diff --git a/news/README.md b/news/README.md index df87ec92658a8eee770ac07f2eb8b6946e0349ff..72b5fee068df8c384d65bd6a333494db61a7ad71 100644 --- a/news/README.md +++ b/news/README.md @@ -5,6 +5,1944 @@ * [2022 年](2022.md) * [2023 年 - 上半年](2023-1st-half.md) +## 20231224:第 71 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v3: Add support for reading D1 efuse speed bin](http://lore.kernel.org/linux-riscv/20231222111407.104270-1-fusibrandon13@gmail.com/)** + +> This series is an attempt to get feedback on decoding D1 efuse speed bins +> in the Sun50i H6 cpufreq driver, and turning the result into a meaningful +> value that selects voltage ranges in an OPP table. +> + +**[v10: StarFive's Pulse Width Modulation driver support](http://lore.kernel.org/linux-riscv/20231222094548.54103-1-william.qiu@starfivetech.com/)** + +> This patchset adds initial rudimentary support for the StarFive +> Pulse Width Modulation controller driver. And this driver will +> be used in StarFive's VisionFive 2 board.The first patch add +> Documentations for the device and Patch 2 adds device probe for +> the module. +> + +**[v1: Introduce & Optimize compat-mode helpers](http://lore.kernel.org/linux-riscv/20231222074605.452452-1-leobras@redhat.com/)** + +> I just saw the opportunity of optimizing the helper is_compat_task() by +> introducing a compile-time test, and it made possible to remove some +> #ifdef's without any loss of performance. +> + +**[GIT PULL: RISC-V Devicetrees for v6.8](http://lore.kernel.org/linux-riscv/20231221-skimmed-boxy-b78aed8afdc4@spud/)** + +> Please pull dt changes for RISC-V. I've got the T-Head SoCs here as +> Jisheng has been busy IRL this cycle and not had the time to set up +> branches etc yet. +> + +**[GIT PULL: RISC-V cache drivers for v6.8](http://lore.kernel.org/linux-riscv/20231221-catatonic-monday-d4c61283b136@spud/)** + +> Please pull the move of the ccache driver out of drivers/soc and the +> addition of support for the non-standard non-coherent cache operations +> on the jh7100. Despite it being an early(ish) SoC and being succeeded by +> the jh7110, there's still people actively adding mainline support for +> some of the peripherals etc. +> + +**[GIT PULL: RISC-V SoC drivers for v6.8](http://lore.kernel.org/linux-riscv/20231221-droop-unblock-81e4fe14acee@spud/)** + +> The FPGA preprogramming driver that we discussed a while back (was it +> at LPC?) is in this PR. It's in the soc-drivers branch because there's +> a bunch of pre-req patches to the existing soc driver that it relies on. +> I'll put stuff for drivers/firmware into a separate branch in the future. +> + +**[v1: Add Pinctrl driver for Starfive JH8100 SoC](http://lore.kernel.org/linux-riscv/20231221083622.3445726-1-yuklin.soo@starfivetech.com/)** + +> Starfive JH8100 SoC consists of 4 pinctrl domains - sys_east, +> sys_west, sys_gmac, and aon. This patch series adds pinctrl +> drivers for these 4 pinctrl domains and this patch series is +> depending on the JH8100 base patch series in [1] and [2]. +> The relevant dt-binding documentation for each pinctrl domain has +> been updated accordingly. +> + +**[v1: riscv: vector: Check SR_SD before saving vstate](http://lore.kernel.org/linux-riscv/20231221070449.1809020-1-songshuaishuai@tinylab.org/)** + +> The SD bit summarizes the dirty states of FS, VS, or XS fields, +> providing a "fast check" before saving fstate or vstate. +> + +**[v13: riscv: Add fine-tuned checksum functions](http://lore.kernel.org/linux-riscv/20231220-optimize_checksum-v13-0-a73547e1cad8@rivosinc.com/)** + +> Each architecture generally implements fine-tuned checksum functions to +> leverage the instruction set. This patch adds the main checksum +> functions that are used in networking. Tested on QEMU, this series +> allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster. +> + +**[v4: RISC-V: Add steal-time support](http://lore.kernel.org/linux-riscv/20231220160012.40184-15-ajones@ventanamicro.com/)** + +> One frequently touted benefit of virtualization is the ability to +> consolidate machines, increasing resource utilization. It may even be +> desirable to overcommit, at the risk of one or more VCPUs having to wait. +> Hypervisors which have interfaces for guests to retrieve the amount of +> time each VCPU had to wait give observers within the guests ways to +> account for less progress than would otherwise be expected. The SBI STA +> extension[1] provides a standard interface for guest VCPUs to retrieve +> the amount of time "stolen". +> + +**[v2: riscv: hwprobe: add Zicond, Zacas and Ztso support](http://lore.kernel.org/linux-riscv/20231220155723.684081-1-cleger@rivosinc.com/)** + +> This series add support for a few more extensions that are present in +> the RVA22U64/RVA23U64 (either mandatory or optional) and that are useful +> for userspace: +> - Zicond +> - Zacas +> - Ztso +> + +**[v1: riscv: put va_kernel_xip_pa_offset into CONFIG_XIP_KERNEL](http://lore.kernel.org/linux-riscv/20231220103428.61758-1-cuiyunhui@bytedance.com/)** + +> opitmize the kernel_mapping_pa_to_va() and kernel_mapping_va_to_pa(). +> + +**[v5: riscv: mm: execute local TLB flush after populating vmemmap](http://lore.kernel.org/linux-riscv/20231220024343.1547648-1-vincent.chen@sifive.com/)** + +> The spare_init() calls memmap_populate() many times to create VA to PA +> mapping for the VMEMMAP area, where all "struct page" are located once +> CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later +> initialized in the zone_sizes_init() function. However, during this +> process, no sfence.vma instruction is executed for this VMEMMAP area. +> This omission may cause the hart to fail to perform page table walk +> because some data related to the address translation is invisible to the +> hart. To solve this issue, the local_flush_tlb_kernel_range() is called +> right after the spare_init() to execute a sfence.vma instruction for this +> VMEMMAP area, ensuring that all data related to the address translation +> is visible to the hart. +> + +**[v3: RISC-V: ACPI: Add external interrupt controller support](http://lore.kernel.org/linux-riscv/20231219174526.2235150-1-sunilvl@ventanamicro.com/)** + +> This series adds support for the below ECR approved by ASWG. +> 1) MADT - https://drive.google.com/file/d/1oMGPyOD58JaPgMl1pKasT-VKsIKia7zR/view?usp=sharing +> +> The series primarily enables irqchip drivers for RISC-V ACPI based +> platforms. +> + +**[v1: riscv: support fast gup](http://lore.kernel.org/linux-riscv/20231219175046.2496-1-jszhang@kernel.org/)** + +> This series adds fast gup support to riscv. +> +> The First patch fixes a bug in __p*d_free_tlb(). Per the riscv +> privileged spec, if non-leaf PTEs I.E pmd, pud or p4d is modified, a +> sfence.vma is a must. +> + +**[v8: Add timer driver for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20231219145402.7879-1-xingyu.wu@starfivetech.com/)** + +> This patch serises are to add timer driver for the StarFive JH7110 +> RISC-V SoC. The first patch adds documentation to describe device +> tree bindings. The subsequent patch adds timer driver and support +> JH7110 SoC. The last patch adds device node about timer in JH7110 +> dts. +> + +**[v1: mm/gup: Unify hugetlb, part 2](http://lore.kernel.org/linux-riscv/20231219075538.414708-1-peterx@redhat.com/)** + +> This is v1 of the series. The series removes the hugetlb slow gup path +> after a previous refactor work [1], so that slow gup now uses the exact +> same path to handle all kinds of memory including hugetlb. +> + +**[v4: Enable networking support for StarFive JH7100 SoC](http://lore.kernel.org/linux-riscv/20231218214451.2345691-1-cristian.ciocaltea@collabora.com/)** + +> This patch series adds ethernet support for the StarFive JH7100 SoC and makes it +> available for the StarFive VisionFive V1 and BeagleV Starlight boards, although +> I could only validate on the former SBC. Thank you Emil and Geert for helping +> with tests on BeagleV! +> + +**[v2: cpufreq support for the D1](http://lore.kernel.org/linux-riscv/20231218110543.64044-1-fusibrandon13@gmail.com/)** + +> This patch series adds support for cpufreq on the D1 SoC, and has been +> tested on a Lichee RV module. +> + +**[v2: Translated the RISC-V architecture boot documentation.](http://lore.kernel.org/linux-riscv/20231218092924.200165-1-longjin@DragonOS.org/)** + +> The patch adds a new file boot.rst to the Documentation/translations/zh_CN/ +> arch/riscv/ directory, and adds a reference to the new file +> in the index.rst file. +> + +**[v4: riscv: sophgo: add clock support for Sophgo CV1800 SoCs](http://lore.kernel.org/linux-riscv/IA1PR20MB495354167CE560FC18E28DC5BB90A@IA1PR20MB4953.namprd20.prod.outlook.com/)** + +> Add clock controller support for the Sophgo CV1800B and CV1812H. +> +> This patch follow this patch series: +> https://lore.kernel.org/all/IA1PR20MB495399CAF2EEECC206ADA7ABBBD5A@IA1PR20MB4953.namprd20.prod.outlook.com/ +> + +**[v6: Add D1/T113s thermal sensor controller support](http://lore.kernel.org/linux-riscv/20231217210629.131486-1-bigunclemax@gmail.com/)** + +> This series adds support for Allwinner D1/T113s thermal sensor controller. +> THIS controller is similar to the one on H6, but with only one sensor and +> uses a different scale and offset values. +> + +#### 进程调度 + +**[v1: sched/eevdf: Correct comment in place_entity](http://lore.kernel.org/lkml/202312201748+0800-wangjinchao@xfusion.com/)** + +> Fix variable names in the place_entity function comments +> to accurately represent lag calculation. +> + +**[v1: sched: move access of avg_rt and avg_dl into existing helper functions](http://lore.kernel.org/lkml/20231220065522.351915-1-sshegde@linux.vnet.ibm.com/)** + +> This is a minor code simplification. There are helper functions called +> cpu_util_dl and cpu_util_rt which gives the average utilization of DL +> and RT respectively. But there are few places in code where these +> variables are used directly. +> + +**[v1: sched/fair: Correct comment for enqueue_task_fair](http://lore.kernel.org/lkml/202312192042+0800-wangjinchao@xfusion.com/)** + +> There is `add_nr_running(rq, 1);` in `enqueue_task_fair`, so the +> original comment is confusing, correct it. +> + +**[v1: sched: Can we rename 'core scheduling' to 'smt scheduling'?](http://lore.kernel.org/lkml/202312191503+0800-wangjinchao@xfusion.com/)** + +> The term 'core' in 'kernel/sched/' implies a relation to the kernel of sched, +> and at the same time, 'core' is used in 'core scheduling' to represent a CPU core. +> Both meanings coexist in the 'core.c' file and appear numerous times. +> + +**[v1: sched/core: Consolidate switch_count assignment](http://lore.kernel.org/lkml/202312191445+0800-wangjinchao@xfusion.com/)** + +> Eliminate redundancy by consolidating the switch_count assignment, +> removing the duplicate operation. +> + +**[v1: sched/debug: Dump end of stack when detected corrupted](http://lore.kernel.org/lkml/20231219032254.96685-1-feng.tang@intel.com/)** + +> When debugging a kernel hang during suspend/resume, there are random +> memory corruptions in different places like being detected by scheduler +> with error message: +> +> "Kernel panic - not syncing: corrupted stack end detected inside scheduler" +> + +#### 内存管理 + +**[v5: mempolicy2, mbind2, and weighted interleave](http://lore.kernel.org/linux-mm/20231223181101.1954-1-gregory.price@memverge.com/)** + +> Weighted interleave is a new interleave policy intended to make +> use of a the new distributed-memory environment made available +> by CXL. The existing interleave mechanism does an even round-robin +> distribution of memory across all nodes in a nodemask, while +> weighted interleave can distribute memory across nodes according +> the available bandwidth that that node provides. +> + +**[v1: mm-unstable: mm/mglru: skip special VMAs in lru_gen_look_around()](http://lore.kernel.org/linux-mm/20231223045647.1566043-1-yuzhao@google.com/)** + +> Special VMAs like VM_PFNMAP can contain anon pages from COW. There +> isn't much profit in doing lookaround on them. Besides, they can +> trigger the pte_special() warning in get_pte_pfn(). +> + +**[v1: mm: abstract shadow stack vma behind arch_is_shadow_stack_vma](http://lore.kernel.org/linux-mm/20231222235248.576482-1-debug@rivosinc.com/)** + +> x86 has used VM_SHADOW_STACK (alias to VM_HIGH_ARCH_5) to encode shadow +> stack VMA. VM_SHADOW_STACK is thus not possible on 32bit. Some arches may +> need a way to encode shadow stack on 32bit and 64bit both and they may +> encode this information differently in VMAs. +> + +**[v2: kexec: Allow preservation of ftrace buffers](http://lore.kernel.org/linux-mm/20231222193607.15474-1-graf@amazon.com/)** + +> Kexec today considers itself purely a boot loader: When we enter the new +> kernel, any state the previous kernel left behind is irrelevant and the +> new kernel reinitializes the system. +> + +**[v1: mm: swap: async free swap slot cache entries](http://lore.kernel.org/linux-mm/20231221-async-free-v1-1-94b277992cb0@kernel.org/)** + +> We discovered that 1% swap page fault is 100us+ while 50% of +> the swap fault is under 20us. +> +> Further investigation show that a large portion of the time +> spent in the free_swap_slots() function for the long tail case. +> + +**[v1: mm: kasan: Mark unpoison_slab_object() as static](http://lore.kernel.org/linux-mm/20231221180042.104694-1-andrey.konovalov@linux.dev/)** + +> With -Wmissing-prototypes enabled, there is a warning that +> unpoison_slab_object() has no prototype, breaking the build with +> CONFIG_WERROR=y: +> +> mm/kasan/common.c:271:6: error: no previous prototype for 'unpoison_slab_object' [-Werror=missing-prototypes] +> 271 | void unpoison_slab_object(struct kmem_cache *cache, void *object, gfp_t flags, +> | ^ +> +> cc1: all warnings being treated as errors +> +> Mark the function as static, as it is not used outside of this +> translation unit, clearing up the warning. +> + +**[v5: netfs, afs, 9p: Delegate high-level I/O to netfslib](http://lore.kernel.org/linux-mm/20231221132400.1601991-1-dhowells@redhat.com/)** + +> I have been working on my netfslib helpers to the point that I can run +> xfstests on AFS to completion (both with write-back buffering and, with a +> small patch, write-through buffering in the pagecache). I have a patch for +> 9P, but am currently unable to test it. +> + +**[v1: iommu/intel: Free empty page tables on unmaps](http://lore.kernel.org/linux-mm/20231221031915.619337-1-pasha.tatashin@soleen.com/)** + +> This series frees empty page tables on unmaps. It intends to be a +> low overhead feature. +> +> The read-writer lock is used to synchronize page table, but most of +> time the lock is held is reader. It is held as a writer for short +> period of time when unmapping a page that is bigger than the current +> iova request. For all other cases this lock is read-only. +> + +**[v2: mm/rmap: interface overhaul](http://lore.kernel.org/linux-mm/20231220224504.646757-1-david@redhat.com/)** + +> This series overhauls the rmap interface, to get rid of the "bool compound" +> / RMAP_COMPOUND parameter with the goal of making the interface less error +> prone, more future proof, and more natural to extend to "batching". Also, +> this converts the interface to always consume folio+subpage, which speeds +> up operations on large folios. +> + +#### 文件系统 + +**[GIT PULL: overlayfs backing file helpers for 6.8](http://lore.kernel.org/linux-fsdevel/20231223154405.941062-1-amir73il@gmail.com/)** + +> Please pull the overlayfs backing file helpers for 6.8. +> +> The only change since the patches that you reviewed [1] is that I added +> assertion to all the helpers that file is a backing_file as you requested. +> + +**[v1: ovl: Reject mounting case-insensitive filesystems](http://lore.kernel.org/linux-fsdevel/87a5q1eecy.fsf_-_@mailhost.krisman.be/)** + +> Eric Biggers writes: +> +> >> When case-insensitive and fscrypt were adapted to work together, we moved the +> >> code that sets the dentry operations for case-insensitive dentries(d_hash and +> >> d_compare) to happen from a helper inside ->lookup. This is because fscrypt +> >> wants to set d_revalidate only on some dentries, so it does it only for them in +> >> d_revalidate. +> >> + +**[v19: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20231222061313.12260-1-nj.shetty@samsung.com/)** + +> The patch series covers the points discussed in past and most recently +> in LSFMM'23[0]. +> We have covered the initial agreed requirements in this patch set and +> further additional features suggested by community. +> + +**[v1: fs/ntfs3: Disable ATTR_LIST_ENTRY size check](http://lore.kernel.org/linux-fsdevel/894db108-509b-4026-a90e-666a759a3f9f@paragon-software.com/)** + +> The use of sizeof(struct ATTR_LIST_ENTRY) has been replaced with le_size(0) +> due to alignment peculiarities on different platforms. +> +> Closes: +> https://lore.kernel.org/oe-kbuild-all/202312071005.g6YrbaIe-lkp@intel.com/ +> + +**[v1: Intruduce stacking filesystem vfs helpers](http://lore.kernel.org/linux-fsdevel/20231221095410.801061-1-amir73il@gmail.com/)** + +> Christian, +> +> These patches essentially just lift some overlayfs code to common +> file fs/backing_file.c. +> +> They are based on vfs.rw and overlayfs-next branches. +> + +**[[PATCH v2 12/11 man-pages] splice.2: document 6.8 blocking behaviour](http://lore.kernel.org/linux-fsdevel/ii3qfagelsu6j2zddtzl6cruy6bpd5wimx35dabhktymjxrwli@tarta.nabijaczleweli.xyz/)** + +> Hypothetical text that matches v2. +> + +**[v1: eventfs: Have event files and directories default to parent uid and gid](http://lore.kernel.org/linux-fsdevel/20231220105017.1489d790@gandalf.local.home/)** + +> Dongliang reported: +> +> I found that in the latest version, the nodes of tracefs have been +> changed to dynamically created. +> +> To fix this, have the files created default to taking the ownership of +> the parent dentry unless the ownership was previously set by the user. +> + +**[v3: RESEND: fuse: Add support for resend pending requests](http://lore.kernel.org/linux-fsdevel/20231220084928.298302-1-winters.zc@antgroup.com/)** + +> After the FUSE daemon crashes, the fuse mount point becomes inaccessible. +> In some production environments, a watchdog daemon is used to preserve +> the FUSE connection's file descriptor (fd). When the FUSE daemon crashes, +> a new FUSE daemon is started and takes over the fd from the watchdog +> daemon, allowing it to continue providing services. +> + +**[v1: security: new security_file_ioctl_compat() hook](http://lore.kernel.org/linux-fsdevel/20231219090909.2827497-1-alpic@google.com/)** + +> Some ioctl commands do not require ioctl permission, but are routed to +> other permissions such as FILE_GETATTR or FILE_SETATTR. This routing is +> done by comparing the ioctl cmd to a set of 64-bit flags (FS_IOC_*). +> + +**[v1: bpf-next: bpf: add BPF_F_TOKEN_FD flag to pass with BPF token FD](http://lore.kernel.org/linux-fsdevel/20231219053150.336991-1-andrii@kernel.org/)** + +> Add BPF_F_TOKEN_FD flag to be used across bpf() syscall commands +> that accept BPF token FD: BPF_PROG_LOAD, BPF_MAP_CREATE, and +> BPF_BTF_LOAD. This flag has to be set whenever token FD is provided. +> + +**[v8: Pass data lifetime information to SCSI disk devices](http://lore.kernel.org/linux-fsdevel/20231219000815.2739120-1-bvanassche@acm.org/)** + +> UFS vendors need the data lifetime information to achieve good performance. +> Providing data lifetime information to UFS devices can result in up to 40% +> lower write amplification. Hence this patch series that adds support in F2FS +> and also in the block layer for data lifetime information. The SCSI disk (sd) +> driver is modified such that it passes write hint information to SCSI devices +> via the GROUP NUMBER field. +> + +#### 网络设备 + +**[v1: net: selftests: bonding: do not set port down when adding to bond](http://lore.kernel.org/netdev/20231223125922.3280841-1-liuhangbin@gmail.com/)** + +> Similar to commit be809424659c ("selftests: bonding: do not set port down +> before adding to bond"). The bond-arp-interval-causes-panic test failed +> after commit a4abfa627c38 ("net: rtnetlink: Enslave device before bringing +> it up") as the kernel will set the port down _after_ adding to bond if setting +> port down specifically. +> + +**[v1: net/ps3_gelic_net: Add gelic_descr structures](http://lore.kernel.org/netdev/2e4bd247-e217-47a6-a7e3-20375d05ff25@infradead.org/)** + +> In an effort to make the PS3 gelic driver easier to maintain, create two +> new structures, struct gelic_hw_regs and struct gelic_chain_link, and +> replace the corresponding members of struct gelic_descr with the new +> structures. +> + +**[v2: net-next: bnxt_en: Add basic ntuple filter support](http://lore.kernel.org/netdev/20231223042210.102485-1-michael.chan@broadcom.com/)** + +> The current driver only supports ntuple filters added by aRFS. This +> patch series adds basic support for user defined TCP/UDP ntuple filters +> added by the user using ethtool. Many of the patches are refactoring +> patches to make the existing code more general to support both aRFS +> and user defined filters. aRFS filters always have the Toeplitz hash +> value from the NIC. A Toepliz hash function is added in patch 5 to +> get the same hash value for user defined filters. The hash is used +> to store all ntuple filters in the table and all filters must be +> hashed identically using the same function and key. +> + +**[v1: net-next: Christmas 3-serie XDP for idpf (+generic stuff)](http://lore.kernel.org/netdev/20231223025554.2316836-1-aleksander.lobakin@intel.com/)** + +> I was highly asked to send this WIP before the holidays to trigger +> some discussions at least for the generic parts. +> +> This all depends on libie[0] and WB-on-ITR fix[1]. The RFC does not +> guarantee to work perfectly, but at least regular XDP seems to work +> for me... +> + +**[v3: net-next: net: dsa: realtek: variants to drivers, interfaces to a common module](http://lore.kernel.org/netdev/20231223005253.17891-1-luizluca@gmail.com/)** + +> The current driver consists of two interface modules (SMI and MDIO) and +> two family/variant modules (RTL8365MB and RTL8366RB). The SMI and MDIO +> modules serve as the platform and MDIO drivers, respectively, calling +> functions from the variant modules. In this setup, one interface module +> can be loaded independently of the other, but both variants must be +> loaded (if not disabled at build time) for any type of interface. This +> approach doesn't scale well, especially with the addition of more switch +> variants (e.g., RTL8366B), leading to loaded but unused modules. +> Additionally, this also seems upside down, as the specific driver code +> normally depends on the more generic functions and not the other way +> around. +> + +**[v1: net-next: Revert "net: ethtool: add support for symmetric-xor RSS hash"](http://lore.kernel.org/netdev/20231222210000.51989-1-gerhard@engleder-embedded.com/)** + +> The tsnep driver and at least also the macb driver implement the ethtool +> operation set_rxnfc but not the get_rxfh operation. With this commit +> set_rxnfc returns -EOPNOTSUPP if get_rxfh is not implemented. This renders +> set_rxnfc unuseable for drivers without get_rxfh. +> + +**[v1: net-next: selftests: Add TEST_INCLUDES directive and adjust tests to use it](http://lore.kernel.org/netdev/20231222135836.992841-1-bpoirier@nvidia.com/)** + +> After commit 25ae948b4478 ("selftests/net: add lib.sh"), some net +> selftests encounter errors when they are being exported and run. This is +> because the new net/lib.sh is not exported along with the tests. +> + +**[v1: net-next: Documentation: add pyyaml to requirements.txt](http://lore.kernel.org/netdev/20231222133628.3010641-1-vegard.nossum@oracle.com/)** + +> Commit f061c9f7d058 ("Documentation: Document each netlink family") added +> a new Python script that is invoked during 'make htmldocs' and which reads +> the netlink YAML spec files. +> + +**[v5: ipsec-next: xfrm: introduce forwarding of ICMP Error messages](http://lore.kernel.org/netdev/38d9daba2f601602ef115942f82b80e56b54c560.1703249432.git.antony.antony@secunet.com/)** + +> This commit aligns with RFC 4301, Section 6, and addresses the +> requirement to forward unauthenticated ICMP error messages that do not +> match any xfrm policies. It utilizes the ICMP payload as an skb and +> performs a reverse lookup. If a policy match is found, forward +> the packet. +> + +**[v1: net-next: mptcp: add CurrEstab MIB counter](http://lore.kernel.org/netdev/20231222-upstream-net-next-20231221-mptcp-currestab-v1-0-c1eb73d6b2b2@kernel.org/)** + +> This MIB counter is similar to the one of TCP -- CurrEstab -- available +> in /proc/net/snmp. This is useful to quickly list the number of MPTCP +> connections without having to iterate over all of them. +> + +**[v6: net-next: Implement more ethtool_ops for Wangxun](http://lore.kernel.org/netdev/20231222101639.1499997-1-jiawenwu@trustnetic.com/)** + +> Provide ethtool functions to operate pause param, ring param, coalesce +> channel number and msglevel, for driver txgbe/ngbe. +> + +**[v3: StarFive DWMAC support for JH7100](http://lore.kernel.org/netdev/20231222101001.2541758-1-cristian.ciocaltea@collabora.com/)** + +> This is just a subset of the initial patch series [1] adding networking +> support for StarFive JH7100 SoC. +> +> [1]: https://lore.kernel.org/lkml/20231218214451.2345691-1-cristian.ciocaltea@collabora.com/ +> + +**[v1: net-next: net: stmmac: Enable Per DMA Channel interrupt](http://lore.kernel.org/netdev/20231222054451.2683242-1-leong.ching.swee@intel.com/)** + +> Add Per DMA Channel interrupt feature for DWXGMAC IP. +> +> Patchset (link below) contains per DMA channel interrupt, But it was +> achieved. +> https://lore.kernel.org/lkml/20230821203328.GA2197059- +> robh@kernel.org/t/#m849b529a642e1bff89c05a07efc25d6a94c8bfb4 +> + +**[v1: net-next: virtio-net: support device stats](http://lore.kernel.org/netdev/20231222033021.20649-1-xuanzhuo@linux.alibaba.com/)** + +> https://github.com/oasis-tcs/virtio-spec/commit/42f389989823039724f95bbbd243291ab0064f82 +> +> The virtio net supports to get device stats. +> + +**[v1: net/tcp_sigpool: Use kref_get_unless_zero()](http://lore.kernel.org/netdev/20231222-tcp-ao-kref_get_unless_zero-v1-1-551c2edd0136@arista.com/)** + +> The freeing and re-allocation of algorithm are protected by cpool_mutex, +> so it doesn't fix an actual use-after-free, but avoids a deserved +> refcount_warn_saturate() warning. +> + +**[v22: nvme-tcp receive offloads](http://lore.kernel.org/netdev/20231221213358.105704-1-aaptel@nvidia.com/)** + +> The next iteration of our nvme-tcp receive offload series. +> Main change is get_netdev_for_sock() using netdevice_tracker. +> + +**[v1: net-next: net/sched: retire tc ipt action](http://lore.kernel.org/netdev/20231221213105.476630-1-jhs@mojatatu.com/)** + +> In keeping up with my status as a hero who removes code: another one bites the +> dust. +> The tc ipt action was intended to run all netfilter/iptables target. +> Unfortunately it has not benefitted over the years from proper updates when +> netfilter changes, and for that reason it has remained rudimentary. +> + +#### 安全增强 + +**[v1: lsm: Add a __counted_by() annotation to lsm_ctx.ctx](http://lore.kernel.org/linux-hardening/20231221-lsm-fix-counted-by-v1-1-12cc27597cdf@kernel.org/)** + +> The ctx in struct lsm_ctx is an array of size ctx_len, tell the compiler +> about this using __counted_by() where supported to improve the ability to +> detect overflow issues. +> + +**[v1: next: bcachefs: Replace zero-length array with flex-array member and use __counted_by](http://lore.kernel.org/linux-hardening/ZYDi1bWIKRSs2NpH@work/)** + +> Fake flexible arrays (zero-length and one-element arrays) are +> deprecated, and should be replaced by flexible-array members. +> So, replace zero-length array with a flexible-array member in +> `struct bch_ioctl_fsck_offline`. +> + +**[v3: shrink lib/string.i via IWYU](http://lore.kernel.org/linux-hardening/20231218-libstringheader-v3-0-500bd58f0f75@google.com/)** + +> This patch series changes the include list of string.c to minimize +> the preprocessing size. The patch series intends to remove REPEAT_BYE +> from kernel.h and move it into its own header file because +> word-at-a-time.h has an implicit dependancy on it but it is declared +> in kernel.h which is bloated. +> + +#### 异步 IO + +**[v1: io_uring/rw: ensure io->bytes_done is always initialized](http://lore.kernel.org/io-uring/175bbf4a-b0a9-4771-b91e-928ebdbf5319@kernel.dk/)** + +> If IOSQE_ASYNC is set and we fail importing an iovec for a readv or +> writev request, then we leave ->bytes_done uninitialized and hence the +> eventual failure CQE posted can potentially have a random res value +> rather than the expected -EINVAL. +> + +**[v1: io_uring/register: move io_uring_register(2) related code to register.c](http://lore.kernel.org/io-uring/52bf7df6-8ab2-4a9b-825c-8751b4719dc7@kernel.dk/)** + +> Most of this code is basically self contained, move it out of the core +> io_uring file to bring a bit more separation to the registration related +> bits. This moves another +> 10% of the code into register.c. +> + +#### BPF + +**[v2: bpf-next: bpf: inline bpf_kptr_xchg()](http://lore.kernel.org/bpf/20231223104042.1432300-1-houtao@huaweicloud.com/)** + +> The motivation of inlining bpf_kptr_xchg() comes from the performance +> profiling of bpf memory allocator benchmark [1]. The benchmark uses +> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed +> objects for free. After inling bpf_kptr_xchg(), the performance for +> object free on 8-CPUs VM increases about 2% +> 10%. However the performance +> gain comes with costs: both the kasan and kcsan checks on the pointer +> will be unavailable. Initially the inline is implemented in do_jit() for +> x86-64 directly, but I think it will more portable to implement the +> inline in verifier. +> + +**[v11: bpf-next: Relax tracing prog recursive attach rules](http://lore.kernel.org/bpf/20231222151153.31291-1-9erthalion6@gmail.com/)** + +> Currently, it's not allowed to attach an fentry/fexit prog to another +> fentry/fexit. At the same time it's not uncommon to see a tracing +> program with lots of logic in use, and the attachment limitation +> prevents usage of fentry/fexit for performance analysis (e.g. with +> "bpftool prog profile" command) in this case. An example could be +> falcosecurity libs project that uses tp_btf tracing programs for +> offloading certain part of logic into tail-called programs, but the +> use-case is still generic enough -- a tracing program could be +> complicated and heavy enough to warrant its profiling, yet frustratingly +> it's not possible to do so use best tooling for that. +> + +**[v1: bpf-next: bpf: introduce BPF_MAP_TYPE_RELAY](http://lore.kernel.org/bpf/20231222122146.65519-1-lulie@linux.alibaba.com/)** + +> The patch set introduce a new type of map, BPF_MAP_TYPE_RELAY, based on +> relay interface [0]. It provides a way for persistent and overwritable data +> transfer. +> +> As stated in [0], relay is a efficient method for log and data transfer. +> And the interface is simple enough so that we can implement and use this +> type of map with current map interfaces. Besides we need a new helper +> bpf_relay_output to output data to user, similar with bpf_ringbuf_output. +> + +**[v6: bpf-next: bpf: Reduce memory usage for bpf_global_percpu_ma](http://lore.kernel.org/bpf/20231222031729.1287957-1-yonghong.song@linux.dev/)** + +> Currently when a bpf program intends to allocate memory for percpu kptr, +> the verifier will call bpf_mem_alloc_init() to prefill all supported +> unit sizes and this caused memory consumption very big for large number +> of cpus. For example, for 128-cpu system, the total memory consumption +> with initial prefill is +> 175MB. Things will become worse for systems +> with even more cpus. +> + +**[v1: bpf-next: bpf: Avoid unnecessary use of comma operator in verifier](http://lore.kernel.org/bpf/20231221-bpf-verifier-comma-v1-1-cde2530912e9@kernel.org/)** + +> Although it does not seem to have any untoward side-effects, +> the use of ';' to separate to assignments seems more appropriate than ','. +> + +**[v7: bpf-next: bpf: tcp: Support arbitrary SYN Cookie at TC.](http://lore.kernel.org/bpf/20231221012806.37137-1-kuniyu@amazon.com/)** + +> Under SYN Flood, the TCP stack generates SYN Cookie to remain stateless +> for the connection request until a valid ACK is responded to the SYN+ACK. +> +> The cookie contains two kinds of host-specific bits, a timestamp and +> secrets, so only can it be validated by the generator. It means SYN +> Cookie consumes network resources between the client and the server; +> intermediate nodes must remember which nodes to route ACK for the cookie. +> + +**[v1: bpf-next: Libbpf-side __arg_ctx fallback support](http://lore.kernel.org/bpf/20231220233127.1990417-1-andrii@kernel.org/)** + +> Support __arg_ctx global function argument tag semantics even on older kernels +> that don't natively support it through btf_decl_tag("arg:ctx"). +> +> Patches #2-#6 are preparatory work to allow to postpone BTF loading into the +> kernel until after all the BPF program relocations (including global func +> appending to main programs) are done. Patch #4 is perhaps the most important +> and establishes pre-created stable placeholder FDs, so that relocations can +> embed valid map FDs into ldimm64 instructions. +> + +**[v15: bpf-next: Registrating struct_ops types from modules](http://lore.kernel.org/bpf/20231220222654.1435895-1-thinker.li@gmail.com/)** + +> Given the current constraints of the current implementation, +> struct_ops cannot be registered dynamically. This presents a +> significant limitation for modules like coming fuse-bpf, which seeks +> to implement a new struct_ops type. To address this issue, a new API +> is introduced that allows the registration of new struct_ops types +> from modules. +> + +**[v1: dwarves: pahole: Inject kfunc decl tags into BTF](http://lore.kernel.org/bpf/421d18942d6ad28625530a8b3247595dc05eb100.1703110747.git.dxu@dxuuu.xyz/)** + +> This commit teaches pahole to parse symbols in .BTF_ids section in +> vmlinux and discover exported kfuncs. Pahole then takes the list of +> kfuncs and injects a BTF_KIND_DECL_TAG for each kfunc. +> + +**[v1: bpf-next: Improvements for tracking scalars in the BPF verifier](http://lore.kernel.org/bpf/20231220214013.3327288-1-maxtram95@gmail.com/)** + +> The goal of this series is to extend the verifier's capabilities of +> tracking scalars when they are spilled to stack, especially when the +> spill or fill is narrowing. It also contains a fix by Eduard for +> infinite loop detection and a state pruning optimization by Eduard that +> compensates for a verification complexity regression introduced by +> tracking unbounded scalars. These improvements reduce the surface of +> false rejections that I saw while working on Cilium codebase. +> + +**[v10: bpf-next: Relax tracing prog recursive attach rules](http://lore.kernel.org/bpf/20231220180422.8375-1-9erthalion6@gmail.com/)** + +> Currently, it's not allowed to attach an fentry/fexit prog to another +> fentry/fexit. At the same time it's not uncommon to see a tracing +> program with lots of logic in use, and the attachment limitation +> prevents usage of fentry/fexit for performance analysis (e.g. with +> "bpftool prog profile" command) in this case. An example could be +> falcosecurity libs project that uses tp_btf tracing programs for +> offloading certain part of logic into tail-called programs, but the +> use-case is still generic enough -- a tracing program could be +> complicated and heavy enough to warrant its profiling, yet frustratingly +> it's not possible to do so use best tooling for that. +> + +**[v3: use preserve_static_offset in bpf uapi headers](http://lore.kernel.org/bpf/20231220133411.22978-1-eddyz87@gmail.com/)** + +> For certain program context types, the verifier applies the +> verifier.c:convert_ctx_access() transformation. +> It modifies ST/STX/LDX instructions that access program context. +> convert_ctx_access() updates the offset field of these instructions +> changing "virtual" offset by offset corresponding to data +> representation in the running kernel. +> + +**[v1: samples/bpf: use %lu format specifier for unsigned long values](http://lore.kernel.org/bpf/20231219152307.368921-1-colin.i.king@gmail.com/)** + +> Currently %ld format specifiers are being used for unsigned long +> values. Fix this by using %lu instead. Cleans up cppcheck warnings: +> +> warning: %ld in format string (no. 1) requires 'long' but the argument +> type is 'unsigned long'. [invalidPrintfArgType_sint] +> + +**[v5: bpf-next: bpf: support to track BPF_JNE](http://lore.kernel.org/bpf/20231219134800.1550388-1-menglong8.dong@gmail.com/)** + +> For now, the reg bounds is not handled for BPF_JNE case, which can cause +> the failure of following case: +> +> /* The type of "a" is u32 */ +> if (a > 0 && a < 100) { +> /* the range of the register for a is [0, 99], not [1, 99], +> * and will cause the following error: +> * +> * invalid zero-sized read +> * +> * as a can be 0. +> */ +> bpf_skb_store_bytes(skb, xx, xx, a, 0); +> } +> +> In the code above, "a > 0" will be compiled to "if a == 0 goto xxx". In +> the TRUE branch, the dst_reg will be marked as known to 0. However, in the +> fallthrough(FALSE) branch, the dst_reg will not be handled, which makes +> the [min, max] for a is [0, 99], not [1, 99]. +> + +**[v1: libbpf: skip DWARF sections in linker sanity check](http://lore.kernel.org/bpf/20231219110324.8989-1-hi@alyssa.is/)** + +> clang can generate (with -g -Wa,--compress-debug-sections) 4-byte +> aligned DWARF sections that declare themselves to be 8-byte aligned in +> the section header. Since DWARF sections are dropped during linking +> anyway, just skip running the sanity checks on them. +> + +**[v1: net-next: xsk: make struct xsk_cb_desc available outside CONFIG_XDP_SOCKETS](http://lore.kernel.org/bpf/20231219110205.1289506-1-vladimir.oltean@nxp.com/)** + +> The ice driver fails to build when CONFIG_XDP_SOCKETS is disabled. +> +> drivers/net/ethernet/intel/ice/ice_base.c:533:21: error: +> variable has incomplete type 'struct xsk_cb_desc' +> struct xsk_cb_desc desc = {}; +> ^ +> include/net/xsk_buff_pool.h:15:8: note: +> forward declaration of 'struct xsk_cb_desc' +> struct xsk_cb_desc; +> ^ +> + +**[v1: bpf-next: bpf: use nla_ok() instead of checking nla_len directly](http://lore.kernel.org/bpf/20231218231904.260440-1-kuba@kernel.org/)** + +> nla_len may also be too short to be sane, in which case after +> recent changes nla_len() will return a wrapped value. +> + +**[v2: bpf-next: bpf: ensure precise is reset to false in __mark_reg_const_zero()](http://lore.kernel.org/bpf/20231218173601.53047-1-andrii@kernel.org/)** + +> It is safe to always start with imprecise SCALAR_VALUE register. +> Previously __mark_reg_const_zero() relied on caller to reset precise +> mark, but it's very error prone and we already missed it in a few +> places. So instead make __mark_reg_const_zero() reset precision always, +> as it's a safe default for SCALAR_VALUE. Explanation is basically the +> same as for why we are resetting (or rather not setting) precision in +> current state. If necessary, precision propagation will set it to +> precise correctly. +> + +### 周边技术动态 + +#### Qemu + +**[v2: target/riscv: deprecate riscv_cpu_options[]](http://lore.kernel.org/qemu-devel/20231222122235.545235-1-dbarboza@ventanamicro.com/)** + +> This new version fixes all instances of 'const PropertyInfo' added, +> changing it to 'static const PropertyInfo', like suggested by Richard in +> v1. +> + +**[v1: target/riscv/kvm: QEMU support for KVM Guest Debug on RISC-V](http://lore.kernel.org/qemu-devel/20231221094923.7349-1-duchao@eswincomputing.com/)** + +> This series implements QEMU KVM Guest Debug on RISC-V. Currently, we can +> debug RISC-V KVM guest from the host side, with software breakpoints. +> + +**[v2: target/riscv: Implement optional CSR mcontext of debug Sdtrig extension](http://lore.kernel.org/qemu-devel/20231219123244.290935-1-alvinga@andestech.com/)** + +> The debug Sdtrig extension defines an CSR "mcontext". This commit +> implements its predicate and read/write operations into CSR table. +> Its value is reset as 0 when the trigger module is reset. +> + +**[v2: target/riscv: add RVV CSRs](http://lore.kernel.org/qemu-devel/20231218204321.75757-1-dbarboza@ventanamicro.com/)** + +> This version was rebased on top of Alistair's riscv-to-apply.next. A +> small tweak was needed in patch 4 due to changes in the branch. +> + +**[v1: target/riscv/kvm: do not use non-portable strerrorname_np()](http://lore.kernel.org/qemu-devel/20231218162301.14817-1-ncopa@alpinelinux.org/)** + +> strerrorname_np is non-portable and breaks building with musl libc. +> +> Use strerror(errno) instead, like we do other places. +> + +**[v13: riscv: RVA22 profiles support](http://lore.kernel.org/qemu-devel/20231218125334.37184-1-dbarboza@ventanamicro.com/)** + +> This is a merge of the two profile series: +> +> "v12: for-9.0: riscv: rv64i/rva22u64 CPUs, RVA22U64 profile support" +> "v2: for-9.0: target/riscv: implement RVA22S64 profile" +> +> I'm sending them together since the second series is dependent on the first. +> + +#### Buildroot + +**[boot/edk2: add support for RISC-V 64bit architecture](http://lore.kernel.org/buildroot/20231223133658.CECBF87CF3@busybox.osuosl.org/)** + +> commit: https://git.buildroot.net/buildroot/commit/?id=1b2498fa91b752f4cc3578600bc653ece734c001 +> branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master +> +> RISC-V 64bit qemu virt machine support has been added in edk2 +> version "stable202302". See [1]. +> + +**[package/libopenssl: use riscv-specific configure target](http://lore.kernel.org/buildroot/20231223104438.EDEC187BF7@busybox.osuosl.org/)** + +> commit: https://git.buildroot.net/buildroot/commit/?id=fc8eff0c76ab35db7d783ce7959193ff4c30a01e +> branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master +> +> Adds BR2_PACKAGE_LIBOPENSSL_TARGET_ARCH for riscv32 and riscv64. +> Otherwise, riscv targets fall back to the linux-generic libopenssl +> configs. This exacerbates the issue partially addressed in +> openssl/openssl#22871 which causes build failures. +> + +#### U-Boot + +**[v1: riscv: Extend board compatible string with "qemu,mbv"](http://lore.kernel.org/u-boot/575ac34167776f3a3a00aa23cc5e182d1e41492f.1703084004.git.michal.simek@amd.com/)** + +> Extend compatible string to match the latest change in dt binding. +> + +**[GIT PULL: Please pull u-boot-amlogic-next-20231220](http://lore.kernel.org/u-boot/8c32e92f-810f-434b-ba77-3074490458bd@linaro.org/)** + +> Please pull into next branch support for the new GXL MDIO mux along +> with a Linux v6.4 DT sync, this will permit using upstream Linux DT +> as-is on GXL & GXM based boards. +> + +**[v3: acpi: add ACPI support on QEMU ARM and RISC-V](http://lore.kernel.org/u-boot/20231219122439.31983-1-heinrich.schuchardt@canonical.com/)** + +> QEMU 8.1.2 can create ACPI tables for the ARM and RISC-V architectures +> Allow passing them through to the operating system. +> Provide a new config fragment that enables this. +> + +**[GIT PULL: u-boot-riscv/next](http://lore.kernel.org/u-boot/ZYAwj034fNBg1HOk@swlinux02/)** + +> The following changes since commit fdefb4e194c65777fa11479119adaa71651f41d4: +> +> Merge tag 'efi-next-20231217' of https://source.denx.de/u-boot/custodians/u-boot-efi into next (2023-12-17 09:11:06 -0500) +> +> are available in the Git repository at: +> +> https://source.denx.de/u-boot/custodians/u-boot-riscv.git next +> +> for you to fetch changes up to 44a792c99498f5a9d3526019779d66585978c491: +> +> riscv: sifive: unmatched: migrate to text environment (2023-12-18 11:09:01 +0800) +> + +## 20231127:第 70 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v1: riscv: dts: microchip: move timebase-frequency to mpfs.dtsi](http://lore.kernel.org/linux-riscv/20231126-unlighted-favorably-4627f2361a59@spud/)** + +> The timebase-frequency on PolarFire SoC is not set by an oscillator on +> the board, but rather by an internal divider, so move the property to +> mpfs.dtsi. +> + +**[v1: riscv: dts: starfive: add Milkv Mars board device tree](http://lore.kernel.org/linux-riscv/20231126100055.1595-1-jszhang@kernel.org/)** + +> Add the devicetree file describing the currently supported features, +> namely PMIC, UART, SD card, QSPI Flash, eMMC and Ethernet. +> + +**[v1: riscv: enable lockless lockref implementation](http://lore.kernel.org/linux-riscv/20231125082144.311-1-jszhang@kernel.org/)** + +> This series selects ARCH_USE_CMPXCHG_LOCKREF to enable the +> cmpxchg-based lockless lockref implementation for riscv. Then, +> implement arch_cmpxchg64_{relaxed|acquire|release}. +> + +**[v1: RISC-V: Add dynamic TSO support](http://lore.kernel.org/linux-riscv/20231124072142.2786653-1-christoph.muellner@vrull.eu/)** + +> The upcoming RISC-V Ssdtso specification introduces a bit in the senvcfg +> CSR to switch the memory consistency model at run-time from RVWMO to TSO +> (and back). The active consistency model can therefore be switched on a +> per-hart base and managed by the kernel on a per-process/thread base. +> + +**[v5: RISC-V SBI debug console extension support](http://lore.kernel.org/linux-riscv/20231124070905.1043092-1-apatel@ventanamicro.com/)** + +> The SBI v2.0 specification is now frozen. The SBI v2.0 specification defines +> SBI debug console (DBCN) extension which replaces the legacy SBI v0.1 +> functions sbi_console_putchar() and sbi_console_getchar(). +> (Refer v2.0-rc5 at https://github.com/riscv-non-isa/riscv-sbi-doc/releases) +> + +**[v2: kexec_file: print out debugging message if required](http://lore.kernel.org/linux-riscv/20231124033642.520686-1-bhe@redhat.com/)** + +> Currently, specifying '-d' will print a lot of debugging information +> about kexec/kdump loading with kexec_load interface. +> +> However, kexec_file_load prints nothing even though '-d' is specified. +> It's very inconvenient to debug or analyze the kexec/kdump loading when +> something wrong happened with kexec/kdump itself or develper want to +> check the kexec/kdump loading. +> + +**[v1: riscv: Select ARCH_WANTS_NO_INSTR](http://lore.kernel.org/linux-riscv/20231123142223.1787-1-jszhang@kernel.org/)** + +> As said in the help of ARCH_WANTS_NO_INSTR entry in arch/Kconfig: +> "An architecture should select this if the noinstr macro is being used on +> functions to denote that the toolchain should avoid instrumenting such +> functions and is required for correctness." +> + +**[v1: riscv: Use asm-generic for {read,write}{bwlq} and their relaxed variant](http://lore.kernel.org/linux-riscv/20231123142003.1759-1-jszhang@kernel.org/)** + +> The asm-generic implementation is functionally identical to the riscv +> version. +> + +**[v1: riscv: declare overflow_stack as exported from traps.c](http://lore.kernel.org/linux-riscv/20231123134214.81481-1-ben.dooks@codethink.co.uk/)** + +> The percpu area overflow_stacks is exported from arch/riscv/kernel/traps.c +> for use in the entry code, but is not declared anywhere. Add the relevant +> declaration to arch/riscv/include/asm/stacktrace.h to silence the following +> sparse warning: +> +> arch/riscv/kernel/traps.c:395:1: warning: symbol '__pcpu_scope_overflow_stack' was not declared. Should it be static? +> + +**[v1: riscv: Introduce 64K base page](http://lore.kernel.org/linux-riscv/20231123065708.91345-1-luxu.kernel@bytedance.com/)** + +> Some existing architectures like ARM supports base page larger than 4K +> as their MMU supports more page sizes. Thus, besides hugetlb page and +> transparent huge page, there is another way for these architectures to +> enjoy the benefits of fewer TLB misses without worrying about cost of +> splitting and merging huge pages. However, on architectures with only +> 4K MMU, larger base page is unavailable now. +> + +**[v1: riscv: Create and document PR_RISCV_SET_ICACHE_FLUSH_CTX prctl](http://lore.kernel.org/linux-riscv/20231122-fencei-v1-0-bec0811cb212@rivosinc.com/)** + +> Improve the performance of icache flushing by creating a new prctl flag +> PR_RISCV_SET_ICACHE_FLUSH_CTX. The interface is left generic to allow +> for future expansions such as with the proposed J extension [1]. +> + +**[v2: Support rv32 ULEB128 test](http://lore.kernel.org/linux-riscv/20231122-module_fixup-v2-1-dfb9565e9ea5@rivosinc.com/)** + +> Use opcodes available to both rv32 and rv64 in uleb128 module linking +> test. +> + +**[v3: RISC-V: hwprobe: Introduce which-cpus](http://lore.kernel.org/linux-riscv/20231122164700.127954-6-ajones@ventanamicro.com/)** + +> This series introduces a flag for the hwprobe syscall which effectively +> reverses its behavior from getting the values of keys for a set of cpus +> to getting the cpus for a set of key-value pairs. +> + +**[v1: perf callchain: Support riscv cross-platform](http://lore.kernel.org/linux-riscv/20231122155548.2449-1-p4ranlee@gmail.com/)** + +> Support riscv cross platform callchain unwind. +> Tested on RISCV 64 Starfive VisionFive2 Board +> $ uname -ra +> Linux starfive 5.15.0-starfive # +> 1 SMP Mon Dec 19 07:56:37 EST 2022 riscv64 GNU/Linux +> paran@starfive: +> /linux-next$ cat /etc/os-release +> PRETTY_NAME="Debian GNU/Linux bookworm/sid" +> NAME="Debian GNU/Linux" +> ID=debian +> HOME_URL="https://www.debian.org/" +> SUPPORT_URL="https://www.debian.org/support" +> + +**[v4: RESEND: perf vendor events riscv: add T-HEAD C9xx JSON file](http://lore.kernel.org/linux-riscv/IA1PR20MB495325FCF603BAA841E29281BBBAA@IA1PR20MB4953.namprd20.prod.outlook.com/)** + +> Add json file of T-HEAD C9xx series events. +> +> The event idx (raw value) is summary as following: +> +> event id range | support cpu +> +> The event ids are based on the public document of T-HEAD and cover +> the c900 series. +> + +**[v4: Support Andes PMU extension](http://lore.kernel.org/linux-riscv/20231122121235.827122-1-peterlin@andestech.com/)** + +> This patch series introduces the Andes PMU extension, which serves +> the same purpose as Sscofpmf. To use FDT-based probing the hardware +> support of the PMU extension, instead of adding another CPU errata, +> we have converted T-Head's PMU alternative along with this series. +> + +**[v1: riscv: make unexported items static](http://lore.kernel.org/linux-riscv/20231122090255.188851-1-ben.dooks@codethink.co.uk/)** + +> The relocation_hashtable and used_buckets_list are not used +> outside of the module.c file and therefore should be made +> static to avoid the follwoing sdparse warnings: +> +> arch/riscv/kernel/module.c:48:19: warning: symbol 'relocation_hashtable' was not declared. Should it be static? +> arch/riscv/kernel/module.c:50:18: warning: symbol 'used_buckets_list' was not declared. Should it be static? +> + +**[v4: perf vendor events riscv: add StarFive Dubhe-90 JSON file](http://lore.kernel.org/linux-riscv/20231122030908.2981502-1-jisheng.teoh@starfivetech.com/)** + +> Similar to StarFive's Dubhe-80, Dubhe-90 supports raw event id +> The raw events are enabled through PMU node of DT binding. +> Besides raw event, add standard RISC-V firmware events to +> support monitoring of firmware event. +> + +**[v1: riscv: Add kernel-mode FPU support for amdgpu](http://lore.kernel.org/linux-riscv/20231122030621.3759313-1-samuel.holland@sifive.com/)** + +> This series allows using newer AMD GPUs (e.g. Navi) on RISC-V boards +> such as SiFive's HiFive Unmatched. Those GPUs need CONFIG_DRM_AMD_DC_FP +> to initialize, which requires kernel-mode FPU support. +> + +**[v3: riscv: ASID-related and UP-related TLB flush enhancements](http://lore.kernel.org/linux-riscv/20231122010815.3545294-1-samuel.holland@sifive.com/)** + +> While reviewing Alexandre Ghiti's "riscv: tlb flush improvements" +> series[1], I noticed that most TLB flush functions end up as a call to +> local_flush_tlb_all() when SMP is disabled. This series resolves that. +> Along the way, I realized that we should be using single-ASID flushes +> wherever possible, so I implemented that as well. +> +> [1]: https://lore.kernel.org/linux-riscv/20231030133027.19542-1-alexghiti@rivosinc.com/ +> + +**[v1: riscv: errata: andes: Probe IOCP during boot stage](http://lore.kernel.org/linux-riscv/20231121202459.36874-1-prabhakar.mahadev-lad.rj@bp.renesas.com/)** + +> We should be probing for IOCP during boot stage only. As we were probing +> for IOCP for all the stages this caused the below issue during module-init +> stage, +> + +**[v1: riscv: mm: implement pgprot_nx](http://lore.kernel.org/linux-riscv/20231121160637.3856-1-jszhang@kernel.org/)** + +> commit cca98e9f8b5e ("mm: enforce that vmap can't map pages +> executable") enforces the W^X protection by not allowing remapping +> existing pages as executable. Add riscv bits so that riscv can benefit +> the same protection. +> + +**[v1: riscv: select ARCH_HAS_FAST_MULTIPLIER](http://lore.kernel.org/linux-riscv/20231121144340.3492-1-jszhang@kernel.org/)** + +> Currently, riscv linux requires at least IMA, so all platforms have a +> multiplier. And I assume the 'mul' efficiency is comparable or better +> than a sequence of five or so register-dependent arithmetic +> instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer +> codegen. Refer to commit f9b4192923fa ("v1: bitops: hweight() +> speedup") for more details. +> + +#### 进程调度 + +**[v1: sched/cputime: exclude ktimer threads in irqtime_account_irq](http://lore.kernel.org/lkml/20231124063450.GA18089@didi-ThinkCentre-M930t-N000/)** + +> In CONFIG_PREEMPT_RT kernel, ktimer threads need to be excluded as well as +> ksoftirqd when accounting CPUTIME_SOFTIRQ in irqtime_account_irq. +> Also add this_cpu_ktimers to keep consistency with this_cpu_ksoftirqd. +> + +**[v1: sched/pelt: avoid underestimate of task utilization](http://lore.kernel.org/lkml/20231122140119.472110-1-vincent.guittot@linaro.org/)** + +> It has been reported that thread's util_est can significantly decrease as +> a result of sharing the CPU with other threads. The use case can be easily +> reproduced with a periodic task TA that runs 1ms and sleeps 100us. +> When the task is alone on the CPU, its max utilization and its util_est is +> around 888. If another similar task starts to run on the same CPU, TA will +> have to share the CPU runtime and its maximum utilization will decrease +> around half the CPU capacity (512) then TA's util_est will follow this new +> maximum trend which is only the result of sharing the CPU with others +> tasks. +> + +**[v1: net/sched: cls: Load net classifier modules via alias](http://lore.kernel.org/lkml/20231121175640.9981-1-mkoutny@suse.com/)** + +> The classifier modules may be loaded lazily without user's awareness and +> control. Add respective aliases to modules and request them under these +> aliases so that modprobe's blacklisting mechanism works also for +> classifier modules. (The same pattern exists e.g. for filesystem +> modules.) +> + +**[v1: freezer,sched: do not restore saved_state of a thawed task](http://lore.kernel.org/lkml/20231120-freezer-state-multiple-thaws-v1-0-f2e1dd7ce5a2@quicinc.com/)** + +> This series applies couple fixes to commit 8f0eed4a78a8 ("freezer,sched: +> Use saved_state to reduce some spurious wakeups") which was found while +> testing with legacy cgroup freezer. My original testing was only with +> system-wide freezer. We found that thaw_task could be called on a task +> which was already frozen. Prior to commit 8f0eed4a78a8 ("freezer,sched: +> Use saved_state to reduce some spurious wakeups"), this wasn't an issue +> as kernel would try to wake up TASK_FROZEN, which wouldn't match the +> thawed task state, and no harm done to task. After commit 8f0eed4a78a8 +> ("freezer,sched: Use saved_state to reduce some spurious wakeups"), it +> was possible to overwrite the state of thawed task. +> + +**[v1: sched/eevdf: Avoid NULL in pick_eevdf](http://lore.kernel.org/lkml/20231120073821.1304-1-xuewen.yan@unisoc.com/)** + +> Now in pick_eevdf function, add the pick_first_entity to prevent +> picking null when using eevdf, however, the leftmost may be null. +> As a result, it would cause oops because the se is NULL. +> + +#### 内存管理 + +**[v1: prctl: Get private anonymous memory region name](http://lore.kernel.org/linux-mm/tencent_977CBF8E8CA6234A1B740A35655D5D7EAA0A@qq.com/)** + +> In commit 9a10064f5625 ("mm: add a field to store names for private anony- +> mous memory") add PR_SET_VMA options and PR_SET_VMA_ANON_NAME for the prctl +> system call, then the PR_GET_VMA interface should be provided accordingly, +> which is necessary, as the userspace program usually wants to know what +> VMA name it has configured for the anonymous page. +> + +**[回复: 回复: v1: mm,oom_reaper: avoid run queue_oom_reaper if task is not oom](http://lore.kernel.org/linux-mm/1d84bf0d1aed45bbbc5941483d8e1695@hihonor.com/)** + +> This problem is likely caused by concurrency. I will try to create a concurrent scenario of oom or kill process to reproduce the issue, +> and if discover anything, I will send it here. +> Thank you, Michal and Andrew, for analyzing and discussing the issue. +> + +**[v3: Permission Overlay Extension](http://lore.kernel.org/linux-mm/20231124163510.1835740-1-joey.gouly@arm.com/)** + +> This series implements the Permission Overlay Extension introduced in 2022 +> VMSA enhancements [1]. It is based on v6.7-rc2. +> + +**[v2: fs/Kconfig: Make hugetlbfs a menuconfig](http://lore.kernel.org/linux-mm/20231124151902.1075697-1-peterx@redhat.com/)** + +> Hugetlb vmemmap default option (HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON) +> is a sub-option to hugetlbfs, but it shows in the same level as hugetlbfs +> itself, under "Pesudo filesystems". +> + +**[v1: WIP: mm: precise "mapped shared" vs. "mapped exclusively" detection for PTE-mapped THP / partially-mappable folios](http://lore.kernel.org/linux-mm/20231124132626.235350-1-david@redhat.com/)** + +> Are you interested in some made-up math, new locking primitives and +> slightly unpleasant performance numbers on first sight? :) +> +> Great, you've come to the right place! As an added bonus, we'll be messing +> with Copy-On-Write and mapcounts, which we all love, of course. +> + +**[v2: mm: page_alloc: unreserve highatomic page blocks before oom](http://lore.kernel.org/linux-mm/1700823445-27531-1-git-send-email-quic_charante@quicinc.com/)** + +> __alloc_pages_direct_reclaim() is called from slowpath allocation where +> high atomic reserves can be unreserved after there is a progress in +> reclaim and yet no suitable page is found. Later should_reclaim_retry() +> gets called from slow path allocation to decide if the reclaim needs to +> be retried before OOM kill path is taken. +> + +**[v1: mm: filemap: avoid unnecessary major faults in filemap_fault()](http://lore.kernel.org/linux-mm/20231124023107.571059-1-zhangpeng362@huawei.com/)** + +> The major fault occurred when using mlockall(MCL_CURRENT | MCL_FUTURE) +> in application, which leading to an unexpected performance issue[1]. +> +> This caused by temporarily cleared pte during a read/modify/write update +> of the pte, eg, do_numa_page()/change_pte_range(). +> + +**[v1: mm/Kconfig: Make userfaultfd a menuconfig](http://lore.kernel.org/linux-mm/20231123224204.1060152-1-peterx@redhat.com/)** + +> PTE_MARKER_UFFD_WP is a subconfig for userfaultfd. To make it clear, +> switch to use menuconfig for userfaultfd. +> + +**[v3: mm: memcg: improve vmscan tracepoints](http://lore.kernel.org/linux-mm/20231123193937.11628-1-ddrokosov@salutedevices.com/)** + +> The motivation behind this commit is to enhance the traceability and +> understanding of memcg events. By integrating the function cgroup_ino() +> into the existing memcg tracepoints, this patch series introduces a new +> tracepoint template for the begin() and end() events. It utilizes a new +> entry field ino to store the cgroup ino, enabling developers to easily +> identify the cgroup associated with a specific memcg tracepoint event. +> + +**[v3: cxl: Add support for CXL feature commands, CXL device patrol scrub control and DDR5 ECS control features](http://lore.kernel.org/linux-mm/20231123174355.1176-1-shiju.jose@huawei.com/)** + +> 1. Add support for CXL feature mailbox commands. +> 2. Add CXL device scrub driver supporting patrol scrub control and DDR5 ECS +> control features. +> 3. Add scrub driver supports configuring memory scrubs in the system. +> 4. Add scrub attributes for DDR5 ECS control to the memory scrub driver. +> 5. Register CXL device patrol scrub and ECS with scrub control driver. +> 6. Add documentation for CXL memory device scrub control attributes. +> + +**[v1: hugetlb: parallelize hugetlb page allocation on boot](http://lore.kernel.org/linux-mm/20231123133036.68540-1-gang.li@linux.dev/)** + +> Inspired by these patches [1][2], this series aims to speed up the +> initialization of hugetlb during the boot process through +> parallelization. +> + +**[v1: linux-next:mm, oom:dump_tasks add rss detailed information printing](http://lore.kernel.org/linux-mm/202311231840181856667@zte.com.cn/)** + +> When the system is under oom, it prints out the RSS information of +> each process. However, we don't know the size of rss_anon, rss_file, +> and rss_shmem. +> + +**[v3: samples: introduce cgroup events listeners](http://lore.kernel.org/linux-mm/20231123071945.25811-1-ddrokosov@salutedevices.com/)** + +> To begin with, this patch series relocates the cgroup example code to +> the samples/cgroup directory, which is the appropriate location for such +> code snippets. +> + +**[v5: mm/gup: Introduce pin_user_pages_fd() for pinning shmem/hugetlbfs file pages (v5)](http://lore.kernel.org/linux-mm/20231123064443.1035709-1-vivek.kasireddy@intel.com/)** + +> The first two patches were previously reviewed but not yet merged. +> These ones need to be merged first as the fourth patch depends on +> the changes introduced in them and they also fix bugs seen in +> very specific scenarios (running Qemu with hugetlb=on, blob=true +> and rebooting guest VM). +> + +**[v2: mm: Implement ECC handling for pfn with no struct page](http://lore.kernel.org/linux-mm/20231123003513.24292-1-ankita@nvidia.com/)** + +> The kernel MM currently handles ECC errors / poison only on memory page +> backed by struct page. As part of [1], the nvgrace-gpu-vfio-pci module +> maps the device memory to user VA (Qemu) using remap_pfn_range without +> being added to the kernel. These pages are not backed by struct page. +> + +**[v1: shrinker debugging improvements](http://lore.kernel.org/linux-mm/20231122232515.177833-1-kent.overstreet@linux.dev/)** + +> This patchset does a few things to aid in OOM debugging, in particular +> when shrinkers are involved: +> +> - improves the show_mem OOM report: it now reports on shrinkers, and +> for both shrinkers and slab we only report the top 10 entries, +> sorted, not the full list +> +> - add shrinker_to_text(), for the show_mem report and debugfs, and a +> an optional shrinker.to_text() callback to report extra +> driver-specific information +> + +**[v1: mm: slub, kasan: improve interaction of KASAN and slub_debug poisoning](http://lore.kernel.org/linux-mm/20231122231202.121277-1-andrey.konovalov@linux.dev/)** + +> When both KASAN and slub_debug are enabled, when a free object is being +> prepared in setup_object, slub_debug poisons the object data before KASAN +> initializes its per-object metadata. +> + +**[v1: mm/mbind: Introduce process_mbind() syscall for external memory binding](http://lore.kernel.org/linux-mm/ZV50MX4STKRCohiB@r13-u19.micron.com/)** + +> This patch introduces `process_mbind()` to enable a userspace orchestrator with +> an understanding of another process's memory layout to alter its memory policy. +> As system memory configurations become more and more complex (e.g., DDR+HBM+CXL memories), +> such a userspace orchestrator can explore more advanced techniques to guide memory placement +> to individual NUMA nodes across memory tiers. This allows for a more efficient allocation of +> memory resources, leading to enhanced application performance. +> + +**[v1: mm/mempolicy: Make task->mempolicy externally modifiable via syscall and procfs](http://lore.kernel.org/linux-mm/20231122211200.31620-1-gregory.price@memverge.com/)** + +> The patch set changes task->mempolicy to be modifiable by tasks other +> than just current. +> +> The ultimate goal is to make mempolicy more flexible and extensible, +> such as adding interleave weights (which may need to change at runtime +> due to hotplug events). Making mempolicy externally modifiable allows +> for userland daemons to make runtime performance adjustments to running +> tasks without that software needing to be made numa-aware. +> + +**[v1: kfence: Replace local_clock() with ktime_get_boot_fast_ns()](http://lore.kernel.org/linux-mm/VI1P193MB0752A2F21C050D701945B62799BAA@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM/)** + +> The time obtained by local_clock() is the local CPU time, which may +> drift between CPUs and is not suitable for comparison across CPUs. +> +> It is possible for allocation and free to occur on different CPUs, +> and using local_clock() to record timestamps may cause confusion. +> + +**[v3: kasan: Improve free meta storage in Generic KASAN](http://lore.kernel.org/linux-mm/VI1P193MB0752675D6E0A2D16CE656F8299BAA@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM/)** + +> Currently free meta can only be stored in object if the object is +> not smaller than free meta. +> +> After the improvement, when the object is smaller than free meta and +> SLUB DEBUG is not enabled, it is possible to store part of the free +> meta in the object, reducing the increased size of the red zone. +> + +**[v7: Small-sized THP for anonymous memory](http://lore.kernel.org/linux-mm/20231122162950.3854897-1-ryan.roberts@arm.com/)** + +> Note: I'm resending this at Andrew's suggestion due to having originally sent +> it during LPC. I'm hoping its in a position where the feedback is minor enough +> that I can rework in time for v6.8, but so far haven't had any. +> + +**[v2: mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf](http://lore.kernel.org/linux-mm/20231122141559.4228-1-laoar.shao@gmail.com/)** + +> In a containerized environment, independent memory binding by a user can +> lead to unexpected system issues or disrupt tasks being run by other users +> on the same server. If a user genuinely requires memory binding, we will +> allocate dedicated servers to them by leveraging kubelet deployment. +> + +**[v2: eventfd: simplify signal helpers](http://lore.kernel.org/linux-mm/20231122-vfs-eventfd-signal-v2-0-bd549b14ce0c@kernel.org/)** + +> This simplifies the eventfd_signal() and eventfd_signal_mask() helpers +> significantly. They can be made void and not take any unnecessary +> arguments. +> +> I've added a few more simplifications based on Sean's suggestion. +> + +**[v7: arm64/gcs: Provide support for GCS in userspace](http://lore.kernel.org/linux-mm/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org/)** + +> The arm64 Guarded Control Stack (GCS) feature provides support for +> hardware protected stacks of return addresses, intended to provide +> hardening against return oriented programming (ROP) attacks and to make +> it easier to gather call stacks for applications such as profiling. +> + +**[v1: mm/page.c: move mem_init_print_info() to later place](http://lore.kernel.org/linux-mm/20231122043550.489889-1-bhe@redhat.com/)** + +> Currently if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, only part of +> pages are initialized and added into buddy allocator at early stage. Then +> the system memory information printed by mem_init_print_info() is +> incorrect. The snippets of boot log are pasted here: +> [ 0.059606] mem auto-init: stack:all(zero), heap alloc:off, heap free:off +> [ 0.059622] software IO TLB: area num 64. +> [ 0.143887] Memory: 1767888K/133954872K available (20480K kernel code, 3284K rwdata, 8972K rodata, 4572K init, 4916K bss, 2529756K reserved, 0K cma-reserved) +> [ 0.145111] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64, Nodes=2 +> + +#### 文件系统 + +**[RFC: map multiple blocks per ->map_blocks in iomap writeback](http://lore.kernel.org/linux-fsdevel/20231126124720.1249310-1-hch@lst.de/)** + +> this series overhaults a large chunk of the iomap writeback code with +> the end result that ->map_blocks can now map multiple blocks at a time, +> at least as long as they are all inside the same folio. +> +> Variants of this have passed a lot of testing on XFS, but I haven't even +> starting testing it with other file systems. In terms of performance +> there is a very slight reduction in large write workloads, but the main +> point for me was to enable other work anyway. +> + +**[v1: simpler way to get benefits of "vfs: shave work on failed file open"](http://lore.kernel.org/linux-fsdevel/20231126020834.GC38156@ZenIV/)** + +> IMO 93faf426e3cc "vfs: shave work on failed file open" had gone overboard - +> avoiding an RCU delay in that particular case is fine, but it's done on +> the wrong level. A file that has never gotten FMODE_OPENED will never +> have RCU-accessed references, its final fput() is equivalent to file_free() +> and if it doesn't have FMODE_BACKING either, it can be done from any context +> and won't need task_work treatment. +> + +**[v6: blksnap - block devices snapshots module](http://lore.kernel.org/linux-fsdevel/20231124165933.27580-1-sergei.shtepa@linux.dev/)** + +> I am happy to offer an improved version of the Block Devices Snapshots +> Module. It allows creating non-persistent snapshots of any block devices. +> The main purpose of such snapshots is to provide backups of block devices. +> See more in Documentation/block/blksnap.rst. +> + +**[v3: simplifying fast_dput(), dentry_kill() et.al.](http://lore.kernel.org/linux-fsdevel/20231124060200.GR38156@ZenIV/)** + +> The series below is the fallout of trying to document the dentry +> refcounting and life cycle - basically, getting rid of the bits that +> had been too subtle and ugly to write them up. +> + +**[v1: ext4: use iomap for regular file's buffered IO path and enable large foilo](http://lore.kernel.org/linux-fsdevel/20231123125121.4064694-1-yi.zhang@huaweicloud.com/)** + +> This is a RFC patch set based on 6.6 that partial switch to use iomap +> for regular file's buffered IO path in ext4. Now this only support ext4 +> filesystem with the default features and mount options, didn't support +> inline_data, bigalloc, dax, fs_verity, fs_crypt, and data=journal mode +> yet. I have test it through fstests -g quick with 4K block size and +> some simple performance tests, all the apparent issues have been fixed +> right now. This is just for discussion and check the overall plan is +> feasible or not, I haven't done other tests, there must be some other +> bugs need to fix later. This is the first time I've developed such a +> large feature for ext4, so I hope you would like it, I will keep on +> testing and improving these patches, any comments are helpful. For the +> convenience of review, I split the implements into small patches. +> + +**[v1: scsi: target: core: add missing file_{start,end}_write()](http://lore.kernel.org/linux-fsdevel/20231123092000.2665902-1-amir73il@gmail.com/)** + +> The callers of vfs_iter_write() are required to hold file_start_write(). +> file_start_write() is a no-op for the S_ISBLK() case, but it is really +> needed when the backing file is a regular file. +> + +**[v1: fs/aio: obey min_nr when doing wakeups](http://lore.kernel.org/linux-fsdevel/20231122234257.179390-1-kent.overstreet@linux.dev/)** + +> Unclear who's maintaining fs/aio.c these days - who wants to take this? +> -- >8 -- +> +> I've been observing workloads where IPIs due to wakeups in +> aio_complete() are +> 15% of total CPU time in the profile. Most of those +> wakeups are unnecessary when completion batching is in use in +> io_getevents(). +> + +#### 网络设备 + +**[v4: net-next: netlink: specs: devlink: add some(not all) missing attributes in devlink.yaml](http://lore.kernel.org/netdev/20231126105246.195288-1-swarupkotikalapudi@gmail.com/)** + +> Add some missing(not all) attributes in devlink.yaml. +> +> Re-generate the related devlink-user.[ch] code. +> + +**[v6: add qca8084 ethernet phy driver](http://lore.kernel.org/netdev/20231126060732.31764-1-quic_luoj@quicinc.com/)** + +> QCA8084 is four-port PHY with maximum link capability 2.5G, +> which supports the interface mode qusgmii and sgmii mode, +> there are two PCSs available to connected with ethernet port. +> + +**[v1: net: r8169: prevent potential deadlock in rtl8169_close](http://lore.kernel.org/netdev/1ec5982e-a68d-4837-af56-619e87a59741@gmail.com/)** + +> ndo_stop() is RTNL-protected by net core, and the worker function takes +> RTNL as well. Therefore we will deadlock when trying to execute a +> pending work synchronously. To fix this execute any pending work +> asynchronously. This will do no harm because netif_running() is false +> in ndo_stop(), and therefore the work function is effectively a no-op. +> + +**[v1: net: octeontx2-pf: Restore TC ingress police rules when interface is up](http://lore.kernel.org/netdev/1700930217-5707-1-git-send-email-sbhatta@marvell.com/)** + +> TC ingress policer rules depends on interface receive queue +> contexts since the bandwidth profiles are attached to RQ +> contexts. When an interface is brought down all the queue +> contexts are freed. This in turn frees bandwidth profiles in +> hardware causing ingress police rules non-functional after +> the interface is brought up. Fix this by applying all the ingress +> police rules config to hardware in otx2_open. Also allow +> adding ingress rules only when interface is running +> since no contexts exist for the interface when it is down. +> + +**[v1: net: phylink: set phy_state interface when attaching SFP](http://lore.kernel.org/netdev/8abed37d01d427bf9d27a157860c54375c994ea1.1700887953.git.daniel@makrotopia.org/)** + +> Then the link seemingly comes up (but is dead) because no subsequent +> call to phylink_major_config actually configured MAC and PCS for +> +> This is because phylink_mac_initial_config() considers +> pl->phy_state.interface if in MLO_AN_PHY mode while +> phylink_sfp_set_config() only sets pl->link_config.interface. +> +> Also set pl->phy_state.interface in phylink_sfp_set_config(). +> + +**[v2: iwl-next: i40e: add ability to reset vf for tx and rx mdd events](http://lore.kernel.org/netdev/20231124160804.2672341-1-aleksandr.loktionov@intel.com/)** + +> In cases when vf sends malformed packets that are classified as +> malicious, sometimes it causes tx queue to freeze. This frozen queue can be +> stuck for several minutes being unusable. When mdd event occurs, there is a +> posibility to perform a graceful vf reset to quickly bring vf back to +> operational state. +> + +**[v5: net-next: net: dsa: microchip: enable setting rmii reference](http://lore.kernel.org/netdev/cover.1700841353.git.ante.knezic@helmholz.de/)** + +> KSZ88X3 devices can select between internal and external RMII reference clock. +> This patch series introduces new device tree property for setting reference +> clock to internal. +> + +**[v5: net-next: net: intel: start The Great Code Dedup + Page Pool for iavf](http://lore.kernel.org/netdev/20231124154732.1623518-1-aleksander.lobakin@intel.com/)** + +> Here's a two-shot: introduce Intel Ethernet common library (libie) and +> switch iavf to Page Pool. Details are in the commit messages; here's +> a summary: +> +> Not a secret there's a ton of code duplication between two and more Intel +> ethernet modules. Before introducing new changes, which would need to be +> copied over again, start decoupling the already existing duplicate +> functionality into a new module, which will be shared between several +> Intel Ethernet drivers. The first name that came to my mind was +> "libie" -- "Intel Ethernet common library". Also this sounds like +> + +**[v5: iwl-next: i40e: Simplify VSI and VEB handling](http://lore.kernel.org/netdev/20231124150343.81520-1-ivecera@redhat.com/)** + +> The series simplifies handling of VSIs and VEBs by introducing for-each +> iterating macros, 'find' helper functions. Also removes the VEB +> recursion because the VEBs cannot have sub-VEBs according datasheet and +> fixes the support for floating VEBs. +> + +**[v2: net-next: net/smc: implement SMCv2.1 virtual ISM device support](http://lore.kernel.org/netdev/1700836935-23819-1-git-send-email-guwen@linux.alibaba.com/)** + +> The fourth edition of SMCv2 adds the SMC version 2.1 feature updates for +> SMC-Dv2 with virtual ISM. Virtual ISM are created and supported mainly by +> OS or hypervisor software, comparable to IBM ISM which is based on platform +> firmware or hardware. +> + +**[v1: net-next: Add Marvell CPT CN10KB/CN10KA B0 support](http://lore.kernel.org/netdev/20231124125047.2329693-1-schalla@marvell.com/)** + +> Marvell OcteonTX2's next gen platform CN10KB/CN10KA B0 +> introduced changes in CPT SG input format(SGv2) to make +> it compatibile with NIX SG input format, to support inline +> IPsec in SG mode. +> + +**[v1: net-next: net: phylink: improve PHY validation](http://lore.kernel.org/netdev/ZWCWn+uNkVLPaQhn@shell.armlinux.org.uk/)** + +> One of the issues which has concerned me about the rate matching +> implenentation that we have is that phy_get_rate_matching() returns +> whether rate matching will be used for a particular interface, and we +> enquire only for one interface. +> + +**[v1: iwl-next: ice: Rename E822 to E82X](http://lore.kernel.org/netdev/20231124114555.253412-1-karol.kolacinski@intel.com/)** + +> When code is applicable for both E822 and E823 devices, rename it from +> E822 to E82X. +> ICE_PHY_PER_NAC_E822 was unused, so just remove it. +> + +**[v1: net-next: mm/page_pool: catch page_pool memory leaks](http://lore.kernel.org/netdev/170082101266.1085481.12199867179160710331.stgit@firesoul/)** + +> Pages belonging to a page_pool (PP) instance must be freed through the +> PP APIs in-order to correctly release any DMA mappings and release +> refcnt on the DMA device when freeing PP instance. When PP release a +> page (page_pool_release_page) the page->pp_magic value is cleared. +> + +**[v3: net-next: skbuff: Optimize SKB coalescing for page pool](http://lore.kernel.org/netdev/20231124073439.52626-1-liangchen.linux@gmail.com/)** + +> The combination of the following condition was excluded from skb coalescing: +> +> from->pp_recycle = 1 +> from->cloned = 1 +> to->pp_recycle = 1 +> +> With page pool in use, this combination can be quite common(ex. +> NetworkMananger may lead to the additional packet_type being registered, +> thus the cloning). In scenarios with a higher number of small packets, it +> can significantly affect the success rate of coalescing. +> + +**[v1: rhashtable: Better error message on allocation failure](http://lore.kernel.org/netdev/20231123235949.421106-1-kent.overstreet@linux.dev/)** + +> Memory allocation failures print backtraces by default, but when we're +> running out of a rhashtable worker the backtrace is useless - it doesn't +> tell us which hashtable the allocation failure was for. +> + +**[v2: net-next: tcp: Dump bound-only sockets in inet_diag.](http://lore.kernel.org/netdev/bfb52b5103de808cda022e2d16bac6cf3ef747d6.1700780828.git.gnault@redhat.com/)** + +> Walk the hashinfo->bhash2 table so that inet_diag can dump TCP sockets +> that are bound but haven't yet called connect() or listen(). +> +> This allows ss to dump bound-only TCP sockets, together with listening +> sockets (as there's no specific state for bound-only sockets). This is +> similar to the UDP behaviour for which bound-only sockets are already +> dumped by ss -lu. +> + +**[v1: net-next: packet: Account for VLAN_HLEN in csum_start when virtio_net_hdr is enabled](http://lore.kernel.org/netdev/20231123183835.635210-1-mkp@redhat.com/)** + +> Af_packet provides checksum offload offsets to usermode applications +> through struct virtio_net_hdr when PACKET_VNET_HDR is enabled on the +> socket. For skbuffs with a vlan being sent to a SOCK_RAW socket, +> af_packet will include the link level header and so csum_start needs +> to be adjusted accordingly. +> + +#### 安全增强 + +**[v1: sysctl: constify sysctl ctl_tables](http://lore.kernel.org/linux-hardening/20231125-const-sysctl-v1-0-5e881b0e0290@weissschuh.net/)** + +> The kernel contains a lot of struct ctl_table throught the tree. +> These are very often 'static' definitions. +> It would be good to mark these tables const to avoid accidental or +> malicious modifications. +> Unfortunately the tables can not be made const because the core +> registration functions expect mutable tables. +> + +#### 异步 IO + +**[v1: io_uring/fs: consider link->flags when getting path for LINKAT](http://lore.kernel.org/io-uring/20231120105545.1209530-1-cmirabil@redhat.com/)** + +> In order for `AT_EMPTY_PATH` to work as expected, the fact +> that the user wants that behavior needs to make it to `getname_flags` +> or it will return ENOENT. +> + +#### Rust For Linux + +**[v1: rust: replace with in rust/exports.c](http://lore.kernel.org/rust-for-linux/20231124142617.713096-1-masahiroy@kernel.org/)** + +> is the right header to include for using +> EXPORT_SYMBOL_GPL. includes much more bloat. +> + +**[v8: net-next: Rust abstractions for network PHY drivers](http://lore.kernel.org/rust-for-linux/20231123050412.1012252-1-fujita.tomonori@gmail.com/)** + +> This patchset adds Rust abstractions for phylib. It doesn't fully +> cover the C APIs yet but I think that it's already useful. I implement +> two PHY drivers (Asix AX88772A PHYs and Realtek Generic FE-GE). Seems +> they work well with real hardware. +> + +**[v2: Kunit to check the longest symbol length](http://lore.kernel.org/rust-for-linux/20231119180145.157455-1-sergio.collado@gmail.com/)** + +> The longest length of a symbol (KSYM_NAME_LEN) was increased to 512 +> in the reference [1]. This patch adds a kunit test to check the longest +> symbol length. +> + +#### BPF + +**[v5: bpf-next: selftests/bpf: Use pkg-config to determine ld flags](http://lore.kernel.org/bpf/20231125084253.85025-1-akihiko.odaki@daynix.com/)** + +> When linking statically, libraries may require other dependencies to be +> included to ld flags. In particular, libelf may require libzstd. Use +> pkg-config to determine such dependencies. +> + +**[v1: rethook: Use __rcu pointer for rethook::handler](http://lore.kernel.org/bpf/170078778632.209874.7893551840863388753.stgit@devnote2/)** + +> Since the rethook::handler is an RCU-maganged pointer so that it will +> notice readers the rethook is stopped (unregistered) or not, it should +> be an __rcu pointer and use appropriate functions to be accessed. This +> will use appropriate memory barrier when accessing it. OTOH, rethook::data +> is never changed, so we don't need to check it in get_kretprobe(). +> + +**[GIT PULL: Networking for v6.7-rc3](http://lore.kernel.org/bpf/20231123171825.957077-1-kuba@kernel.org/)** + +> The following changes since commit 7475e51b87969e01a6812eac713a1c8310372e8a: +> +> Merge tag 'net-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2023-11-16 07:51:26 -0500) +> +> are available in the Git repository at: +> +> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git net-6.7-rc3 +> + +**[v1: bpf-next: bpf: add sock_ops callbacks for data send/recv/acked events](http://lore.kernel.org/bpf/20231123030732.111576-1-lulie@linux.alibaba.com/)** + +> Add 3 sock_ops operators, namely BPF_SOCK_OPS_DATA_SEND_CB, +> BPF_SOCK_OPS_DATA_RECV_CB, and BPF_SOCK_OPS_DATA_ACKED_CB. A flag +> BPF_SOCK_OPS_DATA_EVENT_CB_FLAG is provided to minimize the performance +> impact. The flag must be explicitly set to enable these callbacks. +> + +**[v1: kprobes: consistent rcu api usage for kretprobe holder](http://lore.kernel.org/bpf/20231122132058.3359-1-inwardvessel@gmail.com/)** + +> It seems that the pointer-to-kretprobe "rp" within the kretprobe_holder is +> RCU-managed, based on the (non-rethook) implementation of get_kretprobe(). +> The thought behind this patch is to make use of the RCU API where possible +> when accessing this pointer so that the needed barriers are always in place +> and to self-document the code. +> + +**[v1: bpf-next: libbpf: Start v1.4 development cycle](http://lore.kernel.org/bpf/20231123000439.12025-1-eddyz87@gmail.com/)** + +> Bump libbpf.map to v1.4.0 to start a new libbpf version cycle. +> + +**[v1: bpf-next: Verify global subprogs lazily](http://lore.kernel.org/bpf/20231122213112.3596548-1-andrii@kernel.org/)** + +> See patch #2 for justification. In few words, current eager verification of +> global func prevents BPF CO-RE approaches to be applied to global functions. +> + +**[v2: bpf-next: bpf: Relax tracing prog recursive attach rules](http://lore.kernel.org/bpf/20231122191816.5572-1-9erthalion6@gmail.com/)** + +> Currently, it's not allowed to attach an fentry/fexit prog to another +> one of the same type. At the same time it's not uncommon to see a +> tracing program with lots of logic in use, and the attachment limitation +> prevents usage of fentry/fexit for performance analysis (e.g. with +> "bpftool prog profile" command) in this case. An example could be +> falcosecurity libs project that uses tp_btf tracing programs. +> + +**[v1: ipsec-next: Add bpf_xdp_get_xfrm_state() kfunc](http://lore.kernel.org/bpf/cover.1700676682.git.dxu@dxuuu.xyz/)** + +> This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and +> bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and +> xfrm_state_put(). The intent is to support software RSS (via XDP) for +> the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed +> on (hopefully) reproducible AWS testbeds indicate that single tunnel +> pcpu ipsec can reach line rate on 100G ENA nics. +> + +**[v1: C inlined assembly for reproducing max Thanks for your reply. +> The C inlined assembly code is attached. +> I'm using clang-16, but it still fails. +> + +**[v1: bpf: add __printf() to for printf fmt strings](http://lore.kernel.org/bpf/20231122133656.290475-1-ben.dooks@codethink.co.uk/)** + +> The btf_seq_show() and btf_snprintf_show() take a printk format +> string so add a __printf() to these two functions. This fixes the +> following extended warnings: +> +> kernel/bpf/btf.c:7094:29: error: function ‘btf_seq_show’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format] +> kernel/bpf/btf.c:7131:9: error: function ‘btf_snprintf_show’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format] +> + +**[v3: bpf-next: bpf: tcp: Support arbitrary SYN Cookie at TC.](http://lore.kernel.org/bpf/20231121184245.69569-1-kuniyu@amazon.com/)** + +> Under SYN Flood, the TCP stack generates SYN Cookie to remain stateless +> for the connection request until a valid ACK is responded to the SYN+ACK. +> +> The cookie contains two kinds of host-specific bits, a timestamp and +> secrets, so only can it be validated by the generator. It means SYN +> Cookie consumes network resources between the client and the server; +> intermediate nodes must remember which nodes to route ACK for the cookie. +> + +**[v6: x86/bugs: Add a separate config for each mitigation](http://lore.kernel.org/bpf/20231121160740.1249350-1-leitao@debian.org/)** + +> Currently, the CONFIG_SPECULATION_MITIGATIONS is halfway populated, +> where some mitigations have entries in Kconfig, and they could be +> modified, while others mitigations do not have Kconfig entries, and +> could not be controlled at build time. +> + +**[v2: bpf-next: skmsg: Add the data length in skmsg to SIOCINQ ioctl and rx_queue](http://lore.kernel.org/bpf/1700565725-2706-1-git-send-email-yangpc@wangsu.com/)** + +> When using skmsg redirect, the msg is queued in psock->ingress_msg, +> and the application calling SIOCINQ ioctl will return a readable +> length of 0, and we cannot track the data length of ingress_msg with +> the ss tool. +> + +**[v4: bpf: verify callbacks as if they are called unknown number of times](http://lore.kernel.org/bpf/20231121020701.26440-1-eddyz87@gmail.com/)** + +> This was reported previously in [0]. +> The basic idea of the fix is to schedule callback entry state for +> verification in env->head until some identical, previously visited +> state in current DFS state traversal is found. Same logic as with open +> coded iterators, and builds on top recent fixes [1] for those. +> + +**[v2: bpf-next: Complete BPF verifier precision tracking support for register spills](http://lore.kernel.org/bpf/20231121002221.3687787-1-andrii@kernel.org/)** + +> *NOTE* this patch set conflicts with a fix [0] in bpf tree, so this has to +> wait until bpf and bpf-next trees converge to be rebased. I'm still submitting +> it for early review and discussion. +> + +**[v4: Faultable Tracepoints](http://lore.kernel.org/bpf/20231120205418.334172-1-mathieu.desnoyers@efficios.com/)** + +> Wire up the system call tracepoints with Tasks Trace RCU to allow +> the ftrace, perf, and eBPF tracers to handle page faults. +> +> This series does the initial wire-up allowing tracers to handle page +> faults, but leaves out the actual handling of said page faults as future +> work. +> + +**[v1: bpf-next: selftests/bpf: reduce verboseness of reg_bounds selftest logs](http://lore.kernel.org/bpf/20231120180452.145849-1-andrii@kernel.org/)** + +> Reduce verboseness of test_progs' output in reg_bounds set of tests with +> two changes. +> +> First, instead of each different operator (<, <=, >, ...) being it's own +> subtest, combine all different ops for the same (x, y, init_t, cond_t) +> values into single subtest. +> + +**[v2: LSM: Officially support appending LSM hooks after boot.](http://lore.kernel.org/bpf/93b5e861-c1ec-417c-b21e-56d0c4a3ae79@I-love.SAKURA.ne.jp/)** + +> This functionality will be used by TOMOYO security module. +> +> In order to officially use an LSM module, that LSM module has to be +> built into vmlinux. This limitation has been a big barrier for allowing +> distribution kernel users to use LSM modules which the organization who +> builds that distribution kernel cannot afford supporting [1]. Therefore, +> I've been asking for ability to append LSM hooks from LKM-based LSMs so +> that distribution kernel users can use LSMs which the organization who +> builds that distribution kernel cannot afford supporting. +> + +### 周边技术动态 + +#### Qemu + +**[v12: for-9.0: riscv: rv64i/rva22u64 CPUs, RVA22U64 profile support](http://lore.kernel.org/qemu-devel/20231124202353.1187814-1-dbarboza@ventanamicro.com/)** + +> This new version contains naming changes suggested by Drew in v11. We're +> also eliminating riscv_cpu_validate_zic64b() and open-coding it inside +> riscv_cpu_update_named_features() since it's not worth creating a helper +> just to do a single assignment. +> + +**[v1: for-9.0: target/riscv: implement RVA22S64 profile](http://lore.kernel.org/qemu-devel/20231123191532.1101644-1-dbarboza@ventanamicro.com/)** + +> Based-on: 20231123185122.1100436-1-dbarboza@ventanamicro.com +> ("v11: for-9.0: rv64i and rva22u64 CPUs, RVA22U64 profile support") +> +> This series builds upon the RVA22U64 support to add the supervisor mode +> profile RVA22S64 [1]. +> +> Patch 1 adds a new named feature called 'svade', which is a glorified +> way of telling "we do not want svadu". More info in the commit message. +> + +**[v2: linux-user/riscv: Add Zicboz extensions to hwprobe](http://lore.kernel.org/qemu-devel/20231123181300.2140622-1-christoph.muellner@vrull.eu/)** + +> Upstream Linux recently added RISC-V Zicboz support to the hwprobe API. +> This patch introduces this for QEMU's user space emulator. +> + +**[v1: RISC-V: Increase max vlen to 4096](http://lore.kernel.org/qemu-devel/20231123001709.64934-1-patrick@rivosinc.com/)** + +> QEMU currently limits the max vlenb to 1024. GCC sets the upper bound +> to 4096 [1]. There doesn't seem to be an upper bound set by the spec [2] +> so this patch just changes QEMU to match GCC's upper bound. +> + +**[v1: riscv-to-apply queue](http://lore.kernel.org/qemu-devel/20231122053800.1531799-1-alistair.francis@wdc.com/)** + +> The following changes since commit 8fa379170c2a12476021f5f50d6cf3f672e79e7b: +> +> Update version for v8.2.0-rc1 release (2023-11-21 13:56:12 -0500) +> +> are available in the Git repository at: +> +> https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20231122 +> +> for you to fetch changes up to 6bca4d7d1ff2b8857486c3ff31f5c6fc3e3984b4: +> +> target/riscv/cpu_helper.c: Fix mxr bit behavior (2023-11-22 14:03:37 +1000) +> + +**[v4: Support RISC-V IOPMP](http://lore.kernel.org/qemu-devel/20231122053251.440723-1-ethan84@andestech.com/)** + +> This series implements IOPMP specification v1.0.0-draft4 rapid-k model. +> The specification url: +> https://github.com/riscv-non-isa/iopmp-spec/blob/main/riscv_iopmp_specification.pdf +> + +#### U-Boot + +**[v2: risc-v: add ACPI support on QEMU](http://lore.kernel.org/u-boot/20231121152740.24783-1-heinrich.schuchardt@canonical.com/)** + +> QEMU 8.1.2 can create ACPI tables for the RISC-V architecture. +> Allow passing them through to the operating system. +> Provide a new defconfig that enables this. +> +> This series depends on +> + +**[v3: smbios: arm64: Ensure table is written at a good address](http://lore.kernel.org/u-boot/20231121025818.741258-1-sjg@chromium.org/)** + +> U-Boot typically sets up its malloc() pool near the top of memory. On +> ARM64 systems this can result in an SMBIOS table above 4GB which is +> not supported by SMBIOSv2. +> + ## 20231119:第 69 期 ### 内核动态