diff --git a/news/README.md b/news/README.md index e8a2708bdec4abe1f26f6859f0810ee652647ceb..0c30e7ae4c442e8ef7a2f2f952b5c2af680ecddb 100644 --- a/news/README.md +++ b/news/README.md @@ -5,6 +5,1258 @@ * [2022 年](2022.md) * [2023 年 - 上半年](2023-1st-half.md) +## 20230705:第 52 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v7: -next: support allocating crashkernel above 4G explicitly on riscv](http://lore.kernel.org/linux-riscv/20230704212327.1687310-1-chenjiahao16@huawei.com/)** +1 +> On riscv, the current crash kernel allocation logic is trying to +> allocate within 32bit addressible memory region by default, if +> failed, try to allocate without 4G restriction. +> + +**[v1: riscv: Start of DRAM should at least be aligned on PMD size for the direct mapping](http://lore.kernel.org/linux-riscv/20230704121837.248976-1-alexghiti@rivosinc.com/)** + +> So that we do not end up mapping the whole linear mapping using 4K +> pages, which is slow at boot time, and also very likely at runtime. +> +> So make sure we align the start of DRAM on a PMD boundary. +> + +**[v4: Add initialization of clock for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230704091948.85247-4-william.qiu@starfivetech.com/)** + +> This patchset adds initial rudimentary support for the StarFive +> Quad SPI controller driver. And this driver will be used in +> StarFive's VisionFive 2 board. In 6.4, the QSPI_AHB and QSPI_APB +> clocks changed from the default ON state to the default OFF state, +> so these clocks need to be enabled in the driver.At the same time, +> dts patch is added to this series. +> + +**[v1: Add SPI module for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230704091948.85247-1-william.qiu@starfivetech.com/)** + +> This patchset adds initial rudimentary support for the StarFive +> SPI controller. And this driver will be used in StarFive's +> VisionFive 2 board. The first patch constrain minItems of clocks +> for JH7110 SPI and Patch 2 adds support for StarFive JH7110 SPI. +> + +**[v6: Add PLL clocks driver and syscon for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230704064610.292603-1-xingyu.wu@starfivetech.com/)** + +> This patch serises are to add PLL clocks driver and providers by writing +> and reading syscon registers for the StarFive JH7110 RISC-V SoC. And add +> documentation and nodes to describe StarFive System Controller(syscon) +> Registers. This patch serises are based on Linux 6.4. +> + +**[v4: riscv: Allow userspace to directly access perf counters](http://lore.kernel.org/linux-riscv/20230703124647.215952-1-alexghiti@rivosinc.com/)** + +> riscv used to allow direct access to cycle/time/instret counters, +> bypassing the perf framework, this patchset intends to allow the user to +> mmap any counter when accessed through perf. But we can't break the +> existing behaviour so we introduce a sysctl perf_user_access like arm64 +> does, which defaults to the legacy mode described above. +> + +**[v3: RISC-V: Probe DT extension support using riscv,isa-extensions & riscv,isa-base](http://lore.kernel.org/linux-riscv/20230703-repayment-vocalist-e4f3eeac2b2a@wendy/)** + +> Based on my latest iteration of deprecating riscv,isa [1], here's an +> implementation of the new properties for Linux. The first few patches, +> up to "RISC-V: split riscv_fill_hwcap() in 3", are all prep work that +> further tames some of the extension related code, on top of my already +> applied series that cleans up the ISA string parser. +> Perhaps "RISC-V: shunt isa_ext_arr to cpufeature.c" is a bit gratuitous, +> but I figured a bit of coalescing of extension related data structures +> would be a good idea. Note that riscv,isa will still be used in the +> absence of the new properties. Palmer suggested adding a Kconfig option +> to turn off the fallback for DT, which I have gone and done. It's locked +> behind the NONPORTABLE option for good reason. +> + +**[v1: riscv: optimize ELF relocation function in riscv](http://lore.kernel.org/linux-riscv/1688355132-62933-1-git-send-email-lixiaoyun@binary-semi.com/)** + +> The patch can optimize the running times of insmod command by modify ELF +> relocation function. +> In the 5.10 and latest kernel, when install the riscv ELF drivers which +> contains multiple symbol table items to be relocated, kernel takes a lot +> of time to execute the relocation. For example, we install a 3+MB driver +> need 180+s. +> We focus on the riscv architecture handle R_RISCV_HI20 and R_RISCV_LO20 +> type items relocation function in the arch\riscv\kernel\module.c and +> find that there are two-loops in the function. If we modify the begin +> number in the second for-loops iteration, we could save significant time +> for installation. We install the same 3+MB driver could just need 2s. +> + +**[v10: Add non-coherent DMA support for AX45MP](http://lore.kernel.org/linux-riscv/20230702203429.237615-1-prabhakar.mahadev-lad.rj@bp.renesas.com/)** + +> On the Andes AX45MP core, cache coherency is a specification option so it +> may not be supported. In this case DMA will fail. To get around with this +> issue this patch series does the below: +> +> 1] Andes alternative ports is implemented as errata which checks if the +> IOCP is missing and only then applies to CMO errata. One vendor specific +> SBI EXT (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of +> errata. +> + +**[v5: dt-bindings: riscv: deprecate riscv,isa](http://lore.kernel.org/linux-riscv/20230702-eats-scorebook-c951f170d29f@spud/)** + +> When the RISC-V dt-bindings were accepted upstream in Linux, the base +> ISA etc had yet to be ratified. By the ratification of the base ISA, +> incompatible changes had snuck into the specifications - for example the +> Zicsr and Zifencei extensions were spun out of the base ISA. +> + +**[v5: RISCV: Add KVM_GET_REG_LIST API](http://lore.kernel.org/linux-riscv/cover.1688010022.git.haibo1.xu@intel.com/)** + +> KVM_GET_REG_LIST will dump all register IDs that are available to +> KVM_GET/SET_ONE_REG and It's very useful to identify some platform +> regression issue during VM migration. +> + +**[GIT PULL: RISC-V Patches for the 6.5 Merge Window, Part 1](http://lore.kernel.org/linux-riscv/mhng-ebcc1b82-5dd0-4f2d-824e-8d9250374abf@palmer-ri-x1c9/)** + +> The following changes since commit ac9a78681b921877518763ba0e89202254349d1b: +> +> Linux 6.4-rc1 (2023-05-07 13:34:35 -0700) +> +> are available in the Git repository at: +> +> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git tags/riscv-for-linus-6.5-mw1 +> + +**[v1: Add missing pins for RZ/Five SoC](http://lore.kernel.org/linux-riscv/20230630120433.49529-1-prabhakar.mahadev-lad.rj@bp.renesas.com/)** + +> This patch series intends to incorporate the absent port pins P19 to P28, +> which are exclusively available on the RZ/Five SoC. +> + +**[v2: riscv: Add BUG_ON() for no cpu nodes in devicetree](http://lore.kernel.org/linux-riscv/20230630105938.1377262-1-suagrfillet@gmail.com/)** + +> When only the ACPI tables are passed to kernel, the tiny devictree created +> by EFI Stub doesn't provide cpu nodes. +> + +**[v1: riscv: KCFI support](http://lore.kernel.org/linux-riscv/20230629234244.1752366-8-samitolvanen@google.com/)** + +> This series adds KCFI support for RISC-V. KCFI is a fine-grained +> forward-edge control-flow integrity scheme supported in Clang >=16, +> which ensures indirect calls in instrumented code can only branch to +> functions whose type matches the function pointer type, thus making +> code reuse attacks more difficult. +> + +**[v1: RISC-V: Provide a more helpful error message on invalid ISA strings](http://lore.kernel.org/linux-riscv/20230629223502.1924-1-palmer@rivosinc.com/)** + +> This adds a warning for the cases where the ISA string isn't valid. It's still +> above the BUG_ON cut, but hopefully it's at least a bit easier for users. +> + +**[v4: riscv: Discard vector state on syscalls](http://lore.kernel.org/linux-riscv/20230629142228.1125715-1-bjorn@kernel.org/)** + +> The RISC-V vector specification states: +> Executing a system call causes all caller-saved vector registers +> (v0-v31, vl, vtype) and vstart to become unspecified. +> +> The vector registers are set to all 1s, vill is set (invalid), and the +> vector status is set to Dirty. +> + +**[v1: arch,fbdev: Move screen_info into arch/](http://lore.kernel.org/linux-riscv/20230629121952.10559-1-tzimmermann@suse.de/)** + +> The variables screen_info and edid_info provide information about +> the system's screen, and possibly EDID data of the connected display. +> Both are defined and set by architecture code. But both variables are +> declared in non-arch header files. Dependencies are at bease loosely +> tracked. To resolve this, move the global state screen_info and its +> companion edid_info into arch/. Only declare them on architectures +> that define them. List dependencies on the variables in the Kconfig +> files. Also clean up the callers. +> + +**[v1: riscv: BUG_ON() for no cpu nodes in setup_smp](http://lore.kernel.org/linux-riscv/20230629105839.1160895-1-suagrfillet@gmail.com/)** + +> When booting with ACPI tables, the tiny devictree created by +> EFI Stub doesn't provide cpu nodes. +> +> In setup_smp(), of_parse_and_init_cpus() will bug on !found_boot_cpu +> if acpi_disabled. That's unclear, so bug for no cpu nodes before +> of_parse_and_init_cpus(). +> + +**[v8: Add JH7110 USB PHY driver support](http://lore.kernel.org/linux-riscv/20230629075115.11934-1-minda.chen@starfivetech.com/)** + +> This patchset adds USB and PCIe PHY for the StarFive JH7110 SoC. +> The patch has been tested on the VisionFive 2 board. +> + +**[v1: RISC-V: Document the ISA string parsing rules for ACPI](http://lore.kernel.org/linux-riscv/20230629031705.15575-1-palmer@rivosinc.com/)** + +> We've had a ton of issues around the ISA string parsing rules elsewhere +> in RISC-V, so let's at least be clear about what the rules are so we can +> try and avoid more issues. +> + +**[v1: tools/nolibc: shrink arch support](http://lore.kernel.org/linux-riscv/cover.1687976753.git.falcon@tinylab.org/)** + +> This patchset further improves porting of nolibc to new architectures, +> it is based on our previous v5 sysret helper series [1]. +> +> It mainly shrinks the assembly _start by moving most of its operations +> to a C version of _start_c() function. and also, it removes the old +> sys_stat() support by using the sys_statx() instead and therefore, +> removes all of the arch specific sys_stat_struct. +> + +**[v2: RISC-V: archrandom support](http://lore.kernel.org/linux-riscv/20230628131442.3022772-1-sameo@rivosinc.com/)** + +> This patchset adds support for the archrandom API to the RISC-V +> architecture. +> +> The ratified crypto scalar extensions provide entropy bits via the seed +> CSR, as exposed by the Zkr extension. +> + +**[v5: tools/nolibc: add a new syscall helper](http://lore.kernel.org/linux-riscv/cover.1687957589.git.falcon@tinylab.org/)** + +> It mainly applies the core part of suggestions from Thomas (Many thanks) +> and cleans up the multiple whitespaces issues reported by +> scripts/checkpatch.pl. +> + +**[v1: riscv: sigcontext: Correct the comment of sigreturn](http://lore.kernel.org/linux-riscv/20230628091213.2908149-1-guoren@kernel.org/)** + +> The real-time signals enlarged the sigset_t type, and most architectures +> have changed to using rt_sigreturn as the only way. The riscv is one of +> them, and there is no sys_sigreturn in it. Only some old architecture +> preserved sys_sigreturn as part of the historical burden. +> + +**[GIT PULL: RISC-V: make ARCH_THEAD preclude XIP_KERNEL](http://lore.kernel.org/linux-riscv/20230628-left-attractor-94b7bd5fbb83@wendy/)** + +> Randy reported build errors in linux-next where XIP_KERNEL was enabled. +> ARCH_THEAD requires alternatives to support the non-standard ISA +> extensions used by the THEAD cores, which are mutually exclusive with +> XIP kernels. Clone the dependency list from the Allwinner entry, since +> Allwinner's D1 uses T-Head cores with the same non-standard extensions. +> + +**[v1: Make SV39 the default address space](http://lore.kernel.org/linux-riscv/20230627222152.177716-1-charlie@rivosinc.com/)** + +> Make sv39 the default address space for mmap as some applications +> currently depend on this assumption. The RISC-V specification enforces +> that bits outside of the virtual address range are not used, so +> restricting the size of the default address space as such should be +> temporary. A hint address passed to mmap will cause the largest address +> space that fits entirely into the hint to be used. If the hint is less +> than or equal to 1<<38, a 39-bit address will be used. After an address +> space is completely full, the next smallest address space will be used. +> + +**[v3: Add support for Allwinner PWM on D1/T113s/R329 SoCs](http://lore.kernel.org/linux-riscv/20230627082334.1253020-1-privatesub2@gmail.com/)** + +> This series adds support for PWM controller on new +> Allwinner's SoCs, such as D1, T113s and R329. The implemented driver +> provides basic functionality for control PWM channels. +> + +#### 进程调度 + +**[v3: sched/core: introduce sched_core_idle_cpu()](http://lore.kernel.org/lkml/1688011324-42406-1-git-send-email-CruzZhao@linux.alibaba.com/)** + +> As core scheduling introduced, a new state of idle is defined as +> force idle, running idle task but nr_running greater than zero. +> + +**[v1: sched/core: Use empty mask to reset cpumasks in sched_setaffinity()](http://lore.kernel.org/lkml/20230628211637.1679348-1-longman@redhat.com/)** + +> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested +> cpumask"), user provided CPU affinity via sched_setaffinity(2) is +> perserved even if the task is being moved to a different cpuset. However, +> that affinity is also being inherited by any subsequently created child +> processes which may not want or be aware of that affinity. +> + +**[v3: Sched/fair: Block nohz tick_stop when cfs bandwidth in use](http://lore.kernel.org/lkml/20230628190227.894195-1-pauld@redhat.com/)** + +> CFS bandwidth limits and NOHZ full don't play well together. Tasks +> can easily run well past their quotas before a remote tick does +> accounting. This leads to long, multi-period stalls before such +> tasks can run again. Currentlyi, when presented with these conflicting +> requirements the scheduler is favoring nohz_full and letting the tick +> be stopped. However, nohz tick stopping is already best-effort, there +> are a number of conditions that can prevent it, whereas cfs runtime +> bandwidth is expected to be enforced. +> + +#### 内存管理 + +**[v3: MDWE without inheritance](http://lore.kernel.org/linux-mm/20230704153630.1591122-1-revest@chromium.org/)** + +> Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags +> current with a flag that prevents pages that were previously not executable from +> becoming executable. +> This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK) +> + +**[v2: mm/slub: refactor freelist to use custom type](http://lore.kernel.org/linux-mm/20230704135834.3884421-1-matteorizzo@google.com/)** + +> Currently the SLUB code represents encoded freelist entries as "void*". +> That's misleading, those things are encoded under +> CONFIG_SLAB_FREELIST_HARDENED so that they're not actually dereferencable. +> + +**[v1: block: Make blkdev_get_by_*() return handle](http://lore.kernel.org/linux-mm/20230629165206.383-1-jack@suse.cz/)** + +> this patch series implements the idea of blkdev_get_by_*() calls returning +> bdev_handle which is then passed to blkdev_put() [1]. This makes the get +> and put calls for bdevs more obviously matching and allows us to propagate +> context from get to put without having to modify all the users (again!). +> In particular I need to propagate used open flags to blkdev_put() to be able +> count writeable opens and add support for blocking writes to mounted block +> devices. I'll send that series separately. +> + +**[v1: mm: memory-failure: add missing set_mce_nospec() for memory_failure()](http://lore.kernel.org/linux-mm/20230704121948.1331846-1-linmiaohe@huawei.com/)** + +> If memory_failure() succeeds to hwpoison a page, the set_mce_nospec() is +> expected to be called to prevent speculative access to the page by marking +> it not-present. Add such missing call to set_mce_nospec() in async memory +> failure handling scene. +> + +**[v1: mm: page_alloc: avoid false page outside zone error info](http://lore.kernel.org/linux-mm/20230704111823.940331-1-linmiaohe@huawei.com/)** + +> If pfn is outside zone boundaries in the first round, ret will be set +> to 1. But if pfn is changed to inside the zone boundaries in zone span +> seqretry path, ret is still set to 1 leading to false page outside zone +> error info. +> + +**[v3: Documentation: admin-guide: correct "it's" to possessive "its"](http://lore.kernel.org/linux-mm/20230703232024.8069-1-rdunlap@infradead.org/)** + +> Correct 2 uses of "it's" to the possessive "its" as needed. +> + +**[v2: variable-order, large folios for anonymous memory](http://lore.kernel.org/linux-mm/20230703135330.1865927-1-ryan.roberts@arm.com/)** + +> This is v2 of a series to implement variable order, large folios for anonymous +> memory. The objective of this is to improve performance by allocating larger +> chunks of memory during anonymous page faults. See [1] for background. +> + +**[[PATCH v10 rebased on v6.4 00/25] DEPT(Dependency Tracker)](http://lore.kernel.org/linux-mm/20230703094752.79269-1-byungchul@sk.com/)** + +> From now on, I can work on LKML again! I'm wondering if DEPT has been +> helping kernel debugging well even though it's a form of patches yet. +> + +**[v1: mm: make MEMFD_CREATE into a selectable config option](http://lore.kernel.org/linux-mm/20230630-config-memfd-v1-1-9acc3ae38b5a@weissschuh.net/)** + +> The memfd_create() syscall, enabled by CONFIG_MEMFD_CREATE, is useful on +> its own even when not required by CONFIG_TMPFS or CONFIG_HUGETLBFS. +> +> Split it into its own proper bool option that can be enabled by users. +> + +**[v2: Documentation: mm/memfd: vm.memfd_noexec](http://lore.kernel.org/linux-mm/20230629233454.4166842-1-jeffxu@google.com/)** + +> Add documentation for sysctl vm.memfd_noexec +> +> Link:https://lore.kernel.org/linux-mm/CABi2SkXUX_QqTQ10Yx9bBUGpN1wByOi_=gZU6WEy5a8MaQY3Jw@mail.gmail.com/T/ +> + +**[v2: mm/slub: disable slab merging in the default configuration](http://lore.kernel.org/linux-mm/20230629221910.359711-1-julian.pidancet@oracle.com/)** + +> Make CONFIG_SLAB_MERGE_DEFAULT default to n unless CONFIG_SLUB_TINY is +> enabled. Benefits of slab merging is limited on systems that are not +> memory constrained: the memory overhead is low and evidence of its +> effect on cache hotness is hard to come by. +> + +**[v25: crash: Kernel handling of CPU and memory hot un/plug](http://lore.kernel.org/linux-mm/20230629192119.6613-1-eric.devolder@oracle.com/)** + +> This series is dependent upon "refactor Kconfig to consolidate +> KEXEC and CRASH options". +> https://lore.kernel.org/lkml/20230626161332.183214-1-eric.devolder@oracle.com/ +> +> Once the kdump service is loaded, if changes to CPUs or memory occur, +> either by hot un/plug or off/onlining, the crash elfcorehdr must also +> be updated. +> + +**[v1: mm: Always downgrade mmap_lock if requested](http://lore.kernel.org/linux-mm/20230629191414.1215929-1-willy@infradead.org/)** + +> Now that stack growth must always hold the mmap_lock for write, we can +> always downgrade the mmap_lock to read and safely unmap pages from the +> page table, even if we're next to a stack. +> + +**[v1: writeback: Account the number of pages written back](http://lore.kernel.org/linux-mm/20230628185548.981888-1-willy@infradead.org/)** + +> nr_to_write is a count of pages, so we need to decrease it by the number +> of pages in the folio we just wrote, not by 1. Most callers specify +> either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes() +> might end up writing 512x as many pages as it asked for. +> + +**[v24: crash: Kernel handling of CPU and memory hot un/plug](http://lore.kernel.org/linux-mm/20230628185215.40707-1-eric.devolder@oracle.com/)** + +> This series is dependent upon "refactor Kconfig to consolidate +> KEXEC and CRASH options". +> https://lore.kernel.org/lkml/20230626161332.183214-1-eric.devolder@oracle.com/ +> +> Once the kdump service is loaded, if changes to CPUs or memory occur, +> either by hot un/plug or off/onlining, the crash elfcorehdr must also +> be updated. +> + +**[v1: fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.](http://lore.kernel.org/linux-mm/20230628105624.150352-1-lipeng.zhu@intel.com/)** + +> When running UnixBench/Shell Scripts, we observed high false sharing +> for accessing i_mmap against i_mmap_rwsem. +> +> UnixBench/Shell Scripts are typical load/execute command test scenarios, +> the i_mmap will be accessed frequently to insert/remove vma_interval_tree. +> Meanwhile, the i_mmap_rwsem is frequently loaded. Unfortunately, they are +> in the same cacheline. +> + +**[v2: mm/slub: Optimize slub memory usage](http://lore.kernel.org/linux-mm/20230628095740.589893-1-jaypatel@linux.ibm.com/)** + +> In the previous version [1], we were able to reduce slub memory +> wastage, but the total memory was also increasing so to solve +> this problem have modified the patch as follow: +> +> 1) If min_objects * object_size > PAGE_ALLOC_COSTLY_ORDER, then it +> will return with PAGE_ALLOC_COSTLY_ORDER. +> 2) Similarly, if min_objects * object_size < PAGE_SIZE, then it will +> return with slub_min_order. +> 3) Additionally, I changed slub_max_order to 2. There is no specific +> reason for using the value 2, but it provided the best results in +> terms of performance without any noticeable impact. +> + +#### 文件系统 + +**[v2: 0/6: block: Add config option to not allow writing to mounted devices](http://lore.kernel.org/linux-fsdevel/20230704122727.17096-1-jack@suse.cz/)** + +> This is second version of the patches to add config option to not allow writing +> to mounted block devices. For motivation why this is interesting see patch 1/6. +> I've been testing the patches more extensively this time and I've found couple +> of things that get broken by disallowing writes to mounted block devices: +> 1) Bind mounts get broken because get_tree_bdev() / mount_bdev() first try to +> claim the bdev before searching whether it is already mounted. Patch 6 +> reworks the mount code to avoid this problem. +> 2) btrfs mounting is likely having the same problem as 1). It should be fixable +> AFAICS but for now I've left it alone until we settle on the rest of the +> series. +> 3) "mount -o loop" gets broken because util-linux keeps the loop device open +> read-write when attempting to mount it. Hopefully fixable within util-linux. +> 4) resize2fs online resizing gets broken because it tries to open the block +> device read-write only to call resizing ioctl. Trivial to fix within +> e2fsprogs. +> + +**[v1: block: Make blkdev_get_by_*() return handle](http://lore.kernel.org/linux-fsdevel/20230629165206.383-1-jack@suse.cz/)** + +> this patch series implements the idea of blkdev_get_by_*() calls returning +> bdev_handle which is then passed to blkdev_put() [1]. This makes the get +> and put calls for bdevs more obviously matching and allows us to propagate +> context from get to put without having to modify all the users (again!). +> In particular I need to propagate used open flags to blkdev_put() to be able +> count writeable opens and add support for blocking writes to mounted block +> devices. I'll send that series separately. +> + +**[v5: fanotify accounting for fs/splice.c](http://lore.kernel.org/linux-fsdevel/cover.1688393619.git.nabijaczleweli@nabijaczleweli.xyz/)** + +> Previously: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u +> +> In short: +> * most read/write APIs generate ACCESS/MODIFY for the read/written file(s) +> * except the [vm]splice/tee family +> (actually, since 6.4, splice itself /does/ generate events but only +> for the non-pipes being spliced from/to; this commit is Fixes:ed) +> * userspace that registers (i|fa)notify on pipes usually relies on it +> actually working (coreutils tail -f is the primo example) +> * it's sub-optimal when someone with a magic syscall can fill up a +> pipe simultaneously ensuring it will never get serviced +> + +**[[PATCH v10 rebased on v6.4 00/25] DEPT(Dependency Tracker)](http://lore.kernel.org/linux-fsdevel/20230703094752.79269-1-byungchul@sk.com/)** + +> From now on, I can work on LKML again! I'm wondering if DEPT has been +> helping kernel debugging well even though it's a form of patches yet. +> + +**[GIT PULL: iomap: new code for 6.5](http://lore.kernel.org/linux-fsdevel/168831482682.535407.9162875426107097138.stg-ugh@frogsfrogsfrogs/)** + +> Please pull this branch with changes for iomap for 6.5-rc1. +> +> As usual, I did a test-merge with the main upstream branch as of a few +> minutes ago, and didn't see any conflicts. Please let me know if you +> encounter any problems. +> + +**[v1: proc: proc_setattr for /proc/$PID/net](http://lore.kernel.org/linux-fsdevel/20230630140609.263790-1-falcon@tinylab.org/)** + +> Just applied your patchset on v6.4, and then: +> +> - revert the 1st patch: 'selftests/nolibc: drop test chmod_net' manually +> +> - do the 'run' test of nolibc on arm/vexpress-a9 +> + +**[v3: fuse: add a new fuse init flag to relax restrictions in no cache mode](http://lore.kernel.org/linux-fsdevel/20230630094602.230573-1-hao.xu@linux.dev/)** + +> Patch 1 is a fix for private mmap in FOPEN_DIRECT_IO mode +> This is added here together since the later two depends on it. +> Patch 2 is the main dish +> Patch 3 is to maintain direct write logic for shared mmap in FOPEN_DIRECT_IO mode +> + +**[v1: fs: Optimize unixbench's file copy test](http://lore.kernel.org/linux-fsdevel/1688117303-8294-1-git-send-email-zenghongling@kylinos.cn/)** + +> The iomap_set_range_uptodate function checks if the file is a private +> mapping,and if it is, it needs to do something about it.UnixBench's +> file copy tests are mostly share mapping, such a check would reduce +> file copy scores, so we added the unlikely macro for optimization. +> and the score of file copy can be improved after branch optimization. +> + +**[v1: fanotify: disallow mount/sb marks on kernel internal pseudo fs](http://lore.kernel.org/linux-fsdevel/20230629042044.25723-1-amir73il@gmail.com/)** + +> Hopefully, nobody is trying to abuse mount/sb marks for watching all +> anonymous pipes/inodes. +> +> I cannot think of a good reason to allow this - it looks like an +> oversight that dated back to the original fanotify API. +> + +**[GIT PULL: sysctl changes for v6.5-rc1](http://lore.kernel.org/linux-fsdevel/ZJx62RvS9TwjUUCi@bombadil.infradead.org/)** + +> The following changes since commit f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6: +> +> Linux 6.4-rc2 (2023-05-14 12:51:40 -0700) +> +> are available in the Git repository at: +> +> git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/ tags/v6.5-rc1-sysctl-next +> +> for you to fetch changes up to 2f2665c13af4895b26761107c2f637c2f112d8e9: +> +> sysctl: replace child with an enumeration (2023-06-18 02:32:54 -0700) +> + +#### 网络设备 + +**[v3: net: nfp: clean mc addresses in application firmware when closing port](http://lore.kernel.org/netdev/20230705052818.7122-1-louis.peens@corigine.com/)** + +> When moving devices from one namespace to another, mc addresses are +> cleaned in software while not removed from application firmware. Thus +> the mc addresses are remained and will cause resource leak. +> + +**[v2: iwl-net: ice: prevent call trace during reload](http://lore.kernel.org/netdev/20230705040510.906029-1-michal.swiatkowski@linux.intel.com/)** + +> Calling ethtool during reload can lead to call trace, because VSI isn't +> configured for some time, but netdev is alive. +> +> To fix it add rtnl lock for VSI deconfig and config. Set ::num_q_vectors +> to 0 after freeing and add a check for ::tx/rx_rings in ring related +> ethtool ops. +> +> Add proper unroll of filters in ice_start_eth(). +> + +**[v1: net: octeontx2-af: Promisc enable/disable through mbox](http://lore.kernel.org/netdev/20230705033813.2744357-1-rkannoth@marvell.com/)** + +> In Legacy silicon, promisc mode is only modified +> through CGX mbox messages. In CN10KB silicon, it modified +> from CGX mbox and NIX. This breaks legacy application +> behaviour. Fix this by removing call from NIX. +> + +**[v2: vduse: add support for networking devices](http://lore.kernel.org/netdev/20230704164045.39119-1-maxime.coquelin@redhat.com/)** + +> This small series enables virtio-net device type in VDUSE. +> With it, basic operation have been tested, both with +> virtio-vdpa and vhost-vdpa using DPDK Vhost library series +> adding VDUSE support using split rings layout (merged in +> DPDK v23.07-rc1). +> + +**[v1: net: ftmac100: add multicast filtering possibility](http://lore.kernel.org/netdev/20230704154053.3475336-1-saproj@gmail.com/)** + +> If netdev_mc_count() is not zero and not IFF_ALLMULTI, filter +> incoming multicast packets. The chip has a Multicast Address Hash Table +> for allowed multicast addresses, so we fill it. +> + +**[v1: net: sched: Undo tcf_bind_filter in case of errors in set callbacks](http://lore.kernel.org/netdev/20230704151456.52334-1-victor@mojatatu.com/)** + +> Five different classifier (fw, bpf, u32, matchall, and flower) are +> calling tcf_bind_filter in their callbacks, but weren't undoing it by +> calling tcf_unbind_filter if their was an error after binding. +> +> This patch set fixes all this by calling tcf_unbind_filter in such +> cases. +> + +**[v5: bpf-next: Add SO_REUSEPORT support for TC bpf_sk_assign](http://lore.kernel.org/netdev/20230613-so-reuseport-v5-0-f6686a0dbce0@isovalent.com/)** + +> We want to replace iptables TPROXY with a BPF program at TC ingress. +> To make this work in all cases we need to assign a SO_REUSEPORT socket +> to an skb, which is currently prohibited. This series adds support for +> such sockets to bpf_sk_assing. +> + +**[v1: resubmit: net: fec: Refactor: rename `adapter` to `fep`](http://lore.kernel.org/netdev/20230704114058.5785-1-csokas.bence@prolan.hu/)** + +> Rename local `struct fec_enet_private *adapter` to `fep` in `fec_ptp_gettime()` to match the rest of the driver +> + +**[v1: igb: Add support for AF_XDP zero-copy](http://lore.kernel.org/netdev/20230704095915.9750-1-sriram.yagnaraman@est.tech/)** + +> Disclaimer: My first patches to Intel drivers, implemented AF_XDP +> zero-copy feature which seemed to be missing for igb. Not sure if it was +> a conscious choice to not spend time implementing this for older +> devices, nevertheless I send them to the list for review. +> + +**[v1: net: phy: at803x: support qca8081 1G version chip](http://lore.kernel.org/netdev/20230704090016.7757-1-quic_luoj@quicinc.com/)** + +> This patch series add supporting qca8081 1G version chip, the 1G version +> chip can be identified by the register mmd7.0x901d bit0. +> + +**[v1: net-next: bnxt_en: use dev_consume_skb_any() in bnxt_tx_int](http://lore.kernel.org/netdev/20230704085236.9791-1-imagedong@tencent.com/)** + +> Replace dev_kfree_skb_any() with dev_consume_skb_any() in bnxt_tx_int() +> to clear the unnecessary noise of "kfree_skb" event. +> + +**[v2: net: dsa: SERDES support for mv88e632x family](http://lore.kernel.org/netdev/20230704065916.132486-1-michael.haener@siemens.com/)** + +> This patch series brings SERDES support for the mv88e632x family. +> + +**[v1: can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock](http://lore.kernel.org/netdev/20230704064710.3189-1-astrajoan@yahoo.com/)** + +> The following 3 locks would race against each other, causing the +> deadlock situation in the Syzbot bug report: +> +> - j1939_socks_lock +> - active_session_list_lock +> - sk_session_queue_lock +> +> A reasonable fix is to change j1939_socks_lock to an rwlock, since in +> the rare situations where a write lock is required for the linked list +> that j1939_socks_lock is protecting, the code does not attempt to +> acquire any more locks. This would break the circular lock dependency, +> where, for example, the current thread already locks j1939_socks_lock +> and attempts to acquire sk_session_queue_lock, and at the same time, +> another thread attempts to acquire j1939_socks_lock while holding +> sk_session_queue_lock. +> + +**[v2: bpf-next: XDP metadata via kfuncs for ice](http://lore.kernel.org/netdev/20230703181226.19380-1-larysa.zaremba@intel.com/)** + +> This series introduces XDP hints via kfuncs [0] to the ice driver. +> +> Series brings the following existing hints to the ice driver: +> - HW timestamp +> - RX hash with type +> +> Series also introduces new hints and adds their implementation +> to ice and veth: +> - VLAN tag with protocol +> - Checksum level +> + +**[v1: net: Replace strlcpy with strscpy](http://lore.kernel.org/netdev/20230703175840.3706231-1-azeemshaikh38@gmail.com/)** + +> strlcpy() reads the entire source buffer first. +> This read may exceed the destination size limit. +> This is both inefficient and can lead to linear read +> overflows if a source string is not NUL-terminated [1]. +> In an effort to remove strlcpy() completely [2], replace +> strlcpy() here with strscpy(). +> No return values were used, so direct replacement is safe. +> + +**[v1: bpf, net: Allow setting SO_TIMESTAMPING* from BPF](http://lore.kernel.org/netdev/20230703175048.151683-1-jthinz@mailbox.tu-berlin.de/)** + +> BPF applications, e.g., a TCP congestion control, might benefit from +> precise packet timestamps. These timestamps are already available in +> __sk_buff and bpf_sock_ops, but could not be requested: A BPF program +> was not allowed to set SO_TIMESTAMPING* on a socket. This change enables +> BPF programs to actively request the generation of timestamps from a +> stream socket. +> + +**[v1: bpf-next: xsk: honor SO_BINDTODEVICE on bind](http://lore.kernel.org/netdev/20230703175329.3259672-1-i.maximets@ovn.org/)** + +> Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. +> A privileged process might create the socket and pass it to a +> non-privileged process for later use. However, that process will be +> able to bind the socket to any network interface. Even though it will +> not be able to receive any traffic without modification of the BPF map, +> the situation is not ideal. +> + +**[v3: octeontx2-pf: Add additional check for MCAM rules](http://lore.kernel.org/netdev/20230703170054.2152662-1-sumang@marvell.com/)** + +> Due to hardware limitation, MCAM drop rule with +> ether_type == 802.1Q and vlan_id == 0 is not supported. Hence rejecting +> such rules. +> + +**[v1: netconsole: Append kernel version to message](http://lore.kernel.org/netdev/20230703154155.3460313-1-leitao@debian.org/)** + +> Create a new netconsole Kconfig option that prepends the kernel version in +> the netconsole message. This is useful to map kernel messages to kernel +> version in a simple way, i.e., without checking somewhere which kernel +> version the host that sent the message is using. +> + +**[v2: nf: netfilter: conntrack: Avoid nf_ct_helper_hash uses after free](http://lore.kernel.org/netdev/20230703145216.1096265-1-revest@chromium.org/)** + +> If nf_conntrack_init_start() fails (for example due to a +> register_nf_conntrack_bpf() failure), the nf_conntrack_helper_fini() +> clean-up path frees the nf_ct_helper_hash map. +> + +**[v1: vdpa: reject F_ENABLE_AFTER_DRIVER_OK if backend does not support it](http://lore.kernel.org/netdev/20230703142218.362549-1-eperezma@redhat.com/)** + +> With the current code it is accepted as long as userland send it. +> +> Although userland should not set a feature flag that has not been +> offered to it with VHOST_GET_BACKEND_FEATURES, the current code will not +> complain for it. +> + +**[v1: Add a driver for the Marvell 88Q2110 PHY](http://lore.kernel.org/netdev/20230703124440.391970-1-eichest@gmail.com/)** + +> Add support for 1000BASE-T1 to the phy_device driver and add a first +> + +**[[net PATCH] octeontx2-af: Install TC filter rules in hardware based on priority](http://lore.kernel.org/netdev/20230703120536.2148918-1-sumang@marvell.com/)** + +> As of today, hardware does not support installing tc filter +> rules based on priority. This patch fixes the issue and install +> the hardware rules based on priority. The final hardware rules +> will not be dependent on rule installation order, it will be strictly +> priority based, same as software. +> + +**[v1: net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX](http://lore.kernel.org/netdev/20230703110842.590282-1-linma@zju.edu.cn/)** + +> The attribute TCA_PEDIT_PARMS_EX is not be included in pedit_policy and +> one malicious user could fake a TCA_PEDIT_PARMS_EX whose length is +> smaller than the intended sizeof(struct tc_pedit). Hence, the +> dereference in tcf_pedit_init() could access dirty heap data. +> + +**[[net PATCH V2] octeontx2-pf: Add additional check for MCAM rules.](http://lore.kernel.org/netdev/20230703095600.2048397-1-sumang@marvell.com/)** + +> Due to hardware limitation, MCAM drop rule with +> ether_type == 802.1Q and vlan_id == 0 is not supported. Hence rejecting +> such rules. +> + +**[v1: I3C MCTP net driver](http://lore.kernel.org/netdev/20230703053048.275709-1-matt@codeconstruct.com.au/)** + +> This series adds an I3C transport for the kernel's MCTP network +> protocol. MCTP is a communication protocol between system components +> (BMCs, drives, NICs etc), with higher level protocols such as NVMe-MI or +> PLDM built on top of it (in userspace). It runs over various transports +> such as I2C, PCIe, or I3C. +> + +**[v4: wifi:mac80211: Replace the ternary conditional operator with conditional-statements](http://lore.kernel.org/netdev/20230703030200.1067-1-youkangren@vivo.com/)** + +> Replacing ternary conditional operators with conditional statements +> ensures proper expression of meaning while making it easier for +> the compiler to generate code. +> + +**[v5: vsock: MSG_ZEROCOPY flag support](http://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/)** + +> Difference with copy way is not significant. During packet allocation, +> non-linear skb is created and filled with pinned user pages. +> There are also some updates for vhost and guest parts of transport - in +> both cases i've added handling of non-linear skb for virtio part. vhost +> copies data from such skb to the guest's rx virtio buffers. In the guest, +> virtio transport fills tx virtio queue with pages from skb. +> + +**[v5: vsock: enable setting SO_ZEROCOPY](http://lore.kernel.org/netdev/20230701062310.3397129-14-AVKrasnov@sberdevices.ru/)** + +> For AF_VSOCK, zerocopy tx mode depends on transport, so this option must +> be set in AF_VSOCK implementation where transport is accessible (if +> transport is not set during setting SO_ZEROCOPY: for example socket is +> not connected, then SO_ZEROCOPY will be enabled, but once transport will +> be assigned, support of this type of transmission will be checked). +> + +**[v1: selftests/net: Add xt_policy config for xfrm_policy test](http://lore.kernel.org/netdev/20230701044103.1096039-1-daniel.diaz@linaro.org/)** + +> This is because IPsec "policy" match support is not available +> to the kernel. +> +> This patch adds CONFIG_NETFILTER_XT_MATCH_POLICY as a module +> to the selftests/net/config file, so that `make +> kselftest-merge` can take this into consideration. +> + +**[v1: Add virtio_rtc module and related changes](http://lore.kernel.org/netdev/20230630171052.985577-1-peter.hilber@opensynergy.com/)** + +> This patch series adds the virtio_rtc module, and related bugfixes and +> small interface extensions. The virtio_rtc module implements a driver +> compatible with the proposed Virtio RTC device specification [1]. The +> Virtio RTC (Real Time Clock) device provides information about current +> time. The device can provide different clocks, e.g. for the UTC or TAI time +> standards, or for physical time elapsed since some past epoch. The driver +> can read the clocks with simple or more accurate methods. +> + +#### 安全增强 + +**[v1: pstore: Replace crypto API compression with zlib calls](http://lore.kernel.org/linux-hardening/20230704135211.2471371-1-ardb@kernel.org/)** + +> The pstore layer implements support for compression of kernel log +> output, using a variety of compressions algorithms provided by the +> [deprecated] crypto API 'comp' interface. +> +> This appears to have been somebody's pet project rather than a solution +> to a real problem: the original deflate compression is reasonably fast, +> compressed well and is comparatively small in terms of code footprint, +> and so the flexibility that the crypto API integration provides does +> little more than complicate the code for no reason. +> + +**[v1: Revert "fortify: Allow KUnit test to build without FORTIFY"](http://lore.kernel.org/linux-hardening/20230703220210.never.615-kees@kernel.org/)** + +> The standard for KUnit is to not build tests at all when required +> functionality is missing, rather than doing test "skip". Restore this +> for the fortify tests, so that architectures without +> CONFIG_ARCH_HAS_FORTIFY_SOURCE do not emit unsolvable warnings. +> + +**[v1: wifi: mt76: Replace strlcpy with strscpy](http://lore.kernel.org/linux-hardening/20230703181256.3712079-1-azeemshaikh38@gmail.com/)** + +> strlcpy() reads the entire source buffer first. +> This read may exceed the destination size limit. +> This is both inefficient and can lead to linear read +> overflows if a source string is not NUL-terminated [1]. +> In an effort to remove strlcpy() completely [2], replace +> strlcpy() here with strscpy(). +> + +**[v1: kobject: Replace strlcpy with strscpy](http://lore.kernel.org/linux-hardening/20230703180528.3709258-1-azeemshaikh38@gmail.com/)** + +> strlcpy() reads the entire source buffer first. +> This read may exceed the destination size limit. +> This is both inefficient and can lead to linear read +> overflows if a source string is not NUL-terminated [1]. +> In an effort to remove strlcpy() completely [2], replace +> strlcpy() here with strscpy(). +> + +**[v1: kyber, blk-wbt: Replace strlcpy with strscpy](http://lore.kernel.org/linux-hardening/20230703172159.3668349-1-azeemshaikh38@gmail.com/)** + +> This patch series replaces strlcpy in the kyber and blk-wbt tracing subsystems wherever trivial +> replacement is possible, i.e return value from strlcpy is unused. The patches +> themselves are independent of each other and are applied to different subsystems. They are +> included as a series for ease of review. +> + +**[v1: perf: Replace strlcpy with strscpy](http://lore.kernel.org/linux-hardening/20230703165817.2840457-1-azeemshaikh38@gmail.com/)** + +> strlcpy() reads the entire source buffer first. +> This read may exceed the destination size limit. +> This is both inefficient and can lead to linear read +> overflows if a source string is not NUL-terminated [1]. +> In an effort to remove strlcpy() completely [2], replace +> strlcpy() here with strscpy(). +> No return values were used, so direct replacement is safe. +> + +**[v1: next: media: venus: Use struct_size_t() helper in pkt_session_unset_buffers()](http://lore.kernel.org/linux-hardening/ZKBfoqSl61jfpO2r@work/)** + +> Prefer struct_size_t() over struct_size() when no pointer instance +> of the structure type is present. +> + +**[v2: pid: Replace struct pid 1-element array with flex-array](http://lore.kernel.org/linux-hardening/20230630180418.gonna.286-kees@kernel.org/)** + +> For pid namespaces, struct pid uses a dynamically sized array member, +> "numbers". This was implemented using the ancient 1-element fake flexible +> array, which has been deprecated for decades. Replace it with a C99 +> flexible array, refactor the array size calculations to use struct_size(), +> and address elements via indexes. Note that the static initializer (which +> defines a single element) works as-is, and requires no special handling. +> + +**[[GIT PULL v2] flexible-array transformations for 6.5-rc1](http://lore.kernel.org/linux-hardening/ZJ8C4PtPrxr6LTA7@work/)** + +> The following changes since commit f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6: +> +> Linux 6.4-rc2 (2023-05-14 12:51:40 -0700) +> +> are available in the Git repository at: +> +> git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git tags/flex-array-transformations-6.5-rc1 +> + +**[v3: Add documentation for sysctl vm.memfd_noexec](http://lore.kernel.org/linux-hardening/20230630032535.625390-1-jeffxu@google.com/)** + +> Add documentation for sysctl vm.memfd_noexec +> +> Thanks to Dominique Martinet who reported this. +> see [1] for context. +> +> [1] https://lore.kernel.org/linux-mm/CABi2SkXUX_QqTQ10Yx9bBUGpN1wByOi_=gZU6WEy5a8MaQY3Jw@mail.gmail.com/T/ +> + +**[v1: usb: ch9: Replace bmSublinkSpeedAttr 1-element array with flexible array](http://lore.kernel.org/linux-hardening/20230629190900.never.787-kees@kernel.org/)** + +> Since commit df8fc4e934c1 ("kbuild: Enable -fstrict-flex-arrays=3"), +> UBSAN_BOUNDS no longer pretends 1-element arrays are unbounded. Walking +> bmSublinkSpeedAttr will trigger a warning, so make it a proper flexible +> array. Add a union to keep the struct size identical for userspace in +> case anything was depending on the old size. +> + +**[v1: next: scsi: aacraid: Replace one-element array with flexible-array member in struct user_sgmap](http://lore.kernel.org/linux-hardening/2ebb702f25c4764fb36ab29f4f40728e12b0e42b.1687974498.git.gustavoars@kernel.org/)** + +> Replace one-element array with flexible-array member in struct +> user_sgmap and refactor the rest of the code, accordingly. +> +> Issue found with the help of Coccinelle and audited and fixed, +> manually. +> +> This results in no differences in binary output. +> + +**[v1: next: scsi: aacraid: Use struct_size() helper in code related to struct sgmapraw](http://lore.kernel.org/linux-hardening/be2e5ecf1c4410ab419e2290341fbc8a0e2ba963.1687974498.git.gustavoars@kernel.org/)** + +> Prefer struct_size() over open-coded versions. +> + +**[v1: next: scsi: aacraid: Use struct_size() helper in aac_get_safw_ciss_luns()](http://lore.kernel.org/linux-hardening/cd80ea8f2446fe62ec15ffb0bbcecb69e0c342af.1687974498.git.gustavoars@kernel.org/)** + +> Prefer struct_size() over open-coded versions. +> +> This results in no differences in binary output. +> + +**[v1: next: scsi: aacraid: Replace one-element arrays with flexible-array members](http://lore.kernel.org/linux-hardening/cover.1687974498.git.gustavoars@kernel.org/)** + +> This series aims to replace one-element arrays with flexible-array +> members in multiple structures in drivers/scsi/aacraid/aacraid.h. +> +> This helps with the ongoing efforts to globally enable -Warray-bounds +> and get us closer to being able to tighten the FORTIFY_SOURCE routines +> on memcpy(). +> +> These issues were found with the help of Coccinelle and audited and fixed, +> manually. +> + +**[GIT PULL: flexible-array transformations for 6.5-rc1](http://lore.kernel.org/linux-hardening/ZJxZJDUDs1ry84Rc@work/)** + +> The following changes since commit f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6: +> +> Linux 6.4-rc2 (2023-05-14 12:51:40 -0700) +> +> are available in the Git repository at: +> +> git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git tags/flex-array-transformations-6.5-rc1 +> + +**[v1: pstore: ramoops: support pmsg size larger than kmalloc limitation](http://lore.kernel.org/linux-hardening/20230627202540.881909-2-yuxiaozhang@google.com/)** + +> Current pmsg implementation is using kmalloc for pmsg record buffer, +> which has max size limits based on page size. Currently even we +> allocate enough space with pmsg-size, pmsg will still fail if the +> file size is larger than what kmalloc allowed. +> + +**[v4: Randomized slab caches for kmalloc()](http://lore.kernel.org/linux-hardening/20230626031835.2279738-1-gongruiqi@huaweicloud.com/)** + +> When exploiting memory vulnerabilities, "heap spraying" is a common +> technique targeting those related to dynamic memory allocation (i.e. the +> "heap"), and it plays an important role in a successful exploitation. +> Basically, it is to overwrite the memory area of vulnerable object by +> triggering allocation in other subsystems or modules and therefore +> getting a reference to the targeted memory location. It's usable on +> various types of vulnerablity including use after free (UAF), heap out- +> of-bound write and etc. +> + +#### 异步 IO + +**[v3: Add a sysctl to disable io_uring system-wide](http://lore.kernel.org/io-uring/20230630151003.3622786-1-matteorizzo@google.com/)** + +> Over the last few years we've seen many critical vulnerabilities in +> io_uring[1] which could be exploited by an unprivileged process to gain +> control over the kernel. This patch introduces a new sysctl which disables +> the creation of new io_uring instances system-wide. +> + +**[v1: io_uring: Add {} to maintain consistency in code format](http://lore.kernel.org/io-uring/20230630062512.10724-1-luhongfei@vivo.com/)** + +> In io_issue_sqe, the if (ret == IOU_OK) branch uses {}, so to maintain code +> format consistency, it is better to add {} in the else branch. +> + +**[v4: io_uring: Add io_uring command support for sockets](http://lore.kernel.org/io-uring/20230627134424.2784797-1-leitao@debian.org/)** + +> Enable io_uring commands on network sockets. Create two new +> SOCKET_URING_OP commands that will operate on sockets. +> +> In order to call ioctl on sockets, use the file_operations->io_uring_cmd +> callbacks, and map it to a uring socket function, which handles the +> SOCKET_URING_OP accordingly, and calls socket ioctls. +> + +#### Rust For Linux + +**[v1: rust: types: make `Opaque` be `!Unpin`](http://lore.kernel.org/rust-for-linux/20230630150216.109789-1-benno.lossin@proton.me/)** + +> Adds a `PhantomPinned` field to `Opaque`. This removes the last Rust +> guarantee: the assumption that the type `T` can be freely moved. This is +> not the case for many types from the C side (e.g. if they contain a +> `struct list_head`). This change removes the need to add a +> `PhantomPinned` field manually to Rust structs that contain C structs +> which must not be moved. +> + +**[v1: rust: macros: add `paste!` proc macro](http://lore.kernel.org/rust-for-linux/20230628171108.1150742-1-gary@garyguo.net/)** + +> This macro provides a flexible way to concatenated identifiers together +> and it allows the resulting identifier to be used to declare new items, +> which `concat_idents!` does not allow. It also allows identifiers to be +> transformed before concatenated. +> + +**[v1: rust: build: Define MODULE macro iif the CONFIG_MODULES is enabled](http://lore.kernel.org/rust-for-linux/20230627121422.112246-1-wangrui@loongson.cn/)** + +> The LoongArch does not currently support modules when built with clang. +> A pre-processor error is expected on building modules, that's caused by: +> +> #if defined(MODULE) && defined(CONFIG_AS_HAS_EXPLICIT_RELOCS) +> # if __has_attribute(model) +> # define PER_CPU_ATTRIBUTES __attribute__((model("extreme"))) +> # else +> # error compiler support for the model attribute is necessary when a recent assembler is used +> # endif +> #endif +> + +**[v2: rust: alloc: Add realloc and alloc_zeroed to the GlobalAlloc impl](http://lore.kernel.org/rust-for-linux/20230625232528.89306-1-boqun.feng@gmail.com/)** + +> While there are default impls for these methods, using the respective C +> api's is faster. Currently neither the existing nor these new +> GlobalAlloc method implementations are actually called. Instead the +> __rust_* function defined below the GlobalAlloc impl are used. With +> rustc 1.71 these functions will be gone and all allocation calls will go +> through the GlobalAlloc implementation. +> + +**[v1: Rust device mapper abstractions](http://lore.kernel.org/rust-for-linux/20230625121657.3631109-1-changxian.cqs@antgroup.com/)** + +> This is a version of device mapper abstractions. Based on +> these, we also implement a linear target as a PoC. +> Any suggestions are welcomed, thanks! +> + +#### BPF + +**[v3: um: vector: Replace undo_user_init in old code with out_free_netdev](http://lore.kernel.org/bpf/20230704042942.3984-1-duminjie@vivo.com/)** + +> Thanks for your response and suggestions, +> I made some mistakes. This is a resubmitted patch. +> I got some errors with my local repository, +> so I lost the commit SHA-1 ID. +> + +**[v9: bpf-next: selftests/bpf: Add benchmark for bpf memory allocator](http://lore.kernel.org/bpf/20230704025039.938914-1-houtao@huaweicloud.com/)** + +> The benchmark could be used to compare the performance of hash map +> operations and the memory usage between different flavors of bpf memory +> allocator (e.g., no bpf ma vs bpf ma vs reuse-after-gp bpf ma). It also +> could be used to check the performance improvement or the memory saving +> provided by optimization. +> + +**[v1: bpf, net: Allow setting SO_TIMESTAMPING* from BPF](http://lore.kernel.org/bpf/20230703175048.151683-1-jthinz@mailbox.tu-berlin.de/)** + +> BPF applications, e.g., a TCP congestion control, might benefit from +> precise packet timestamps. These timestamps are already available in +> __sk_buff and bpf_sock_ops, but could not be requested: A BPF program +> was not allowed to set SO_TIMESTAMPING* on a socket. This change enables +> BPF programs to actively request the generation of timestamps from a +> stream socket. +> + +**[v1: x86/BPF: Add new BPF helper call bpf_rdtsc](http://lore.kernel.org/bpf/20230703105745.1314475-1-tero.kristo@linux.intel.com/)** + +> This patch series adds a new x86 arch specific BPF helper, bpf_rdtsc() +> which can be used for reading the hardware time stamp counter (TSC.) +> Currently the same counter is directly accessible from userspace +> (using RDTSC instruction), and kernel space using various rdtsc_*() +> APIs, however eBPF lacks the support. +> + +**[v1: fs: Add kfuncs to handle idmapped mounts](http://lore.kernel.org/bpf/c35fbb4cb0a3a9b4653f9a032698469d94ca6e9c.1688123230.git.legion@kernel.org/)** + +> Since the introduction of idmapped mounts, file handling has become +> somewhat more complicated. If the inode has been found through an +> idmapped mount the idmap of the vfsmount must be used to get proper +> i_uid / i_gid. This is important, for example, to correctly take into +> account idmapped files when caching, LSM or for an audit. +> + +**[[v3 PATCH bpf-next 0/6] bpf: add percpu stats for bpf_map](http://lore.kernel.org/bpf/20230630082516.16286-1-aspsk@isovalent.com/)** + +> This series adds a mechanism for maps to populate per-cpu counters on +> insertions/deletions. The sum of these counters can be accessed by a new kfunc +> from map iterator and tracing programs. +> + +**[v5: RFC: introduce page_pool_alloc() API](http://lore.kernel.org/bpf/20230629120226.14854-1-linyunsheng@huawei.com/)** + +> In [1] & [2] & [3], there are usecases for veth and virtio_net +> to use frag support in page pool to reduce memory usage, and it +> may request different frag size depending on the head/tail +> room space for xdp_frame/shinfo and mtu/packet size. When the +> requested frag size is large enough that a single page can not +> be split into more than one frag, using frag support only have +> performance penalty because of the extra frag count handling +> for frag support. +> + +**[v1: bpf-next: bpf: Support new insns from cpu v4](http://lore.kernel.org/bpf/20230629063715.1646832-1-yhs@fb.com/)** + +> This patch set added kernel support for insns proposed in [1] except +> BPF_ST which already has full kernel support. Beside the above proposed +> insns, LLVM will generate BPF_ST insn as well under -mcpu=v4 ([2]). +> +> The patchset implements interpreter and jit support for these new +> insns. It has minimum verifier support in order to pass bpf selftests. +> More work will be required to cover verification and other aspects +> (e.g. blinding, etc.). +> + +**[[PATCH RESEND v3 bpf-next 00/14] BPF token](http://lore.kernel.org/bpf/20230629051832.897119-1-andrii@kernel.org/)** + +> This patch set introduces new BPF object, BPF token, which allows to delegate +> a subset of BPF functionality from privileged system-wide daemon (e.g., +> systemd or any other container manager) to a *trusted* unprivileged +> application. Trust is the key here. This functionality is not about allowing +> unconditional unprivileged BPF usage. Establishing trust, though, is +> completely up to the discretion of respective privileged application that +> would create a BPF token, as different production setups can and do achieve it +> through a combination of different means (signing, LSM, code reviews, etc), +> and it's undesirable and infeasible for kernel to enforce any particular way +> of validating trustworthiness of particular process. +> + +**[v1: fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free()](http://lore.kernel.org/bpf/168796344232.46347.7947681068822514750.stgit@devnote2/)** + +> Ensure running fprobe_exit_handler() has finished before +> calling rethook_free() in the unregister_fprobe() so that caller can free +> the fprobe right after unregister_fprobe(). +> +> unregister_fprobe() ensured that all running fprobe_entry/exit_handler() +> have finished by calling unregister_ftrace_function() which synchronizes +> RCU. But commit 5f81018753df ("fprobe: Release rethook after the ftrace_ops +> is unregistered") changed to call rethook_free() after +> unregister_ftrace_function(). So call rethook_stop() to make rethook +> disabled before unregister_ftrace_function() and ensure it again. +> + +**[v8: bpf-next: bpf, x86: allow function arguments up to 12 for TRACING](http://lore.kernel.org/bpf/20230627115319.13128-1-imagedong@tencent.com/)** + +> Therefore, let's enhance it by increasing the function arguments count +> allowed in arch_prepare_bpf_trampoline(), for now, only x86_64. +> +> In the 1st patch, we save/restore regs with BPF_DW size to make the code +> in save_regs()/restore_regs() simpler. +> +> In the 2nd patch, we make arch_prepare_bpf_trampoline() support to copy +> function arguments in stack for x86 arch. Therefore, the maximum +> arguments can be up to MAX_BPF_FUNC_ARGS for FENTRY, FEXIT and +> MODIFY_RETURN. Meanwhile, we clean the potential garbage value when we +> copy the arguments on-stack. +> + +**[v1: bpf-next: Support defragmenting IPv(4|6) packets in BPF](http://lore.kernel.org/bpf/cover.1687819413.git.dxu@dxuuu.xyz/)** + +> In the context of a middlebox, fragmented packets are tricky to handle. +> The full 5-tuple of a packet is often only available in the first +> fragment which makes enforcing consistent policy difficult. +> So stateful tracking is the only sane option. RFC 8900 [0] calls this +> out as well in section 6.3: +> +> Middleboxes [...] should process IP fragments in a manner that is +> consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes +> must maintain state in order to achieve this goal. +> + +**[v1: Interest in additional endianness documentation](http://lore.kernel.org/bpf/CADx9qWgHCC4MML2d+mq25-aeTn+20qxjeTZSHMGPQrMq65a+bQ@mail.gmail.com/)** + +> Thank you to everyone in the community for building/working on such a +> great tool! I am helping build a userspace implementation of eBPF and +> following Dave's standardization process closely. +> + +### 周边技术动态 + +#### U-Boot + +**[u-boot compilation failure for Sifive unmatched board](http://lore.kernel.org/u-boot/CAK1XJzWofn5+OE7qCyf3nTb+hevULoAVss5dG2OzwbMh0C=YVA@mail.gmail.com/)** + +> This is Satish, compiling u-boot code based on the reference page: +> https://github.com/carlosedp/riscv-bringup/blob/master/unmatched/Readme.md#install-toolchain-to-build-kernel +> +> u-boot is failing with following commit id & its tag is +> commit d637294e264adfeb29f390dfc393106fd4d41b17 (HEAD, tag: v2022.01) +> + +**[Pull request: u-boot-rockchip-20230629](http://lore.kernel.org/u-boot/20230629121342.72391-1-kever.yang@rock-chips.com/)** + +> Please pull the fixex for rockchip platform: +> - rockchip inno phy fix; +> - pinctrl driver in SPL arort in specific case; +> - fix IO port voltage for rock5b-rk3588 board; +> +> CI: +> https://source.denx.de/u-boot/custodians/u-boot-rockchip/-/pipelines/16732 +> + +**[Trying to boot JH7110 RISCV-V CPU from MMC](http://lore.kernel.org/u-boot/e6cd461d-151e-3557-f58a-6118836f8e6a@ruabmbua.dev/)** + +> I am trying to use upstream u-boot + opensbi, to boot my visionfive2 SBC +> I got from external SD card. +> + +**[v1: riscv: sifive: fu70: downclock CPU clock for stability](http://lore.kernel.org/u-boot/20230628081530.3184607-1-uwu@icenowy.me/)** + +> When building the package `rustc` for AOSC OS on HiFive Unmatched, +> random SIGSEGV prevents the package from getting correctly built. +> Downclocking the CPU PLL clock seems to allow rustc to be built, +> although taking much more time. +> + ## 20230625:第 51 期 ### 内核动态