From 5d71385629506eb44707e66d2bd19ff14940f9eb Mon Sep 17 00:00:00 2001 From: yooyoyo <11251868+yooyoyo@user.noreply.gitee.com> Date: Sun, 23 Apr 2023 15:43:39 +0000 Subject: [PATCH] update news/README.md. Signed-off-by: yooyoyo <11251868+yooyoyo@user.noreply.gitee.com> --- news/README.md | 923 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 923 insertions(+) diff --git a/news/README.md b/news/README.md index 1fec006..dc38240 100644 --- a/news/README.md +++ b/news/README.md @@ -4,6 +4,929 @@ * [2022 年](2022.md) +## 20230423:第 43 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v1: riscv: uprobes: Restore thread.bad_cause](http://lore.kernel.org/linux-riscv/1682214146-3756-1-git-send-email-yangtiezhu@loongson.cn/)** + +> thread.bad_cause is saved in arch_uprobe_pre_xol(), it should be restored +> in arch_uprobe_{post,abort}_xol() accordingly, otherwise the save operation +> is meaningless, this change is similar with x86 and powerpc. +> + +**[v1: dt-bindings: riscv: add sv57 mmu-type](http://lore.kernel.org/linux-riscv/20230421-voucher-ecology-7ddfdf801a71@spud/)** + +> Dumping the dtb from new versions of QEMU warns that sv57 is an +> undocumented mmu-type. The kernel has supported sv57 for about a year, +> so bring it into the fold. +> + +**[GIT PULL: KVM/riscv changes for 6.4](http://lore.kernel.org/linux-riscv/CAAhSdy2RLinG5Gx-sfOqrYDAT=xDa3WAk8r1jTu8ReO5Jo0LVA@mail.gmail.com/)** + +> We have the following KVM RISC-V changes for 6.4: +> 1) ONE_REG interface to enable/disable SBI extensions +> 2) Zbb extension for Guest/VM +> 3) AIA CSR virtualization +> 4) Few minor cleanups and fixes +> + +**[v17: Microchip Soft IP corePWM driver](http://lore.kernel.org/linux-riscv/20230421-neurology-trapezoid-b4fa29923a23@wendy/)** + +> Yet another version of this driver :) +> +> This time around I've implemented Uwe's simplified method for +> calculating the prescale & period_steps. For low values of prescale it +> makes for much worse approximations of the period, but as the period +> increases with respect to the that of the pwm's underlying clock there +> is mostly no different in the approximations. +> + +**[v1: riscv: mm: Ensure prot of VM_WRITE and VM_EXEC must be readable](http://lore.kernel.org/linux-riscv/20230421075111.1391952-1-woodrow.shen@sifive.com/)** + +> The commit 8aeb7b17f04e ("RISC-V: Make mmap() with PROT_WRITE imply PROT_READ") +> allows riscv to use mmap with PROT_WRITE only, and meanwhile mmap with w+x is +> also permitted. However, when userspace tries to access this page with +> PROT_WRITE|PROT_EXEC, which causes infinite loop at load page fault as well as +> it triggers soft lockup. According to riscv privileged spec, +> "Writable pages must also be marked readable". The fix to drop the +> `PAGE_COPY_EXEC` and then `PAGE_COPY_READ_EXEC` should be just used instead. +> This aligns the other arches (i.e arm64) for protection_map. +> + +**[v3: Add JH7110 cpufreq support](http://lore.kernel.org/linux-riscv/20230421031431.23010-1-mason.huo@starfivetech.com/)** + +> The StarFive JH7110 SoC has four RISC-V cores, +> and it supports up to 4 cpu frequency loads. +> +> This patchset adds the compatible strings into the allowlist +> for supporting the generic cpufreq driver on JH7110 SoC. +> Also, it enables the axp15060 pmic for the cpu power source. +> + +**[v1: RISC-V: include cpufeature.h in cpufeature.c](http://lore.kernel.org/linux-riscv/20230420-wound-gizzard-2b2b589d9bea@spud/)** + +> Automation complains: +> warning: symbol '__pcpu_scope_misaligned_access_speed' was not declared. Should it be static? +> +> cpufeature.c doesn't actually include the header of the same name, as it +> had not previously used anything from it. +> The per-cpu variable is declared there, so include it to silence the +> complaints. +> + +**[v5: Add JH7110 USB and USB PHY driver support](http://lore.kernel.org/linux-riscv/20230420110052.3182-1-minda.chen@starfivetech.com/)** + +> This patchset adds USB driver and USB PHY for the StarFive JH7110 SoC. +> USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. +> The patch has been tested on the VisionFive 2 board. +> + +**[v3: Change PWM-controlled LED pin active mode and algorithm](http://lore.kernel.org/linux-riscv/20230420093457.18936-1-nylon.chen@sifive.com/)** + +> According to the circuit diagram of User LEDs - RGB described in the manual hifive-unleashed-a00.pdf[0] and hifive-unmatched-schematics-v3.pdf[1]. +> + +**[v2: Add TDM audio on StarFive JH7110](http://lore.kernel.org/linux-riscv/20230420024118.22677-1-walker.chen@starfivetech.com/)** + +> This patchset adds TDM audio driver for the StarFive JH7110 SoC. The +> first patch adds device tree binding for TDM module. The second patch +> adds the item for JH7110 audio board to the dt-binding of StarFive +> SoC-based boards. The third patch adds tdm driver support for JH7110 +> SoC. The last patch adds device node of tdm and sound card to JH7110 dts. +> + +**[v1: kvmtool: RISC-V CoVE support](http://lore.kernel.org/linux-riscv/20230419222350.3604274-1-atishp@rivosinc.com/)** + +> This series is an initial version of the support for running confidential VMs on +> riscv architecture. This is to get feedback on the proposed COVH, COVI and COVG +> extensions for running Confidential VMs on riscv. The specification is available +> here [0]. Make sure to build it to get the latest changes as it gets updated +> from time to time. +> + +**[v2: Add JH7110 AON PMU support](http://lore.kernel.org/linux-riscv/20230419034833.43243-1-changhuang.liang@starfivetech.com/)** + +> This patchset adds aon power domain driver for the StarFive JH7110 SoC. +> It is used to turn on/off dphy rx/tx power switch. The series has been +> tested on the VisionFive 2 board. +> + +**[v1: pwm: sifive: Simplify using devm_clk_get_prepared()](http://lore.kernel.org/linux-riscv/20230418202102.117658-1-u.kleine-koenig@pengutronix.de/)** + +> Instead of preparing the clk after it was requested and unpreparing in +> .probe()'s error path and .remove(), use devm_clk_get_prepared() which +> copes for unpreparing automatically. +> + +**[v1: Split ptdesc from struct page](http://lore.kernel.org/linux-riscv/20230417205048.15870-1-vishal.moola@gmail.com/)** + +> The MM subsystem is trying to shrink struct page. This patchset +> introduces a memory descriptor for page table tracking - struct ptdesc. +> +> This patchset introduces ptdesc, splits ptdesc from struct page, and +> converts many callers of page table constructor/destructors to use ptdescs. +> + +**[v1: tools/nolibc: add stackprotector support for more architectures](http://lore.kernel.org/linux-riscv/20230408-nolibc-stackprotector-archs-v1-0-271f5c859c71@weissschuh.net/)** + +> Add stackprotector support for all remaining architectures, except s390. +> +> On s390 the stackprotectors are not supported in "global" mode; only +> "sysreg" mode which is not suppored in nolibc. +> + +**[v1: RISC-V: Add steal-time support](http://lore.kernel.org/linux-riscv/20230417103402.798596-1-ajones@ventanamicro.com/)** + +> One frequently touted benefit of virtualization is the ability to +> consolidate machines, increasing resource utilization. It may even be +> desirable to overcommit, at the risk of one or more VCPUs having to wait. +> Hypervisors which have interfaces for guests to retrieve the amount of +> time each VCPU had to wait give observers within the guests ways to +> account for less progress than would otherwise be expected. The SBI STA +> extension proposal[1] provides a standard interface for guest VCPUs to +> retrieve the amount of time "stolen". +> + +**[v3: riscv: mm: execute local TLB flush after populating vmemmap](http://lore.kernel.org/linux-riscv/20230417060618.639395-1-vincent.chen@sifive.com/)** + +> The spare_init() calls memmap_populate() many times to create VA to PA +> mapping for the VMEMMAP area, where all "struct page" are located once +> CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later +> initialized in the zone_sizes_init() function. However, during this +> process, no sfence.vma instruction is executed for this VMEMMAP area. +> This omission may cause the hart to fail to perform page table walk +> because some data related to the address translation is invisible to the +> hart. To solve this issue, the local_flush_tlb_kernel_range() is called +> right after the spare_init() to execute a sfence.vma instruction for the +> VMEMMAP area, ensuring that all data related to the address translation +> is visible to the hart. +> + +**[v1: riscv: dts: starfive: Add PMU controller node](http://lore.kernel.org/linux-riscv/20230417034728.2670-1-walker.chen@starfivetech.com/)** + +> Add the pmu controller node for the StarFive JH7110 SoC. The PMU needs +> to be used by other modules, e.g. VPU,ISP,etc. +> + +#### 进程调度 + +**[v2: net: net/sched: cls_api: Initialize miss_cookie_node when action miss is not used](http://lore.kernel.org/lkml/20230420183634.1139391-1-ivecera@redhat.com/)** + +> Function tcf_exts_init_ex() sets exts->miss_cookie_node ptr only +> when use_action_miss is true so it assumes in other case that +> the field is set to NULL by the caller. If not then the field +> contains garbage and subsequent tcf_exts_destroy() call results +> in a crash. +> Ensure that the field .miss_cookie_node pointer is NULL when +> use_action_miss parameter is false to avoid this potential scenario. +> + +**[v2: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230420051946.7463-1-yury.norov@gmail.com/)** + +> for_each_cpu() is widely used in kernel, and it's beneficial to create +> a NUMA-aware version of the macro. +> + +**[v1: net: sched: print jiffies when transmit queue time out](http://lore.kernel.org/lkml/20230419115632.738730-1-yajun.deng@linux.dev/)** + +> Although there is watchdog_timeo to let users know when the transmit queue +> begin stall, but dev_watchdog() is called with an interval. The jiffies +> will always be greater than watchdog_timeo. +> + +**[v1: drm/msm: Move cmdstream dumping out of sched kthread](http://lore.kernel.org/lkml/20230417225510.494951-1-robdclark@gmail.com/)** + +> This is something that can block for arbitrary amounts of time as +> userspace consumes from the FIFO. So we don't really want this to +> be in the fence signaling path. +> + +**[v1: sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag](http://lore.kernel.org/lkml/20230416213406.2966521-1-davidai@google.com/)** + +> A userspace service may manage uclamp dynamically for individual tasks and +> a child task will unintentionally inherit a pesudo-random uclamp setting. +> This could result in the child task being stuck with a static uclamp value +> that results in poor performance or poor power. +> + +**[GIT PULL: sched/urgent for v6.3-rc7](http://lore.kernel.org/lkml/20230416123412.GDZDvrRCv9VvvmXuPz@fat_crate.local/)** + +> pls pull an urgent scheduler fix for 6.3. +> +> Thx. +> + +#### 内存管理 + +**[v1: mm/gup: disallow GUP writing to file-backed mappings by default](http://lore.kernel.org/linux-mm/f86dc089b460c80805e321747b0898fd1efe93d7.1682168199.git.lstoakes@gmail.com/)** + +> It isn't safe to write to file-backed mappings as GUP does not ensure that +> the semantics associated with such a write are performed correctly, for +> instance filesystems which rely upon write-notify will not be correctly +> notified. +> + +**[v12: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230421231421.2401346-1-nphamcs@gmail.com/)** + +> There is currently no good way to query the page cache statistics of large +> files and directory trees. There is mincore(), but it scales poorly: the +> kernel writes out a lot of bitmap data that userspace has to aggregate, +> when the user really does not care about per-page information in that +> case. The user also needs to mmap and unmap each file as it goes along, +> which can be quite slow as well. +> + +**[v2: migrate: Avoid unbounded blocks in MIGRATE_SYNC_LIGHT](http://lore.kernel.org/linux-mm/20230421221249.1616168-1-dianders@chromium.org/)** + +> This series is the result of discussion around my RFC patch [1] where +> I talked about completely removing the waits for the folio_lock in +> migrate_folio_unmap(). +> + +**[v1: shmem: add support for blocksize > PAGE_SIZE](http://lore.kernel.org/linux-mm/20230421214400.2836131-1-mcgrof@kernel.org/)** + +> This is an initial attempt to add support for block size > PAGE_SIZE for tmpfs. +> Why would you want this? It helps us experiment with higher order folio uses +> with fs APIS and helps us test out corner cases which would likely need +> to be accounted for sooner or later if and when filesystems enable support +> for this. Better review early and burn early than continue on in the wrong +> direction so looking for early feedback. +> + +**[v1: [v2] kasan: use internal prototypes matching gcc-13 builtins](http://lore.kernel.org/linux-mm/20230421205754.106794-1-arnd@kernel.org/)** + +> This now passes all randconfig builds on arm, arm64 and x86, but I have +> not tested it on the other architectures that support kasan, since they +> tend to fail randconfig builds in other ways. This might fail if any +> of the 32-bit architectures expect a 'long' instead of 'int' for the +> size argument. +> + +**[v1: block: simplify with PAGE_SECTORS_SHIFT](http://lore.kernel.org/linux-mm/20230421195807.2804512-1-mcgrof@kernel.org/)** + +> A bit of block drivers have their own incantations with +> PAGE_SHIFT - SECTOR_SHIFT. Just simplfy and use PAGE_SECTORS_SHIFT +> all over. +> + +**[v5: cgroup: eliminate atomic rstat flushing](http://lore.kernel.org/linux-mm/20230421174020.2994750-1-yosryahmed@google.com/)** + +> A previous patch series ([1] currently in mm-stable) changed most +> atomic rstat flushing contexts to become non-atomic. This was done to +> avoid an expensive operation that scales with # cgroups and # cpus to +> happen with irqs disabled and scheduling not permitted. There were two +> remaining atomic flushing contexts after that series. This series tries +> to eliminate them as well, eliminating atomic rstat flushing completely. +> + +**[v1: arm64: Also reset KASAN tag if page is not PG_mte_tagged](http://lore.kernel.org/linux-mm/20230420210945.2313627-1-pcc@google.com/)** + +> Consider the following sequence of events: +> +> 1) A page in a PROT_READ|PROT_WRITE VMA is faulted. +> 2) Page migration allocates a page with the KASAN allocator, +> causing it to receive a non-match-all tag, and uses it +> to replace the page faulted in 1. +> 3) The program uses mprotect() to enable PROT_MTE on the page faulted in 1. +> + +**[v4: bio: check return values of bio_add_page](http://lore.kernel.org/linux-mm/20230420100501.32981-1-jth@kernel.org/)** + +> We have two functions for adding a page to a bio, __bio_add_page() which is +> used to add a single page to a freshly created bio and bio_add_page() which is +> used to add a page to an existing bio. +> + +**[v1: shmem: restrict noswap option to initial user namespace](http://lore.kernel.org/linux-mm/20230420-faxen-advokat-40abb4c1a152@brauner/)** + +> Prevent tmpfs instances mounted in an unprivileged namespaces from +> evading accounting of locked memory by using the "noswap" mount option. +> + +**[v15: RESEND: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-mm/20230420060156.895881-1-usama.anjum@collabora.com/)** + +> This syscall is used in Windows applications and games etc. This syscall is +> being emulated in pretty slow manner in userspace. Our purpose is to +> enhance the kernel such that we translate it efficiently in a better way. +> Currently some out of tree hack patches are being used to efficiently +> emulate it in some kernels. We intend to replace those with these patches. +> So the whole gaming on Linux can effectively get benefit from this. It +> means there would be tons of users of this code. +> + +**[v2: module: add debugging auto-load duplicate module support](http://lore.kernel.org/linux-mm/20230420003046.1604251-1-mcgrof@kernel.org/)** + +> The finit_module() system call can in the worst case use up to more than +> twice of a module's size in virtual memory. Duplicate finit_module() +> system calls are non fatal, however they unnecessarily strain virtual +> memory during bootup and in the worst case can cause a system to fail +> to boot. This is only known to currently be an issue on systems with +> larger number of CPUs. +> + +**[v15: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-mm/20230419110716.4113627-1-usama.anjum@collabora.com/)** + +> This syscall is used in Windows applications and games etc. This syscall is +> being emulated in pretty slow manner in userspace. Our purpose is to +> enhance the kernel such that we translate it efficiently in a better way. +> Currently some out of tree hack patches are being used to efficiently +> emulate it in some kernels. We intend to replace those with these patches. +> So the whole gaming on Linux can effectively get benefit from this. It +> means there would be tons of users of this code. +> + +**[v1: mm/cma: mm/cma: retry allocation of dedicated area on EBUSY](http://lore.kernel.org/linux-mm/20230419083851.2555096-1-sergii.piatakov@globallogic.com/)** + +> Sometimes continuous page range can't be successfully allocated, because +> some pages in the range may not pass the isolation test. In this case, +> the CMA allocator gets an EBUSY error and retries allocation again (in +> the slightly shifted range). +> + +**[v1: printk: Enough to disable preemption in printk deferred context](http://lore.kernel.org/linux-mm/20230419074210.17646-1-pmladek@suse.com/)** + +> The comment above printk_deferred_enter()/exit() definition claims +> that it can be used only when interrupts are disabled. +> + +**[v1: mm: skip CMA pages when they are not available](http://lore.kernel.org/linux-mm/1681882824-17532-1-git-send-email-zhaoyang.huang@unisoc.com/)** + +> It is wasting of effort to reclaim CMA pages if they are not availabe +> for current context during direct reclaim. Skip them when under corresponding +> circumstance. +> + +**[v1: mm/mmap: Map MAP_STACK to VM_STACK](http://lore.kernel.org/linux-mm/20230418210230.3495922-1-longman@redhat.com/)** + +> One of the flags of mmap(2) is MAP_STACK to request a memory segment +> suitable for a process or thread stack. The kernel currently ignores +> this flags. Glibc uses MAP_STACK when mmapping a thread stack. However, +> selinux has an execstack check in selinux_file_mprotect() which disallows +> a stack VMA to be made executable. +> + +**[v1: mm: reliable huge page allocator](http://lore.kernel.org/linux-mm/20230418191313.268131-1-hannes@cmpxchg.org/)** + +> As memory capacity continues to grow, 4k TLB coverage has not been +> able to keep up. On Meta's 64G webservers, close to 20% of execution +> cycles are observed to be handling TLB misses when using 4k pages +> only. Huge pages are shifting from being a nice-to-have optimization +> for HPC workloads to becoming a necessity for common applications. +> + +#### 文件系统 + +**[v1: io_uring: add getdents support, take 2](http://lore.kernel.org/linux-fsdevel/20230422-uring-getdents-v1-0-14c1db36e98c@codewreck.org/)** + +> The new API does nothing that cannot be achieved with plain syscalls so +> it shouldn't be introducing any new problem, the only downside is that +> having the state in the file struct isn't very uring-ish and if a +> better solution is found later that will probably require duplicating +> some logic in a new flag... But that seems like it would likely be a +> distant future, and this version should be usable right away. +> + +**[v2: Support negative dentries on case-insensitive ext4 and f2fs](http://lore.kernel.org/linux-fsdevel/20230422000310.1802-1-krisman@suse.de/)** + +> This is the v2 of the negative dentry support on case-insensitive directories. +> It doesn't have any functional changes from v1, but it adds more context and a +> comment to the dentry->d_name access I'm doing in d_revalidate, documenting +> why (i understand) it is safe to do it without protecting from the parallell +> directory changes. +> + +**[GIT PULL: Turn single vector imports into ITER_UBUF](http://lore.kernel.org/linux-fsdevel/f16053ea-d3b8-a8a2-0178-3981fea5a656@kernel.dk/)** + +> This series turns singe vector imports into ITER_UBUF, rather than +> ITER_IOVEC. The former is more trivial to iterate and advance, and hence +> a bit more efficient. From some very unscientific testing, +> 60% of all +> iovec imports are single vector. +> + +**[GIT PULL: pipe: nonblocking rw for io_uring](http://lore.kernel.org/linux-fsdevel/20230421-seilbahn-vorpreschen-bd73ac3c88d7@brauner/)** + +> /* Summary */ +> This contains Jens' work to support FMODE_NOWAIT and thus IOCB_NOWAIT +> for pipes ensuring that all places can deal with non-blocking requests. +> +> To this end, pass down the information that this is a nonblocking +> request so that pipe locking, allocation, and buffer checking correctly +> deal with those. +> + +**[v1: fs/coredump: open coredump file in O_WRONLY instead of O_RDWR](http://lore.kernel.org/linux-fsdevel/20230420120409.602576-1-vsementsov@yandex-team.ru/)** + +> This makes it possible to make stricter apparmor profile and don't +> allow the program to read any coredump in the system. +> + +**[v2: shmem: Add user and group quota support for tmpfs](http://lore.kernel.org/linux-fsdevel/20230420080359.2551150-1-cem@kernel.org/)** + +> This is the version 2 of the quota support from tmpfs addressing some issues +> discussed on V1 and a few extra things, details are within each patch. Original +> cover-letter below. +> + +**[v5: Introduce block provisioning primitives](http://lore.kernel.org/linux-fsdevel/20230420004850.297045-1-sarthakkukreti@chromium.org/)** + +> Next revision of adding support for block provisioning requests. +> + +**[v2: ext4: Handle error pointers being returned from __filemap_get_folio](http://lore.kernel.org/linux-fsdevel/20230419120923.3152939-1-willy@infradead.org/)** + +> Commit "mm: return an ERR_PTR from __filemap_get_folio" changed from +> returning NULL to returning an ERR_PTR(). This cannot be fixed in either +> the ext4 tree or the mm tree, so this patch should be applied as part +> of merging the two trees. +> + +**[v10: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20230419114320.13674-1-nj.shetty@samsung.com/)** + +> The patch series covers the points discussed in November 2021 virtual +> call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0]. +> We have covered the initial agreed requirements in this patchset and +> further additional features suggested by community. +> Patchset borrows Mikulas's token based approach for 2 bdev +> implementation. +> + +**[v1: Backport several fuse patches for 6.1.y](http://lore.kernel.org/linux-fsdevel/20230419095518.51373-1-yb203166@antfin.com/)** + +> Antgroup is using 5.10.y in product environment, we found several patches are +> missing in 5.10.y tree. These patches are needed for us. So we backported them +> to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression. +> + +**[v1: Backport several fuse patches for 5.15.y](http://lore.kernel.org/linux-fsdevel/20230419095424.51328-1-yb203166@antfin.com/)** + +> Antgroup is using 5.10.y in product environment, we found several patches are +> missing in 5.10.y tree. These patches are needed for us. So we backported them +> to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression. +> + +**[v1: Backport several fuse patches to 5.10.y](http://lore.kernel.org/linux-fsdevel/20230419094844.51110-1-yb203166@antfin.com/)** + +> Antgroup is using 5.10.y in product environment, we found several patches are +> missing in 5.10.y tree. These patches are needed for us. So we backported them +> to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression. +> + +**[v4: Introduce provisioning primitives for thinly provisioned storage](http://lore.kernel.org/linux-fsdevel/20230418221207.244685-1-sarthakkukreti@chromium.org/)** + +> This patch series is revision 4 of introducing a new mechanism to pass through provision requests on stacked thinly provisioned storage devices. See [1] for original cover letter. +> +> [1] https://lore.kernel.org/lkml/ZDnMl8A1B1+Tfn5S@redhat.com/T/#md4f20113c2242755747ae069f84be720a6751012 +> + +**[v3: bpf-next: FUSE BPF: A Stacked Filesystem Extension for FUSE](http://lore.kernel.org/linux-fsdevel/20230418014037.2412394-1-drosen@google.com/)** + +> These patches extend FUSE to be able to act as a stacked filesystem. This +> allows pure passthrough, where the fuse file system simply reflects the lower +> filesystem, and also allows optional pre and post filtering in BPF and/or the +> userspace daemon as needed. This can dramatically reduce or even eliminate +> transitions to and from userspace. +> + +**[v1: shmem: stable directory cookies](http://lore.kernel.org/linux-fsdevel/168175931561.2843.16288612382874559384.stgit@manet.1015granger.net/)** + +> The current cursor-based directory cookie mechanism doesn't work +> when a tmpfs filesystem is exported via NFS. This is because NFS +> clients do not open directories: each READDIR operation has to open +> the directory on the server, read it, then close it. The cursor +> state for that directory, being associated strictly with the opened +> struct file, is then discarded. +> + +**[v1: vfs: allow using kernel buffer during fiemap operation](http://lore.kernel.org/linux-fsdevel/bc30483b-7f9b-df4e-7143-8646aeb4b5a2@I-love.SAKURA.ne.jp/)** + +> syzbot is reporting circular locking dependency between ntfs_file_mmap() +> (which has mm->mmap_lock => ni->ni_lock => ni->file.run_lock dependency) +> and ntfs_fiemap() (which has ni->ni_lock => ni->file.run_lock => +> mm->mmap_lock dependency), for commit c4b929b85bdb ("vfs: vfs-level fiemap +> interface") implemented fiemap_fill_next_extent() using copy_to_user() +> where direct mm->mmap_lock dependency is inevitable. +> + +#### 网络设备 + +**[v5: net-next: net/smc: Introduce SMC-D-based OS internal communication acceleration](http://lore.kernel.org/netdev/1682252271-2544-1-git-send-email-guwen@linux.alibaba.com/)** + +> We found SMC-D can be used to accelerate OS internal communication, such as +> loopback or between two containers within the same OS instance. So this patch +> set provides a kind of SMC-D dummy device (we call it the SMC-D loopback device) +> to emulate an ISM device, so that SMC-D can also be used on architectures +> other than s390. The SMC-D loopback device are designed as a system global +> device, visible to all containers. +> + +**[v4: net-next: tsnep: XDP socket zero-copy support](http://lore.kernel.org/netdev/20230421194656.48063-1-gerhard@engleder-embedded.com/)** + +> Implement XDP socket zero-copy support for tsnep driver. I tried to +> follow existing drivers like igc as far as possible. But one main +> + +**[v3: net: netlink: Use copy_to_user() for optval in netlink_getsockopt().](http://lore.kernel.org/netdev/20230421185255.94606-1-kuniyu@amazon.com/)** + +> Brad Spencer provided a detailed report [0] that when calling getsockopt() +> for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such +> options require at least sizeof(int) as length. +> + +**[v5: bpf-next: bpf: add netfilter program type](http://lore.kernel.org/netdev/20230421170300.24115-1-fw@strlen.de/)** + +> Changes since last version: +> - rework test case in last patch wrt. ctx->skb dereference etc (Alexei) +> - pacify bpf ci tests, netfilter program type missed string translation +> in libbpf helper. +> + +**[v5: drivers/net/phy: add driver for Microchip LAN867x 10BASE-T1S PHY](http://lore.kernel.org/netdev/ZEK8Hvl0Zl%2F0NntI@debian/)** + +> This patch adds support for the Microchip LAN867x 10BASE-T1S family +> (LAN8670/1/2). The driver supports P2MP with PLCA. +> + +**[v2: can: virtio: Initial virtio CAN driver.](http://lore.kernel.org/netdev/20230421145653.12811-1-Mikhail.Golubev-Ciuchea@opensynergy.com/)** + +> This is version 3 of the driver after having gotten review comments. +> + +**[v1: net-next: net: dsa: MT7530, MT7531, and MT7988 improvements](http://lore.kernel.org/netdev/20230421143648.87889-1-arinc.unal@arinc9.com/)** + +> This patch series is focused on simplifying the code, and improving the +> logic of the support for MT7530, MT7531, and MT7988 SoC switches. +> +> There's also a fix for the switch on the MT7988 SoC. +> + +#### 异步 IO + +**[v1: io_uring: honor I/O nowait flag for read/write](http://lore.kernel.org/io-uring/20230421172822.8053-1-kch@nvidia.com/)** + +> When IO_URING_F_NONBLOCK is set on io_kiocb req->flag in io_write() or +> io_read() IOCB_NOWAIT is set for kiocb when passed it to the respective +> rw_iter callback. This sets REQ_NOWAIT for underlaying I/O. The result +> is low level driver always sees block layer request as REQ_NOWAIT even +> if user has submitted request with nowait = 0 e.g. fio nowait=0. +> + +**[v1: tools/io_uring: Add .gitignore](http://lore.kernel.org/io-uring/tencent_C8F457D8D10F44760333A1E1AC9B4B0C1507@qq.com/)** + +> Ignore {io_uring-bench,io_uring-cp}. +> + +**[v2: io_uring: Pass the whole sqe to commands](http://lore.kernel.org/io-uring/20230421114440.3343473-1-leitao@debian.org/)** + +> These three patches prepare for the sock support in the io_uring cmd, as +> described in the following RFC: +> +> https://lore.kernel.org/lkml/20230406144330.1932798-1-leitao@debian.org/ +> + +**[v1: test/file-verify.t: Don't run over mlock limit when run as non-root](http://lore.kernel.org/io-uring/20230420185728.4104-1-krisman@suse.de/)** + +> test/file-verify tries to get 2MB of pinned memory at once, which is +> higher than the default allowed for non-root users in older +> kernels (64kb before v5.16, nowadays 8mb). Skip the test for non-root +> users if the registration fails instead of failing the test. +> + +**[v1: Support for mapping SQ/CQ rings into huge page](http://lore.kernel.org/io-uring/20230419224805.693734-1-axboe@kernel.dk/)** + +> io_uring SQ/CQ rings are allocated by the kernel from contigious, normal +> pages, and then the application mmap()'s the rings into userspace. This +> works fine, but does require contigious pages to be available for the +> given SQ and CQ ring sizes. As uptime increases on a given system, so +> does memory fragmentation. Entropy is invevitable. +> + +**[v1: io_uring: Pass whole sqe to commands](http://lore.kernel.org/io-uring/20230419102930.2979231-1-leitao@debian.org/)** + +> These two patches prepares for the sock support in the io_uring cmd, as +> described in the following RFC: +> +> https://lore.kernel.org/lkml/20230406144330.1932798-1-leitao@debian.org/ +> + +**[v1: io_uring: Optimization of buffered random write](http://lore.kernel.org/io-uring/20230419092233.56338-1-luhongfei@vivo.com/)** + +> The buffered random write performance of io_uring is poor +> due to the following reason: +> By default, when performing buffered random writes, io_sq_thread +> will call io_issue_sqe writes req, but due to the setting of +> IO_URING_F_NONBLOCK, req is executed asynchronously in iou-wrk, +> where io_wq_submit_work calls io_issue_sqe completes the write req, +> with issue_flag as IO_URING_F_UNLOCKED | IO_URING_F_IOWQ, +> which will reduce performance. +> This patch will determine whether this req is a buffered random write, +> and if so, io_sq_thread directly calls io_issue_sqe(req, 0) +> completes req instead of completing it asynchronously in iou wrk. +> + +**[v4: io_uring: add support for multishot timeouts](http://lore.kernel.org/io-uring/20230418225817.1905027-1-davidhwei@meta.com/)** + +> A multishot timeout submission will repeatedly generate completions with +> the IORING_CQE_F_MORE cflag set. +> + +**[v1: for-next: another round of rsrc refactoring](http://lore.kernel.org/io-uring/cover.1681822823.git.asml.silence@gmail.com/)** + +> The main part is Patch 3, which establishes 1:1 relation between +> struct io_rsrc_put and nodes, which removes io_rsrc_node_switch() / +> io_rsrc_node_switch_start() and all the additional complexity with +> pre allocations. Note, it doesn't change any guarantees as +> io_queue_rsrc_removal() was doing allocations anyway and could +> always fail. +> + +**[v1: liburing: io_uring sendto](http://lore.kernel.org/io-uring/20230415165821.791763-1-ammarfaizi2@gnuweeb.org/)** + +> There are two patches in this series. The first patch adds +> io_uring_prep_sendto() function. The second patch addd the +> manpage and CHANGELOG. +> + +#### Rust For Linux + +**[v1: v4.1: rust: lock: introduce `SpinLock`](http://lore.kernel.org/rust-for-linux/20230419174426.132207-1-wedsonaf@gmail.com/)** + +> This is the `spinlock_t` lock backend and allows Rust code to use the +> kernel spinlock idiomatically. +> + +**[v1: .gitattributes: set diff driver for Rust source code files](http://lore.kernel.org/rust-for-linux/20230418233048.335281-1-ojeda@kernel.org/)** + +> Git supports a builtin Rust diff driver [1] since v2.23.0 (2019). +> +> It improves the choice of hunk headers in some cases, such as +> + +**[v1: Rust 1.68.2 upgrade](http://lore.kernel.org/rust-for-linux/20230418214347.324156-1-ojeda@kernel.org/)** + +> This is the first upgrade to the Rust toolchain since the initial Rust +> merge, from 1.62.0 to 1.68.2 (i.e. the latest). +> + +#### BPF + +**[v4: bpf-next: bpftool: Show map IDs along with struct_ops links.](http://lore.kernel.org/bpf/20230421214131.352662-1-kuifeng@meta.com/)** + +> A new link type, BPF_LINK_TYPE_STRUCT_OPS, was added to attach +> struct_ops to links. (226bc6ae6405) It would be helpful for users to +> know which map is associated with the link. +> + +**[v1: bpf-next: selftests/bpf: verifier/prevent_map_lookup converted to inline assembly](http://lore.kernel.org/bpf/20230421204514.2450907-1-eddyz87@gmail.com/)** + +> Test verifier/prevent_map_lookup automatically converted to use inline assembly. +> +> This was a part of a series [1] but could not be applied becuase +> another patch from a series had to be witheld. +> + +**[v1: bpf-next: Second set of verifier/*.c migrated to inline assembly](http://lore.kernel.org/bpf/20230421174234.2391278-1-eddyz87@gmail.com/)** + +> This is a follow up for RFC [1]. It migrates a second batch of 23 +> verifier/*.c tests to inline assembly and use of ./test_progs for +> actual execution. Link to the first batch is [2]. +> + +**[v1: Dump map id instead of value for map_of_maps types](http://lore.kernel.org/bpf/20230421101154.23690-1-kuro@kuroa.me/)** + +> When using `bpftool map dump` in plain format, it is usually +> more convenient to show the inner map id instead of raw value. +> Changing this behavior would help with quick debugging with +> `bpftool`, without disruption scripted behavior. Since user +> could dump the inner map with id, but need to convert value. +> + +**[v2: bpf-next: Introduce a new kfunc of bpf_task_under_cgroup](http://lore.kernel.org/bpf/20230421090403.15515-1-zhoufeng.zf@bytedance.com/)** + +> Trace sched related functions, such as enqueue_task_fair, it is necessary to +> specify a task instead of the current task which within a given cgroup. +> + +**[v1: bpf-next: selftests/xsk: put MAP_HUGE_2MB in correct argument](http://lore.kernel.org/bpf/20230421062208.3772-1-magnus.karlsson@gmail.com/)** + +> Put the flag MAP_HUGE_2MB in the correct flags argument instead of the +> wrong offset argument. +> + +**[v3: bpf-next: net/smc: Introduce BPF injection capability](http://lore.kernel.org/bpf/1682051033-66125-1-git-send-email-alibuda@linux.alibaba.com/)** + +> This patches attempt to introduce BPF injection capability for SMC, +> and add selftest to ensure code stability. +> +> As we all know that the SMC protocol is not suitable for all scenarios, +> especially for short-lived. However, for most applications, they cannot +> guarantee that there are no such scenarios at all. Therefore, apps +> may need some specific strategies to decide shall we need to use SMC +> or not, for example, apps can limit the scope of the SMC to a specific +> IP address or port. +> + +**[v2: bpf: Socket lookup BPF API from tc/xdp ingress does not respect VRF bindings.](http://lore.kernel.org/bpf/20230420145041.508434-1-gilad9366@gmail.com/)** + +> When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't +> respected. This patchset fixes this by regarding the incoming device's +> VRF attachment when performing the socket lookups from tc/xdp. +> + +**[v1: net-next: net: lan966x: Don't use xdp_frame when action is XDP_TX](http://lore.kernel.org/bpf/20230420121152.2737625-1-horatiu.vultur@microchip.com/)** + +> When the action of an xdp program was XDP_TX, lan966x was creating +> a xdp_frame and use this one to send the frame back. But it is also +> possible to send back the frame without needing a xdp_frame, because +> it possible to send it back using the page. +> And then once the frame is transmitted is possible to use directly +> page_pool_recycle_direct as lan966x is using page pools. +> This would save some CPU usage on this path. +> + +**[v5: tracing: Add fprobe events](http://lore.kernel.org/bpf/168198993129.1795549.8306571027057356176.stgit@mhiramat.roam.corp.google.com/)** + +> Here is the 5th version of improve fprobe and add a basic fprobe event +> support for ftrace (tracefs) and perf. Here is the previous version. +> + +**[v1: bpf-next: Introduce a new bpf helper of bpf_task_under_cgroup](http://lore.kernel.org/bpf/20230420072657.80324-1-zhoufeng.zf@bytedance.com/)** + +> Trace sched related functions, such as enqueue_task_fair, it is necessary to +> specify a task instead of the current task which within a given cgroup to a map. +> + +**[v2: bpf-next: Dynptr helpers](http://lore.kernel.org/bpf/20230420071414.570108-1-joannelkoong@gmail.com/)** + +> This patchset is the 3rd in the dynptr series. The 1st (dynptr +> fundamentals) can be found here [0] and the second (skb + xdp dynptrs) +> can be found here [1]. +> + +**[v2: bpf-next: Access variable length array relaxed for integer type](http://lore.kernel.org/bpf/20230420032735.27760-1-zhoufeng.zf@bytedance.com/)** + +> Add support for integer type of accessing variable length array. +> Add a selftest to check it. +> + +**[v1: bpf-next: bpftool: Replace "__fallthrough" by a comment to address merge conflict](http://lore.kernel.org/bpf/20230420003333.90901-1-quentin@isovalent.com/)** + +> The recent support for inline annotations in control flow graphs +> generated by bpftool introduced the usage of the "__fallthrough" macro +> in a switch/case block in btf_dumper.c. This change went through the +> bpf-next tree, but resulted in a merge conflict in linux-next, because +> this macro has been renamed "fallthrough" (no underscores) in the +> meantime. +> + +**[v1: bpf-next: bpf: handle another corner case in getsockopt](http://lore.kernel.org/bpf/20230418225343.553806-1-sdf@google.com/)** + +> Martin reports another case where getsockopt EFAULTs perfectly +> valid callers. Let's fix it and also replace EFAULT with +> pr_info_ratelimited. That should hopefully make this place +> less error prone. +> + +**[v2: vmlinux.lds.h: Discard .note.gnu.property section](http://lore.kernel.org/bpf/20230418214925.ay3jpf2zhw75kgmd@treble/)** + +> When tooling reads ELF notes, it assumes each note entry is aligned to +> the value listed in the .note section header's sh_addralign field. +> +> The kernel-created ELF notes in the .note.Linux and .note.Xen sections +> are aligned to 4 bytes. This causes the toolchain to set those +> sections' sh_addralign values to 4. +> + +**[v1: bpf-next: bpftool: Register struct_ops with a link.](http://lore.kernel.org/bpf/20230418200058.603169-1-kuifeng@meta.com/)** + +> You can include an optional path after specifying the object name for the +> 'struct_ops register' subcommand. +> +> Since the commit 226bc6ae6405 ("Merge branch 'Transit between BPF TCP +> congestion controls.'") has been accepted, it is now possible to create a +> link for a struct_ops. This can be done by defining a struct_ops in +> SEC(".struct_ops.link") to make libbpf returns a real link. If we don't pin +> the links before leaving bpftool, they will disappear. To instruct bpftool +> to pin the links in a directory with the names of the maps, we need to +> provide the path of that directory. +> + +**[v6: bpf-next: bpf: Add socket destroy capability](http://lore.kernel.org/bpf/20230418153148.2231644-1-aditi.ghag@isovalent.com/)** + +> This patch adds the capability to destroy sockets in BPF. We plan to use +> the capability in Cilium to force client sockets to reconnect when their +> remote load-balancing backends are deleted. The other use case is +> on-the-fly policy enforcement where existing socket connections prevented +> by policies need to be terminated. +> + +**[v2: bpf-next: XDP-hints: XDP kfunc metadata for driver igc](http://lore.kernel.org/bpf/168182460362.616355.14591423386485175723.stgit@firesoul/)** + +> Implement both RX hash and RX timestamp XDP hints kfunc metadata +> for driver igc. +> + +### 周边技术动态 + +#### Qemu + +**[v8: target/riscv: rework CPU extension validation](http://lore.kernel.org/qemu-devel/20230421132727.121462-1-dbarboza@ventanamicro.com/)** + +> This version dropped patch 12 from v7. Alistair mentioned that it would +> limiti static CPUs needlesly, since there's nothing preventing a static +> CPU to allow for extension changes during runtime, and that misa-w is +> enough to prevent write_misa() during runtime. I agree. +> + +**[v1: hw/riscv: virt: Enable booting M-mode or S-mode FW from pflash0](http://lore.kernel.org/qemu-devel/20230421043353.125701-1-sunilvl@ventanamicro.com/)** + +> Currently, virt machine supports two pflash instances each with +> 32MB size. However, the first pflash is always assumed to +> contain M-mode firmware and reset vector is set to this if +> enabled. Hence, for S-mode payloads like EDK2, only one pflash +> instance is available for use. This means both code and NV variables +> of EDK2 will need to use the same pflash. +> + +**[v3: riscv: Make sure an exception is raised if a pte is malformed](http://lore.kernel.org/qemu-devel/20230420150220.60919-1-alexghiti@rivosinc.com/)** + +> As per the specification, in 64-bit, if any of the pte reserved bits +> Memory Protection"). In addition, we must check the napot/pbmt bits are +> not set if those extensions are not active. +> + +**[v1: target/riscv: add Ventana's Veyron V1 CPU](http://lore.kernel.org/qemu-devel/20230418123624.16414-1-dbarboza@ventanamicro.com/)** + +> Add a virtual CPU for Ventana's first CPU named veyron-v1. It runs +> exclusively for the rv64 target. It's tested with the 'virt' board. +> + +**[v7: target/riscv: rework CPU extensions validation](http://lore.kernel.org/qemu-devel/20230417140013.58893-1-dbarboza@ventanamicro.com/)** + +> In this v7 we have three extra patches: +> +> - patch 4 [1] and 5 [2], both from Weiwei Li, addresses an issue that +> we're going to have with Zca and RVC if we push the priv spec +> disabling code to the end of validation. More details can be seen on +> [3]. Patch 5 commit message also has some context on it; +> + +**[v2: Add RISC-V vector cryptographic instruction set support](http://lore.kernel.org/qemu-devel/20230417135821.609964-1-lawrence.hunter@codethink.co.uk/)** + +> This patchset provides an implementation for Zvbb, Zvbc, Zvkned, Zvknh, Zvksh, +> Zvkg, and Zvksed of the draft RISC-V vector cryptography extensions as per the +> v20230407 version of the specification(1) (3206f07). This is an update to the +> patchset submitted to qemu-devel on Friday, 10 Mar 2023 16:03:01 +0000. +> + +**[v2: target/riscv: Restore the predicate() NULL check behavior](http://lore.kernel.org/qemu-devel/20230417043054.3125614-1-bmeng@tinylab.org/)** + +> When reading a non-existent CSR QEMU should raise illegal instruction +> exception, but currently it just exits due to the g_assert() check. +> +> This actually reverts commit 0ee342256af9205e7388efdf193a6d8f1ba1a617. +> Some comments are also added to indicate that predicate() must be +> provided for an implemented CSR. +> + +**[v1: riscv: implement Ssqosid extension and CBQRI controllers](http://lore.kernel.org/qemu-devel/20230416232050.4094820-1-dfustini@baylibre.com/)** + +> This RFC series implements the Ssqosid extension and the sqoscfg CSR as +> defined in the RISC-V Capacity and Bandwidth Controller QoS Register +> Interface (CBQRI) specification [1]. Quality of Service (QoS) in this +> context is concerned with shared resources on an SoC such as cache +> capacity and memory bandwidth. +> + +#### U-Boot + +**[v5: Add StarFive JH7110 PCIe drvier support](http://lore.kernel.org/u-boot/20230423105859.125764-1-minda.chen@starfivetech.com/)** + +> This patchset needs to apply after patchset in [1]. These PCIe series patches +> are based on the JH7110 RISC-V SoC and VisionFive V2 board. +> +> [1] https://patchwork.ozlabs.org/project/uboot/cover/20230329034224.26545-1-yanhong.wang@starfivetech.com +> + +**[v1: u-boot-riscv/master](http://lore.kernel.org/u-boot/ZEHbqoEXAB+BAtmo@ubuntu01/)** + +> The following changes since commit 5db4972a5bbdbf9e3af48ffc9bc4fec73b7b6a79: +> +> Merge tag 'u-boot-nand-20230417' of https://source.denx.de/u-boot/custodians/u-boot-nand-flash (2023-04-17 10:47:33 -0400) +> + +**[v1: riscv: visionfive2: use OF_BOARD_SETUP](http://lore.kernel.org/u-boot/20230419112801.GA1907@lst.de/)** + +> U-Boot already has a mechanism to fix up the DT before OS boot. +> This avoids the excessive duplication of data and work proposed +> by the explicit separation of 1.2a and 1.3b board revisions. It +> will also, to a good degree, improve the user experience, as +> pointed out by Matthias. +> + ## 20230416:第 42 期 ### 内核动态 -- Gitee