diff --git a/news/2023-1st-half.md b/news/2023-1st-half.md new file mode 100644 index 0000000000000000000000000000000000000000..694593912806bc6475c54b6d3271b2ec29141a17 --- /dev/null +++ b/news/2023-1st-half.md @@ -0,0 +1,21450 @@ +# RISC-V Linux 内核及周边技术动态 + +## 20230604:第 48 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v1: tools/nolibc: add two new syscall helpers](http://lore.kernel.org/linux-riscv/cover.1685856497.git.falcon@tinylab.org/)** + +> When I worked on adding new syscalls and the related library routines, +> I have seen most of the library routines share the same syscall call and +> return logic, this patchset adds two macros to simplify and shrink them. +> + +**[v3: nolibc: add part2 of support for rv32](http://lore.kernel.org/linux-riscv/cover.1685780412.git.falcon@tinylab.org/)** + +> This is the v3 part2 of support for rv32, differs from the v2 part2 [1], +> we only fix up compile issues in this patchset. +> +> With the v3 generic part1 [2] and this patchset, we can compile nolibc +> for rv32 now. +> +> This is based on the idea of suggestions from Arnd [3], instead of +> '#error' on the unsupported syscall on a target platform, a 'return +> -ENOSYS' allow us to compile it at first and then allow we fix up the +> test failures reported by nolibc-test one by one. +> + +**[v3: nolibc: add generic part1 of prepare for rv32](http://lore.kernel.org/linux-riscv/cover.1685777982.git.falcon@tinylab.org/)** + +> This is the v3 generic part1 for rv32, all of the found issues of v2 +> part1 [1] have been fixed up, several generic patches have been fixed up +> and merged from v2 part2 [2] to this series, the standalone test_fork +> patch [4] is merged with a Reviewed-by line into this series too. +> + +**[v2: Use MMU read lock for clear-dirty-log](http://lore.kernel.org/linux-riscv/20230602160914.4011728-1-vipinsh@google.com/)** + +> This series is on top of kvmarm/next as I needed to also modify Eager +> page splitting logic in clear-dirty-log API. Eager page splitting is not +> present in Linux 6.4-rc4. +> + +**[v2: Add initialization of clock for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230602084925.215411-1-william.qiu@starfivetech.com/)** + +> This patchset adds initial rudimentary support for the StarFive +> Quad SPI controller driver. And this driver will be used in +> StarFive's VisionFive 2 board. In 6.4, the QSPI_AHB and QSPI_APB +> clocks changed from the default ON state to the default OFF state, +> so these clocks need to be enabled in the driver.At the same time, +> dts patch is added to this series. +> + +**[v1: Add DRM driver for StarFive SoC JH7110](http://lore.kernel.org/linux-riscv/20230602074043.33872-1-keith.zhao@starfivetech.com/)** + +> This series is a DRM driver for StarFive SoC JH7110, which includes a +> display controller driver for Verisilicon DC8200 and an HMDI driver. +> +> We use GEM framework for buffer management and allocate memory by +> using DMA APIs. +> + +**[v2: gpio: sifive: Add missing check for platform_get_irq](http://lore.kernel.org/linux-riscv/20230602072755.7314-1-jiasheng@iscas.ac.cn/)** + +> Add the missing check for platform_get_irq and return error code +> if it fails. +> + +**[v2: Add support for Allwinner GPADC on D1/T113s/R329/T507 SoCs](http://lore.kernel.org/linux-riscv/20230601223104.1243871-1-bigunclemax@gmail.com/)** + +> This series adds support for general purpose ADC (GPADC) on new +> Allwinner's SoCs, such as D1, T113s, T507 and R329. The implemented driver +> provides basic functionality for getting ADC channels data. +> + +**[v2: riscv/purgatory: Do not use fortified string functions](http://lore.kernel.org/linux-riscv/20230601160025.gonna.868-kees@kernel.org/)** + +> This means that the memcpy() calls with "buf" as a destination in +> sha256.c's code will attempt to perform run-time bounds checking, which +> could lead to calling missing functions, specifically a potential +> WARN_ONCE, which isn't callable from purgatory. +> + +**[v1: mm: jit/text allocator](http://lore.kernel.org/linux-riscv/20230601101257.530867-1-rppt@kernel.org/)** + +> module_alloc() is used everywhere as a mean to allocate memory for code. +> +> Beside being semantically wrong, this unnecessarily ties all subsystmes +> that need to allocate code, such as ftrace, kprobes and BPF to modules +> and puts the burden of code allocation to the modules code. +> + +**[v4: StarFive's Pulse Width Modulation driver support](http://lore.kernel.org/linux-riscv/20230601085154.36938-1-william.qiu@starfivetech.com/)** + +> This patchset adds initial rudimentary support for the StarFive +> Pulse Width Modulation controller driver. And this driver will +> be used in StarFive's VisionFive 2 board.The first patch add +> Documentations for the device and Patch 2 adds device probe for +> the module. +> + +**[v3: Split ptdesc from struct page](http://lore.kernel.org/linux-riscv/20230531213032.25338-1-vishal.moola@gmail.com/)** + +> The MM subsystem is trying to shrink struct page. This patchset +> introduces a memory descriptor for page table tracking - struct ptdesc. +> +> This patchset introduces ptdesc, splits ptdesc from struct page, and +> converts many callers of page table constructor/destructors to use ptdescs. +> + +**[v2: riscv: mm: Pre-allocate PGD entries for vmalloc/modules area](http://lore.kernel.org/linux-riscv/20230531093817.665799-1-bjorn@kernel.org/)** + +> The RISC-V port requires that kernel PGD entries are to be +> synchronized between MMs. This is done via the vmalloc_fault() +> function, that simply copies the PGD entries from init_mm to the +> faulting one. +> + +**[v3: RISC-V: KVM: Ensure SBI extension is enabled](http://lore.kernel.org/linux-riscv/20230530175024.354527-1-ajones@ventanamicro.com/)** + +> Ensure guests can't attempt to invoke SBI extension functions when the +> SBI extension's probe function has stated that the extension is not +> available. +> + +**[v1: selftests/nolibc: add user-space 'efault' handler](http://lore.kernel.org/linux-riscv/cover.1685443199.git.falcon@tinylab.org/)** + +> This is not really for merge, but only let it work as a demo code to +> test whether it is possible to restore the next test when there is a bad +> pointer access in user-space [1]. +> + +**[v1: fdt: Mark "/reserved-memory" nodes as nosave if !reusable](http://lore.kernel.org/linux-riscv/20230530080425.18612-1-alexghiti@rivosinc.com/)** + +> In the RISC-V kernel, the firmware does not mark the region it uses as +> "no-map" so that the kernel can avoid having holes in the linear mapping +> and then use larger pages. +> + +**[v2: nolibc: add part3 of support for rv32](http://lore.kernel.org/linux-riscv/cover.1685428087.git.falcon@tinylab.org/)** + +> Hi, Willy +> +> These two patches are based on part2 of support for rv32 [1], I have +> forgotten to send them together. +> + +**[v1: riscv: mm: Pre-allocate PGD entries vmalloc/modules area](http://lore.kernel.org/linux-riscv/20230529180023.289904-1-bjorn@kernel.org/)** + +> The RISC-V port requires that kernel PGD entries are to be +> synchronized between MMs. This is done via the vmalloc_fault() +> function, that simply copies the PGD entries from init_mm to the +> faulting one. +> + +**[v5: Add JH7110 MIPI DPHY RX support](http://lore.kernel.org/linux-riscv/20230529121503.3544-1-changhuang.liang@starfivetech.com/)** + +> This patchset adds mipi dphy rx driver for the StarFive JH7110 SoC. +> It is used to transfer CSI camera data. The series has been tested on +> the VisionFive 2 board. +> + +**[v1: riscv: Enable ARCH_SUSPEND_POSSIBLE for s2idle](http://lore.kernel.org/linux-riscv/20230529101524.322076-1-songshuaishuai@tinylab.org/)** + +> With this configuration opened, the basic platform-independent s2idle is +> provided by the sole "s2idle" string in `/sys/power/mem_sleep`. +> +> At the end of s2idle, harts will hit the `wfi` instruction or enter the +> SUSPENDED state through the sbi_cpuidle driver. The interrupt of possible +> wakeup devices will be kept to wake the system up. +> + +**[v12: -next: riscv: Add independent irq/softirq stacks](http://lore.kernel.org/linux-riscv/20230529084600.2878130-1-guoren@kernel.org/)** + +> This patch series adds independent irq/softirq stacks to decrease the +> press of the thread stack. Also, add a thread STACK_SIZE config for +> users to adjust the proper size during compile time. +> + +#### 进程调度 + +**[v1: net: sched: wrap tc_skip_wrapper with CONFIG_RETPOLINE](http://lore.kernel.org/lkml/20230602235210.91262-1-minhuadotchen@gmail.com/)** + +> This patch fixes the following sparse warning: +> +> net/sched/sch_api.c:2305:1: sparse: warning: symbol 'tc_skip_wrapper' was not declared. Should it be static? +> + +**[v1: net-next: net/sched: introduce pretty printers for Qdiscs](http://lore.kernel.org/lkml/20230602162935.2380811-1-vladimir.oltean@nxp.com/)** + +> Sometimes when debugging Qdiscs it may be confusing to know exactly what +> you're looking at, especially since they're hierarchical. Pretty printing +> the handle, parent handle and netdev is a bit cumbersome, so this patch +> proposes a set of wrappers around __qdisc_printk() which are heavily +> inspired from __net_printk(). +> + +**[v1: sched: EEVDF and latency-nice and/or slice-attr](http://lore.kernel.org/lkml/20230531115839.089944915@infradead.org/)** + +> Latest version of the EEVDF [1] patches. +> +> The only real change since last time is the fix for tick-preemption [2], and a +> simple safe-guard for the mixed slice heuristic. +> +> Other than that, I've re-arranged the patches to make EEVDF come first and have +> the latency-nice or slice-attribute patches on top. +> + +**[v2: sched/fair: Don't balance task to its current running CPU](http://lore.kernel.org/lkml/20230530082507.10444-1-yangyicong@huawei.com/)** + +> The new_dst_cpu is chosen from the env->dst_grpmask. Currently it +> contains CPUs in sched_group_span() and if we have overlapped groups it's +> possible to run into this case. This patch makes env->dst_grpmask of +> group_balance_mask() which exclude any CPUs from the busiest group and +> solve the issue. For balancing in a domain with no overlapped groups +> the behaviour keeps same as before. +> + +**[v8: sched/fair: Scan cluster before scanning LLC in wake-up path](http://lore.kernel.org/lkml/20230530070253.33306-1-yangyicong@huawei.com/)** + +> This is the follow-up work to support cluster scheduler. Previously +> we have added cluster level in the scheduler for both ARM64[1] and +> X86[2] to support load balance between clusters to bring more memory +> bandwidth and decrease cache contention. This patchset, on the other +> hand, takes care of wake-up path by giving CPUs within the same cluster +> a try before scanning the whole LLC to benefit those tasks communicating +> with each other. +> + +**[v1: sched: deadline: Simplify pick_earliest_pushable_dl_task()](http://lore.kernel.org/lkml/20230530181145.2880-1-kunyu@nfschina.com/)** + +> Using the while statement instead of the if and goto statements is more +> concise and efficient. +> + +#### 内存管理 + +**[v6: mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8](http://lore.kernel.org/linux-mm/20230531154836.1366225-1-catalin.marinas@arm.com/)** + +> Here's version 6 of the series reducing the kmalloc() minimum alignment +> on arm64 to 8 (from 128). There are patches already to do the same for +> riscv (pretty straight-forward after this series). +> +> The first 11 patches decouple ARCH_KMALLOC_MINALIGN from +> ARCH_DMA_MINALIGN and, for arm64, limit the kmalloc() caches to those +> aligned to the run-time probed cache_line_size(). On arm64 we gain the +> kmalloc-{64,192} caches. +> + +**[v1: Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()](http://lore.kernel.org/linux-mm/20230531130833.635651916@infradead.org/)** + +> After much breaking of things, find here the improved version. +> + +**[v2: net-next: splice, net: Handle MSG_SPLICE_PAGES in AF_TLS](http://lore.kernel.org/linux-mm/20230531124528.699123-1-dhowells@redhat.com/)** + +> Here are patches to make AF_TLS handle the MSG_SPLICE_PAGES internal +> sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol +> that it should splice the pages supplied if it can. Its sendpage +> implementations are then turned into wrappers around that. +> + +**[v7: bio: check return values of bio_add_page](http://lore.kernel.org/linux-mm/cover.1685532726.git.johannes.thumshirn@wdc.com/)** + +> We have two functions for adding a page to a bio, __bio_add_page() which is +> used to add a single page to a freshly created bio and bio_add_page() which is +> used to add a page to an existing bio. +> +> While __bio_add_page() is expected to succeed, bio_add_page() can fail. +> + +**[v2: net-next: splice, net: Handle MSG_SPLICE_PAGES in AF_KCM](http://lore.kernel.org/linux-mm/20230531110423.643196-1-dhowells@redhat.com/)** + +> Here are patches to make AF_KCM handle the MSG_SPLICE_PAGES internal +> sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol +> that it should splice the pages supplied if it can. Its sendpage +> implementation is then turned into a wrapper around that. +> + +**[v2: net-next: splice, net: Handle MSG_SPLICE_PAGES in Chelsio-TLS](http://lore.kernel.org/linux-mm/20230531110008.642903-1-dhowells@redhat.com/)** + +> Here are patches to make Chelsio-TLS handle the MSG_SPLICE_PAGES internal +> sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol +> that it should splice the pages supplied if it can. Its sendpage +> implementation is then turned into a wrapper around that. +> + +**[v1: make unregistration of super_block shrinker more faster](http://lore.kernel.org/linux-mm/20230531095742.2480623-1-qi.zheng@linux.dev/)** + +> The kernel test robot noticed a -88.8% regression of stress-ng.ramfs.ops_per_sec +> on commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). More +> details can be seen from the link[1] below. +> + +**[v2: mm/migrate_device: Try to handle swapcache pages](http://lore.kernel.org/linux-mm/20230531044018.17893-1-mpenttil@redhat.com/)** + +> Migrating file pages and swapcache pages into device memory is not supported. +> The decision is done based on page_mapping(). For now, swapcache pages are not migrated. +> + +**[v1: mm: zswap: multiple zpool support](http://lore.kernel.org/linux-mm/20230531022911.1168524-1-yosryahmed@google.com/)** + +> Support using multiple zpools of the same type in zswap, for concurrency +> purposes. Add CONFIG_ZSWAP_NR_ZPOOLS_ORDER to control the number of +> zpools. The order is specific by the config rather than the absolute +> number to guarantee a power of 2. This is useful so that we can use +> deterministically link each entry to a zpool by hashing the zswap_entry +> pointer. +> + +**[v3: zswap: do not shrink if cgroup may not zswap](http://lore.kernel.org/linux-mm/20230530232435.3097106-1-nphamcs@gmail.com/)** + +> Before storing a page, zswap first checks if the number of stored pages +> exceeds the limit specified by memory.zswap.max, for each cgroup in the +> hierarchy. If this limit is reached or exceeded, then zswap shrinking is +> triggered and short-circuits the store attempt. +> + +**[v1: mm: zswap: support exclusive loads](http://lore.kernel.org/linux-mm/20230530210251.493194-1-yosryahmed@google.com/)** + +> Commit 71024cb4a0bf ("frontswap: remove frontswap_tmem_exclusive_gets") +> removed support for exclusive loads from frontswap as it was not used. +> +> Bring back exclusive loads support to frontswap by adding an +> exclusive_loads argument to frontswap_ops. Add support for exclusive +> loads to zswap behind CONFIG_ZSWAP_EXCLUSIVE_LOADS. +> + +**[v1: zswap: do not shrink when memory.zswap.max is 0](http://lore.kernel.org/linux-mm/20230530162153.836565-1-nphamcs@gmail.com/)** + +> Before storing a page, zswap first checks if the number of stored pages +> exceeds the limit specified by memory.zswap.max, for each cgroup in the +> hierarchy. If this limit is reached or exceeded, then zswap shrinking is +> triggered and short-circuits the store attempt. +> + +**[v2: net-next: crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES)](http://lore.kernel.org/linux-mm/20230530141635.136968-1-dhowells@redhat.com/)** + +> Here's the fourth tranche of patches towards providing a MSG_SPLICE_PAGES +> internal sendmsg flag that is intended to replace the ->sendpage() op with +> calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol +> that it should splice the pages supplied if it can. +> + +**[v4: sock: Improve condition on sockmem pressure](http://lore.kernel.org/linux-mm/20230530114011.13368-1-wuyun.abel@bytedance.com/)** + +> Currently the memcg's status is also accounted into the socket's +> memory pressure to alleviate the memcg's memstall. But there are +> still cases that can be improved. Please check the patches for +> detailed info. +> + +**[v2: string: use __builtin_memcpy() in strlcpy/strlcat](http://lore.kernel.org/linux-mm/20230530083911.1104336-1-glider@google.com/)** + +> lib/string.c is built with -ffreestanding, which prevents the compiler +> from replacing certain functions with calls to their library versions. +> + +**[v1: -next: mm: page_alloc: simplify has_managed_dma()](http://lore.kernel.org/linux-mm/20230529144022.42927-1-wangkefeng.wang@huawei.com/)** + +> The ZONE_DMA should only exists on Node 0, only check NODE_DATA(0) +> is enough, so simplify has_managed_dma() and make it inline. +> + +**[v1: mm: free retracted page table by RCU](http://lore.kernel.org/linux-mm/35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com/)** + +> Here is the third series of patches to mm (and a few architectures), based +> on v6.4-rc3 with the preceding two series applied: in which khugepaged +> takes advantage of pte_offset_map[_lock]() allowing for pmd transitions. +> + +**[v1: Do not print page type when the page has no type](http://lore.kernel.org/linux-mm/ZHI0YKzZADjr1nyq@casper.infradead.org/)** + +> It is confusing and unnecessary to print the page type when the +> page has no type. +> + +#### 文件系统 + +**[v2: Create large folios in iomap buffered write path](http://lore.kernel.org/linux-fsdevel/20230602222445.2284892-1-willy@infradead.org/)** + +> Commit ebb7fb1557b1 limited the length of ioend chains to 4096 entries +> to improve worst-case latency. Unfortunately, this had the effect of +> limiting the performance of: +> +> fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30 \ +> -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 \ +> -numjobs=4 -directory=/mnt/test +> +> The problem ends up being lock contention on the i_pages spinlock as we +> clear the writeback bit on each folio (and propagate that up through +> the tree). By using larger folios, we decrease the number of folios +> to be processed by a factor of 256 for this benchmark, eliminating the +> lock contention. +> + +**[v1: highmem: Rename put_and_unmap_page() to unmap_and_put_page()](http://lore.kernel.org/linux-fsdevel/20230602103307.5637-1-fmdefrancesco@gmail.com/)** + +> With commit 849ad04cf562a ("new helper: put_and_unmap_page()"), Al Viro +> introduced the put_and_unmap_page() to use in those many places where we +> have a common pattern consisting of calls to kunmap_local() + +> put_page(). +> + +**[v1: fs: Rename put_and_unmap_page() to unmap_and_put_page()](http://lore.kernel.org/linux-fsdevel/20230601132317.13606-1-fmdefrancesco@gmail.com/)** + +> With commit 849ad04cf562a ("new helper: put_and_unmap_page()"), Al Viro +> introduced the put_and_unmap_page() to use in those many places where we +> have a common pattern consisting of calls to kunmap_local() + +> put_page(). +> + +**[v2: zonefs: use iomap for synchronous direct writes](http://lore.kernel.org/linux-fsdevel/20230601125636.205191-1-dlemoal@kernel.org/)** + +> Remove the function zonefs_file_dio_append() that is used to manually +> issue REQ_OP_ZONE_APPEND BIOs for processing synchronous direct writes +> and use iomap instead. +> + +**[v1: fs.h: Optimize file struct to prevent false sharing](http://lore.kernel.org/linux-fsdevel/20230601092400.27162-1-zhiyin.chen@intel.com/)** + +> In the syscall test of UnixBench, performance regression occurred due +> to false sharing. +> + +**[v1: fuse: Abort the requests under processing queue with a spin_lock](http://lore.kernel.org/linux-fsdevel/20230531092643.45607-1-quic_pragalla@quicinc.com/)** + +> There is a potential race/timing issue while aborting the +> requests on processing list between fuse_dev_release() and +> fuse_abort_conn(). This is resulting into below warnings +> and can even result into UAF issues. +> + +**[v3: NFSD: recall write delegation on GETATTR conflict](http://lore.kernel.org/linux-fsdevel/1685500507-23598-1-git-send-email-dai.ngo@oracle.com/)** + +> This patch series adds the recall of write delegation when there is +> conflict with a GETATTR and a counter in /proc/net/rpc/nfsd to keep +> count of this recall. +> + +**[v4: fs/sysv: Null check to prevent null-ptr-deref bug](http://lore.kernel.org/linux-fsdevel/20230531013141.19487-1-princekumarmaurya06@gmail.com/)** + +> sb_getblk(inode->i_sb, parent) return a null ptr and taking lock on +> that leads to the null-ptr-deref bug. +> +> Reported-by: syzbot+aad58150cbc64ba41bdc@syzkaller.appspotmail.com +> Closes: https://syzkaller.appspot.com/bug?extid=aad58150cbc64ba41bdc +> + +**[v1: sysctl: move umh and keys sysctls](http://lore.kernel.org/linux-fsdevel/20230530232914.3689712-1-mcgrof@kernel.org/)** + +> If you look at kernel/sysctl.c there are two sysctl arrays which +> are declared in header files but registered with no good reason now +> on kernel/sysctl.c instead of the place they belong. So just do +> the registration where it belongs. +> + +**[v2: multiblock allocator improvements](http://lore.kernel.org/linux-fsdevel/cover.1685449706.git.ojaswin@linux.ibm.com/)** + +> So this patch was intended to remove a dead if-condition but it was not +> actually dead code and removing it was causing a performance regression. +> Unfortunately I somehow missed that when I was reviewing his patchset +> and it already went in so I had to revert the commit. I've added details +> of the regression and root cause in the revert commit. Also attaching +> the performance numbers I observer: +> + +**[v1: fs/buffer: using __bio_add_page in submit_bh_wbc()](http://lore.kernel.org/linux-fsdevel/20230530033239.17534-1-gouhao@uniontech.com/)** + +> In submit_bh_wbc(), bio is newly allocated, so it +> does not need any merging logic. +> +> And using bio_add_page here will execute 'bio_flagged( +> bio, BIO_CLONED)' and 'bio_full' twice, which is unnecessary. +> + +**[v1: FUSE: dev: Change the posiion of spin_lock](http://lore.kernel.org/linux-fsdevel/20230529015656.3099390-1-lijun01@kylinos.cn/)** + +> just list_del need spin_lock ,so the spin_lock should be close to +> "list_del(&req->list)", this may add a little benefit. +> + +**[v1: Null check to prevent null-ptr-deref bug](http://lore.kernel.org/linux-fsdevel/20230528173546.593511-1-princekumarmaurya06@gmail.com/)** + +> sb_getblk(inode->i_sb, parent) return a null ptr and taking lock on +> that leads to the null-ptr-deref bug. +> + +#### 网络设备 + +**[v5: net-next: net: flower: add cfm support](http://lore.kernel.org/netdev/20230604115825.2739031-1-zahari.doychev@linux.com/)** + +> The first patch adds cfm support to the flow dissector. +> The second adds the flower classifier support. +> The third adds a selftest for the flower cfm functionality. +> +> iproute2 changes will come in follow up patches. +> + +**[v4: vsock: MSG_ZEROCOPY flag support](http://lore.kernel.org/netdev/20230603204939.1598818-1-AVKrasnov@sberdevices.ru/)** + +> Difference with copy way is not significant. During packet allocation, +> non-linear skb is created and filled with pinned user pages. +> There are also some updates for vhost and guest parts of transport - in +> both cases i've added handling of non-linear skb for virtio part. vhost +> copies data from such skb to the guest's rx virtio buffers. In the guest, +> virtio transport fills tx virtio queue with pages from skb. +> + +**[v1: Add support for sam9x7 SoC family](http://lore.kernel.org/netdev/20230603200243.243878-1-varshini.rajendran@microchip.com/)** + +> This patch series adds support for the new SoC family - sam9x7. +> - The device tree, configs and drivers are added +> - Clock driver for sam9x7 is added +> - Support for basic peripherals is added +> + +**[v1: RDMA/siw: Fabricate a GID on tun and loopback devices](http://lore.kernel.org/netdev/168580524310.5238.13720896895363588620.stgit@oracle-102.nfsv4bat.org/)** + +> LOOPBACK and NONE (tunnel) devices have all-zero MAC addresses. +> Currently, siw_device_create() falls back to copying the IB device's +> name in those cases, because an all-zero MAC address breaks the RDMA +> core address resolution mechanism. +> + +**[v1: net-next: Move KSZ9477 errata handling to PHY driver](http://lore.kernel.org/netdev/20230602234019.436513-1-robert.hancock@calian.com/)** + +> Patches to move handling for KSZ9477 PHY errata register fixes from +> the DSA switch driver into the corresponding PHY driver, for more +> proper layering and ordering. +> + +**[v1: net: dsa: realtek: rtl8365mb: add missing case for digital interface 0](http://lore.kernel.org/netdev/40df61cc5bebe94e4d7d32f79776be0c12a37d61.1685746295.git.chunkeey@gmail.com/)** + +> when bringing up the switch on a Netgear WNDAP660, I observed that +> no traffic got passed from the RTL8363 to the ethernet interface... +> +> Turns out, this was because the dropped case for +> RTL8365MB_DIGITAL_INTERFACE_SELECT_REG(0) that +> got deleted by accident. +> + +**[v10: vfio: pds_vfio driver](http://lore.kernel.org/netdev/20230602220318.15323-1-brett.creeley@amd.com/)** + +> This is a patchset for a new vendor specific VFIO driver +> (pds_vfio) for use with the AMD/Pensando Distributed Services Card +> (DSC). This driver makes use of the pds_core driver. +> + +**[v1: RDMA/core: Handle ARPHRD_NONE devices](http://lore.kernel.org/netdev/168573386075.5660.5037682341906748826.stgit@oracle-102.nfsv4bat.org/)** + +> We would like to enable the use of siw on top of a VPN that is +> constructed and managed via a tun device. That hasn't worked up +> until now because ARPHRD_NONE devices (such as tun devices) have +> no GID for the RDMA/core to look up. +> + +**[v1: net: dsa: realtek: rtl8365mb: use mdio passthrough to access PHYs](http://lore.kernel.org/netdev/0df383e20e5a90494e3cbd0cf23c508c5c943ab4.1685725191.git.chunkeey@gmail.com/)** + +> when bringing up the PHYs on a Netgear WNDAP660, I observed that +> none of the PHYs are getting enumerated and the rtl8365mb fails +> to load. +> + +**[v1: net: rfs: annotate lockless accesses](http://lore.kernel.org/netdev/20230602163141.2115187-1-edumazet@google.com/)** + +> rfs runs without locks held, so we should annotate +> read and writes to shared variables. +> + +**[v5: net-next: net: ioctl: Use kernel memory on protocol ioctl callbacks](http://lore.kernel.org/netdev/20230602163044.1820619-1-leitao@debian.org/)** + +> Most of the ioctls to net protocols operates directly on userspace +> argument (arg). Usually doing get_user()/put_user() directly in the +> ioctl callback. This is not flexible, because it is hard to reuse these +> functions without passing userspace buffers. +> + +**[v1: iproute2: ipaddress: accept symbolic names](http://lore.kernel.org/netdev/20230602155419.8958-1-stephen@networkplumber.org/)** + +> The function rtnl_addproto_a2n() was defined but never used. +> Use it to allow for symbolic names, and fix the function signatures +> so protocol value is consistently __u8. +> + +**[v1: net-next: complete Lynx mdio device handling](http://lore.kernel.org/netdev/ZHoOe9K%2FdZuW2pOe@shell.armlinux.org.uk/)** + +> This series completes the mdio device lifetime handling for Lynx PCS +> users which do not create their own mdio device, but instead fetch +> it using a firmware description - namely the DPAA2 and FMAN_MEMAC +> drivers. +> + +**[v1: net: net/sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values](http://lore.kernel.org/netdev/20230602123747.2056178-1-edumazet@google.com/)** + +> We got multiple syzbot reports, all duplicates of the following [1] +> +> syzbot managed to install fq_pie with a zero TCA_FQ_PIE_QUANTUM, +> thus triggering infinite loops. +> + +**[[PATCH RESEND net-next 0/5] Improve the taprio qdisc's relationship with its children](http://lore.kernel.org/netdev/20230602103750.2290132-1-vladimir.oltean@nxp.com/)** + +> [ Original patch set was lost due to an apparent transient problem with +> kernel.org's DNSBL setup. This is an identical resend. ] +> +> Prompted by Vinicius' request to consolidate some child Qdisc +> dereferences in taprio: +> https://lore.kernel.org/netdev/87edmxv7x2.fsf@intel.com/ +> + +**[v1: net: enetc: correct the statistics of rx bytes](http://lore.kernel.org/netdev/20230602094659.965523-1-wei.fang@nxp.com/)** + +> The purpose of this patch set is to fix the issue of rx bytes +> statistics. The first patch corrects the rx bytes statistics +> of normal kernel protocol stack path, and the second patch is +> used to correct the rx bytes statistics of XDP. +> + +**[v1: net-next: ipv6: lower "link become ready"'s level message](http://lore.kernel.org/netdev/20230601-net-next-skip_print_link_becomes_ready-v1-1-7ff2b88dc9b8@tessares.net/)** + +> This following message is printed in the console each time a network +> device configured with an IPv6 addresses is ready to be used: +> + +**[v5: net-next: sock: Improve condition on sockmem pressure](http://lore.kernel.org/netdev/20230602081135.75424-1-wuyun.abel@bytedance.com/)** + +> Currently the memcg's status is also accounted into the socket's +> memory pressure to alleviate the memcg's memstall. But there are +> still cases that can be improved. Please check the patches for +> detailed info. +> + +**[v2: bpf-next: bpf, x86: allow function arguments up to 14 for TRACING](http://lore.kernel.org/netdev/20230602065958.2869555-1-imagedong@tencent.com/)** + +> For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used +> on the kernel functions whose arguments count less than 6. This is not +> friendly at all, as too many functions have arguments count more than 6. +> +> Therefore, let's enhance it by increasing the function arguments count +> allowed in arch_prepare_bpf_trampoline(), for now, only x86_64. +> + +**[v4: Introduce a vringh accessor for IO memory](http://lore.kernel.org/netdev/20230602055211.309960-1-mie@igel.co.jp/)** + +> Vringh is a host-side implementation of virtio rings, and supports the vring +> located on three kinds of memories, userspace, kernel space and a space +> translated iotlb. +> + +**[v1: net-next: tools: ynl-gen: dust off the user space code](http://lore.kernel.org/netdev/20230602023548.463441-1-kuba@kernel.org/)** + +> Every now and then I wish I finished the user space part of +> the netlink specs, Python scripts kind of stole the show but +> C is useful for selftests and stuff which needs to be fast. +> Recently someone asked me how to access devlink and ethtool +> from C++ which pushed me over the edge. +> + +**[v6: net-next: net: dsa: mv88e6xxx: implement USXGMII mode for mv88e6393x](http://lore.kernel.org/netdev/20230602001705.2747-1-msmulski2@gmail.com/)** + +> Changes from previous version: +> * use phylink_decode_usxgmii_word() to decode USXGMII link state +> * use existing include/uapi/linux/mdio.h defines when parsing status bits +> + +**[v6: net-next: Brcm ASP 2.0 Ethernet Controller](http://lore.kernel.org/netdev/1685657551-38291-1-git-send-email-justin.chen@broadcom.com/)** + +> Add support for the Broadcom ASP 2.0 Ethernet controller which is first +> introduced with 72165. +> + +**[v1: net: dsa: qca8k: add CONFIG_LEDS_TRIGGERS dependency](http://lore.kernel.org/netdev/20230601213111.3182893-1-arnd@kernel.org/)** + +> There is a mix of 'depends on' and 'select' for LEDS_TRIGGERS, so it's +> not clear what we should use here, but in general using 'depends on' +> causes fewer problems, so use that. +> + +**[v1: net: tcp: gso: really support BIG TCP](http://lore.kernel.org/netdev/20230601211732.1606062-1-edumazet@google.com/)** + +> oldlen name is a bit misleading, as it is the contribution +> of skb->len on the input skb TCP checksum. I added a comment +> to clarify this point. +> + +**[v3: net/sctp: Make sha1 as default algorithm if fips is enabled](http://lore.kernel.org/netdev/1685643474-18654-1-git-send-email-kashwindayan@vmware.com/)** + +> MD5 is not FIPS compliant. But still md5 was used as the +> default algorithm for sctp if fips was enabled. +> Due to this, listen() system call in ltp tests was +> failing for sctp in fips environment, with below error message. +> + +**[GIT PULL: Networking for v6.4-rc5](http://lore.kernel.org/netdev/20230601180906.238637-1- + +> Additional napi fields such as PID association for napi +> thread etc. can be supported in a follow-on patch set. +> +> This series only supports 'get' ability for retrieving +> napi fields (specifically, napi ids and queue[s]). The 'set' +> ability for setting queue[s] associated with a napi instance +> via netdev-genl will be submitted as a separate patch series. +> + +#### 安全增强 + +**[v5: checkpatch: Check for 0-length and 1-element arrays](http://lore.kernel.org/linux-hardening/20230601160746.up.948-kees@kernel.org/)** + +> Fake flexible arrays have been deprecated since last millennium. Proper +> C99 flexible arrays must be used throughout the kernel so +> CONFIG_FORTIFY_SOURCE and CONFIG_UBSAN_BOUNDS can provide proper array +> bounds checking. +> +> Fixed-by: Joe Perches +> + +**[v1: s390/purgatory: Do not use fortified string functions](http://lore.kernel.org/linux-hardening/20230531003414.never.050-kees@kernel.org/)** + +> This means that the memcpy() calls with "buf" as a destination in +> sha256.c's code will attempt to perform run-time bounds checking, which +> could lead to calling missing functions, specifically a potential +> WARN_ONCE, which isn't callable from purgatory. +> + +**[v1: x86/purgatory: Do not use fortified string functions](http://lore.kernel.org/linux-hardening/20230531003345.never.325-kees@kernel.org/)** + +> This means that the memcpy() calls with "buf" as a destination in +> sha256.c's code will attempt to perform run-time bounds checking, which +> could lead to calling missing functions, specifically a potential +> WARN_ONCE, which isn't callable from purgatory. +> + +**[v1: next: firewire: Replace zero-length array with flexible-array member](http://lore.kernel.org/linux-hardening/ZHT0V3SpvHyxCv5W@work/)** + +> Zero-length and one-element arrays are deprecated, and we are moving +> towards adopting C99 flexible-array members, instead. +> + +**[v1: next: drm/amdgpu/discovery: Replace fake flex-arrays with flexible-array members](http://lore.kernel.org/linux-hardening/ZHO4%2FZ+iO+lqV4rW@work/)** + +> Zero-length and one-element arrays are deprecated, and we are moving +> towards adopting C99 flexible-array members, instead. +> +> Use the DECLARE_FLEX_ARRAY() helper macro to transform zero-length +> arrays in a union into flexible-array members. And replace a one-element +> array with a C99 flexible-array member. +> + +#### Rust For Linux + +**[v1: add abstractions for network device drivers](http://lore.kernel.org/rust-for-linux/01010188843258ec-552cca54-4849-4424-b671-7a5bf9b8651a-000000@us-west-2.amazonses.com/)** + +> This patchset adds minimum abstractions for network device drivers and +> Rust dummy network device driver, a simpler version of drivers/net/dummy.c. +> +> The dummy network device driver doesn't attach any bus such as PCI so +> the dependency is minimum. Hopefully, it would make reviewing easier. +> + +**[v2: Rust scatterlist abstractions](http://lore.kernel.org/rust-for-linux/20230602101819.2134194-1-changxian.cqs@antgroup.com/)** + +> This is a version of scatterlist abstractions for Rust drivers. +> +> Scatterlist is used for efficient management of memory buffers, which is +> essential for many kernel-level operations such as Direct Memory Access +> (DMA) transfers and crypto APIs. +> + +**[v2: rust: workqueue: add bindings for the workqueue](http://lore.kernel.org/rust-for-linux/20230601134946.3887870-1-aliceryhl@google.com/)** + +> This patchset contains bindings for the kernel workqueue. +> +> One of the primary goals behind the design used in this patch is that we +> must support embedding the `work_struct` as a field in user-provided +> types, because this allows you to submit things to the workqueue without +> having to allocate, making the submission infallible. If we didn't have +> to support this, then the patch would be much simpler. One of the main +> things that make it complicated is that we must ensure that the function +> pointer in the `work_struct` is compatible with the struct it is +> contained within. +> + +**[v1: rust: error: integrate Rust error type with `errname`](http://lore.kernel.org/rust-for-linux/20230531174450.3733220-1-aliceryhl@google.com/)** + +> This integrates the `Error` type with the `errname` by making it +> accessible via the `name` method or via the `Debug` trait. +> + +#### BPF + +**[v11: evm: Do HMAC of multiple per LSM xattrs for new inodes](http://lore.kernel.org/bpf/20230603191518.1397490-1-roberto.sassu@huaweicloud.com/)** + +> One of the major goals of LSM stacking is to run multiple LSMs side by side +> without interfering with each other. The ultimate decision will depend on +> individual LSM decision. +> + +**[[PATCH RESEND bpf-next 00/18] BPF token](http://lore.kernel.org/bpf/20230602150011.1657856-1-andrii@kernel.org/)** + +> *Resending with trimmed CC list because original version didn't make it to +> the mailing list.* +> +> This patch set introduces new BPF object, BPF token, which allows to delegate +> a subset of BPF functionality from privileged system-wide daemon (e.g., +> systemd or any other container manager) to a *trusted* unprivileged +> application. Trust is the key here. This functionality is not about allowing +> unconditional unprivileged BPF usage. Establishing trust, though, is +> completely up to the discretion of respective privileged application that +> would create a BPF token. +> + +**[v1: selftests/bpf: Add missing selftests kconfig options](http://lore.kernel.org/bpf/20230602140108.1177900-1-void@manifault.com/)** + +> Our selftests of course rely on the kernel being built with +> CONFIG_DEBUG_INFO_BTF=y, though this (nor its dependencies of +> CONFIG_DEBUG_INFO=y and CONFIG_DEBUG_INFO_DWARF4=y) are not specified. +> This causes the wrong kernel to be built, and selftests to similarly +> fail to build. +> + +**[v10: vhost: virtio core prepares for AF_XDP](http://lore.kernel.org/bpf/20230602092206.50108-1-xuanzhuo@linux.alibaba.com/)** + +> Now, virtio may can not work with DMA APIs when virtio features do not have +> VIRTIO_F_ACCESS_PLATFORM. +> +> 1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just +> work with the "real" devices. +> 2. I tried to let xsk support callballs to get phy address from virtio-net +> driver as the dma address. But the maintainers of xsk may want to use dma-buf +> to replace the DMA APIs. I think that may be a larger effort. We will wait +> too long. +> + +**[v1: bpf-next: bpf: Support ->fill_link_info for kprobe prog](http://lore.kernel.org/bpf/20230602085239.91138-1-laoar.shao@gmail.com/)** + +> Currently, it is not easy to determine which functions are probed by a +> kprobe_multi program. This patchset supports ->fill_link_info for it, +> allowing the user to easily obtain the probed functions. +> +> Although the user can retrieve the functions probed by a perf_event +> program using `bpftool perf show`, it would be beneficial to also support +> ->fill_link_info. This way, the user can obtain it in the same manner as +> other bpf links. +> + +**[v2: bpf-next: bpf_refcount followups (part 1)](http://lore.kernel.org/bpf/20230602022647.1571784-1-davemarchevsky@fb.com/)** + +> This series is the first of two (or more) followups to address issues in the +> bpf_refcount shared ownership implementation discovered by Kumar. +> Specifically, this series addresses the "bpf_refcount_acquire on non-owning ref +> in another tree" scenario described in [0], and does _not_ address issues +> raised in [1]. Further followups will address the other issues. +> + +**[v2: bpf-next: bpf/xdp: optimize bpf_xdp_pointer to avoid reading sinfo](http://lore.kernel.org/bpf/168563651438.3436004.17735707525651776648.stgit@firesoul/)** + +> Currently we observed a significant performance degradation in +> samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling, +> added in commit 772251742262 ("samples/bpf: fixup some tools to be able +> to support xdp multibuffer"). +> + +**[v1: bpf-next: bpf: getsockopt hook to get optval without checking kernel retval](http://lore.kernel.org/bpf/20230601024900.22902-1-zhoufeng.zf@bytedance.com/)** + +> Remove the judgment on retval and pass bpf ctx by default. The +> advantage of this is that it is more flexible. Bpf getsockopt can +> support the new optname without using the module to call the +> nf_register_sockopt to register. +> + +**[v1: bpf-next: bpf: support BTF kind metadata to separate](http://lore.kernel.org/bpf/20230531201936.1992188-1-alan.maguire@oracle.com/)** + +> BTF kind metadata provides information to parse BTF kinds. +> By separating parsing BTF from using all the information +> it provides, we allow BTF to encode new features even if +> they cannot be used. This is helpful in particular for +> cases where newer tools for BTF generation run on an +> older kernel; BTF kinds may be present that the kernel +> cannot yet use, but at least it can parse the BTF +> provided. Meanwhile userspace tools with newer libbpf +> may be able to use the newer information. +> + +**[v1: net: ice: recycle/free all of the fragments from multi-buffer frame](http://lore.kernel.org/bpf/20230531154457.3216621-1-anthony.l.nguyen@intel.com/)** + +> The ice driver caches next_to_clean value at the beginning of +> ice_clean_rx_irq() in order to remember the first buffer that has to be +> freed/recycled after main Rx processing loop. The end boundary is +> indicated by first descriptor of frame that Rx processing loop has ended +> its duties. Note that if mentioned loop ended in the middle of gathering +> multi-buffer frame, next_to_clean would be pointing to the descriptor in +> the middle of the frame BUT freeing/recycling stage will stop at the +> first descriptor. This means that next iteration of ice_clean_rx_irq() +> will miss the (first_desc, next_to_clean - 1) entries. +> + +**[v1: bpf/tests: Use struct_size()](http://lore.kernel.org/bpf/20230531043251.989312-1-suhui@nfschina.com/)** + +> Use struct_size() instead of hand writing it. +> This is less verbose and more informative. +> + +**[v1: net: bpf, sockmap: avoid potential NULL dereference in sk_psock_verdict_data_ready()](http://lore.kernel.org/bpf/20230530195149.68145-1-edumazet@google.com/)** + +> syzbot found sk_psock(sk) could return NULL when called +> from sk_psock_verdict_data_ready(). +> +> Just make sure to handle this case. +> + +**[v2: bpf-next: verify scalar ids mapping in regsafe()](http://lore.kernel.org/bpf/20230530172739.447290-1-eddyz87@gmail.com/)** + +> To represent this set I use a u32_hashset data structure derived from +> tools/lib/bpf/hashmap.h. I tested it locally (see [1]), but I think +> that ideally it should be tested using KUnit. However, AFAIK, this +> would be the first use of KUnit in context of BPF verifier. +> If people are ok with this, I will prepare the tests and necessary +> CI integration. +> + +**[v1: bpf-next: samples/bpf: xdp1 and xdp2 reduce XDPBUFSIZE to 60](http://lore.kernel.org/bpf/168545704139.2996228.2516528552939485216.stgit@firesoul/)** + +> Default samples/pktgen scripts send 60 byte packets as hardware +> adds 4-bytes FCS checksum, which fulfils minimum Ethernet 64 bytes +> frame size. +> +> XDP layer will not necessary have access to the 4-bytes FCS checksum. +> + +**[v2: bpf-next: xsk: multi-buffer support](http://lore.kernel.org/bpf/20230529155024.222213-1-maciej.fijalkowski@intel.com/)** + +> This series of patches add multi-buffer support for AF_XDP. XDP and +> various NIC drivers already have support for multi-buffer packets. With +> this patch set, programs using AF_XDP sockets can now also receive and +> transmit multi-buffer packets both in copy as well as zero-copy mode. +> ZC multi-buffer implementation is based on ice driver. +> + +**[v1: net: tcp: introduce a compack timer handler in sack compression](http://lore.kernel.org/bpf/20230529113804.GA20300@didi-ThinkCentre-M920t-N000/)** + +> We've got some issues when sending a compressed ack is deferred to +> release phrase due to the socket owned by another user: +> 1. a compressed ack would not be sent because of lack of ICSK_ACK_TIMER +> flag. +> 2. the tp->compressed_ack counter should be decremented by 1. +> 3. we cannot pass timeout check and reset the delack timer in +> tcp_delack_timer_handler(). +> 4. we are not supposed to increment the LINUX_MIB_DELAYEDACKS counter. +> ... +> + +**[v1: bpf-next: multi-buffer support for XDP_REDIRECT samples](http://lore.kernel.org/bpf/20230529110608.597534-1-tariqt@nvidia.com/)** + +> This series adds multi-buffer support for two XDP_REDIRECT sample programs. +> It follows the pattern from xdp1 and xdp2. +> + +**[v2: net-next: support non-frag page for page_pool_alloc_frag()](http://lore.kernel.org/bpf/20230529092840.40413-1-linyunsheng@huawei.com/)** + +> In [1] & [2], there are usecases for veth and virtio_net to +> use frag support in page pool to reduce memory usage, and it +> may request different frag size depending on the head/tail +> room space for xdp_frame/shinfo and mtu/packet size. When the +> requested frag size is large enough that a single page can not +> be split into more than one frag, using frag support only have +> performance penalty because of the extra frag count handling +> for frag support. +> + +**[v1: bpf-next: bpf: Support ->show_fdinfo and ->fill_link_info for kprobe prog](http://lore.kernel.org/bpf/20230528142027.5585-1-laoar.shao@gmail.com/)** + +> Currently, it is not easy to determine which functions are probed by a +> kprobe_multi program. This patchset supports ->show_fdinfo and +> ->fill_link_info for it, allowing the user to easily obtain the probed +> functions. +> + +### 周边技术动态 + +#### Qemu + +**[RFC: target/riscv: Add support for Zacas extension](http://lore.kernel.org/qemu-devel/20230602121638.36342-1-rbradford@rivosinc.com/)** + +> The Zacas[1] extension is a proposed unprivileged ISA extension for +> adding support for atomic compare-and-swap. Since this extension is not +> yet frozen (although no significant changes are expected) these patches +> are RFC/informational. +> + +**[v2: linux-user/riscv: Add syscall riscv_hwprobe](http://lore.kernel.org/qemu-devel/f59f948fc42fdf0b250afd6dcd6f232013480d9c.camel@rivosinc.com/)** + +> This patch adds the new syscall for the +> "RISC-V Hardware Probing Interface" +> (https://docs.kernel.org/riscv/hwprobe.html). +> + +**[v7: hw/riscv/virt: pflash improvements](http://lore.kernel.org/qemu-devel/20230601045910.18646-1-sunilvl@ventanamicro.com/)** + +> This series improves the pflash usage in RISC-V virt machine with solutions to +> below issues. +> +> 1) Currently the first pflash is reserved for ROM/M-mode firmware code. But S-mode +> payload firmware like EDK2 need both pflash devices to have separate code and variable +> store so that OS distros can keep the FW code as read-only. +> + +**[v1: disas/riscv: Add vendor extension support](http://lore.kernel.org/qemu-devel/20230530131843.1186637-1-christoph.muellner@vrull.eu/)** + +> This series adds vendor extension support to the QEMU disassembler +> for RISC-V. The following vendor extensions are covered: +> * XThead{Ba,Bb,Bs,Cmo,CondMov,FMemIdx,Fmv,Mac,MemIdx,MemPair,Sync} +> * XVentanaCondOps +> + +#### Buildroot + +**[package/openjdk{-bin}: security bump versions to 11.0.19+7 and 17.0.7+7](http://lore.kernel.org/buildroot/20230602202425.2C00186BCA@busybox.osuosl.org/)** + +> For details, see the announcements: +> https://mail.openjdk.org/pipermail/jdk-updates-dev/2023-April/021899.html +> https://mail.openjdk.org/pipermail/jdk-updates-dev/2023-April/021900.html +> + +#### U-Boot + +**[v4: SPL NVMe support](http://lore.kernel.org/u-boot/20230603140256.2443518-1-mchitale@ventanamicro.com/)** + +> This patchset adds support to load images of the SPL's next booting +> stage from a NVMe device. +> + +**[v1: riscv: JH7110: move pll clocks to their own device node (Was: The latest U-boot...) visionfive2 1.3B board](http://lore.kernel.org/u-boot/20230602171054.GB27915@lst.de/)** + +> Here is the revert, along with a work in progress attempt to make the DT +> match the hardware. Conor had asked me to share it, regardless of its +> early stage. It compiles, and boots Linux kernels, but there is no PLL +> driver I can find currently. So clocks are still hanging in PROBE_DEFER. +> + + +## 20230528:第 47 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v1: riscv: Reduce ARCH_KMALLOC_MINALIGN to 8](http://lore.kernel.org/linux-riscv/20230526165958.908-1-jszhang@kernel.org/)** + +> Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E +> 64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel +> Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus +> it brings some bad effects to for coherent platforms: +> +> Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and +> kmalloc-8 slab caches don't exist any more, they are replaced with +> either kmalloc-128 or kmalloc-64. +> + +**[v1: RISC-V: mark hibernation as nonportable](http://lore.kernel.org/linux-riscv/20230526-astride-detonator-9ae120051159@wendy/)** + +> Hibernation support depends on firmware marking its reserved/PMP +> protected regions as not accessible from Linux. +> The latest versions of the de-facto SBI implementation (OpenSBI) do +> not do this, having dropped the no-map property to enable 1 GiB huge +> page mappings by the kernel. +> This was exposed by commit 3335068f8721 ("riscv: Use PUD/P4D/PGD pages +> for the linear mapping"), which made the first 2 MiB of DRAM (where SBI +> typically resides) accessible by the kernel. +> + +**[v2: RISC-V: KVM: Ensure SBI extension is enabled](http://lore.kernel.org/linux-riscv/20230526102540.105013-1-ajones@ventanamicro.com/)** + +> Ensure guests can't attempt to invoke SBI extension functions when the +> SBI extension's probe function has stated that the extension is not +> available. +> + +**[v1: Add initialization of clock for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230526062529.46747-1-william.qiu@starfivetech.com/)** + +> This patchset adds initial rudimentary support for the StarFive +> Quad SPI controller driver. And this driver will be used in +> StarFive's VisionFive 2 board. In 6.4, the QSPI_AHB and QSPI_APB +> clocks changed from the default ON state to the default OFF state, +> so these clocks need to be enabled in the driver.At the same time, +> dts patch is added to this series. +> + +**[v2: RISCV: Add KVM_GET_REG_LIST API](http://lore.kernel.org/linux-riscv/cover.1684999824.git.haibo1.xu@intel.com/)** + +> KVM_GET_REG_LIST will dump all register IDs that are available to +> KVM_GET/SET_ONE_REG and It's very useful to identify some platform +> regression issue during VM migration. +> + +**[v1: riscv: Kconfig: Add select ARM_AMBA to SOC_STARFIVE](http://lore.kernel.org/linux-riscv/20230525061836.79223-1-jiajie.ho@starfivetech.com/)** + +> Selects ARM_AMBA platform support for StarFive SoCs required by spi and +> crypto dma engine. +> + +**[v1: tools/nolibc: riscv: Add full rv32 support](http://lore.kernel.org/linux-riscv/cover.1684949267.git.falcon@tinylab.org/)** + +> In the first series [1], we have fixed up the compile errors about +> _start and __NR_llseek for rv32, but left compile errors about tons of +> time32 syscalls (removed after kernel commit d4c08b9776b3 ("riscv: Use +> latest system call ABI")) and the missing fstat in nolibc-test.c [2], +> now we have fixed up all of them. +> + +**[v1: Add support for Allwinner GPADC on D1/T113s/R329 SoCs](http://lore.kernel.org/linux-riscv/20230524082744.3215427-1-bigunclemax@gmail.com/)** + +> This series adds support for general purpose ADC (GPADC) on new +> Allwinner's SoCs, such as D1, T113s and R329. The implemented driver +> provides basic functionality for getting ADC channels data. +> + +**[v2: dmaengine: pl330: rename _start to prevent build error](http://lore.kernel.org/linux-riscv/20230524045310.27923-1-rdunlap@infradead.org/)** + +> "_start" is used in several arches and proably should be reserved +> for ARCH usage. Using it in a driver for a private symbol can cause +> a build error when it conflicts with ARCH usage of the same symbol. +> + +**[v1: riscv: mm: try VMA lock-based page fault handling first](http://lore.kernel.org/linux-riscv/20230523165942.2630-1-jszhang@kernel.org/)** + +> Attempt VMA lock-based page fault handling first, and fall back to the +> existing mmap_lock-based handling if that fails. +> + +**[v2: riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION](http://lore.kernel.org/linux-riscv/20230523165502.2592-1-jszhang@kernel.org/)** + +> When trying to run linux with various opensource riscv core on +> resource limited FPGA platforms, for example, those FPGAs with less +> than 16MB SDRAM, I want to save mem as much as possible. One of the +> major technologies is kernel size optimizations, I found that riscv +> does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which +> passes -fdata-sections, -ffunction-sections to CFLAGS and passes the +> --gc-sections flag to the linker. +> + +**[v3: Add Zawrs support and use it for spinlocks](http://lore.kernel.org/linux-riscv/20230521114715.955823-1-heiko.stuebner@vrull.eu/)** + +> Zawrs [0] was ratified in november 2022 [1], so I've resurrect the patch +> adding Zawrs support for spinlocks and adapted it to recent kernel +> changes. +> +> Also incorporated are the nice comments David Laight provided on v2. +> + +**[v1: tools/nolibc: autodetect stackprotector availability from compiler](http://lore.kernel.org/linux-riscv/20230521-nolibc-automatic-stack-protector-v1-0-dad6c80c51c1@weissschuh.net/)** + +> As suggested by Willy it is possible to detect the availability of +> stackprotector via preprocessor defines. +> Make use of that to simplify the code and interface of nolibc. +> + +**[v1: RISC-V: KVM: Redirect AMO load/store misaligned traps to guest](http://lore.kernel.org/linux-riscv/20230520150116.7451-1-waylingII@gmail.com/)** + +> The M-mode redirects an unhandled misaligned trap back +> to S-mode when not delegating it to VS-mode(hedeleg). +> However, KVM running in HS-mode terminates the VS-mode +> software when back from M-mode. +> The KVM should redirect the trap back to VS-mode, and +> let VS-mode trap handler decide the next step. +> + +#### 进程调度 + +**[v1: sched/psi: make psi_cgroups_enabled static](http://lore.kernel.org/lkml/20230525103428.49712-1-linmiaohe@huawei.com/)** + +> The static key psi_cgroups_enabled is only used inside file psi.c. +> Make it static. +> + +**[v1: sched/fair: Don't balance task to its current running CPU](http://lore.kernel.org/lkml/20230524072018.62204-1-yangyicong@huawei.com/)** + +> Further investigation shows that the warning is superfluous, the migration +> disabled task is just going to be migrated to its current running CPU. +> This is because that on load balance if the dst_cpu is not allowed by the +> task, we'll re-select a new_dst_cpu as a candidate. If no task can be +> balanced to dst_cpu we'll try to balance the task to the new_dst_cpu +> instead. In this case when the migration disabled task is not on CPU it +> only allows to run on its current CPU, load balance will select its +> current CPU as new_dst_cpu and later triggers the the warning above. +> + +**[v1: sched/deadline: simplify dl_bw_cpus() using cpumask_weight_and()](http://lore.kernel.org/lkml/20230522115605.1238227-1-linmiaohe@huawei.com/)** + +> cpumask_weight_and() can be used to count of bits both in rd->span and +> cpu_active_mask. No functional change intended. +> + +#### 内存管理 + +**[v1: Do not print page type when the page has no type](http://lore.kernel.org/linux-mm/ZHI0YKzZADjr1nyq@casper.infradead.org/)** + +> It is confusing and unnecessary to print the page type when the +> page has no type. +> + +**[v4: block: Make old dio use iov_iter_extract_pages() and page pinning](http://lore.kernel.org/linux-mm/20230526214142.958751-1-dhowells@redhat.com/)** + +> Here are three patches that go on top of the similar patches for bio +> structs now in the block tree that make the old block direct-IO code use +> iov_iter_extract_pages() and page pinning. +> + +**[v1: tmpfs.5: extend with new noswap documentation](http://lore.kernel.org/linux-mm/20230526210703.934922-1-mcgrof@kernel.org/)** + +> Linux commit 2c6efe9cf2d7 ("shmem: add support to ignore swap") +> merged as of v6.4 added support to disable swap for tmpfs mounts. +> +> This extends the man page to document that. +> + +**[v3: mm: zswap: shrink until can accept](http://lore.kernel.org/linux-mm/20230526183227.793977-1-cerasuolodomenico@gmail.com/)** + +> This update addresses an issue with the zswap reclaim mechanism, which +> hinders the efficient offloading of cold pages to disk, thereby +> compromising the preservation of the LRU order and consequently +> diminishing, if not inverting, its performance benefits. +> + +**[v1: net-next: crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES)](http://lore.kernel.org/linux-mm/20230526143104.882842-1-dhowells@redhat.com/)** + +> Here's the fourth tranche of patches towards providing a MSG_SPLICE_PAGES +> internal sendmsg flag that is intended to replace the ->sendpage() op with +> calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol +> that it should splice the pages supplied if it can. +> + +**[v2: -next: memblock: unify memblock dump and debugfs show](http://lore.kernel.org/linux-mm/20230526120505.123693-1-wangkefeng.wang@huawei.com/)** + +> There are two interfaces to show the memblock information, memblock_dump_all() +> and /sys/kernel/debug/memblock/, but the content is displayed separately, +> let's unify them in case of more different changes over time. +> + +**[v2: add support for blocksize > PAGE_SIZE](http://lore.kernel.org/linux-mm/20230526075552.363524-1-mcgrof@kernel.org/)** + +> This is an initial attempt to add support for block size > PAGE_SIZE for tmpfs. +> Why would you want this? It helps us experiment with higher order folio uses +> with fs APIS and helps us test out corner cases which would likely need +> to be accounted for sooner or later if and when filesystems enable support +> for this. Better review early and burn early than continue on in the wrong +> direction so looking for early feedback. +> + +**[v2: block: simplify with PAGE_SECTORS_SHIFT](http://lore.kernel.org/linux-mm/20230526073336.344543-1-mcgrof@kernel.org/)** + +> A bit of block drivers have their own incantations with +> PAGE_SHIFT - SECTOR_SHIFT. Just simplfy and use PAGE_SECTORS_SHIFT +> all over. +> +> Based on linux-next next-20230525. +> + +**[v2: x86/mce: set MCE_IN_KERNEL_COPYIN for all MC-Safe Copy](http://lore.kernel.org/linux-mm/20230526063242.133656-1-wangkefeng.wang@huawei.com/)** + +> Both EX_TYPE_FAULT_MCE_SAFE and EX_TYPE_DEFAULT_MCE_SAFE exception +> fixup types are used to identify fixups which allow in kernel #MC +> recovery, that is the Machine Check Safe Copy. +> + +**[v4: mm, compaction: Skip all non-migratable pages during scan](http://lore.kernel.org/linux-mm/20230525191507.160076-1-khalid.aziz@oracle.com/)** + +> Pages pinned in memory through extra refcounts can not be migrated. +> Currently as isolate_migratepages_block() scans pages for +> compaction, it skips any pinned anonymous pages. All non-migratable +> pages should be skipped and not just the anonymous pinned pages. +> + +**[v16: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-mm/20230525085517.281529-1-usama.anjum@collabora.com/)** + +> This syscall is used in Windows applications and games etc. This syscall is +> being emulated in pretty slow manner in userspace. Our purpose is to +> enhance the kernel such that we translate it efficiently in a better way. +> Currently some out of tree hack patches are being used to efficiently +> emulate it in some kernels. We intend to replace those with these patches. +> So the whole gaming on Linux can effectively get benefit from this. It +> means there would be tons of users of this code. +> + +**[v1: zonefs: Call zonefs_io_error() on any error from filemap_splice_read()](http://lore.kernel.org/linux-mm/3788353.1685003937@warthog.procyon.org.uk/)** + +> Call zonefs_io_error() after getting any error from filemap_splice_read() +> in zonefs_file_splice_read(), including non-fatal errors such as ENOMEM, +> EINTR and EAGAIN. +> + +**[v1: mm/memcontrol: export memcg.swap watermark via sysfs for v2 memcg](http://lore.kernel.org/linux-mm/20230524181734.125696-1-lars@pixar.com/)** + +> This patch is similar to commit 8e20d4b33266 ("mm/memcontrol: export +> memcg->watermark via sysfs for v2 memcg"), but exports the swap counter's +> watermark. +> + +**[v5: mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8](http://lore.kernel.org/linux-mm/20230524171904.3967031-1-catalin.marinas@arm.com/)** + +> Another version of the series reducing the kmalloc() minimum alignment +> on arm64 to 8 (from 128). Other architectures can easily opt in by +> defining ARCH_KMALLOC_MINALIGN as 8 and selecting +> DMA_BOUNCE_UNALIGNED_KMALLOC. +> + +**[v1: net-next: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 3](http://lore.kernel.org/linux-mm/20230524153311.3625329-1-dhowells@redhat.com/)** + +> Here's the third tranche of patches towards providing a MSG_SPLICE_PAGES +> internal sendmsg flag that is intended to replace the ->sendpage() op with +> calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol +> that it should splice the pages supplied if it can and copy them if not. +> + +**[v3: Optimize mremap during mutual alignment within PMD](http://lore.kernel.org/linux-mm/20230524153239.3036507-1-joel@joelfernandes.org/)** + +> The main changes are: +> 1. Care to be taken to move purely within a VMA, in other words this check +> in call_align_down(): +> if (vma->vm_start <= addr_masked) +> return false; +> +> As an example of why this is needed: +> Consider the following range which is 2MB aligned and is +> a part of a larger 10MB range which is not shown. Each +> character is 256KB below making the source and destination +> 2MB each. The lower case letters are moved (s to d) and the +> upper case letters are not moved. +> + +**[v1: mm/slab: add new flag SLAB_NO_MERGE to avoid merging per slab](http://lore.kernel.org/linux-mm/20230524101748.30714-1-dsterba@suse.com/)** + +> Add a flag that allows to disable merging per slab. This can be used for +> more fine grained control over the caches or for debugging builds where +> separate slabs can verify that no objects leak. +> +> The slab_nomerge boot option is too coarse and would need to be enabled +> on all testing hosts. There are some other ways how to disable merging, +> e.g. a slab constructor but this disables poisoning besides that it adds +> additional overhead. Other flags are internal and may have other +> semantics. +> + +**[v1: mm: deduct the number of pages reclaimed by madvise from workingset](http://lore.kernel.org/linux-mm/1684919574-28368-1-git-send-email-zhaoyang.huang@unisoc.com/)** + +> The pages reclaimed by madvise_pageout are made of inactive and dropped from LRU +> forcefully, which lead to the coming up refault pages possess a large refault +> distance than it should be. These could affect the accuracy of thrashing when +> madvise_pageout is used as a common way of memory reclaiming as ANDROID does now. +> + +**[v4: net-next/mm: page_pool: new approach for leak detection and shutdown phase](http://lore.kernel.org/linux-mm/168485351546.2849279.13771638045665633339.stgit@firesoul/)** + +> Patchset change summary: +> - Remove PP workqueue and inflight warnings, instead rely on inflight +> pages to trigger cleanup +> - Moves leak detection to the MM-layer page allocator when combined +> with CONFIG_DEBUG_VM. +> + +**[v1: mm/slab: rename CONFIG_SLAB to CONFIG_SLAB_DEPRECATED](http://lore.kernel.org/linux-mm/20230523091139.21449-1-vbabka@suse.cz/)** + +> As discussed at LSF/MM [1] [2] and with no objections raised there, +> deprecate the SLAB allocator. Rename the user-visible option so that +> users with CONFIG_SLAB=y get a new prompt with explanation during make +> oldconfig, while make olddefconfig will just switch to SLUB. +> + +#### 文件系统 + +**[v1: NFSD: recall write delegation on GETATTR conflict](http://lore.kernel.org/linux-fsdevel/1685122722-18287-1-git-send-email-dai.ngo@oracle.com/)** + +> This patch series adds the recall of write delegation when there is +> conflict with a GETATTR and a counter in /proc/net/rpc/nfsd to keep +> count of this recall. +> + +**[v1: init: Add support for rootwait timeout parameter](http://lore.kernel.org/linux-fsdevel/20230526130716.2932507-1-loic.poulain@linaro.org/)** + +> Add an optional timeout arg to 'rootwait' as the maximum time in +> seconds to wait for the root device to show up before attempting +> forced mount of the root filesystem. +> +> This can be helpful to force boot failure and restart in case the +> root device does not show up in time, allowing the bootloader to +> take any appropriate measures (e.g. recovery, A/B switch, retry...). +> + +**[v1: block layer patches for bcachefs](http://lore.kernel.org/linux-fsdevel/20230525214822.2725616-1-kent.overstreet@linux.dev/)** + +> Jens, here's the full series of block layer patches needed for bcachefs: +> +> Some of these (added exports, zero_fill_bio_iter?) can probably go with +> the bcachefs pull and I'm just including here for completeness. The main +> ones are the bio_iter patches, and the __invalidate_super() patch. +> + +**[v2: Add support for Vendor Defined Error Types in Einj Module](http://lore.kernel.org/linux-fsdevel/20230525204422.4754-1-Avadhut.Naik@amd.com/)** + +> This patchset adds support for Vendor Defined Error types in the einj +> module by exporting a binary blob file in module's debugfs directory. +> Userspace tools can write OEM Defined Structures into the blob file as +> part of injecting Vendor defined errors. +> + +**[v1: multiblock allocator improvements](http://lore.kernel.org/linux-fsdevel/cover.1685009579.git.ojaswin@linux.ibm.com/)** + +> So this patch was intended to remove a dead if-condition but it was not +> actually dead code and removing it was causing a performance regression. +> Unfortunately I somehow missed that when I was reviewing his patchset +> and it already went in so I had to revert the commit. I've added details +> of the regression and root cause in the revert commit. Also attaching +> the performance numbers I observer: +> + +**[v4: bpf-next: Add O_PATH-based BPF_OBJ_PIN and BPF_OBJ_GET support](http://lore.kernel.org/linux-fsdevel/20230523170013.728457-1-andrii@kernel.org/)** + +> This feature is inspired as a result of recent conversations during +> LSF/MM/BPF 2023 conference about shortcomings of being able to perform BPF +> objects pinning only using lookup-based paths. +> + +**[v1: fs: use UB-safe check for signed addition overflow in remap_verify_area](http://lore.kernel.org/linux-fsdevel/20230523162628.17071-1-dsterba@suse.com/)** + +> As loff_t is a signed type, we should use the safe overflow checks +> instead of relying on compiler implementation. +> +> The bogus values are intentional and the test is supposed to verify the +> boundary conditions. +> + +**[v3: arch: Make virt_to_pfn into a static inline](http://lore.kernel.org/linux-fsdevel/20230503-virt-to-pfn-v6-4-rc1-v3-0-a16c19c03583@linaro.org/)** + +> This is an attempt to harden the typing on virt_to_pfn() +> and pfn_to_virt(). +> +> Making virt_to_pfn() a static inline taking a strongly typed +> (const void *) makes the contract of a passing a pointer of that +> type to the function explicit and exposes any misuse of the +> macro virt_to_pfn() acting polymorphic and accepting many types +> such as (void *), (unitptr_t) or (unsigned long) as arguments +> without warnings. +> + +**[v21: block: Use page pinning](http://lore.kernel.org/linux-fsdevel/20230522205744.2825689-1-dhowells@redhat.com/)** + +> This patchset rolls page-pinning out to the bio struct and the block layer, +> using iov_iter_extract_pages() to get pages and noting with BIO_PAGE_PINNED +> if the data pages attached to a bio are pinned. If the data pages come +> from a non-user-backed iterator, then the pages are left unpinned and +> unref'd, relying on whoever set up the I/O to do the retaining. +> + +#### 网络设备 + +**[v1: wifi: rsi: Do not configure WoWlan in shutdown hook if not enabled](http://lore.kernel.org/netdev/20230527222833.273741-1-marex@denx.de/)** + +> In case WoWlan was never configured during the operation of the system, +> the hw->wiphy->wowlan_config will be NULL. rsi_config_wowlan() checks +> whether wowlan_config is non-NULL and if it is not, then WARNs about it. +> The warning is valid, as during normal operation the rsi_config_wowlan() +> should only ever be called with non-NULL wowlan_config. In shutdown this +> rsi_config_wowlan() should only ever be called if WoWlan was configured +> before by the user. +> + +**[v1: net: ipa: Use the correct value for IPA_STATUS_SIZE](http://lore.kernel.org/netdev/7ae8af63b1254ab51d45c870e7942f0e3dc15b1e.camel@web.de/)** + +> commit b8dc7d0eea5a7709bb534f1b3ca70d2d7de0b42c introduced +> IPA_STATUS_SIZE as a replacement for the size of the removed struct +> ipa_status. sizeof(struct ipa_status) was sizeof(__le32[8]), use this +> as IPA_STATUS_SIZE. +> + +**[v1: net-next: liquidio: Use vzalloc()](http://lore.kernel.org/netdev/93b010824d9d92376e8d49b9eb396a0fa0c0ac80.1685216322.git.christophe.jaillet@wanadoo.fr/)** + +> Use vzalloc() instead of hand writing it with vmalloc()+memset(). +> This is less verbose. +> + +**[v2: net-next: net: dsa: mv88e6xxx: implement USXGMII mode for mv88e6393x](http://lore.kernel.org/netdev/20230527172024.9154-1-michal.smulski@ooma.com/)** + +> Enable USXGMII mode for mv88e6393x chips. Tested on Marvell 88E6191X. +> + +**[v2: net-next: netlink: specs: add ynl spec for ovs_flow](http://lore.kernel.org/netdev/20230527133107.68161-1-donald.hunter@gmail.com/)** + +> Add a ynl specification for ovs_flow. The spec is sufficient to dump ovs +> flows but some attrs have been left as binary blobs because ynl doesn't +> support C arrays in struct definitions yet. +> + +**[v1: net-next: net: phy: smsc: add WoL support to LAN8740/LAN8742 PHYs.](http://lore.kernel.org/netdev/1685151574-2752-1-git-send-email-Tristram.Ha@microchip.com/)** + +> Microchip LAN8740/LAN8742 PHYs support basic unicast, broadcast, and +> Magic Packet WoL. They have one pattern filter matching up to 128 bytes +> of frame data, which can be used to implement ARP or multicast WoL. +> + +**[v1: net: netlink: specs: correct types of legacy arrays](http://lore.kernel.org/netdev/20230526220653.65538-1-kuba@kernel.org/)** + +> ethtool has some attrs which dump multiple scalars into +> an attribute. The spec currently expects one attr per entry. +> + +**[v4: iproute2: vxlan: option printing](http://lore.kernel.org/netdev/20230526174141.5972-1-stephen@networkplumber.org/)** + +> This patchset makes printing of vxlan details more consistent. +> It also adds extra verbose output. The boolean options +> are now brinted after all the non-boolean options. +> + +**[v1: net: tcp: deny tcp_disconnect() when threads are waiting](http://lore.kernel.org/netdev/20230526163458.2880232-1-edumazet@google.com/)** + +> Historically connect(AF_UNSPEC) has been abused by syzkaller +> and other fuzzers to trigger various bugs. +> +> A recent one triggers a divide-by-zero [1], and Paolo Abeni +> was able to diagnose the issue. +> + +**[v1: net: af_packet: do not use READ_ONCE() in packet_bind()](http://lore.kernel.org/netdev/20230526162320.5816-1-kuniyu@amazon.com/)** + +> Date: Fri, 26 May 2023 15:43:42 +0000 +> > A recent patch added READ_ONCE() in packet_bind() and packet_bind_spkt() +> > +> > This is better handled by reading pkt_sk(sk)->num later +> > in packet_do_bind() while appropriate lock is held. +> > +> > READ_ONCE() in writers are often an evidence of something being wrong. +> > +> > Fixes: 822b5a1c17df ("af_packet: Fix data-races of pkt_sk(sk)->num.") +> > +> + +**[v2: iproute2: Add ability to specify eBPF map pin path](http://lore.kernel.org/netdev/20230526150921.338906-1-mtottenh@akamai.com/)** + +> We have a use case where we have several different applications composed of +> sets of eBPF programs (programs that may be attached at the TC/XDP layers), +> that need to share maps and not conflict with each other. +> + +**[v1: net: usb: qmi_wwan: Set DTR quirk for BroadMobi BM818](http://lore.kernel.org/netdev/20230526-bm818-dtr-v1-1-64bbfa6ba8af@puri.sm/)** + +> BM818 is based on Qualcomm MDM9607 chipset. +> + +**[v1: net-next: devlink: Spelling corrections](http://lore.kernel.org/netdev/20230526-devlink-spelling-v1-1-9a3e36cdebc8@kernel.org/)** + +> Make some minor spelling corrections in comments. +> +> Found by inspection. +> + +**[v1: bpf: netfilter: add BPF_NETFILTER bpf_attach_type](http://lore.kernel.org/netdev/20230526121124.3915-1-fw@strlen.de/)** + +> Andrii Nakryiko writes: +> +> And we currently don't have an attach type for NETLINK BPF link. +> Thankfully it's not too late to add it. I see that link_create() in +> kernel/bpf/syscall.c just bypasses attach_type check. We shouldn't +> have done that. Instead we need to add BPF_NETLINK attach type to enum +> bpf_attach_type. And wire all that properly throughout the kernel and +> libbpf itself. +> + +**[v1: net-next: net: dpaa2-mac: use correct interface to free mdiodev](http://lore.kernel.org/netdev/E1q2VsB-008QlZ-El@rmk-PC.armlinux.org.uk/)** + +> Rather than using put_device(&mdiodev->dev), use the proper interface +> provided to dispose of the mdiodev - that being mdio_device_free(). +> + +**[v1: net: rxrpc: Truncate UTS_RELEASE for rxrpc version](http://lore.kernel.org/netdev/654974.1685100894@warthog.procyon.org.uk/)** + +> UTS_RELEASE has a maximum length of 64 which can cause rxrpc_version to +> exceed the 65 byte message limit. +> +> Per the rx spec[1]: "If a server receives a packet with a type value of 13, +> and the client-initiated flag set, it should respond with a 65-byte payload +> containing a string that identifies the version of AFS software it is +> running." +> + +**[v1: net-next: net: pcs: add helpers to xpcs and lynx to manage mdiodev](http://lore.kernel.org/netdev/ZHCGZ8IgAAwr8bla@shell.armlinux.org.uk/)** + +> This morning, we have had two instances where the destruction of the +> MDIO device associated with XPCS and Lynx has been wrong. Rather than +> allowing this pattern of errors to continue, let's make it easier for +> driver authors to get this right by adding a helper. +> + +**[v2: net/sched: act_pedit: Parse L3 Header for L4 offset](http://lore.kernel.org/netdev/20230526095810.280474-1-mtottenh@akamai.com/)** + +> Instead of relying on skb->transport_header being set correctly, opt +> instead to parse the L3 header length out of the L3 headers for both +> IPv4/IPv6 when the Extended Layer Op for tcp/udp is used. This fixes a +> bug if GRO is disabled, when GRO is disabled skb->transport_header is +> set by __netif_receive_skb_core() to point to the L3 header, it's later +> fixed by the upper protocol layers, but act_pedit will receive the SKB +> before the fixups are completed. +> + +**[v1: net-next: support non-frag page for page_pool_alloc_frag()](http://lore.kernel.org/netdev/20230526092616.40355-1-linyunsheng@huawei.com/)** + +> In [1], there is a use case to use frag support in page +> pool to reduce memory usage, and it may request different +> frag size depending on the head/tail room space for +> xdp_frame/shinfo and mtu/packet size. When the requested +> frag size is large enough that a single page can not be +> split into more than one frag, using frag support only +> have performance penalty because of the extra frag count +> handling for frag support. +> + +**[v3: Add motorcomm phy pad-driver-strength-cfg support](http://lore.kernel.org/netdev/20230526090502.29835-1-samin.guo@starfivetech.com/)** + +> The motorcomm phy (YT8531) supports the ability to adjust the drive +> strength of the rx_clk/rx_data, and the default strength may not be +> suitable for all boards. So add configurable options to better match +> the boards.(e.g. StarFive VisionFive 2) +> +> The first patch adds a description of dt-bingding, and the second patch adds +> YT8531's parsing and settings for pad-driver-strength-cfg. +> + +**[v7: net-next: Wangxun netdev features support](http://lore.kernel.org/netdev/20230526090230.71487-1-mengyuanlou@net-swift.com/)** + +> Implement tx_csum and rx_csum to support hardware checksum offload. +> Implement ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid. +> Implement ndo_set_features. +> Enable macros in netdev features which wangxun can support. +> + +**[v3: hv_netvsc: Allocate rx indirection table size dynamically](http://lore.kernel.org/netdev/1685080949-18316-1-git-send-email-shradhagupta@linux.microsoft.com/)** + +> Allocate the size of rx indirection table dynamically in netvsc +> from the value of size provided by OID_GEN_RECEIVE_SCALE_CAPABILITIES +> query instead of using a constant value of ITAB_NUM. +> + +**[v1: Truncate UTS_RELEASE for rxrpc version](http://lore.kernel.org/netdev/20230525211346.718562-1-Kenny.Ho@amd.com/)** + +> UTS_RELEASE has maximum length of 64 which can cause rxrpc_version to +> exceed the 65 byte message limit. +> +> Per https://web.mit.edu/kolya/afs/rx/rx-spec +> "If a server receives a packet with a type value of 13, and the +> client-initiated flag set, it should respond with a 65-byte payload +> containing a string that identifies the version of AFS software it is +> running." +> + +#### 安全增强 + +**[v2: checkpatch: Check for strcpy and strncpy too](http://lore.kernel.org/linux-hardening/20230526172508.gonna.793-kees@kernel.org/)** + +> Warn about strcpy(), strncpy(), and strlcpy(). Suggest strscpy() and +> include pointers to the open KSPP issues for each, which has further +> details and replacement procedures. +> + +**[v2: leds: as3645a: Replace strlcpy with strscpy](http://lore.kernel.org/linux-hardening/20230524144824.2360607-1-azeemshaikh38@gmail.com/)** + +> Part of a tree-wide effort to remove deprecated strlcpy()[1] and replace +> it with strscpy()[2]. No return values were used, so direct replacement +> is safe. +> + +**[v1: next: nfsd: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/ZG1d51tGG4c97qqb@work/)** + +> One-element arrays are deprecated, and we are replacing them with +> flexible array members instead. So, replace a one-element array +> with a flexible-arrayº member in struct vbi_anc_data and refactor +> the rest of the code, accordingly. +> + +**[v1: next: media: pci: cx18-av-vbi: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/ZG1YVji9thTLWeRm@work/)** + +> One-element arrays are deprecated, and we are replacing them with flexible +> array members instead. So, replace one-element arrays with flexible-array +> members in struct vbi_anc_data. +> + +**[v2: next: scsi: lpfc: Use struct_size() helper](http://lore.kernel.org/linux-hardening/ZG0fDdY%2FPPQ%2Fijlt@work/)** + +> Prefer struct_size() over open-coded versions of idiom: +> +> sizeof(struct-with-flex-array) + sizeof(typeof-flex-array-elements) * count +> + +**[v2: fscrypt: Replace 1-element array with flexible array](http://lore.kernel.org/linux-hardening/20230523165458.gonna.580-kees@kernel.org/)** + +> 1-element arrays are deprecated and are being replaced with C99 +> flexible arrays[1]. +> +> As sizes were being calculated with the extra byte intentionally, +> propagate the difference so there is no change in binary output. +> +> [1] https://github.com/KSPP/linux/issues/79 +> + +**[v1: next: vfio/ccw: Use struct_size() helper](http://lore.kernel.org/linux-hardening/f657276073630e806e69726a40ad1cc85101448a.1684805398.git.gustavoars@kernel.org/)** + +> Prefer struct_size() over open-coded versions. +> + +**[v1: next: vfio/ccw: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/3c10549ebe1564eade68a2515bde233527376971.1684805398.git.gustavoars@kernel.org/)** + +> One-element arrays are deprecated, and we are replacing them with flexible +> array members instead. So, replace one-element array with flexible-array +> member in struct vfio_ccw_parent and refactor the the rest of the code +> accordingly. +> + +**[v1: lkdtm/bugs: Switch from 1-element array to flexible array](http://lore.kernel.org/linux-hardening/20230522212949.never.283-kees@kernel.org/)** + +> The testing for ARRAY_BOUNDS just wants an uninstrumented array, +> and the proper flexible array definition is fine for that. +> + +**[v2: md/raid5: Convert stripe_head's "dev" to flexible array member](http://lore.kernel.org/linux-hardening/20230522212114.gonna.589-kees@kernel.org/)** + +> Replace old-style 1-element array of "dev" in struct stripe_head with +> modern C99 flexible array. In the future, we can additionally annotate +> it with the run-time size, found in the "disks" member. +> + +**[v1: overflow: Add struct_size_t() helper](http://lore.kernel.org/linux-hardening/20230522211810.never.421-kees@kernel.org/)** + +> While struct_size() is normally used in situations where the structure +> type already has a pointer instance, there are places where no variable +> is available. In the past, this has been worked around by using a typed +> NULL first argument, but this is a bit ugly. Add a helper to do this, +> and replace the handful of instances of the code pattern with it. +> + +#### 异步 IO + +**[v2: io_uring: unlock sqd->lock before sq thread release CPU](http://lore.kernel.org/io-uring/20230525082626.577862-1-wenwen.chen@samsung.com/)** + +> The sq thread actively releases CPU resources by calling the +> cond_resched() and schedule() interfaces when it is idle. Therefore, +> more resources are available for other threads to run. +> + +#### Rust For Linux + +**[v2: scripts: read cfgs from Makefile for rust-analyzer](http://lore.kernel.org/rust-for-linux/20230520231701.46008-1-yakoyoku@gmail.com/)** + +> Both `core` and `alloc` had their `cfgs` missing in `rust-project.json`, +> to remedy this `generate_rust_analyzer.py` scans the Makefile from +> inside the `rust` directory for them to be added to a dictionary that +> each key corresponds to a crate and each value, to an array of `cfgs`. +> + +#### BPF + +**[v1: bpf-next: bpf: replace open code with for allocated object check](http://lore.kernel.org/bpf/20230527122706.59315-1-danieltimlee@gmail.com/)** + +> From commit 282de143ead9 ("bpf: Introduce allocated objects support"), +> With this allocated object with BPF program, (PTR_TO_BTF_ID | MEM_ALLOC) +> has been a way of indicating to check the type is the allocated object. +> + +**[v1: bpf-next: bpf, vmtest: Build test_progs and friends as statically linked](http://lore.kernel.org/bpf/05b5dd79465be41ff8cf8b56b694118a0aa7ae12.1685140942.git.daniel@iogearbox.net/)** + +> With the specified TRUNNER_LDFLAGS out of vmtest to force static linking +> runners like test_progs/test_maps/etc work just fine. +> + +**[v6: RESEND: libbpf: kprobe.multi: Filter with available_filter_functions](http://lore.kernel.org/bpf/20230526155026.1419390-1-liu.yun@linux.dev/)** + +> When using regular expression matching with "kprobe multi", it scans all +> the functions under "/proc/kallsyms" that can be matched. However, not all +> of them can be traced by kprobe.multi. If any one of the functions fails +> to be traced, it will result in the failure of all functions. The best +> approach is to filter out the functions that cannot be traced to ensure +> proper tracking of the functions. +> + +**[v5: libbpf: kprobe.multi: Filter with available_filter_functions](http://lore.kernel.org/bpf/20230526122053.1373871-1-liu.yun@linux.dev/)** + +> When using regular expression matching with "kprobe multi", it scans all +> the functions under "/proc/kallsyms" that can be matched. However, not all +> of them can be traced by kprobe.multi. If any one of the functions fails +> to be traced, it will result in the failure of all functions. The best +> approach is to filter out the functions that cannot be traced to ensure +> proper tracking of the functions. +> + +**[v1: Type aware module allocator](http://lore.kernel.org/bpf/20230526051529.3387103-1-song@kernel.org/)** + +> This set implements the second part of module type aware allocator +> (module_alloc_type), which was discussed in [1]. This part contains the +> interface of the new allocator, as well as changes in x86 code to use the +> new allocator (modules, BPF, ftrace, kprobe). +> + +**[v1: dwarves: pahole: avoid adding same struct structure to two rb trees](http://lore.kernel.org/bpf/20230525235949.2978377-1-eddyz87@gmail.com/)** + +> This commit modifies resort_classes() to re-use 'structures__tree' and +> to reset 'rb_node' fields before adding structure instances to the +> tree for a second time. +> + +**[v1: bpf-next: selftests/bpf: Check whether to run selftest](http://lore.kernel.org/bpf/20230525232248.640465-1-deso@posteo.net/)** + +> The sockopt test invokes test__start_subtest and then unconditionally +> asserts the success. That means that even if deny-listed, any test will +> still run and potentially fail. +> Evaluate the return value of test__start_subtest() to achieve the +> desired behavior, as other tests do. +> + +**[v1: bpf: utilize table ID in bpf_fib_lookup helper](http://lore.kernel.org/bpf/20230505-bpf-add-tbid-fib-lookup-v1-0-fd99f7162e76@gmail.com/)** + +> This patchset adds the ability to specify a table ID to the +> `bpf_fib_lookup` BPF helper. +> +> A new `tbid` field is added to `struct fib_bpf_lookup`. +> When the `fib_bpf_lookup` helper is called with the +> `BPF_FIB_LOOKUP_DIRECT` flag and the `tbid` is set to an integer greater +> then 0, the `tbid` field will be interpreted as the table ID to use for +> the fib lookup. +> + +**[v1: bpf-next: libbpf: add netfilter link attach helper](http://lore.kernel.org/bpf/20230525110100.8212-1-fw@strlen.de/)** + +> When initial netfilter bpf program type support got added one +> suggestion was to extend libbpf with a helper to ease attachment +> of nf programs to the hook locations. +> +> Add such a helper and a demo test case that attaches a dummy +> program to various combinations. +> + +**[v1: bpf-next: bpf: Export rx queue info for reuseport ebpf prog](http://lore.kernel.org/bpf/20230525033757.47483-1-jdamato@fastly.com/)** + +> BPF_PROG_TYPE_SK_REUSEPORT / sk_reuseport ebpf programs do not have +> access to the queue_mapping or napi_id of the incoming skb. Having +> this information can help ebpf progs determine which listen socket to +> select. +> + +**[v1: bpf-next: libbpf: change var type in datasec resize func](http://lore.kernel.org/bpf/20230525001323.8554-1-inwardvessel@gmail.com/)** + +> This changes a local variable type that stores a new array id to match +> the return type of btf__add_array(). +> + +**[v1: bpf-next: Relax checks for unprivileged bpf() commands](http://lore.kernel.org/bpf/20230524225421.1587859-1-andrii@kernel.org/)** + +> During last relaxation of bpf syscall's capabilities checks ([0]), the model +> of FD-based ownership was established: if process through whatever means got +> FD for some BPF object (map, prog, etc), it should be able to perform +> operations on this object without extra CAP_SYS_ADMIN or CAP_BPF capabilities. +> + +**[v1: bpf-next: Revamp bpf_attr and make it easier to evolve](http://lore.kernel.org/bpf/20230524210243.605832-1-andrii@kernel.org/)** + +> RFC patch set revamping anonymous substructs of union bpf_attr, which would +> allow nicer and more coherent evolution of bpf() syscall arguments, especially +> for commands like BPF_MAP_CREATE and BPF_PROG_LOAD. See patch #1 for +> justification and more details. Patch #2 demonstrates how straightforward it +> is to switch to new-style substricts in kernel code (and keep in mind that +> this is optional until we need some new field for a given command, so we can +> do it completely asynchronously from landing bpf_attr changes themselves). +> Patch #3 shows also similar libbpf changes, except for libbpf single patches +> switches over entire libbpf code base to new-style substructs (except +> skel_internal.h, due to concerns that users might be reliant on outdated +> system-wide linux/bpf.h UAPI header). +> + +**[v3: bpf-next: libbpf: capability for resizing datasec maps](http://lore.kernel.org/bpf/20230524004537.18614-1-inwardvessel@gmail.com/)** + +> Due to the way the datasec maps like bss, data, rodata are memory +> mapped, they cannot be resized with bpf_map__set_value_size() like +> non-datasec maps can. This series offers a way to allow the resizing of +> datasec maps, by having the mapped regions resized as needed and also +> adjusting associated BTF info if possible. +> + +**[v3: dwarves: Support for new btf_type_tag encoding](http://lore.kernel.org/bpf/20230524001825.2688661-1-eddyz87@gmail.com/)** + +> In recent discussion in BPF mailing list ([1], look for Solution #2) +> participants agreed to add a new DWARF representation for +> "btf_type_tag" annotations. +> +> Existing representation is DW_TAG_LLVM_annotation object attached as a +> child to a DW_TAG_pointer_type. It means that "btf_type_tag" +> annotation is attached to a pointee type. +> + +**[v1: libbpf: kprobe.multi: Filter with blacklist and available_filter_functions](http://lore.kernel.org/bpf/20230523132547.94384-1-liu.yun@linux.dev/)** + +> When using regular expression matching with "kprobe multi", it scans all +> the functions under "/proc/kallsyms" that can be matched. However, not all +> of them can be traced by kprobe.multi. If any one of the functions fails +> to be traced, it will result in the failure of all functions. The best +> approach is to filter out the functions that cannot be traced to ensure +> proper tracking of the functions. +> + +**[v1: Bring back vmlinux.h generation](http://lore.kernel.org/bpf/20230522204047.800543-1-irogers@google.com/)** + +> Commit 760ebc45746b ("perf lock contention: Add empty 'struct rq' to +> satisfy libbpf 'runqueue' type verification") inadvertently created a +> declaration of 'struct rq' that conflicted with a generated +> vmlinux.h's: +> +> Fix the issue by moving the declaration to vmlinux.h. So this can't +> happen again, bring back build support for generating vmlinux.h then +> add build tests. +> + +### 周边技术动态 + +#### Qemu + +**[v2: target/riscv: Add RISC-V Virtual IRQs and IRQ filtering support](http://lore.kernel.org/qemu-devel/20230526162308.22892-1-rkanwal@rivosinc.com/)** + +> This series adds M and HS-mode virtual interrupt and IRQ filtering support. +> This allows inserting virtual interrupts from M/HS-mode into S/VS-mode +> using mvien/hvien and mvip/hvip csrs. IRQ filtering is a use case of +> this change, i-e M-mode can stop delegating an interrupt to S-mode and +> instead enable it in MIE and receive those interrupts in M-mode and then +> selectively inject the interrupt using mvien and mvip. +> + +**[v5: hw/riscv/virt: pflash improvements](http://lore.kernel.org/qemu-devel/20230526121006.76388-1-sunilvl@ventanamicro.com/)** + +> This series improves the pflash usage in RISC-V virt machine with solutions to +> below issues. +> +> 1) Currently the first pflash is reserved for ROM/M-mode firmware code. But S-mode +> payload firmware like EDK2 need both pflash devices to have separate code and variable +> store so that OS distros can keep the FW code as read-only. +> + +**[v3: target/riscv: Add support for PC-relative translation](http://lore.kernel.org/qemu-devel/20230526072124.298466-1-liweiwei@iscas.ac.cn/)** + +> This patchset tries to add support for PC-relative translation. +> +> The existence of CF_PCREL can improve performance with the guest +> kernel's address space randomization. Each guest process maps libc.so +> (et al) at a different virtual address, and this allows those +> translations to be shared. +> + +**[v3: Add RISC-V KVM AIA Support](http://lore.kernel.org/qemu-devel/20230526062509.31682-1-yongxuan.wang@sifive.com/)** + +> This series adds support for KVM AIA in RISC-V architecture. +> +> In order to test these patches, we require Linux with KVM AIA support which can +> be found in the qemu_kvm_aia branch at https://github.com/yong-xuan/linux.git +> This kernel branch is based on the riscv_aia_v1 branch available at +> https://github.com/avpatel/linux.git, and it also includes two additional +> patches that fix a KVM AIA bug and reply to the query of KVM_CAP_IRQCHIP. +> + +**[v3: hw/riscv: virt: Assume M-mode FW in pflash0 only when "-bios none"](http://lore.kernel.org/qemu-devel/20230523102805.100160-1-sunilvl@ventanamicro.com/)** + +> Currently, virt machine supports two pflash instances each with +> 32MB size. However, the first pflash is always assumed to +> contain M-mode firmware and reset vector is set to this if +> enabled. Hence, for S-mode payloads like EDK2, only one pflash +> instance is available for use. This means both code and NV variables +> of EDK2 will need to use the same pflash. +> + +**[v3: target/riscv: Add Smrnmi support.](http://lore.kernel.org/qemu-devel/20230522131123.3498539-1-tommy.wu@sifive.com/)** + +> This patchset added support for Smrnmi Extension in RISC-V. +> +> RNMI also has higher priority than any other interrupts or exceptions +> and cannot be disabled by software. +> +> RNMI may be used to route to other devices such as Bus Error Unit or +> Watchdog Timer in the future. +> + +#### U-Boot + +**[v1: riscv: Initial support for Lichee PI 4A board](http://lore.kernel.org/u-boot/20230526124107.894-1-dlan@gentoo.org/)** + +> Sipeed's Lichee PI 4A board is based on T-HEAD's TH1520 SoC which consists of +> quad core XuanTie C910 CPU, plus one C906 CPU and one E902 CPU. +> +> In this series, the UART, basic device tree, CPU, PLIC are enabled, making it +> capable of running in serial console mode. +> + +**[v4: Add ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/u-boot/20230525093637.31364-1-yanhong.wang@starfivetech.com/)** + +> This series of patches base on the latest branch/master,and +> adds ethernet support for the StarFive JH7110 RISC-V SoC. +> The series includes EEPROM, PHY and MAC drivers. The PHY model is +> YT8531 (from Motorcomm Inc), and the MAC version is dwmac-5.20 +> (from Synopsys DesignWare). +> + +**[v1: arch: riscv: jh7110: Correctly zero L2 LIM](http://lore.kernel.org/u-boot/1684668616-358043-1-git-send-email-ganboing@gmail.com/)** + +> Background information: +> JH7110 SPL runs in L2 LIM (2M in size mapped at 0x8000000). It +> consists of 16 0x20000 sized regions, each one can be used as +> either L2 cache way or SRAM (not both). From top to bottom, there're +> ways 0-15. The way 0 is always enabled, at most 0x1e0000 can be used. +> + +## 20230521:第 46 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v1: tools/nolibc: autodetect stackprotector availability from compiler](http://lore.kernel.org/linux-riscv/20230521-nolibc-automatic-stack-protector-v1-0-dad6c80c51c1@weissschuh.net/)** + +> As suggested by Willy it is possible to detect the availability of +> stackprotector via preprocessor defines. +> Make use of that to simplify the code and interface of nolibc. +> + +**[v1: RISC-V: KVM: Redirect AMO load/store misaligned traps to guest](http://lore.kernel.org/linux-riscv/20230520150116.7451-1-waylingII@gmail.com/)** + +> The M-mode redirects an unhandled misaligned trap back +> to S-mode when not delegating it to VS-mode(hedeleg). +> However, KVM running in HS-mode terminates the VS-mode +> software when back from M-mode. +> The KVM should redirect the trap back to VS-mode, and +> let VS-mode trap handler decide the next step. +> Here is a way to handle misaligned traps in KVM, +> not only directing them to VS-mode or terminate it. +> + +**[v1: perf parse-regs: Refactor arch related functions](http://lore.kernel.org/linux-riscv/20230520025537.1811986-1-leo.yan@linaro.org/)** + +> The register parsing have two levels: one level is under 'arch' folder, +> another level is under 'util' folder. A good design is 'arch' folder +> handles architecture specific operations and provides APIs for upper +> layer, on the other hand, 'util' folder should be general and simply +> calls APIs to talk to arch layer. +> + +**[v1: riscv: hibernation: Replace jalr with jr before suspend_restore_regs](http://lore.kernel.org/linux-riscv/20230519060854.214138-1-suagrfillet@gmail.com/)** + +> No need to link the x1/ra reg via jalr before suspend_restore_regs +> So it's better to replace jalr with jr. +> + +**[v2: Add Sipeed Lichee Pi 4A RISC-V board support](http://lore.kernel.org/linux-riscv/20230518184541.2627-1-jszhang@kernel.org/)** + +> Sipeed's Lichee Pi 4A development board uses Lichee Module 4A core +> module which is powered by T-HEAD's TH1520 SoC. Add minimal device +> tree files for the core module and the development board. +> + +**[v1: riscv: Allow disable vdso support](http://lore.kernel.org/linux-riscv/cover.1684430522.git.falcon@tinylab.org/)** + +> This is part of my tinylinux work for RISC-V, see related patchsets: +> +> * RISC-V: Enable dead code elimination, v3 [1] +> * tools/nolibc: riscv: Fix up compile error for rv32, v1 [2] +> * Add dead syscalls elimination support, RFC [3] +> + +**[v20: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230518161949.11203-1-andy.chiu@sifive.com/)** + +> This patchset is implemented based on vector 1.0 spec to add vector support +> in riscv Linux kernel. There are some assumptions for this implementations. +> + +**[v4: riscv: add Bouffalolab bl808 support](http://lore.kernel.org/linux-riscv/20230518152244.2178-1-jszhang@kernel.org/)** + +> This series adds Bouffalolab uart driver and basic devicetrees for +> Bouffalolab bl808 SoC and Sipeed M1s dock board. +> + +**[v1: riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode](http://lore.kernel.org/linux-riscv/20230518131013.3366406-1-guoren@kernel.org/)** + +> This patch series adds s64ilp32 support to riscv. The term s64ilp32 +> means smode-xlen=64 and -mabi=ilp32 (ints, longs, and pointers are all +> 32-bit), i.e., running 32-bit Linux kernel on pure 64-bit supervisor +> mode. There have been many 64ilp32 abis existing, such as mips-n32 [1], +> arm-aarch64ilp32 [2], and x86-x32 [3], but they are all about userspace. +> Thus, this should be the first time running a 32-bit Linux kernel with +> the 64ilp32 ABI at supervisor mode (If not, correct me). +> + +**[v18: Microchip Soft IP corePWM driver](http://lore.kernel.org/linux-riscv/20230518-reactive-nursing-23b7fe093048@wendy/)** + +> Another version, although a lot smaller of a range-diff than previously! +> All you get this time is the one change requested by Uwe on v17, along +> with a rebase on -rc1. +> + +**[v6: Add JH7110 USB and USB PHY driver support](http://lore.kernel.org/linux-riscv/20230518112750.57924-1-minda.chen@starfivetech.com/)** + +> This patchset adds USB driver and USB PHY for the StarFive JH7110 SoC. +> USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. +> The patch has been tested on the VisionFive 2 board. +> + +**[v6: Add STG/ISP/VOUT clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230518101234.143748-1-xingyu.wu@starfivetech.com/)** + +> This patch serises are base on the basic JH7110 SYSCRG/AONCRG +> drivers and add new partial clock drivers and reset supports +> about System-Top-Group(STG), Image-Signal-Process(ISP) +> and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. These +> clocks and resets could be used by DMA, VIN and Display modules. +> + +**[v1: dt-bindings: riscv: deprecate riscv,isa](http://lore.kernel.org/linux-riscv/20230518-thermos-sanitary-cf3fbc777ea1@wendy/)** + +> When the RISC-V dt-bindings were accepted upstream in Linux, the base +> ISA etc had yet to be ratified. By the ratification of the base ISA, +> incompatible changes had snuck into the specifications - for example the +> Zicsr and Zifencei extensions were spun out of the base ISA. +> + +**[v1: RISC-V KVM in-kernel AIA irqchip](http://lore.kernel.org/linux-riscv/20230517105135.1871868-1-apatel@ventanamicro.com/)** + +> This series adds in-kernel AIA irqchip which only trap-n-emulate IMSIC and +> APLIC MSI-mode for Guest. The APLIC MSI-mode trap-n-emulate is optional so +> KVM user space can emulate APLIC entirely in user space. +> + +**[v3: RISC-V: Enable dead code elimination](http://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/)** + +> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for RISC-V, allowing +> the user to enable dead code elimination. In order for this to work, +> ensure that we keep the alternative table by annotating them with KEEP. +> + +**[v3: perf vendor events riscv: add T-HEAD C9xx JSON file](http://lore.kernel.org/linux-riscv/IA1PR20MB4953B6C4CB711506CF542737BB7E9@IA1PR20MB4953.namprd20.prod.outlook.com/)** + +> These events are the max that c9xx series support. +> Since T-HEAD let manufacturers decide whether events are usable, +> the final support of the perf events is determined by the pmu node +> of the soc dtb. +> + +**[v1: irq_work: consolidate arch_irq_work_raise prototypes](http://lore.kernel.org/linux-riscv/20230516200341.553413-1-arnd@kernel.org/)** + +> The prototype was hidden on x86, which causes a warning: +> +> kernel/irq_work.c:72:13: error: no previous prototype for 'arch_irq_work_raise' [-Werror=missing-prototypes] +> +> Fix this by providing it in only one place that is always visible. +> + +**[v1: perf: add T-HEAD C9xx series cpu support](http://lore.kernel.org/linux-riscv/IA1PR20MB49539201E93DE46A9A2A8E74BB799@IA1PR20MB4953.namprd20.prod.outlook.com/)** + +> The T-HEAD C9xx series cpu is a series of riscv CPU IP. As this IP was +> proposed before the current riscv event standard. It has a non-standard +> events encoding for perf events and unimplemented MARCH and MIMP CSR. +> This patch add these events to support C9xx cpus. +> + +#### 进程调度 + +**[v1: RESEND: sched/nohz: Add HRTICK_BW for using cfs bandwidth with nohz_full](http://lore.kernel.org/lkml/20230518132038.3534728-1-pauld@redhat.com/)** + +> CFS bandwidth limits and NOHZ full don't play well together. Tasks +> can easily run well past their quotas before a remote tick does +> accounting. This leads to long, multi-period stalls before such +> tasks can run again. Use the hrtick mechanism to set a sched +> tick to fire at remaining_runtime in the future if we are on +> a nohz full cpu, if the task has quota and if we are likely to +> disable the tick (nr_running == 1). This allows for bandwidth +> accounting before tasks go too far over quota. +> + +**[v1: sched: core: Simplify cpuset_cpumask_can_shrink()](http://lore.kernel.org/lkml/20230518203416.3323-1-zeming@nfschina.com/)** + +> Remove useless intermediate variable "ret" and its initialization. +> Directly return dl_cpuset_cpumask_can_shrink() result. +> + +**[v1: sched/rt: Print curr when RT throttling activated](http://lore.kernel.org/lkml/20230516122202.954313-1-alex@shruggie.ro/)** + +> We may meet the issue, that one RT thread occupied the cpu by 950ms/1s, +> The RT thread maybe is a business thread or other unknown thread. +> +> Currently, it only outputs the print "sched: RT throttling activated" +> when RT throttling happen. It is hard to know what is the RT thread, +> For further analysis, we need add more prints. +> + +**[v1: sched/fair: Introduce SIS_PAIR to wakeup task on local idle core first](http://lore.kernel.org/lkml/20230516011159.4552-1-yu.c.chen@intel.com/)** + +> The will-it-scale context_switch1 test case exposes the issue. The +> test platform has 2 x 56C/112T and 224 CPUs in total. To evaluate the +> C2C overhead within 1 LLC, will-it-scale was tested with 1 socket/node +> online, so there are 56C/112T CPUs when running will-it-scale. +> + +**[v3: sched: Consider CPU contention in frequency, EAS max util & load-balance busiest CPU selection](http://lore.kernel.org/lkml/20230515115735.296329-1-dietmar.eggemann@arm.com/)** + +> This is the implementation of the idea to factor in CPU runnable_avg +> into the CPU utilization getter functions (so called 'runnable +> boosting') as a way to consider CPU contention for: +> +> (a) CPU frequency +> (b) EAS' max util and +> (c) 'migrate_util' type load-balance busiest CPU selection. +> + +**[v1: sched/fair: Consider asymmetric scheduler groups in load balancer](http://lore.kernel.org/lkml/20230515114601.12737-1-huschle@linux.ibm.com/)** + +> The current load balancer implementation implies that scheduler groups, +> within the same scheduler domain, all host the same number of CPUs. +> +> This appears to be valid for non-s390 architectures. Nevertheless, s390 +> can actually have scheduler groups of unequal size. +> The current scheduler behavior causes some s390 configs to use SMT +> while some cores are still idle, leading to a performance degredation +> under certain levels of workload. +> + +**[GIT PULL: sched/urgent for v6.4-rc2](http://lore.kernel.org/lkml/20230514115312.GDZGDLqDPvR+M8m+1M@fat_crate.local/)** + +> please pull an urgent (oh well :)) sched fix for 6.4. +> +> Thx. +> + +#### 内存管理 + +**[v21: splice: Kill ITER_PIPE](http://lore.kernel.org/linux-mm/20230520000049.2226926-1-dhowells@redhat.com/)** + +> I've split off splice patchset and moved the block patches to a separate +> branch (though they are dependent on this one). +> +> This patchset kills off ITER_PIPE to avoid a race between truncate, +> iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio with +> unpinned/unref'ed pages from an O_DIRECT splice read. This causes memory +> corruption[2]. Instead, we use filemap_splice_read(), which invokes the +> buffered file reading code and splices from the pagecache into the pipe; +> copy_splice_read(), which bulk-allocates a buffer, reads into it and then +> pushes the filled pages into the pipe; or handle it in filesystem-specific +> code. +> + +**[v2: change ->index to PAGE_SIZE for hugetlb pages](http://lore.kernel.org/linux-mm/20230519220142.212051-1-sidhartha.kumar@oracle.com/)** + +> This patchset adds new wrappers for hugetlb code to to interact with the +> page cache. These wrappers calculate a linear page index as this is now +> what the page cache expects for hugetlb pages as well. +> + +**[v2: Optimize mremap during mutual alignment within PMD](http://lore.kernel.org/linux-mm/20230519190934.339332-1-joel@joelfernandes.org/)** + +> Here is v2 of the mremap start address optimization / fix for exec warning. +> +> 2. Fix issue with bogus return value found by Linus if we broke out of the +> above loop for the first PMD itself. +> + +**[v1: mm: compaction: avoid GFP_NOFS ABBA deadlock](http://lore.kernel.org/linux-mm/20230519111359.40475-1-hannes@cmpxchg.org/)** + +> During stress testing with higher-order allocations, a deadlock +> scenario was observed in compaction: One GFP_NOFS allocation was +> sleeping on mm/compaction.c::too_many_isolated(), while all CPUs in +> the system were busy with compactors spinning on buffer locks held by +> the sleeping GFP_NOFS allocation. +> + +**[v4: memblock: Add flags and nid info in memblock debugfs](http://lore.kernel.org/linux-mm/20230519105321.333-1-ssawgyw@gmail.com/)** + +> Currently, the memblock debugfs can display the count of memblock_type and +> the base and end of the reg. However, when memblock_mark_*() or +> memblock_set_node() is executed on some range, the information in the +> existing debugfs cannot make it clear why the address is not consecutive. +> + +**[v1: mm,page_owner: mark page_owner_threshold helpers as static](http://lore.kernel.org/linux-mm/20230519092800.3772196-1-arnd@kernel.org/)** + +> The newly added functions have no prototype: +> +> mm/page_owner.c:748:5: error: no previous prototype for 'page_owner_threshold_get' [-Werror=missing-prototypes] +> mm/page_owner.c:754:5: error: no previous prototype for 'page_owner_threshold_set' [-Werror=missing-prototypes] +> + +**[v1: iov_iter: Add automatic-alloc for ITER_BVEC and use in direct_splice_read()](http://lore.kernel.org/linux-mm/1740264.1684482558@warthog.procyon.org.uk/)** + +> If it's a problem that direct_splice_read() always allocates as much memory as +> is asked for and that will fit into the pipe when less could be allocated in +> the case that, say, an O_DIRECT-read will hit a hole and do a short read or a +> socket will return less than was asked for, something like the attached +> modification to ITER_BVEC could be made. +> + +**[v4: mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8](http://lore.kernel.org/linux-mm/20230518173403.1150549-1-catalin.marinas@arm.com/)** + +> That's the fourth version of the series reducing the kmalloc() minimum +> alignment on arm64 to 8 (from 128). +> +> The first 10 patches decouple ARCH_KMALLOC_MINALIGN from +> ARCH_DMA_MINALIGN and, for arm64, it limits the kmalloc() caches to +> those aligned to the run-time probed cache_line_size(). The advantage on +> arm64 is that we gain the kmalloc-{64,192} caches. +> + +**[v1: mm: page_alloc: set sysctl_lowmem_reserve_ratio storage-class-specifier to static](http://lore.kernel.org/linux-mm/20230518141119.927074-1-trix@redhat.com/)** + +> smatch reports +> mm/page_alloc.c:247:5: warning: symbol +> 'sysctl_lowmem_reserve_ratio' was not declared. Should it be static? +> +> This variable is only used in its defining file, so it should be static +> + +**[v1: mm/page_owner: set page_owner_* storage-class-specifier to static](http://lore.kernel.org/linux-mm/20230518134718.926663-1-trix@redhat.com/)** + +> smatch reports +> mm/page_owner.c:739:30: warning: symbol +> 'page_owner_stack_operations' was not declared. Should it be static? +> mm/page_owner.c:748:5: warning: symbol +> 'page_owner_threshold_get' was not declared. Should it be static? +> mm/page_owner.c:754:5: warning: symbol +> 'page_owner_threshold_set' was not declared. Should it be static? +> + +**[v9: net-next: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1](http://lore.kernel.org/linux-mm/20230518130713.1515729-1-dhowells@redhat.com/)** + +> Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES +> internal sendmsg flag that is intended to replace the ->sendpage() op with +> calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol +> that it should splice the pages supplied if it can and copy them if not. +> + +#### 文件系统 + +**[v1: Create large folios in iomap buffered write path](http://lore.kernel.org/linux-fsdevel/20230520163603.1794256-1-willy@infradead.org/)** + +> Wang Yugui has a workload which would be improved by using large folios. +> Until now, we've only created large folios in the readahead path, +> but this workload writes without reading. The decision of what size +> folio to create is based purely on the size of the write() call (unlike +> readahead where we keep history and can choose to create larger folios +> based on that history even if individual reads are small). +> + +**[v1: cachefiles: Allow the cache to be non-root](http://lore.kernel.org/linux-fsdevel/1853230.1684516880@warthog.procyon.org.uk/)** + +> Set mode 0600 on files in the cache so that cachefilesd can run as an +> unprivileged user rather than leaving the files all with 0. Directories +> are already set to 0700. +> + +**[v2: bpf-next: Add O_PATH-based BPF_OBJ_PIN and BPF_OBJ_GET support](http://lore.kernel.org/linux-fsdevel/20230518215444.1418789-1-andrii@kernel.org/)** + +> Add ability to specify pinning location within BPF FS using O_PATH-based FDs, +> similar to openat() family of APIs. Patch #1 adds necessary kernel-side +> changes. Patch #2 exposes this through libbpf APIs. Patch #3 uses new mount +> APIs (fsopen, fsconfig, fsmount) to demonstrated how now it's possible to work +> with detach-mounted BPF FS using new BPF_OBJ_PIN and BPF_OBJ_GET +> functionality. +> + +**[v2: Documentation: add initial iomap kdoc](http://lore.kernel.org/linux-fsdevel/20230518150105.3160445-1-mcgrof@kernel.org/)** + +> To help with iomap adoption / porting I set out the goal to try to +> help improve the iomap documentation and get general guidance for +> filesystem conversions over from buffer-head in time for this year's +> LSFMM. The end results thanks to the review of Darrick, Christoph and +> others is on the kernelnewbies wiki [0]. +> + +**[v1: squashfs: don't include buffer_head.h](http://lore.kernel.org/linux-fsdevel/20230517071622.245151-1-hch@lst.de/)** + +> Squashfs has stopped using buffers heads in 93e72b3c612adcaca1 +> ("squashfs: migrate from ll_rw_block usage to BIO"). +> + +**[v1: gfs2/buffer folio changes](http://lore.kernel.org/linux-fsdevel/20230517032442.1135379-1-willy@infradead.org/)** + +> This kind of started off as a gfs2 patch series, then became entwined +> with buffer heads once I realised that gfs2 was the only remaining +> caller of __block_write_full_page(). For those not in the gfs2 world, +> the big point of this series is that block_write_full_page() should now +> handle large folios correctly. +> + +**[v4: memcontrol: support cgroup level OOM protection](http://lore.kernel.org/linux-fsdevel/20230517032032.76334-1-chengkaitao@didiglobal.com/)** + +> Establish a new OOM score algorithm, supports the cgroup level OOM +> protection mechanism. When an global/memcg oom event occurs, we treat +> all processes in the cgroup as a whole, and OOM killers need to select +> the process to kill based on the protection quota of the cgroup. +> + +**[v1: ACPI: APEI: EINJ: Add support for vendor defined error types](http://lore.kernel.org/linux-fsdevel/d10df9d4-8cc7-b6f0-4096-cd0805407744@amd.com/)** + +> Noted. The only checkpatch warning that was ignored was pertaining +> to the usage of S_IWUSR macro with debugfs_create_blob. Had noticed that a +> majority of einj module's debugfs files have been created with S_IRUSR and +> S_IWUSR macros. So used them to maintain uniformity. +> Will switch to octal permissions though. +> + +**[v1: procfs: consolidate arch_report_meminfo declaration](http://lore.kernel.org/linux-fsdevel/20230516195834.551901-1-arnd@kernel.org/)** + +> The arch_report_meminfo() function is provided by four architectures, +> with a __weak fallback in procfs itself. On architectures that don't +> have a custom version, the __weak version causes a warning because +> of the missing prototype. +> + +**[v1: radix-tree: move declarations to header](http://lore.kernel.org/linux-fsdevel/20230516194212.548910-1-arnd@kernel.org/)** + +> The xarray.c file contains the only call to radix_tree_node_rcu_free(), +> and it comes with its own extern declaration for it. This means the +> function definition causes a missing-prototype warning: +> +> lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes] +> + +#### 网络设备 + +**[v5: iproute2-next: ip-link: add support for nolocalbypass in vxlan](http://lore.kernel.org/netdev/20230521054948.22753-1-vladimir@nikishkin.pw/)** + +> Add userspace support for the [no]localbypass vxlan netlink +> attribute. With localbypass on (default), the vxlan driver processes +> the packets destined to the local machine by itself, bypassing the +> userspace nework stack. With nolocalbypass the packets are always +> forwarded to the userspace network stack, so userspace programs, +> such as tcpdump have a chance to process them. +> + +**[v1: net-next: nfc: Switch i2c drivers back to use .probe()](http://lore.kernel.org/netdev/20230520172104.359597-1-u.kleine-koenig@pengutronix.de/)** + +> After commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new() +> call-back type"), all drivers being converted to .probe_new() and then +> convert back to (the new) .probe() to be able to eventually drop +> .probe_new() from struct i2c_driver. +> + +**[v1: net-next: net: phylink: require supported_interfaces to be filled](http://lore.kernel.org/netdev/E1q0K1u-006EIP-ET@rmk-PC.armlinux.org.uk/)** + +> We have been requiring the supported_interfaces bitmap to be filled in +> by MAC drivers that have a mac_select_pcs() method. Now that all MAC +> drivers fill in the supported_interfaces bitmap, it is time to enforce +> this. We have already required supported_interfaces to be set in order +> for optical SFPs to be configured in commit f81fa96d8a6c ("net: phylink: +> use phy_interface_t bitmaps for optical modules"). +> + +**[v1: net-next: net: sfp: add support for a couple of copper multi-rate modules](http://lore.kernel.org/netdev/E1q0JfS-006Dqc-8t@rmk-PC.armlinux.org.uk/)** + +> Add support for the Fiberstore SFP-10G-T and Walsun HXSX-ATRC-1 +> modules. Internally, the PCB silkscreen has what seems to be a part +> number of WT_502. Fiberstore use v2.2 whereas Walsun use v2.6. +> + +**[v1: net: macb: use correct __be32 and __be16 types](http://lore.kernel.org/netdev/20230519221942.53942-1-minhuadotchen@gmail.com/)** + +> This patch fixes the following sparse warnings. No functional changes. +> +> Use cpu_to_be16() and cpu_to_be32() to convert constants before comparing +> them with __be16 type of psrc/pdst and __be32 type of ip4src/ip4dst. +> Apply be16_to_cpu() in GEM_BFINS(). +> + +**[v7: virtio: pds_vdpa driver](http://lore.kernel.org/netdev/20230519215632.12343-1-shannon.nelson@amd.com/)** + +> This patchset implements a new module for the AMD/Pensando DSC that +> supports vDPA services on PDS Core VF devices. This code is based on +> and depends on include files from the pds_core driver described here[0]. +> The pds_core driver creates the auxiliary_bus devices that this module +> connects to, and this creates vdpa devices for use by the vdpa module. +> + +**[v2: can: esd_usb: More preparation before supporting esd CAN-USB/3](http://lore.kernel.org/netdev/20230519195600.420644-1-frank.jungclaus@esd.eu/)** + +> Apply another small batch of patches as preparation for adding support +> of the newly available esd CAN-USB/3 to esd_usb.c. +> + +**[v1: net-next: net/mlx5: Introduce SF direction](http://lore.kernel.org/netdev/20230519183044.19065-1-saeed@kernel.org/)** + +> Whenever multiple Virtual Network functions (VNFs) are used by Service +> Function Chaining (SFC), each packet is passing through all the VNFs, +> and each VNF is performing hairpin in order to pass the packet to the +> next function in the chain. +> + +**[v1: net: rtnetlink: not allow dev gro_max_size to exceed GRO_MAX_SIZE](http://lore.kernel.org/netdev/25a7b1b138e5ad3c926afce8cd4e08d8b7ef3af6.1684516568.git.lucien.xin@gmail.com/)** + +> In commit 0fe79f28bfaf ("net: allow gro_max_size to exceed 65536"), +> it limited GRO_MAX_SIZE to (8 * 65535) to avoid overflows, but also +> deleted the check of GRO_MAX_SIZE when setting the dev gro_max_size. +> + +**[v1: net-next: i40e: add PHY debug register dump](http://lore.kernel.org/netdev/20230519170208.2820484-1-anthony.l.nguyen@intel.com/)** + +> Implement ethtool register dump for some PHY registers in order to +> assist field debugging of link issues. +> + +**[v1: net-next:pull request: ice: allow matching on meta data](http://lore.kernel.org/netdev/20230519170018.2820322-1-anthony.l.nguyen@intel.com/)** + +> This patchset is intended to improve the usability of the switchdev +> slow path. Without matching on a meta data values slow path works +> based on VF's MAC addresses. It causes a problem when the VF wants +> to use more than one MAC address (e.g. when it is in trusted mode). +> + +**[v2: net-next: net: dsa: mv88e6xxx: add 88E6361 support](http://lore.kernel.org/netdev/20230519141303.245235-1-alexis.lothore@bootlin.com/)** + +> This series brings initial support for Marvell 88E6361 switch. +> +> MV88E6361 is a 8 ports switch with 5 integrated Gigabit PHYs and 3 +> 2.5Gigabit SerDes interfaces. It is in fact a new variant in the +> - port 0: MII, RMII, RGMII, 1000BaseX, 2500BaseX +> - port 3 to 7: triple speed internal phys +> - port 9 and 10: 1000BaseX, 25000BaseX +> + +**[v1: net-next: TCP splice improvements](http://lore.kernel.org/netdev/cover.1684501922.git.asml.silence@gmail.com/)** + +> The main part is in Patch 1, which optimises locking for successful +> blocking TCP splice read, following with a clean up in Patch 2. +> + +**[v1: net-next: net/tcp: refactor tcp_inet6_sk()](http://lore.kernel.org/netdev/16be6307909b25852744a67b2caf570efbb83c7f.1684502478.git.asml.silence@gmail.com/)** + +> Don't keep hand coded offset caluclations and replace it with +> container_of(). It should be type safer and a bit less confusing. +> +> It also makes it with a macro instead of inline function to preserve +> constness, which was previously casted out like in case of +> tcp_v6_send_synack(). +> + +**[v1: net-next: net: phy: add helpers for comparing phy IDs](http://lore.kernel.org/netdev/E1pzzm3-006BZJ-Bi@rmk-PC.armlinux.org.uk/)** + +> There are several places which open code comparing PHY IDs. Provide a +> couple of helpers to assist with this, using a slightly simpler test +> than the original: +> +> - phy_id_compare() compares two arbitary PHY IDs and a mask of the +> significant bits in the ID. +> - phydev_id_compare() compares the bound phydev with the specified +> PHY ID, using the bound driver's mask. +> + +**[v4: net-next: Fine-Tune Flow Control and Speed Configurations in Microchip KSZ8xxx DSA Driver](http://lore.kernel.org/netdev/20230519124700.635041-1-o.rempel@pengutronix.de/)** + +> change v4: +> - instead of downstream/upstream use CPU-port and PHY-port +> - adjust comments +> - minor fixes +> + +**[v3: net: stmmac: compare p->des0 and p->des1 with __le32 type values](http://lore.kernel.org/netdev/20230519115030.74493-1-minhuadotchen@gmail.com/)** + +> Use cpu_to_le32 to convert the constants to __le32 type +> before comparing them with p->des0 and p->des1 (they are __le32 type) +> and to fix following sparse warnings: +> +> drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c:110:23: sparse: warning: restricted __le32 degrades to integer +> drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c:110:50: sparse: warning: restricted __le32 degrades to integer +> + +**[v1: [net-next] net: ipconfig: move ic_nameservers_fallback into #ifdef block](http://lore.kernel.org/netdev/20230519093250.4011881-1-arnd@kernel.org/)** + +> The new variable is only used when IPCONFIG_BOOTP is defined and otherwise +> causes a warning: +> +> net/ipv4/ipconfig.c:177:12: error: 'ic_nameservers_fallback' defined but not used [-Werror=unused-variable] +> +> Move it next to the user. +> + +**[v2: net-next: net: fec: turn on XDP features](http://lore.kernel.org/netdev/20230519014825.1659331-1-wei.fang@nxp.com/)** + +> The XDP features are supported since the commit 66c0e13ad236 +> ("drivers: net: turn on XDP features"). Currently, the fec +> driver supports NETDEV_XDP_ACT_BASIC, NETDEV_XDP_ACT_REDIRECT +> and NETDEV_XDP_ACT_NDO_XMIT. So turn on these XDP features +> for fec driver. +> + +**[v1: net: stmmac: use le32_to_cpu for p->des0 and p->des1](http://lore.kernel.org/netdev/20230519002522.3648-1-minhuadotchen@gmail.com/)** + +> Use le32_to_cpu for p->des0 and p->des1 to fix the +> following sparse warnings: +> +> drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c:110:23: sparse: warning: restricted __le32 degrades to integer +> drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c:110:50: sparse: warning: restricted __le32 degrades to integer +> + +**[v13: io_uring: add napi busy polling support](http://lore.kernel.org/netdev/20230518211751.3492982-1-shr@devkernel.io/)** + +> This adds the napi busy polling support in io_uring.c. It adds a new +> napi_list to the io_ring_ctx structure. This list contains the list of +> napi_id's that are currently enabled for busy polling. This list is +> used to determine which napi id's enabled busy polling. For faster +> access it also adds a hash table. +> + +**[v6: Enable multiple MCAN on AM62x](http://lore.kernel.org/netdev/20230518193613.15185-1-jm@ti.com/)** + +> On AM62x there are two MCANs in MCU domain. The MCANs in MCU domain +> were not enabled since there is no hardware interrupt routed to A53 +> GIC interrupt controller. Therefore A53 Linux cannot be interrupted +> by MCU MCANs. +> + +**[v1: bpf-next: xsk: multi-buffer support](http://lore.kernel.org/netdev/20230518180545.159100-1-maciej.fijalkowski@intel.com/)** + +> This series of patches add multi-buffer support for AF_XDP. XDP and +> various NIC drivers already have support for multi-buffer packets. With +> this patch set, programs using AF_XDP sockets can now also receive and +> transmit multi-buffer packets both in copy as well as zero-copy mode. +> ZC multi-buffer implementation is based on ice driver. +> + +**[v1: nf: netfilter: ipset: Add schedule point in call_ad().](http://lore.kernel.org/netdev/20230518173300.34531-1-kuniyu@amazon.com/)** + +> syzkaller found a repro that causes Hung Task [0] with ipset. The repro +> first creates an ipset and then tries to delete a large number of IPs +> from the ipset concurrently: +> +> IPSET_ATTR_IPADDR_IPV4: 172.20.20.187 +> IPSET_ATTR_CIDR: 2 +> + +**[v3: net: fec: add dma_l.org/netdev/20230518150202.1920375-1-shenwei.wang@nxp.com/)** + +> Two dma_wmb() are added in the XDP TX path to ensure proper ordering of +> descriptor and buffer updates: +> 1. A dma_wmb() is added after updating the last BD to make sure +> the updates to rest of the descriptor are visible before +> transferring ownership to FEC. +> 2. A dma_wmb() is also added after updating the bdp to ensure these +> updates are visible before updating txq->bd.cur. +> 3. Start the xmit of the frame immediately right after configuring the +> tx descriptor. +> + +**[v1: bpf: Use call_rcu_hurry() with synchronize_rcu_mult()](http://lore.kernel.org/netdev/358bde93-4933-4305-ac42-4d6f10c97c08@paulmck-laptop/)** + +> The bpf_struct_ops_map_free() function must wait for both an RCU grace +> period and an RCU Tasks grace period, and so it passes call_rcu() and +> call_rcu_tasks() to synchronize_rcu_mult(). This works, but on ChromeOS +> and Android platforms call_rcu() can have lazy semantics, resulting in +> multi-second delays between call_rcu() invocation and invocation of the +> corresponding callback. +> + +**[GIT PULL: Networking for 6.4-rc3](http://lore.kernel.org/netdev/20230518132554.41223-1-pabeni@redhat.com/)** + +> The following changes since commit 6e27831b91a0bc572902eb065b374991c1ef452a: +> +> Merge tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2023-05-11 08:42:47 -0500) +> + +#### 安全增强 + +**[v1: Memory Mapping (VMA) protection using PKU - set 1](http://lore.kernel.org/linux-hardening/20230519011915.846407-1-jeffxu@chromium.org/)** + +> We're using PKU for in-process isolation to enforce control-flow integrity +> for a JIT compiler. In our threat model, an attacker exploits a +> vulnerability and has arbitrary read/write access to the whole process +> space concurrently to other threads being executed. This attacker can +> manipulate some arguments to syscalls from some threads. +> + +**[v1: next: ALSA: mixart: Replace one-element arrays with simple object declarations](http://lore.kernel.org/linux-hardening/ZGVlcpuvx1rSOMP8@work/)** + +> One-element arrays are deprecated, and we are replacing them with flexible +> array members, instead. However, in this case it seems those one-element +> arrays have never actually been used as fake flexible arrays. +> + +**[v1: md/raid5: Convert stripe_head's "dev" to flexible array member](http://lore.kernel.org/linux-hardening/20230517233313.never.130-kees@kernel.org/)** + +> Replace old-style 1-element array of "dev" in struct stripe_head with +> modern C99 flexible array. In the future, we can additionally annotate +> it with the run-time size, found in the "disks" member. +> + +**[v1: kbuild: Enable -fstrict-flex-arrays=3](http://lore.kernel.org/linux-hardening/20230517232801.never.262-kees@kernel.org/)** + +> The -fstrict-flex-arrays=3 option is now available with the release +> of GCC 13[1] and Clang 16[2]. This feature instructs the compiler to +> treat only C99 flexible arrays as dynamically sized for the purposes of +> object size calculations. In other words, the ancient practice of using +> 1-element arrays, or the GNU extension of using 0-sized arrays, as a +> dynamically sized array is disabled. This allows CONFIG_UBSAN_BOUNDS, +> CONFIG_FORTIFY_SOURCE, and other object-size aware features to behave +> unambiguously in the face of trailing arrays: only C99 flexible arrays +> are considered to be dynamically sized. +> + +**[v1: pid: Replace struct pid 1-element array with flex-array](http://lore.kernel.org/linux-hardening/20230517225838.never.965-kees@kernel.org/)** + +> For pid namespaces, struct pid uses a dynamically sized array member, +> "numbers". This was implemented using the ancient 1-element fake flexible +> array, which has been deprecated for decades. Replace it with a C99 +> flexible array, refactor the array size calculations to use struct_size(), +> and address elements via indexes. Note that the static initializer (which +> defines a single element) works as-is, and requires no special handling. +> + +**[v1: next: scsi: lpfc: Use struct_size() helper](http://lore.kernel.org/linux-hardening/99e06733f5f35c6cd62e05f530b93107bfd03362.1684358315.git.gustavoars@kernel.org/)** + +> Prefer struct_size() over open-coded versions of idiom: +> +> sizeof(struct-with-flex-array) + sizeof(typeof-flex-array-elements) * count +> +> where count is the max number of items the flexible array is supposed to +> contain. +> + +**[v1: next: scsi: lpfc: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/6c6dcab88524c14c47fd06b9332bd96162656db5.1684358315.git.gustavoars@kernel.org/)** + +> One-element arrays are deprecated, and we are replacing them with flexible +> array members instead. So, replace one-element arrays with flexible-array +> members in a couple of structures, and refactor the rest of the code, +> accordingly. +> + +**[v1: checkpatch: Check for strcpy and strncpy too](http://lore.kernel.org/linux-hardening/20230517201349.never.582-kees@kernel.org/)** + +> Warn about strcpy(), strncpy(), and strlcpy(). Suggest strscpy() and +> include pointers to the open KSPP issues for each, which has further +> details and replacement procedures. +> + +**[v2: Compiler Attributes: Add __counted_by macro](http://lore.kernel.org/linux-hardening/20230517190841.gonna.796-kees@kernel.org/)** + +> In an effort to annotate all flexible array members with their run-time +> size information, the "element_count" attribute is being introduced by +> Clang[1] and GCC[2] in future releases. This annotation will provide +> the CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE features the ability +> to perform run-time bounds checking on otherwise unknown-size flexible +> arrays. +> + +**[v1: next: media: venus: hfi_cmds: Replace fake flex-arrays with flexible-array members](http://lore.kernel.org/linux-hardening/ZGQrSQ%2FzHu+pk7WU@work/)** + +> One-element arrays are deprecated, and we are replacing them with flexible +> array members instead. So, replace one-element arrays with flexible-array +> members in multiple structures. +> + +**[v1: next: media: venus: hfi_cmds: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZGQn63U4IeRUiJWb@work/)** + +> One-element arrays are deprecated, and we are replacing them with flexible +> array members instead. So, replace one-element arrays with flexible-array +> members in struct hfi_sys_set_resource_pkt, and refactor the rest of +> the code, accordingly. +> + +**[v1: next: media: venus: hfi_cmds: Use struct_size() helper](http://lore.kernel.org/linux-hardening/fd52d6ddce285474615e4bd96931ab12a0da8199.1684278538.git.gustavoars@kernel.org/)** + +> Prefer struct_size() over open-coded versions of idiom: +> +> sizeof(struct-with-flex-array) + sizeof(typeof-flex-array-elements) * count +> +> where count is the max number of items the flexible array is supposed to +> contain. +> + +**[v1: next: media: venus: hfi_cmds: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/e4b13d7b79d1477e775c6d4564f7b23c4cf967f2.1684278538.git.gustavoars@kernel.org/)** + +> One-element arrays are deprecated, and we are replacing them with flexible +> array members instead. So, replace one-element arrays with flexible-array +> members in struct hfi_session_set_buffers_pkt, and refactor the rest of +> the code, accordingly. +> + +**[v1: next: media: venus: Replace one-element arrays with flexible-array members](http://lore.kernel.org/linux-hardening/ZGPk3PpvYzjD1+0%2F@work/)** + +> One-element arrays are deprecated, and we are replacing them with flexible +> array members instead. So, replace one-element arrays with flexible-array +> members in multiple structures, and refactor the rest of the code, +> accordingly. +> + +**[v1: next: iavf: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/ZGLR3H1OTgJfOdFP@work/)** + +> One-element arrays are deprecated, and we are replacing them with flexible +> array members instead. So, replace one-element array with flexible-array +> member in struct iavf_qvlist_info, and refactor the rest of the code, +> accordingly. +> + +**[v1: next: wifi: wil6210: fw: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper](http://lore.kernel.org/linux-hardening/ZGKHByxujJoygK+l@work/)** + +> Zero-length arrays are deprecated, and we are moving towards adopting +> C99 flexible-array members, instead. So, replace zero-length arrays +> declarations alone in structs with the new DECLARE_FLEX_ARRAY() +> helper macro. +> + +**[v1: next: wifi: wil6210: wmi: Replace zero-length array with DECLARE_FLEX_ARRAY() helper](http://lore.kernel.org/linux-hardening/ZGKHM+MWFsuqzTjm@work/)** + +> Zero-length arrays are deprecated, and we are moving towards adopting +> C99 flexible-array members, instead. So, replace zero-length arrays +> declarations alone in structs with the new DECLARE_FLEX_ARRAY() +> helper macro. +> + +**[v1: next: net: libwx: Replace zero-length array with flexible-array member](http://lore.kernel.org/linux-hardening/ZGKGwtsobVZecWa4@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated, and we are +> moving towards adopting C99 flexible-array members instead. +> + +**[v1: next: mlxfw: Replace zero-length array with DECLARE_FLEX_ARRAY() helper](http://lore.kernel.org/linux-hardening/ZGKGiBxP0zHo6XSK@work/)** + +> Zero-length arrays are deprecated and we are moving towards adopting +> C99 flexible-array members, instead. So, replace zero-length arrays +> declarations alone in structs with the new DECLARE_FLEX_ARRAY() +> helper macro. +> + +#### 异步 IO + +**[v1: net-next: minor tcp io_uring zc optimisations](http://lore.kernel.org/io-uring/cover.1684166247.git.asml.silence@gmail.com/)** + +> Patch 1 is a simple cleanup, patch 2 gives removes 2 atomics from the +> io_uring zc TCP submission path, which yielded extra 0.5% for my +> throughput CPU bound tests based on liburing/examples/send-zerocopy.c +> + +**[v1: for-next: Enable IOU_F_TWQ_LAZY_WAKE for passthrough](http://lore.kernel.org/io-uring/cover.1684154817.git.asml.silence@gmail.com/)** + +> Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough. +> +> The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1] +> patchset, but for a quick test I took fio/t/io_uring with 4 threads each +> reading their own drive and all pinned to the same CPU to make it CPU +> bound and got +10% throughput improvement. +> + +#### Rust For Linux + +**[v1: Bindings for the workqueue](http://lore.kernel.org/rust-for-linux/20230517203119.3160435-1-aliceryhl@google.com/)** + +> This patchset contains bindings for the kernel workqueue. +> +> One of the primary goals behind the design used in this patch is that we +> must support embedding the `work_struct` as a field in user-provided +> types, because this allows you to submit things to the workqueue without +> having to allocate, making the submission infallible. If we didn't have +> to support this, then the patch would be much simpler. One of the main +> things that make it complicated is that we must ensure that the function +> pointer in the `work_struct` is compatible with the struct it is +> contained within. +> + +**[v1: rust: networking and crypto abstractions](http://lore.kernel.org/rust-for-linux/010101881db036fb-2fb6981d-e0ef-4ad1-83c3-54d64b6d93b3-000000@us-west-2.amazonses.com/)** + +> This includes initial rust abstractions for networking and crypto. +> +> I've been working on in-kernel TLS 1.3 handshake in Rust on the top of +> this. Currently you can run simple TLS server code, which does a +> handshake, sets up kTLS (Kernel TLS offload) to read and write some +> bytes. +> + +#### BPF + +**[v9: bpf-next: bpf: Add socket destroy capability](http://lore.kernel.org/bpf/20230519225157.760788-1-aditi.ghag@isovalent.com/)** + +> This patch set adds the capability to destroy sockets in BPF. We plan to +> use the capability in Cilium to force client sockets to reconnect when +> their remote load-balancing backends are deleted. The other use case is +> on-the-fly policy enforcement where existing socket connections +> prevented by policies need to be terminated. +> + +**[v1: dwarves: Encoding function addresses using DECL_TAGs](http://lore.kernel.org/bpf/20230517161648.17582-1-alan.maguire@oracle.com/)** + +> As a means to continue the discussion in [1], which is +> concerned with finding the best long-term solution to +> having a BPF Type Format (BTF) representation of +> functions that is usable for tracing of edge cases, this +> proof-of-concept series is intended to explore one approach +> to adding information to help make tracing more accurate. +> + +**[v2: bpf-next: bpftool: specify XDP Hints ifname when loading program](http://lore.kernel.org/bpf/20230517160103.1088185-1-larysa.zaremba@intel.com/)** + +> Add ability to specify a network interface used to resolve +> XDP Hints kfuncs when loading program through bpftool. +> + +**[v1: bpf-next: selftests/bpf: add xdp_feature selftest for bond device](http://lore.kernel.org/bpf/64cb8f20e6491f5b971f8d3129335093c359aad7.1684329998.git.lorenzo@kernel.org/)** + +> Introduce selftests to check xdp_feature support for bond driver. +> + +**[v2: bpf-next: bpf: Show target_{obj,btf}_id for tracing link](http://lore.kernel.org/bpf/20230517103126.68372-1-laoar.shao@gmail.com/)** + +> The target_btf_id can help us understand which kernel function is +> linked by a tracing prog. The target_btf_id and target_obj_id have +> already been exposed to userspace, so we just need to show them. +> + +**[v1: selftests/bpf: Do not use sign-file as testcase](http://lore.kernel.org/bpf/88e3ab23029d726a2703adcf6af8356f7a2d3483.1684316821.git.legion@kernel.org/)** + +> The sign-file utility (from scripts/) is used in prog_tests/verify_pkcs7_sig.c, +> but the utility should not be called as a test. Executing this utility +> produces the following error: +> + +**[v1: support non-frag page for page_pool_alloc_frag()](http://lore.kernel.org/bpf/20230516124801.2465-1-linyunsheng@huawei.com/)** + +> In [1], there is a use case to use frag support in page +> pool to reduce memory usage, and it may request different +> frag size depending on the head/tail room space for +> xdp_frame/shinfo and mtu/packet size. When the requested +> frag size is large enough that a single page can not be +> split into more than one frag, using frag support only +> have performance penalty because of the extra frag count +> handling for frag support. +> + +**[v2: bpf-next: seltests/xsk: prepare for AF_XDP multi-buffer testing](http://lore.kernel.org/bpf/20230516103109.3066-1-magnus.karlsson@gmail.com/)** + +> Prepare the AF_XDP selftests test framework code for the upcoming +> multi-buffer support in AF_XDP. This so that the multi-buffer patch +> set does not become way too large. In that upcoming patch set, we are +> only including the multi-buffer tests together with any framework +> code that depends on the new options bit introduced in the AF_XDP +> multi-buffer implementation itself. +> + +**[v1: bpf-next: selftests/bpf: improve netcnt test robustness](http://lore.kernel.org/bpf/20230515204833.2832000-1-andrii@kernel.org/)** + +> Change netcnt to demand at least 10K packets, as we frequently see some +> stray packet arriving during the test in BPF CI. It seems more important +> to make sure we haven't lost any packet than enforcing exact number of +> packets. +> + +**[v1: bpf: samples/bpf: use canonical fallthrough pseudo-keyword in hbm.c](http://lore.kernel.org/bpf/20230515200207.2541162-1-andrii@kernel.org/)** + +> Rename now unsupported __fallthrough into fallthrough ([0]) in +> samples/bpf/hbm.c to fix samples/bpf compilation. +> +> [0] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through +> + +**[v2: iwl-net: ice: recycle/free all of the fragments from multi-buffer frame](http://lore.kernel.org/bpf/20230515135247.142105-1-maciej.fijalkowski@intel.com/)** + +> The ice driver caches next_to_clean value at the beginning of +> ice_clean_rx_irq() in order to remember the first buffer that has to be +> freed/recycled after main Rx processing loop. The end boundary is +> indicated by first descriptor of frame that Rx processing loop has ended +> its duties. Note that if mentioned loop ended in the middle of gathering +> multi-buffer frame, next_to_clean would be pointing to the descriptor in +> the middle of the frame BUT freeing/recycling stage will stop at the +> first descriptor. This means that next iteration of ice_clean_rx_irq() +> will miss the (first_desc, next_to_clean - 1) entries. +> + +**[v2: bpf-next: bpf: bpf trampoline improvements](http://lore.kernel.org/bpf/20230515130849.57502-1-laoar.shao@gmail.com/)** + +> When we run fexit bpf programs (e.g. attaching tcp_recvmsg) on our servers +> which were running old kernels, some of these servers crashed. Finally we +> figured out that it was caused by the same issue resolved by +> commit e21aa341785c ("bpf: Fix fexit trampoline."). After we backported +> that commit, the crash disappears. However new issues are introduced by +> that commit. This patchset fixes them. +> + +**[v1: bpf-next: bpf: btf: restore resolve_mode when popping the resolve stack](http://lore.kernel.org/bpf/20230515121521.30569-1-lmb@isovalent.com/)** + +> In commit 9b459804ff99 ("btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR") +> I fixed a bug that occurred during resolving of a DATASEC by strategically resetting +> resolve_mode. This fixes the immediate bug but leaves us open to future bugs where +> nested types have to be resolved. +> + +**[v1: Make fpobe + rethook immune to recursion](http://lore.kernel.org/bpf/20230515035215.Hx3AI5Kb65x5TpmiBhIKrdGS6XpIW09Y4phhBWXCDMg@z/)** + +> Current fprobe and rethook has some pitfalls and may introduce kernel stack recusion, especially in +> massive tracing scenario. +> +> For example, if (DEBUG_PREEMPT | TRACE_PREEMPT_TOGGLE) , preempt_count_{add, sub} can be traced via +> ftrace, if we happens to use fprobe + rethook based on ftrace to hook on those functions, +> recursion is introduced in functions like rethook_trampoline_handler and leads to kernel crash +> because of stack overflow. +> + +### 周边技术动态 + +#### Qemu + +**[v1: hw/riscv/opentitan: Correct QOM type/size of OpenTitanState](http://lore.kernel.org/qemu-devel/20230520054510.68822-1-philmd@linaro.org/)** + +> This series fix a QOM issue with the OpenTitanState +> structure, noticed while auditing QOM relations globally. +> + +**[v5: hw/riscv: qemu crash when NUMA nodes exceed available CPUs](http://lore.kernel.org/qemu-devel/20230519023758.1759434-1-yin.wang@intel.com/)** + +> Command "qemu-system-riscv64 -machine virt +> -m 2G -smp 1 -numa node,mem=1G -numa node,mem=1G" +> would trigger this problem.Backtrace with: +> #0 0x0000555555b5b1a4 in riscv_numa_get_default_cpu_node_id at ../hw/riscv/numa.c:211 +> #1 0x00005555558ce510 in machine_numa_finish_cpu_init at ../hw/core/machine.c:1230 +> #2 0x00005555558ce9d3 in machine_run_board_init at ../hw/core/machine.c:1346 +> #3 0x0000555555aaedc3 in qemu_init_board at ../softmmu/vl.c:2513 +> #4 0x0000555555aaf064 in qmp_x_exit_preconfig at ../softmmu/vl.c:2609 +> #5 0x0000555555ab1916 in qemu_init at ../softmmu/vl.c:3617 +> #6 0x000055555585463b in main at ../softmmu/main.c:47 +> This commit fixes the issue by adding parameter checks. +> + +**[v1: Add RISC-V Virtual IRQs and IRQ filtering support](http://lore.kernel.org/qemu-devel/20230518113838.130084-1-rkanwal@rivosinc.com/)** + +> This series adds M and HS-mode virtual interrupt and IRQ filtering support. +> This allows inserting virtual interrupts from M/HS-mode into S/VS-mode +> using mvien/hvien and mvip/hvip csrs. IRQ filtering is a use case of +> this change, i-e M-mode can stop delegating an interrupt to S-mode and +> instead enable it in MIE and receive those interrupts in M-mode and then +> selectively inject the interrupt using mvien and mvip. +> + +**[v9: target/riscv: rework CPU extension validation](http://lore.kernel.org/qemu-devel/20230517135714.211809-1-dbarboza@ventanamicro.com/)** + +> In this version we have a change in patch 11. We're now firing a +> GUEST_ERROR if write_misa() fails and we need to rollback (i.e. not +> change MISA ext). +> + +#### U-Boot + +**[v2: riscv: setup per-hart stack earlier](http://lore.kernel.org/u-boot/1684650044-313122-1-git-send-email-ganboing@gmail.com/)** + +> Harts need to use per-hart stack before any function call, even if that +> function is a simple one. When the callee uses stack for register save/ +> restore, especially RA, if nested call, concurrent access by multiple +> harts on the same stack will cause data-race. +> + +**[v1: riscv: add backtrace support](http://lore.kernel.org/u-boot/20230515130322.516871-1-ben.dooks@sifive.com/)** + +> When debugging, it is useful to have a backtrace to find +> out what is in the call stack as the previous function (RA) +> may not have been the culprit. +> + +## 20230507:第 45 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v3: Allwinner R329/D1/R528/T113s SPI support](http://lore.kernel.org/linux-riscv/20230506232616.1792109-1-bigunclemax@gmail.com/)** + +> This series is attempt to revive previous work to add support for SPI +> controller which is used in newest Allwinner's SOCs R329/D1/R528/T113s +> https://lore.kernel.org/lkml/BYAPR20MB2472E8B10BFEF75E7950BBC0BCF79@BYAPR20MB2472.namprd20.prod.outlook.com/ +> + +**[v1: riscv: mm: use bitmap_zero() API](http://lore.kernel.org/linux-riscv/202305061711417142802@zte.com.cn/)** + +> bitmap_zero() is faster than bitmap_clear(), so use bitmap_zero() +> instead of bitmap_clear(). +> + +**[v1: RISC-V: KVM: use bitmap_zero() API](http://lore.kernel.org/linux-riscv/202305061710302032748@zte.com.cn/)** + +> bitmap_zero() is faster than bitmap_clear(), so use bitmap_zero() +> instead of bitmap_clear(). +> + +**[v3: Add TDM audio on StarFive JH7110](http://lore.kernel.org/linux-riscv/20230506090116.9206-1-walker.chen@starfivetech.com/)** + +> This patchset adds TDM audio driver for the StarFive JH7110 SoC. The +> first patch adds device tree binding for TDM module. The second patch +> adds tdm driver support for JH7110 SoC. The last patch adds device node +> of tdm and sound card to JH7110 dts. +> +> The series has been tested on the VisionFive 2 board by plugging an +> audio expansion board. +> +> For more information of audio expansion board, you can take a look +> at the following webpage: +> https://wiki.seeedstudio.com/ReSpeaker_2_Mics_Pi_HAT/ +> + +**[v1: perf build: Add system include paths to BPF builds](http://lore.kernel.org/linux-riscv/20230506021450.3499232-1-irogers@google.com/)** + +> There are insufficient headers in tools/include to satisfy building +> BPF programs and their header dependencies. Add the system include +> paths from the non-BPF clang compile so that these headers can be +> found. +> +> This code was taken from: +> tools/testing/selftests/bpf/Makefile +> + +**[GIT PULL: RISC-V Patches for the 6.4 Merge Window, Part 2](http://lore.kernel.org/linux-riscv/mhng-b783c0bb-3d23-4767-9c69-a39f805a8544@palmer-ri-x1c9/)** + +> +> RISC-V Patches for the 6.4 Merge Window, Part 2 +> +> * Support for hibernation. +> * .rela.dyn has been moved to init. +> * A fix for the SBI probing to allow for implementation-defined +> behavior. +> * Various other fixes and cleanups throughout the tree. +> +> There are still a few minor build issues with drivers, but patches are on the +> lists. Aside from that things look good with a merge from Linus' master as of +> last night, I've got another test running now but I don't see anything scary. +> + +**[v1: riscv: Optimize memset](http://lore.kernel.org/linux-riscv/6d1cbe2e.3c31d.187eb14d990.Coremail.zhangfei@nj.iscas.ac.cn/)** + +> +> This patch has been optimized for memset data sizes less than 16 bytes. +> Compared to byte by byte storage, significant performance improvement has been achieved. +> + +**[v1: riscv: dts: allwinner: d1: Add SPI0 controller node](http://lore.kernel.org/linux-riscv/20230505074701.1030980-1-bigunclemax@gmail.com/)** + +> Some boards form the MangoPi family (MQ\MQ-Dual\MQ-R) may have +> an optional SPI flash that connects to the SPI0 controller. +> This controller is already supported by sun8i-h3-spi driver. +> So let's add its DT node. +> + +**[v2: RISC-V: Detect Ssqosid extension and handle sqoscfg CSR](http://lore.kernel.org/linux-riscv/20230430-riscv-cbqri-rfc-v2-v2-0-8e3725c4a473@baylibre.com/)** + +> This RFC series adds initial support for the Ssqosid extension and the +> sqoscfg CSR as specified in Chapter 2 of the RISC-V Capacity and +> Bandwidth Controller QoS Register Interface (CBQRI) specification [1]. +> +> QoS (Quality of Service) in this context is concerned with shared +> resources on an SoC such as cache capacity and memory bandwidth. Intel +> and AMD already have QoS features on x86, and there is an existing user +> interface in Linux: the resctrl virtual filesystem [2]. +> +> The sqoscfg CSR provides a mechanism by which a software workload (e.g. +> a process or a set of processes) can be associated with a resource +> control ID (RCID) and a monitoring counter ID (MCID) that accompanies +> each request made by the hart to shared resources like cache. CBQRI +> defines operations to configure resource usage limits, in the form of +> capacity or bandwidth, for an RCID. CBQRI also defines operations to +> configure counters to track the resource utilization of an MCID. +> +> The CBQRI spec is still in draft state and is undergoing review [3]. It +> is possible there will be changes to the Ssqosid extension and the CBQRI +> spec. For example, the CSR address for sqoscfg is not yet finalized. +> +> My goal for this RFC is to determine if the 2nd patch is an acceptable +> approach to handling sqoscfg when switching tasks. This RFC was tested +> against a QEMU branch that implements the Ssqosid extension [4]. A test +> driver [5] was used to set sqoscfg for the current process. This allows +> __switch_to_sqoscfg() to be tested without resctrl. +> +> This series is based on riscv/for-next at: +> +> b09313dd2e72 ("RISC-V: hwprobe: Explicity check for -1 in vdso init") +> + +**[v2: Split ptdesc from struct page](http://lore.kernel.org/linux-riscv/20230501192829.17086-1-vishal.moola@gmail.com/)** + +> The MM subsystem is trying to shrink struct page. This patchset +> introduces a memory descriptor for page table tracking - struct ptdesc. +> +> This patchset introduces ptdesc, splits ptdesc from struct page, and +> converts many callers of page table constructor/destructors to use ptdescs. +> +> Ptdesc is a foundation to further standardize page tables, and eventually +> allow for dynamic allocation of page tables independent of struct page. +> However, the use of pages for page table tracking is quite deeply +> ingrained and varied across archictectures, so there is still a lot of +> work to be done before that can happen. +> +> This is rebased on next-20230428. +> + +**[v3: riscv: allow case-insensitive ISA string parsing](http://lore.kernel.org/linux-riscv/tencent_E6911C8D71F5624E432A1AFDF86804C3B509@qq.com/)** + +> This patchset allows case-insensitive ISA string parsing, which is +> needed in the ACPI environment. As the RISC-V Hart Capabilities Table +> (RHCT) description in UEFI Forum ECR[1] shows the format of the ISA +> string is defined in the RISC-V unprivileged specification[2]. However, +> the RISC-V unprivileged specification defines the ISA naming strings are +> case-insensitive while the current ISA string parser in the kernel only +> accepts lowercase letters. In this case, the kernel should allow +> case-insensitive ISA string parsing. Moreover, this reason has been +> discussed in Conor's patch[3]. And I have also checked the current ISA +> string parsing in the recent ACPI support patch[4] will also call +> `riscv_fill_hwcap` function as DT we use now. +> +> The original motivation for my patch v1[5] is that some SoC generators +> will provide generated DT with illegal ISA string in dt-binding such as +> rocket-chip, which will even cause kernel panic in some cases as I +> mentioned in v1[5]. Now, the rocket-chip has been fixed in PR #3333[6]. +> However, when using some specific version of rocket-chip with +> illegal ISA string in DT, this patchset will also work for parsing +> uppercase letters correctly in DT, thus will have better compatibility. +> +> In summary, this patch not only works for case-insensitive ISA string +> parsing to meet the requirements in ECR[1] but also can be a workaround +> for some specific versions of rocket-chip. +> + +#### 进程调度 + +**[v2: sched/debug: correct printing for rq->nr_uninterruptible](http://lore.kernel.org/lkml/20230506074253.44526-1-yanyan.yan@antgroup.com/)** + +> Commit e6fe3f422be1 ("sched: Make multiple runqueue task counters +> 32-bit") changed the type for rq->nr_uninterruptible from "unsigned +> long" to "unsigned int", but left wrong cast print to +> /sys/kernel/debug/sched/debug and to the console. +> +> For example, nr_uninterruptible's value is fffffff7 with type +> "unsigned int", (long)nr_uninterruptible shows 4294967287 while +> (int)nr_uninterruptible prints -9. So using int cast fixes wrong +> printing. +> + +**[v1: sched: core: Simplify init_sched_mm_cid()](http://lore.kernel.org/lkml/20230507023352.2784-1-kunyu@nfschina.com/)** + +> int mm_users variable definition move to variable usage location. +> + +**[v2: sched/deadline: cpuset: Rework DEADLINE bandwidth restoration](http://lore.kernel.org/lkml/20230503072228.115707-1-juri.lelli@redhat.com/)** + +> Qais reported [1] that iterating over all tasks when rebuilding root +> domains for finding out which ones are DEADLINE and need their bandwidth +> correctly restored on such root domains can be a costly operation (10+ +> ms delays on suspend-resume). He proposed we skip rebuilding root +> domains for certain operations, but that approach seemed arch specific +> and possibly prone to errors, as paths that ultimately trigger a rebuild +> might be quite convoluted (thanks Qais for spending time on this!). +> +> This is v2 of an alternative approach (v1 at [3]) to fix the problem. +> + +**[v1: sched/numa: Disjoint set vma scan improvements](http://lore.kernel.org/lkml/cover.1683033105.git.raghavendra.kt@amd.com/)** + +> +> While this has improved significant system time overhead, there are corner +> cases, which genuinely needs some relaxation for e.g., concern raised by +> PeterZ where unfairness amongst the thread belonging to disjoint set of VMSs +> can potentially amplify the side effects of vma regions belonging to some of +> the tasks being left unscanned. +> +> With this patch I am seeing good improvement in numa01_THREAD_ALLOC case, +> but please note that with [1] there was a drastic decrease in system time when +> benchmarks run, this patch adds back some of the system time. +> + +**[v2: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230430171809.124686-1-yury.norov@gmail.com/)** + +> for_each_cpu() is widely used in kernel, and it's beneficial to create +> a NUMA-aware version of the macro. +> +> Recently added for_each_numa_hop_mask() works, but switching existing +> codebase to it is not an easy process. +> +> This series adds for_each_numa_cpu(), which is designed to be similar to +> the for_each_cpu(). It allows to convert existing code to NUMA-aware as +> simple as adding a hop iterator variable and passing it inside new macro. +> for_each_numa_cpu() takes care of the rest. +> +> At the moment, we have 2 users of NUMA-aware enumerators. One is +> Melanox's in-tree driver, and another is Intel's in-review driver: +> +> https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/ +> +> Both real-life examples follow the same pattern: +> +> for_each_numa_hop_mask(cpus, prev, node) { +> for_each_cpu_andnot(cpu, cpus, prev) { +> if (cnt++ == max_num) +> goto out; +> do_something(cpu); +> } +> prev = cpus; +> } +> +> With the new macro, it has a more standard look, like this: +> +> for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) { +> if (cnt++ == max_num) +> break; +> do_something(cpu); +> } +> +> Straight conversion of existing for_each_cpu() codebase to NUMA-aware +> version with for_each_numa_hop_mask() is difficult because it doesn't +> take a user-provided cpu mask, and eventually ends up with open-coded +> double loop. With for_each_numa_cpu() it shouldn't be a brainteaser. +> Consider the NUMA-ignorant example: +> +> cpumask_t cpus = get_mask(); +> int cnt = 0, cpu; +> +> for_each_cpu(cpu, cpus) { +> if (cnt++ == max_num) +> break; +> do_something(cpu); +> } +> +> Converting it to NUMA-aware version would be as simple as: +> +> cpumask_t cpus = get_mask(); +> int node = get_node(); +> int cnt = 0, hop, cpu; +> +> for_each_numa_cpu(cpu, hop, node, cpus) { +> if (cnt++ == max_num) +> break; +> do_something(cpu); +> } +> +> The latter looks more verbose and avoids from open-coding that annoying +> double loop. Another advantage is that it works with a 'hop' parameter with +> the clear meaning of NUMA distance, and doesn't make people not familiar +> to enumerator internals bothering with current and previous masks machinery. +> + +#### 内存管理 + +**[v1: filemap: Handle error return from __filemap_get_folio()](http://lore.kernel.org/linux-mm/20230506160415.2992089-1-willy@infradead.org/)** + +> Smatch reports that filemap_fault() was missed in the conversion of +> __filemap_get_folio() error returns from NULL to ERR_PTR. +> + +**[v1: mm/gup: add missing gup_must_unshare() check to gup_huge_pgd()](http://lore.kernel.org/linux-mm/cb971ac8dd315df97058ea69442ecc007b9a364a.1683381545.git.lstoakes@gmail.com/)** + +> All other instances of gup_huge_pXd() perform the unshare check, so update +> the PGD-specific function to do so as well. +> +> While checking pgd_write() might seem unusual, this function already +> performs such a check via pgd_access_permitted() so this is in line with +> the existing implementation. +> + +**[v3: memcontrol: support cgroup level OOM protection](http://lore.kernel.org/linux-mm/20230506114948.6862-1-chengkaitao@didiglobal.com/)** + +> Establish a new OOM score algorithm, supports the cgroup level OOM +> protection mechanism. When an global/memcg oom event occurs, we treat +> all processes in the cgroup as a whole, and OOM killers need to select +> the process to kill based on the protection quota of the cgroup +> + +**[v1: RESEND: Make PCMCIA and QCOM_HIDMA depend on HAS_IOMEM](http://lore.kernel.org/linux-mm/20230506111628.712316-1-bhe@redhat.com/)** + +> This is suggested by Niklas when he reviewed patches related to s390 +> part: +> https://lore.kernel.org/all/d78edb587ecda0aa09ba80446d0f1883e391996d.camel@linux.ibm.com/T/#u +> +> v1 link: +> https://lore.kernel.org/all/20230216073403.451455-1-bhe@redhat.com/T/#u +> +> This resend v1 with Niklas and Arnd's ack tags added. +> + +**[v1: mbind.2: Clarify MPOL_MF_MOVE with MPOL_INTERLEAVE policy](http://lore.kernel.org/linux-mm/20230505194858.23539-1-mike.kravetz@oracle.com/)** + +> There was user confusion about specifying MPOL_MF_MOVE* with +> MPOL_INTERLEAVE policy [1]. Add clarification. +> +> [1] https://lore.kernel.org/linux-mm/20230501185836.GA85110@monkey/ +> + +**[v1: mm/hugetlb: revert use of page_cache_next_miss()](http://lore.kernel.org/linux-mm/20230505185301.534259-1-sidhartha.kumar@oracle.com/)** + +> As reported by Ackerley[1], the use of page_cache_next_miss() in +> hugetlbfs_fallocate() introduces a bug where a second fallocate() call to +> same offset fails with -EEXIST. Revert this change and go back to the +> previous method of using get from the page cache and then dropping the +> reference on success. +> +> hugetlbfs_pagecache_present() was also refactored to use +> page_cache_next_miss(), revert the usage there as well. +> +> User visible impacts include hugetlb fallocate incorrectly returning +> EEXIST if pages are already present in the file. In addition, hugetlb +> pages will not be included in core dumps if they need to be brought in via +> GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages +> already present in the cache. It may try to allocate a new page and +> potentially return ENOMEM as opposed to EEXIST. +> + +**[v2: shmemfs stable directory cookies](http://lore.kernel.org/linux-mm/168331111400.20728.2327812215536431362.stgit@oracle-102.nfsv4bat.org/)** + +> The following series is for continued discussion of the need for +> and implementation of stable directory cookies for shmemfs/tmpfs. +> +> Based on one of Andrew's review comments, I've split this one patch +> into a series to (hopefully) reduce its complexity and make it +> easier to analyze the changes. +> +> Although the patch(es) have been passing functional tests for +> several weeks, there have been some reports of performance +> regressions that we still need to get to the bottom of. +> +> We might consider a simpler lseek/readdir implementation, as using +> an xarray is effective but a bit of overkill. I'd like to avoid a +> linked list implementation as that is known to have significant +> performance impact past a dozen or so list entries. +> + +**[v2: maple_tree: Make maple state reusable after mas_empty_area()](http://lore.kernel.org/linux-mm/20230505145829.74574-1-zhangpeng.00@bytedance.com/)** + +> Make mas->min and mas->max point to a node range instead of a leaf entry +> range. This allows mas to still be usable after mas_empty_area() returns. +> Users would get unexpected results from other operations on the maple +> state after calling the affected function. +> + +**[v1: sysctl: add config to make randomize_va_space RO](http://lore.kernel.org/linux-mm/20230504213002.56803-1-michael.mccracken@gmail.com/)** + +> Add config RO_RANDMAP_SYSCTL to set the mode of the randomize_va_space +> sysctl to 0444 to disallow all runtime changes. This will prevent +> accidental changing of this value by a root service. +> +> The config is disabled by default to avoid surprises. +> + +**[v9: mm/gup: disallow GUP writing to file-backed mappings by default](http://lore.kernel.org/linux-mm/cover.1683235180.git.lstoakes@gmail.com/)** + +> Writing to file-backed mappings which require folio dirty tracking using +> GUP is a fundamentally broken operation, as kernel write access to GUP +> mappings do not adhere to the semantics expected by a file system. +> +> A GUP caller uses the direct mapping to access the folio, which does not +> cause write notify to trigger, nor does it enforce that the caller marks +> the folio dirty. +> +> The problem arises when, after an initial write to the folio, writeback +> results in the folio being cleaned and then the caller, via the GUP +> interface, writes to the folio again. +> +> As a result of the use of this secondary, direct, mapping to the folio no +> write notify will occur, and if the caller does mark the folio dirty, this +> will be done so unexpectedly. +> +> For example, consider the following scenario:- +> +> 1. A folio is written to via GUP which write-faults the memory, notifying +> the file system and dirtying the folio. +> 2. Later, writeback is triggered, resulting in the folio being cleaned and +> the PTE being marked read-only. +> 3. The GUP caller writes to the folio, as it is mapped read/write via the +> direct mapping. +> 4. The GUP caller, now done with the page, unpins it and sets it dirty +> (though it does not have to). +> +> This change updates both the PUP FOLL_LONGTERM slow and fast APIs. As +> pin_user_pages_fast_only() does not exist, we can rely on a slightly +> imperfect whitelisting in the PUP-fast case and fall back to the slow case +> should this fail. +> + +**[v1: MDWE without inheritance](http://lore.kernel.org/linux-mm/20230504170942.822147-1-revest@chromium.org/)** + +> Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags +> current with a flag that prevents pages that were previously not executable from +> becoming executable. +> +> This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK) +> +> At Google, we've been using a somewhat similar downstream patch for a few years +> now. To make the adoption of this feature easier, we've had it support a mode in +> which the W^X flag does not propagate to children. For example, this is handy if +> a C process which wants W^X protection suspects it could start children +> processes that would use a JIT. +> +> I'd like to align our features with the upstream prctl. This series proposes a +> new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It +> sets a different flag in current that is not in MMF_INIT_MASK and which does not +> propagate. +> +> As part of looking into MDWE, I also fixed a couple of things in the MDWE test. +> + +**[v1: mm: always respect QUEUE_FLAG_STABLE_WRITES on the block device](http://lore.kernel.org/linux-mm/20230504105624.9789-1-idryomov@gmail.com/)** + +> Commit 1cb039f3dc16 ("bdi: replace BDI_CAP_STABLE_WRITES with a queue +> and a sb flag") introduced a regression for the raw block device use +> case. Capturing QUEUE_FLAG_STABLE_WRITES flag in set_bdev_super() has +> the effect of respecting it only when there is a filesystem mounted on +> top of the block device. If a filesystem is not mounted, block devices +> that do integrity checking return sporadic checksum errors. +> +> Additionally, this commit made the corresponding sysfs knob writeable +> for debugging purposes. However, because QUEUE_FLAG_STABLE_WRITES flag +> is captured when the filesystem is mounted and isn't consulted after +> that anywhere outside of swap code, changing it doesn't take immediate +> effect even though dumping the knob shows the new value. With no way +> to dump SB_I_STABLE_WRITES flag, this is needlessly confusing. +> +> Resurrect the original stable writes behavior by changing +> folio_wait_stable() to account for the case of a raw block device and +> also: +> +> - for the case of a filesystem, test QUEUE_FLAG_STABLE_WRITES flag +> each time instead of capturing it in the superblock so that changes +> are reflected immediately (thus aligning with the case of a raw block +> device) +> - retain SB_I_STABLE_WRITES flag for filesystems that need stable +> writes independent of the underlying block device (currently just +> NFS) +> + +**[v1: [For stable 5.4] mm: migrate: buffer_migrate_page_norefs() fallback migrate not uptodate pages](http://lore.kernel.org/linux-mm/20230503163426.5538-2-findns94@gmail.com/)** + +> Recently we notice that ext4 filesystem occasionally fail to read +> metadata from disk and report error message, but the disk and block +> layer looks fine. After analyse, we lockon commit 88dbcbb3a484 +> ("blkdev: avoid migration stalls for blkdev pages"). It provide a +> migration method for the bdev, we could move page that has buffers +> without extra users now, but it will lock the buffers on the page, which +> breaks a lot of current filesystem's fragile metadata read operations, +> like ll_rw_block() for common usage and ext4_read_bh_lock() for ext4, +> these helpers just trylock the buffer and skip submit IO if it lock +> failed, many callers just wait_on_buffer() and conclude IO error if the +> buffer is not uptodate after buffer unlocked. +> +> This issue could be easily reproduced by add some delay just after +> buffer_migrate_lock_buffers() in __buffer_migrate_page() and do +> fsstress on ext4 filesystem. +> +> EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #73193: +> comm fsstress: reading directory lblock 0 +> EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #75334: +> comm fsstress: reading directory lblock 0 +> +> Something like ll_rw_block() should be used carefully and seems could +> only be safely used for the readahead case. So the best way is to fix +> the read operations in filesystem in the long run, but now let us avoid +> this issue first. This patch avoid this issue by fallback to migrate +> pages that are not uptodate like fallback_migrate_page(), those pages +> that has buffers may probably do read operation soon. +> + +**[v3: fs: implement multigrain timestamps](http://lore.kernel.org/linux-mm/20230503142037.153531-1-jlayton@kernel.org/)** + +> +> This is a follow-up of the patches I posted last week [1]. The main +> change in this set is that it no longer uses the lowest-order bit in the +> tv_nsec field, and instead uses one of the higher-order bits (#31, +> specifically) since they are otherwise unused. This change makes things +> much simpler, and we no longer need to twiddle s_time_gran for it. +> + +**[v13: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230503013608.2431726-1-nphamcs@gmail.com/)** + +> +> This series of patches introduces a new system call, cachestat, that +> summarizes the page cache statistics (number of cached pages, dirty +> pages, pages marked for writeback, evicted pages etc.) of a file, in a +> specified range of bytes. It also include a selftest suite that tests some +> typical usage. Currently, the syscall is only wired in for x86 +> architecture. +> + +**[v1: fs: hugetlbfs: Set vma policy only when needed for allocating folio](http://lore.kernel.org/linux-mm/20230502235622.3652586-1-ackerleytng@google.com/)** + +> Calling hugetlb_set_vma_policy() later avoids setting the vma policy +> and then dropping it on a page cache hit. +> + +**[v5: bio: check return values of bio_add_page](http://lore.kernel.org/linux-mm/20230502101934.24901-1-johannes.thumshirn@wdc.com/)** + +> +> This series converts the callers of bio_add_page() which can easily use +> __bio_add_page() to using it and checks the return of bio_add_page() for +> callers that don't work on a freshly created bio. +> + +#### 文件系统 + +**[v1: bpf-next: Introduce bpf iterators for file-system](http://lore.kernel.org/linux-fsdevel/20230507040107.3755166-1-houtao@huaweicloud.com/)** + +> +> The patchset attempts to provide more observability for the file-system +> as proposed in [0]. Compared to drgn [1], the bpf iterator for file-system +> has fewer dependencies (e.g., no need for vmlinux) and more accurate +> results. +> + +**[GIT PULL: Pipe FMODE_NOWAIT support](http://lore.kernel.org/linux-fsdevel/26aba1b5-8393-a20a-3ce9-f82425673f4d@kernel.dk/)** + +> Here's the revised edition of the FMODE_NOWAIT support for pipes, in +> which we just flag it as such supporting FMODE_NOWAIT unconditionally, +> but clear it if we ever end up using splice/vmsplice on the pipe. The +> pipe read/write side is perfectly fine for nonblocking IO, however +> splice and vmsplice can potentially wait for IO with the pipe lock held. +> + +**[v6: Introduce block provisioning primitives](http://lore.kernel.org/linux-fsdevel/20230506062909.74601-1-sarthakkukreti@chromium.org/)** + +> This patch series covers iteration 6 of adding support for block +> provisioning requests. +> + +**[v1: fuse: add a new flag to allow shared mmap in FOPEN_DIRECT_IO mode](http://lore.kernel.org/linux-fsdevel/20230505081652.43008-1-hao.xu@linux.dev/)** + +> FOPEN_DIRECT_IO is usually set by fuse daemon to indicate need of strong +> coherency, e.g. network filesystems. Thus shared mmap is disabled since +> it leverages page cache and may write to it, which may cause +> inconsistence. But FOPEN_DIRECT_IO can be used not for coherency but to +> reduce memory footprint as well, e.g. reduce guest memory usage with +> virtiofs. Therefore, add a new flag FOPEN_DIRECT_IO_SHARED_MMAP to allow +> shared mmap for these cases. +> + +**[v1: -next: lsm: Change inode_setattr() to take struct](http://lore.kernel.org/linux-fsdevel/20230505081200.254449-1-xiujianfeng@huawei.com/)** + +> I am working on adding xattr/attr support for landlock [1], so we can +> control fs accesses such as chmod, chown, uptimes, setxattr, etc.. inside +> landlock sandbox. +> + +**[v3: dax: enable dax fault handler to report VM_FAULT_HWPOISON](http://lore.kernel.org/linux-fsdevel/20230505011747.956945-1-jane.chu@oracle.com/)** + +> When multiple processes mmap() a dax file, then at some point, +> a process issues a 'load' and consumes a hwpoison, the process +> receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb +> set for the poison scope. Soon after, any other process issues +> a 'load' to the poisoned page (that is unmapped from the kernel +> side by memory_failure), it receives a SIGBUS with +> si_code = BUS_ADRERR and without valid si_lsb. +> +> This is confusing to user, and is different from page fault due +> to poison in RAM memory, also some helpful information is lost. +> +> Channel dax backend driver's poison detection to the filesystem +> such that instead of reporting VM_FAULT_SIGBUS, it could report +> VM_FAULT_HWPOISON. +> + +**[v1: Supporting same fsid filesystems mounting on btrfs](http://lore.kernel.org/linux-fsdevel/20230504170708.787361-1-gpiccoli@igalia.com/)** + +> Currently, we cannot reliably mount same fsid filesystems even one at +> a time in btrfs, but if users want to mount them at the same time, it's +> pretty much impossible. Other filesystems like ext4 are capable of that. +> +> The goal is to allow systems with A/B partitioning scheme (like the +> Steam Deck console or various mobile devices) to be able to hold +> the same filesystem image in both partitions; it also allows to have +> block device level check for filesystem integrity - this is used in the +> Steam Deck image installation, to check if the current read-only image +> is pristine. A bit more details are provided in the following ML thread: +> +> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/ +> +> The mechanism used to achieve it is based in the metadata_uuid feature, +> leveraging such code infrastructure for that. The patches are based on +> kernel 6.3 and were tested both in a virtual machine as well as in the +> Steam Deck. Comments, suggestions and overall feedback is greatly +> appreciated - thanks in advance! +> + +**[GIT PULL: sysctl changes for v6.4-rc4 v2](http://lore.kernel.org/linux-fsdevel/ZFKzZeAs5Mdfv5ha@bombadil.infradead.org/)** + +> +> As mentioned on my first pull request for sysctl-next, for v6.4-rc1 +> we're very close to being able to deprecating register_sysctl_paths(). +> I was going to assess the situation after the first week of the merge +> window. +> +> That time is now and things are looking good. We only have one stragglers +> on the patch which had already an ACK for so I'm picking this up here now and +> the last patch is the one that uses an axe. Some careful eyeballing would +> be appreciated by others. If this doesn't get properly reviewed I can also +> just hold off on this in my tree for the next merge window. Either way is +> fine by me. +> +> I have boot tested the last patch and 0-day build completed successfully. +> + +**[v1: block atomic writes](http://lore.kernel.org/linux-fsdevel/20230503183821.1473305-1-john.g.garry@oracle.com/)** + +> This series introduces a new proposal to implementing atomic writes in the +> kernel. +> +> This series takes the approach of adding a new "atomic" flag to each of +> pwritev2() and iocb->ki_flags - RWF_ATOMIC and IOCB_ATOMIC, respectively. +> When set, these indicate that we want the write issued "atomically". I +> have seen a similar flag for pwritev2() touted on the lists previously. +> +> Only direct IO is supported and for block devices and xfs. +> +> The atomic writes feature requires dedicated HW support, like +> SCSI WRITE_ATOMIC_16 command. +> +> The goal here is to provide an interface that allow applications use +> application-specific block sizes larger than logical block size +> reported by the storage device or larger than filesystem block size as +> reported by stat(). +> +> With this new interface, application blocks will never be torn or +> fractured. For a power fail, for each individual application block, all or +> none of the data to be written. A racing atomic write and read will mean +> that the read sees all the old data or all the new data, but never a mix +> of old and new. +> + +**[v4: fs: allow to mount beneath top mount](http://lore.kernel.org/linux-fsdevel/20230202-fs-move-mount-replace-v4-0-98f3d80d7eaa@kernel.org/)** + +> More common use-cases will just be things like: +> +> mount -t btrfs /dev/sdA /mnt +> mount -t xfs /dev/sdB --beneath /mnt +> umount /mnt +> +> after which we'll have updated from a btrfs filesystem to a xfs +> filesystem without ever revealing the underlying mountpoint. + +**[v24: xfs: online repair for fs summary counters with exclusive fsfreeze](http://lore.kernel.org/linux-fsdevel/168308293319.734377.10454919162350827812.stgit@frogsfrogsfrogs/)** + +> A longstanding deficiency in the online fs summary counter scrubbing +> code is that it hasn't any means to quiesce the incore percpu counters +> while it's running. There is no way to coordinate with other threads +> are reserving or freeing free space simultaneously, which leads to false +> error reports. Right now, if the discrepancy is large, we just sort of +> shrug and bail out with an incomplete flag, but this is lame. +> +> For repair activity, we actually /do/ need to stabilize the counters to +> get an accurate reading and install it in the percpu counter. To +> improve the former and enable the latter, allow the fscounters online +> fsck code to perform an exclusive mini-freeze on the filesystem. The +> exclusivity prevents userspace from thawing while we're running, and the +> mini-freeze means that we don't wait for the log to quiesce, which will +> make both speedier. +> + +**[v1: sysctl: death to register_sysctl_paths()](http://lore.kernel.org/linux-fsdevel/20230503023329.752123-1-mcgrof@kernel.org/)** + +> +> As mentioned on my first pull request for sysctl-next, for v6.4-rc1 +> we're very close to being able to deprecating register_sysctl_paths(). +> I was going to assess the situation after the first week of the merge +> window. +> +> That time is now and things are looking good. We only have one stragglers +> on the patch which had already an ACK for so I'm picking this up here now and +> the last patch is the one that uses an axe. Some careful eyeballing would +> be appreciated by others. If this doesn't get properly reviewed I can also +> just hold off on this in my tree for the next merge window. Either way is +> fine by me. +> +> I have boot tested the last patch and 0-day build is ongoing. You can give +> it a day for a warm fuzzy build test result. +> + +**[v1: Rework locking when rendering mountinfo cgroup paths](http://lore.kernel.org/linux-fsdevel/20230502133847.14570-1-mkoutny@suse.com/)** + +> Idea for these modification came up when css_set_lock seemed unneeded in +> cgroup_show_path. +> +> It's a delicate change, so the deciding factor was when cgroup_show_path popped +> up also in some profiles of frequent mountinfo readers. +> +> The idea is to trade the exclusive css_set_lock for the shared +> namespace_sem when rendering cgroup paths. Details are described more in +> individual commits. +> + +**[v2: Prepare for supporting more filesystems with fanotify](http://lore.kernel.org/linux-fsdevel/20230502124817.3070545-1-amir73il@gmail.com/)** + +> +> Following v2 incorporates a few fixes and ACKs from review of v1 [1]. +> +> While fanotify relaxes the requirements for filesystems to support +> reporting fid to require only the ->encode_fh() operation, there are +> currently no new filesystems that meet the relaxed requirements. +> +> Patches to add ->encode_fh() to overlay with default configuation +> are available on my github branch [2]. I will re-post them after +> this patch set will be approved. +> +> Based on the discussion on the UAPI alternatives, I kept the +> AT_HANDLE_FID UAPI, which seems the simplest of them all. +> +> There is an LTP test [3] that tests reporting fid from overlayfs, +> which also demonstrates the use of AT_HANDLE_FID for requesting a +> non-decodeable file handle by userspace and there is a man page +> draft [4] for the documentation of the AT_HANDLE_FID flags. +> + +**[v1: FUSE: add another flag to support shared mmap in FOPEN_DIRECT_IO mode](http://lore.kernel.org/linux-fsdevel/5683716d-9b1d-83d6-9dd1-a7ad3d05cbb1@linux.dev/)** + +> From discussion with Bernd, I get that FOPEN_DIRECT_IO is designed for +> those user cases where users want strong coherency like network +> filesystems, where one server serves multiple remote clients. And thus +> shared mmap is disabled since local page cache existence breaks this +> kind of coherency. +> +> But here our use case is one virtiofs daemon serve one guest vm, We use +> FOPEN_DIRECT_IO to reduce memory footprint not for coherency. So we +> expect shared mmap works in this case. Here I suggest/am implementing +> adding another flag to indicate this kind of cases----use +> FOPEN_DIRECT_IO not for coherency----so that shared mmap works. +> + +**[v1: Memory allocation profiling](http://lore.kernel.org/linux-fsdevel/20230501165450.15352-1-surenb@google.com/)** + +> Memory allocation profiling infrastructure provides a low overhead +> mechanism to make all kernel allocations in the system visible. It can be +> used to monitor memory usage, track memory hotspots, detect memory leaks, +> identify memory regressions. +> +> To keep the overhead to the minimum, we record only allocation sizes for +> every allocation in the codebase. With that information, if users are +> interested in more detailed context for a specific allocation, they can +> enable in-depth context tracking, which includes capturing the pid, tgid, +> task name, allocation size, timestamp and call stack for every allocation +> at the specified code location. +> + +**[v2: permit write-sealed memfd read-only shared mappings](http://lore.kernel.org/linux-fsdevel/cover.1682890156.git.lstoakes@gmail.com/)** + +> The man page for fcntl() describing memfd file seals states the following +> about F_SEAL_WRITE:- +> +> Furthermore, trying to create new shared, writable memory-mappings via +> mmap(2) will also fail with EPERM. +> +> With emphasis on _writable_. In turns out in fact that currently the kernel +> simply disallows _all_ new shared memory mappings for a memfd with +> F_SEAL_WRITE applied, rendering this documentation inaccurate. +> + +#### 网络设备 + +**[v2: Make iscsid-kernel communications namespace-aware](http://lore.kernel.org/netdev/20230506232930.195451-1-cleech@redhat.com/)** + +> This set of patches modifies the kernel iSCSI initiator communications +> so that they are namespace-aware. The goal is to allow multiple iSCSI +> daemon (iscsid) to run at once as long as they are in separate +> namespaces, and so that iscsid can run in containers. +> +> Container runtime environments seem to want to containerize their own +> components, and there have been complaints about the need to run iscsid +> from the host network namespace. There are still priviledged +> capabilities needed for iscsid, but these changes address the namespace +> issue. +> +> I've tested with iscsi_tcp and iser over rxe with an unmodified iscsid +> running in a podman container. +> +> Note that with iscsi_tcp, the connected socket will keep the network +> namespace alive after container exit. The namespace will exit once the +> connection terminates, and I'd recommend running with a iSCSI +> noop_out_timeout set to error out the connection after the routing has +> been removed. +> + +**[v1: net-next: net: openvswitch: Use struct_size()](http://lore.kernel.org/netdev/e7746fbbd62371d286081d5266e88bbe8d3fe9f0.1683388991.git.christophe.jaillet@wanadoo.fr/)** + +> Use struct_size() instead of hand writing it. +> This is less verbose and more informative. +> + +**[v1: can: kvaser_usb_leaf: Implement CAN 2.0 raw DLC functionality.](http://lore.kernel.org/netdev/20230506105529.4023-1-carsten.schmidt-achim@t-online.de/)** + +**[v7: bpf-next: Introduce a new kfunc of bpf_task_under_cgroup](http://lore.kernel.org/netdev/20230506031545.35991-1-zhoufeng.zf@bytedance.com/)** + +> Trace sched related functions, such as enqueue_task_fair, it is necessary to +> specify a task instead of the current task which within a given cgroup. +> + +**[v1: virtio_net: set default mtu to 1500 when 'Device maximum MTU' bigger than 1500](http://lore.kernel.org/netdev/20230506021529.396812-1-chenh@yusur.tech/)** + +> When VIRTIO_NET_F_MTU(3) Device maximum MTU reporting is supported. +> If offered by the device, device advises driver about the value of its +> maximum MTU. If negotiated, the driver uses mtu as the maximum +> MTU value. But there the driver also uses it as default mtu, +> some devices may have a maximum MTU greater than 1500, this may +> cause some large packages to be discarded, so I changed the MTU to a more +> general 1500 when 'Device maximum MTU' bigger than 1500. +> + +**[v1: wifi: mwifiex: Use default @max_active for workqueues](http://lore.kernel.org/netdev/ZFWI3PpJXeXXnHzi@slm.duckdns.org/)** + +> These workqueues only host a single work item and thus doen't need explicit +> concurrency limit. Let's use the default @max_active. This doesn't cost +> anything and clearly expresses that @max_active doesn't matter. +> + +**[v1: wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq](http://lore.kernel.org/netdev/ZFWIpN7HN431MVSI@slm.duckdns.org/)** + +> trans_pcie->rba.alloc_wq only hosts a single work item and thus doesn't need +> explicit concurrency limit. Let's use the default @max_active. This doesn't +> cost anything and clearly expresses that @max_active doesn't matter. +> + +**[GIT PULL: Networking for v6.4-rc1](http://lore.kernel.org/netdev/20230505214917.1453870-1-kuba@kernel.org/)** + +> +> Current release - regressions: +> +> - sched: act_pedit: free pedit keys on bail from offset check +> +> Current release - new code bugs: +> +> - pds_core: +> - Kconfig fixes (DEBUGFS and AUXILIARY_BUS) +> - fix mutex double unlock in error path +> +> Previous releases - regressions: +> +> - sched: cls_api: remove block_cb from driver_list before freeing +> +> - nf_tables: fix ct untracked match breakage +> +> - eth: mtk_eth_soc: drop generic vlan rx offload +> +> - sched: flower: fix error handler on replace +> +> Previous releases - always broken: +> +> - tcp: fix skb_copy_ubufs() vs BIG TCP +> +> - ipv6: fix skb hash for some RST packets +> +> - af_packet: don't send zero-byte data in packet_sendmsg_spkt() +> +> - rxrpc: timeout handling fixes after moving client call connection +> to the I/O thread +> +> - ixgbe: fix panic during XDP_TX with > 64 CPUs +> +> - igc: RMW the SRRCTL register to prevent losing timestamp config +> +> - dsa: mt7530: fix corrupt frames using TRGMII on 40 MHz XTAL MT7621 +> +> - r8152: +> - fix flow control issue of RTL8156A +> - fix the poor throughput for 2.5G devices +> - move setting r8153b_rx_agg_chg_indicate() to fix coalescing +> - enable autosuspend +> +> - ncsi: clear Tx enable mode when handling a Config required AEN +> +> - octeontx2-pf: macsec: fixes for CN10KB ASIC rev +> +> Misc: +> +> - 9p: remove INET dependency +> + +**[v2: net-next: netfilter: nft_set_pipapo: Use struct_size()](http://lore.kernel.org/netdev/687973f7f0f77a456ee2ebabd75cec61cba2eb98.1683321933.git.christophe.jaillet@wanadoo.fr/)** + +> Use struct_size() instead of hand writing it. +> This is less verbose and more informative. +> + +**[v1: RDMA/mana_ib: Use v2 version of cfg_rx_steer_req to enable RX coalescing](http://lore.kernel.org/netdev/1683312708-24872-1-git-send-email-longli@linuxonhyperv.com/)** + +> With RX coalescing, one CQE entry can be used to indicate multiple packets +> on the receive queue. This saves processing time and PCI bandwidth over +> the CQ. +> + +**[v1: siw on tunnel devices](http://lore.kernel.org/netdev/168330051600.5953.11366152375575299483.stgit@oracle-102.nfsv4bat.org/)** + +> Chalk this one up to yet another crazy idea. +> +> At NFS testing events, we'd like to test NFS/RDMA over the event's +> private network. We can do that with iWARP using siw from guests. +> +> If the guest itself is on the VPN, that means siw's slave device +> is a tun device. Such devices have no MAC address. That breaks the +> RDMA core's ability to find the correct egress device for siw when +> given a source IP address. +> +> We've worked around this in the past with various software hacks, +> but we'd rather see full support for this capability in stock +> kernels. +> +> A direct and perhaps naive way to do that is to give loopback and +> tun devices their own artificial MAC addresses for this purpose. +> + +**[v1: iproute2-next: mptcp: add support for implicit flag](http://lore.kernel.org/netdev/1eaea070b52f2db1f310506ac49f4b5d51b5704c.1683294873.git.aclaudi@redhat.com/)** + +> Kernel supports implicit flag since commit d045b9eb95a9 ("mptcp: +> introduce implicit endpoints"), included in v5.18. +> +> Let's add support for displaying it to iproute2. +> +> Before this change: +> $ ip mptcp endpoint show +> 10.0.2.2 id 1 rawflags 10 +> +> After this change: +> $ ip mptcp endpoint show +> 10.0.2.2 id 1 implicit +> + +**[v1: ipsec: af_key: Reject optional tunnel/BEET mode templates in outbound policies](http://lore.kernel.org/netdev/46fcb205-989e-4ea7-463d-e72b85db9e71@strongswan.org/)** + +> xfrm_state_find() uses `encap_family` of the current template with +> the passed local and remote addresses to find a matching state. +> If an optional tunnel or BEET mode template is skipped in a mixed-family +> scenario, there could be a mismatch causing an out-of-bounds read as +> the addresses were not replaced to match the family of the next template. +> +> While there are theoretical use cases for optional templates in outbound +> policies, the only practical one is to skip IPComp states in inbound +> policies if uncompressed packets are received that are handled by an +> implicitly created IPIP state instead. +> + +**[v1: ipsec: xfrm: Reject optional tunnel/BEET mode templates in outbound policies](http://lore.kernel.org/netdev/5d5bf4d9-5b63-ae0d-2f65-770e911ea7d6@strongswan.org/)** + +> xfrm_state_find() uses `encap_family` of the current template with +> the passed local and remote addresses to find a matching state. +> If an optional tunnel or BEET mode template is skipped in a mixed-family +> scenario, there could be a mismatch causing an out-of-bounds read as +> the addresses were not replaced to match the family of the next template. +> +> While there are theoretical use cases for optional templates in outbound +> policies, the only practical one is to skip IPComp states in inbound +> policies if uncompressed packets are received that are handled by an +> implicitly created IPIP state instead. +> + +**[v1: net: socket: Use fdget() and fdput()](http://lore.kernel.org/netdev/202305051706416319733@zte.com.cn/)** + +> By using the fdget function, the socket object, can be quickly obtained +> from the process's file descriptor table without the need to obtain the +> file descriptor first before passing it as a parameter to the fget +> function. +> + +**[v2: Add motorcomm phy pad-driver-strength-cfg support](http://lore.kernel.org/netdev/20230505090558.2355-1-samin.guo@starfivetech.com/)** + +> The motorcomm phy (YT8531) supports the ability to adjust the drive +> strength of the rx_clk/rx_data, and the default strength may not be +> suitable for all boards. So add configurable options to better match +> the boards.(e.g. StarFive VisionFive 2) +> +> The first patch adds a description of dt-bingding, and the second patch adds +> YT8531's parsing and settings for pad-driver-strength-cfg. +> + +**[v6: net-next: TXGBE PHYLINK support](http://lore.kernel.org/netdev/20230505074228.84679-1-jiawenwu@trustnetic.com/)** + +> Implement I2C, SFP, GPIO and PHYLINK to setup TXGBE link. +> +> Because our I2C and PCS are based on Synopsys Designware IP-core, extend +> the i2c-designware and pcs-xpcs driver to realize our functions. +> + +**[v1: vhost_net: Use fdget() and fdput()](http://lore.kernel.org/netdev/202305051424047152799@zte.com.cn/)** + +> convert the fget()/fput() uses to fdget()/fdput(). +> + +**[v6: can: usb: f81604: add Fintek F81604 support](http://lore.kernel.org/netdev/20230505022317.22417-1-peter_hong@fintek.com.tw/)** + +> This patch adds support for Fintek USB to 2CAN controller. +> + +#### 安全增强 + +**[v1: Hypervisor-Enforced Kernel Integrity](http://lore.kernel.org/linux-hardening/20230505152046.6575-1-mic@digikod.net/)** + +> This patch series is a proof-of-concept that implements new KVM features +> (extended page tracking, MBEC support, CR pinning) and defines a new API to +> protect guest VMs. No VMM (e.g., Qemu) modification is required. +> +> The main idea being that kernel self-protection mechanisms should be delegated +> to a more privileged part of the system, hence the hypervisor. It is still the +> role of the guest kernel to request such restrictions according to its +> configuration. The high-level security guarantees provided by the hypervisor +> are semantically the same as a subset of those the kernel already enforces on +> itself (CR pinning hardening and memory page table protections), but with much +> higher guarantees. +> +> We'd like the mainline kernel to support such hardening features leveraging +> virtualization. We're looking for reviews and comments that can help mainline +> these two parts: the KVM implementation and the guest kernel API layer designed +> to support different hypervisors. The struct heki_hypervisor enables to plug in +> + +**[v1: Compiler Attributes: Add __counted_by macro](http://lore.kernel.org/linux-hardening/20230504181636.never.222-kees@kernel.org/)** + +> In an effort to annotate all flexible array members with their run-time +> size information, the "element_count" attribute is being introduced by +> Clang[1] and GCC[2] in future releases. This annotation will provide +> the CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE features the ability +> to perform run-time bounds checking on otherwise unknown-size flexible +> arrays. +> +> Even though the attribute is under development, we can start the +> annotation process in the kernel. This requires defining a macro for +> it, even if we have to change the name of the actual attribute later. +> Since it is likely that this attribute may change its name to "counted_by" +> in the future (to better align with a future total bytes "sized_by" +> attribute), name the wrapper macro "__counted_by", which also reads more +> clearly (and concisely) in structure definitions. +> +> [1] https://reviews.llvm.org/D148381 +> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896 +> + +#### 异步 IO + +**[v1: io_uring: set plug tags for same file](http://lore.kernel.org/io-uring/20230504162427.1099469-1-kbusch@meta.com/)** + +> io_uring tries to optimize allocating tags by hinting to the plug how +> many it expects to need for a batch instead of allocating each tag +> individually. But io_uring submission queueus may have a mix of many +> devices for io, so the number of io's counted may be overestimated. This +> can lead to allocating too many tags, which adds overhead to finding +> that many contiguous tags, freeing up the ones we didn't use, and may +> starve out other users that can actually use them. +> +> When starting a new batch of uring commands, count only commands that +> match the file descriptor of the first seen for this optimization. +> + +**[v4: io_uring: Pass the whole sqe to commands](http://lore.kernel.org/io-uring/20230504121856.904491-1-leitao@debian.org/)** + +> These three patches prepare for the sock support in the io_uring cmd, as +> described in the following RFC: +> +> Since the support linked above depends on other refactors, such as the sock +> ioctl() sock refactor, I would like to start integrating patches that have +> consensus and can bring value right now. This will also reduce the +> patchset size later. +> +> Regarding to these three patches, they are simple changes that turn +> io_uring cmd subsystem more flexible (by passing the whole SQE to the +> command), and cleaning up an unnecessary compile check. +> +> These patches were tested by creating a file system and mounting an NVME disk +> using ubdsrv/ublkb0. +> + +**[v12: io_uring: add napi busy polling support](http://lore.kernel.org/io-uring/20230502165332.2075091-1-shr@devkernel.io/)** + +> This adds the napi busy polling support in io_uring.c. It adds a new +> napi_list to the io_ring_ctx structure. This list contains the list of +> napi_id's that are currently enabled for busy polling. This list is +> used to determine which napi id's enabled busy polling. For faster +> access it also adds a hash table. +> +> When a new napi id is added, the hash table is used to locate if +> the napi id has already been added. When processing the busy poll +> loop the list is used to process the individual elements. +> +> io-uring allows specifying two parameters: +> - busy poll timeout and +> - prefer busy poll to call of io_napi_busy_loop() +> This sets the above parameters for the ring. The settings are passed +> with a new structure io_uring_napi. +> +> There is also a corresponding liburing patch series, which enables this +> feature. The name of the series is "liburing: add add api for napi busy +> poll timeout". It also contains two programs to test the this. +> +> Testing has shown that the round-trip times are reduced to 38us from +> 55us by enabling napi busy polling with a busy poll timeout of 100us. +> More detailled results are part of the commit message of the first +> patch. +> + +**[v1: io_uring: undeprecate epoll_ctl support](http://lore.kernel.org/io-uring/20230501185240.352642-1-info@bnoordhuis.nl/)** + +> Libuv recently started using it so there is at least one consumer now. +> + +**[v1: Rethinking splice](http://lore.kernel.org/io-uring/cover.1682701588.git.asml.silence@gmail.com/)** + +> IORING_OP_SPLICE has problems, many of them are fundamental and rooted +> in the uapi design, see the patch 8 description. This patchset introduces +> a different approach, which came from discussions about splices +> and fused commands and absorbed ideas from both of them. We remove +> reliance onto pipes and registering "spliced" buffers with data as an +> io_uring's registered buffer. Then the user can use it as a usual +> registered buffer, e.g. pass it to IORING_OP_WRITE_FIXED. +> +> Once a buffer is released, it'll be returned back to the file it +> originated from via a callback. It's carried on on the level of the +> enitre buffer rather than on per-page basis as with splice, which, +> as noted by Ming, will allow more optimisations. +> +> The communication with the target file is done by a new fops callback, +> however the end mean of getting a buffer might change. It also peels +> layers of code compared to splice requests, which helps it to be more +> flexible and support more cases. For instance, Ming has a case where +> it's beneficial for the target file to provide a buffer to be filled +> with read/recv/etc. requests and then returned back to the file. +> + +**[v1: io_uring attached nvme queue](http://lore.kernel.org/io-uring/20230429093925.133327-1-joshi.k@samsung.com/)** + +> This series shows one way to do what the title says. +> +> This puts up a more direct/lean path that enables +> - submission from io_uring SQE to NVMe SQE +> - completion from NVMe CQE to io_uring CQE +> Essentially cutting the hoops (involving request/bio) for nvme io path. +> +> Also, io_uring ring is not to be shared among application threads. +> Application is responsible for building the sharing (if it feels the +> need). This means ring-associated exclusive queue can do away with some +> synchronization costs that occur for shared queue. +> +> Primary objective is to amp up of efficiency of kernel io path further +> (towards PCIe gen N, N+1 hardware). +> And we are seeing some asks too [1]. +> + +#### Rust For Linux + +**[v2: rust: str: add conversion from `CStr` to `CString`](http://lore.kernel.org/rust-for-linux/20230503141016.683634-1-aliceryhl@google.com/)** + +> These methods can be used to copy the data in a temporary c string into +> a separate allocation, so that it can be accessed later even if the +> original is deallocated. +> +> The API in this change mirrors the standard library API for the `&str` +> and `String` types. The `ToOwned` trait is not implemented because it +> assumes that allocations are infallible. +> + +**[v1: Rust null block driver](http://lore.kernel.org/rust-for-linux/20230503090708.2524310-1-nmi@metaspace.dk/)** + +> +> A null block driver is a good opportunity to evaluate Rust bindings for the +> block layer. It is a small and simple driver and thus should be simple to reason +> about. Further, the null block driver is not usually deployed in production +> environments. Thus, it should be fairly straight forward to review, and any +> potential issues are not going to bring down any production workloads. +> + +**[v1: rust: error: add ERESTARTSYS error code](http://lore.kernel.org/rust-for-linux/20230503083941.499090-1-aliceryhl@google.com/)** + +> This error code was probably excluded here originally because it never +> actually reaches user programs when a syscall returns it. However, from +> the perspective of a kernel driver, it is still a perfectly valid error +> type, that the driver might need to return. E.g., this can be necessary +> when a signal occurs during sleep. +> + +**[v1: rust: error: allow specifying error type on `Result`](http://lore.kernel.org/rust-for-linux/20230502124015.356001-1-aliceryhl@google.com/)** + +> Currently, if the `kernel::error::Result` type is in scope (which is +> often is, since it's in the kernel's prelude), you cannot write +> `Result` when you want to use a different error +> type than `kernel::error::Error`. +> +> To solve this we change the error type from being hard-coded to just +> being a default generic parameter. This still lets you write `Result` +> when you just want to use the `Error` error type, but also lets you +> write `Result` when necessary. +> + +#### BPF + +**[v2: bpf-next: bpftool: Support bpffs mountpoint as pin path for prog loadall](http://lore.kernel.org/bpf/1683342439-3677-1-git-send-email-yangpc@wangsu.com/)** + +> Currently, when using prog loadall, if the pin path is a bpffs +> mountpoint, bpffs will be repeatedly mounted to the parent directory +> of the bpffs mountpoint path. +> +> For example, +> $ bpftool prog loadall test.o /sys/fs/bpf +> currently bpffs will be repeatedly mounted to /sys/fs. +> + +**[v3: bpf-next: Dynptr Verifier Adjustments](http://lore.kernel.org/bpf/20230506013134.2492210-1-drosen@google.com/)** + +> These patches relax a few verifier requirements around dynptrs. +> Patches 1-3 are unchanged from v2, apart from rebasing +> Patch 4 is the same as in v1, see +> https://lore.kernel.org/bpf/CA+PiJmST4WUH061KaxJ4kRL=fqy3X6+Wgb2E2rrLT5OYjUzxfQ@mail.gmail.com/ +> Patch 5 adds a test for the change in Patch 4 +> + +**[v1: bpf: netdev: init the offload table earlier](http://lore.kernel.org/bpf/20230505215836.491485-1-kuba@kernel.org/)** + +> Some netdevices may get unregistered before late_initcall(), +> we have to move the hashtable init earlier. +> + +**[v1: bpf-next: RFC: bpf: query effective progs without cgroup_mutex](http://lore.kernel.org/bpf/20230505184550.1386802-1-sdf@google.com/)** + +> We're observing some stalls on the heavily loaded machines +> in the cgroup_bpf_prog_query path. This is likely due to +> being blocked on cgroup_mutex. +> +> IIUC, the cgroup_mutex is there mostly to protect the non-effective +> fields (cgrp->bpf.progs) which might be changed by the update path. +> For the BPF_F_QUERY_EFFECTIVE case, all we need is to rcu_dereference +> a bunch of pointers (and keep them around for consistency), so +> let's do it. +> +> Sending out as an RFC because it looks a bit ugly. It would also +> be nice to handle non-effective case locklessly as well, but it +> might require a larger rework. +> + +**[v3: bpf-next: Add precision propagation for subprogs and callbacks](http://lore.kernel.org/bpf/20230505043317.3629845-1-andrii@kernel.org/)** + +> +> This patch set teaches BPF verifier to support SCALAR precision +> backpropagation across multiple frames (for subprogram calls and callback +> simulations) and addresses most practical situations (SCALAR stack +> loads/stores using registers other than r10 being the last remaining +> limitation, though thankfully rarely used in practice). +> + +**[v4: bpf-next: bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen](http://lore.kernel.org/bpf/20230504184349.3632259-1-sdf@google.com/)** + +> optval larger than PAGE_SIZE leads to EFAULT if the BPF program +> isn't careful enough. This is often overlooked and might break +> completely unrelated socket options. Instead of EFAULT, +> let's ignore BPF program buffer changes. See the first patch for +> more info. +> +> In addition, clearly document this corner case and reset optlen +> in our selftests (in case somebody copy-pastes from them). +> + +**[v3: net: bonding: add xdp_features support](http://lore.kernel.org/bpf/5969591cfc2336e45de08e1d272bdcee30942fb7.1683191281.git.lorenzo@kernel.org/)** + +> Introduce xdp_features support for bonding driver according to the slave +> devices attached to the master one. xdp_features is required whenever we +> want to xdp_redirect traffic into a bond device and then into selected +> slaves attached to it. +> + +**[v1: bpf-next: bpf_refcount followups (part 1)](http://lore.kernel.org/bpf/20230504053338.1778690-1-davemarchevsky@fb.com/)** + +> This series is the first of two (or more) followups to address issues in the +> bpf_refcount shared ownership implementation discovered by Kumar. +> Specifically, this series addresses the "bpf_refcount_acquire on non-owning ref +> in another tree" scenario described in [0], and does _not_ address issues +> raised in [1]. Further followups will address the other issues. +> + +**[v7: bpf-next: bpf: Add socket destroy capability](http://lore.kernel.org/bpf/20230503225351.3700208-1-aditi.ghag@isovalent.com/)** + +> This patch adds the capability to destroy sockets in BPF. We plan to use +> the capability in Cilium to force client sockets to reconnect when their +> remote load-balancing backends are deleted. The other use case is +> on-the-fly policy enforcement where existing socket connections prevented +> by policies need to be terminated. +> + +**[[RFC/PATCH] libbpf: Store zero fd to fd_array for loader kfunc relocation](http://lore.kernel.org/bpf/20230503172441.2138444-1-jolsa@kernel.org/)** + +> When moving some of the test kfuncs to bpf_testmod I hit an issue +> when some of the object's kfuncs are in module and some in vmlinux. +> +> The problem is that both vmlinux and module kfuncs get btf_fd_idx +> index into fd_array, but we store to it the BTF fd value only for +> module's kfunc. +> +> Then after the program is loaded we check if fd_array[btf_fd_idx] != 0 +> and close the fd. +> +> When the object has kfuncs from both vmlinux and module, the fd from +> fd_array[btf_fd_idx] from previous load will be there for vmlinux kfunc +> and we close unrelated fd (of the program we just loaded in my case). +> +> Not sure if there's easier way to clear the fd_array between the +> loads, but the change below seems to fix the issue for me. +> + +**[v1: bpf-next: Centralize BPF permission checks](http://lore.kernel.org/bpf/20230502230619.2592406-1-andrii@kernel.org/)** + +> This patch set refactors BPF subsystem permission checks for BPF maps and +> programs, localizes them in one place, and ensures all parts of BPF ecosystem +> (BPF verifier and JITs, and their supporting infra) use recorded effective +> capabilities, stored in respective bpf_map or bpf_prog structs, for further +> decision making. +> +> This allows for more explicit and centralized handling of BPF-related +> capabilities and makes for simpler further BPF permission model evolution, to +> be proposed and discussed in follow up patch sets. +> + +**[v1: bpf-next: bpf: Emit struct bpf_tcp_sock type in vmlinux BTF](http://lore.kernel.org/bpf/20230502180543.1832140-1-yhs@fb.com/)** + +> In one of our internal testing, we found a case where +> - uapi struct bpf_tcp_sock is in vmlinux.h where vmlinux.h is not +> generated from the testing kernel +> - struct bpf_tcp_sock is not in vmlinux BTF +> +> The above combination caused bpf load failure as the following +> memory access +> struct bpf_tcp_sock *tcp_sock = ...; +> ... tcp_sock->snd_cwnd ... +> needs CORE relocation but the relocation cannot be resolved since +> the kernel BTF does not have corresponding type. +> +> Similar to other previous cases (nf_conn___init, tcp6_sock, mctcp_sock, etc.), +> add the type to vmlinux BTF with BTF_EMIT_TYPE macro. +> + +**[v9: tracing: Add fprobe/tracepoint events](http://lore.kernel.org/bpf/168299383880.3242086.7182498102007986127.stgit@mhiramat.roam.corp.google.com/)** + +> With this fprobe events, we can continue to trace function entry/exit +> even if the CONFIG_KPROBES_ON_FTRACE is not available. Since +> CONFIG_KPROBES_ON_FTRACE requires the CONFIG_DYNAMIC_FTRACE_WITH_REGS, +> it is not available if the architecture only supports +> CONFIG_DYNAMIC_FTRACE_WITH_ARGS (e.g. arm64). And that means kprobe +> events can not probe function entry/exit effectively on such architecture. +> But this problem can be solved if the dynamic events supports fprobe events +> because fprobe events doesn't use kprobe but ftrace via fprobe. +> + +**[v3: bpf-next: Handle immediate reuse in bpf memory allocator](http://lore.kernel.org/bpf/20230429101215.111262-1-houtao@huaweicloud.com/)** + +> As discussed in v1, currently the freed objects in bpf memory allocator +> may be reused immediately by the new allocation, it introduces +> use-after-bpf-ma-free problem for non-preallocated hash map and makes +> lookup procedure return incorrect result. The immediate reuse also makes +> introducing new use case more difficult (e.g. qp-trie). +> +> The patch series tries to solve these problems by introducing +> BPF_MA_{REUSE|FREE}_AFTER_RCU_GP in bpf memory allocator. For +> REUSE_AFTER_GP, the freed objects are reused only after one RCU grace +> period and may be freed by bpf memory allocator after another +> RCU-tasks-trace grace period. So for bpf programs which care about reuse +> problem, these programs can use bpf_rcu_read_{lock,unlock}() to access +> these objects safely and for those which doesn't care, there will be +> safely use-after-bpf-ma-free because these objects have not been freed +> by bpf memory allocator. FREE_AFTER_GP behavior differently. Instead of +> making the freed elements being reusable after one RCU GP, it directly +> freed these elements back to slab after one RCU GP, so sleepable bpf +> program must use bpf_rcu_read_{lock,unlock}() to access elements +> allocated from FREE_AFTER_GP bpf memory allocator. +> +> Personally I prefer FREE_AFTER_RCU_GP because its implementation is much +> simpler compared with REUSE_AFTER_RCU and its memory usage is also better +> than REUSE_AFTER_GP. But its shortcoming is also obvious, so I want to get +> some feedback before putting in more effort. As usual, comments and +> suggestions are always welcome. + +### 周边技术动态 + +#### Qemu + +**[[PTACH v2 0/6] Add RISC-V KVM AIA Support](http://lore.kernel.org/qemu-devel/20230505113946.23433-1-yongxuan.wang@sifive.com/)** + +> This series adds support for KVM AIA in RISC-V architecture. +> +> In order to test these patches, we require Linux with KVM AIA support which can +> be found in the qemu_kvm_aia branch at https://github.com/yong-xuan/linux.git +> This kernel branch is based on the riscv_aia_v1 branch available at +> https://github.com/avpatel/linux.git, and it also includes two additional +> patches that fix a KVM AIA bug and reply to the query of KVM_CAP_IRQCHIP. +> + +**[v1: riscv-to-apply queue](http://lore.kernel.org/qemu-devel/20230505010241.21812-1-alistair.francis@wdc.com/)** + +> +> First RISC-V PR for 8.1 +> +> * CPURISCVState related cleanup and simplification +> * Refactor Zicond and reuse in XVentanaCondOps +> * Fix invalid riscv,event-to-mhpmcounters entry +> * Support subsets of code size reduction extension +> * Fix itrigger when icount is used +> * Simplification for RVH related check and code style fix +> * Add signature dump function for spike to run ACT tests +> * Rework MISA writing +> * Fix mstatus.MPP related support +> * Use check for relationship between Zdinx/Zhinx{min} and Zfinx +> * Fix the H extension TVM trap +> * A large collection of mstatus sum changes and cleanups +> * Zero init APLIC internal state +> * Implement query-cpu-definitions +> * Restore the predicate() NULL check behavior +> * Fix Guest Physical Address Translation +> * Make sure an exception is raised if a pte is malformed +> * Add Ventana's Veyron V1 CPU +> + +**[v3: linux-user: Add /proc/cpuinfo handler for RISC-V](http://lore.kernel.org/qemu-devel/mvmednx301n.fsf@suse.de/)** + +**[v1: tcg/riscv: Support for Zba, Zbb, Zicond extensions](http://lore.kernel.org/qemu-devel/20230503085657.1814850-1-richard.henderson@linaro.org/)** + +> Based-on: 20230503070656.1746170-1-richard.henderson@linaro.org +> ("v4: tcg: Improve atomicity support") +> +> I've been vaguely following the __hw_probe syscall progress +> in the upstream kernel. The initial version only handled +> bog standard F+D and C extensions, which everything expects +> to be present anyway, which was disappointing. But at least +> the basis is there for proper extensions. +> +> In the meantime, probe via sigill. Tested with qemu-on-qemu. +> I understand the Ventana core has all of these, if you'd be +> so kind as to test. +> + +#### U-Boot + +**[v3: SPL NVMe support](http://lore.kernel.org/u-boot/20230504095327.2791676-1-mchitale@ventanamicro.com/)** + +> This patchset adds support to load images of the SPL's next booting stage from a NVMe device. +> + +**[v2: SPL NVme support](http://lore.kernel.org/u-boot/20230502161902.1339861-1-mchitale@ventanamicro.com/)** + +> This patchset adds support to load images of the SPL's next booting stage from a NVMe device. +> + +## 20230501:第 44 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v1: RISC-V: Export Zba, Zbb to usermode via hwprobe](http://lore.kernel.org/linux-riscv/20230428190609.3239486-1-evan@rivosinc.com/)** + +> This change detects the presence of Zba and Zbb extensions and exports +> them per-hart to userspace via the hwprobe mechanism. Glibc can then use +> these in setting up hwcaps-based library search paths. +> + +**[GIT PULL: RISC-V Patches for the 6.4 Merge Window, Part 1](http://lore.kernel.org/linux-riscv/mhng-57198db1-de34-4dca-be9f-989b1137503e@palmer-ri-x1c9/)** + +> RISC-V Patches for the 6.4 Merge Window, Part 1 +> +> * Support for runtime detection of the Svnapot extension. +> * Support for Zicboz when clearing pages. +> * We've moved to GENERIC_ENTRY. +> * Support for !MMU on rv32 systems. +> * The linear region is now mapped via huge pages. +> * Support for building relocatable kernels. +> * Support for the hwprobe interface. +> * Various fixes and cleanups throughout the tree. +> + +**[v2: riscv: allow case-insensitive ISA string parsing](http://lore.kernel.org/linux-riscv/tencent_8492B68063042E768C758871A3171FBD2006@qq.com/)** + +> The original motivation for my patch v1[5] is that some SoC generators +> will provide generated DT with illegal ISA string in dt-binding such as +> rocket-chip, which will even cause kernel panic in some cases as I +> mentioned in v1[5]. Now, the rocket-chip has been fixed in PR #3333[6]. +> However, when using some specific version of rocket-chip with +> illegal ISA string in DT, this patchset will also work for parsing +> uppercase letters correctly in DT, thus will have better compatibility. +> + +**[v1: Limit the number of counter returned from SBI.](http://lore.kernel.org/linux-riscv/20230428110256.711352-1-v.v.mitrofanov@yadro.com/)** + +> Perf relies on reliability of SBI. If sth goes wrong the code trusts it. +> It happened due to some debug process that I passed more than +> RISCV_MAX_COUNTERS to perf from SBI. At the first glance there were +> bloating of kalloced variable pmu_ctr_list and counter mask recycle write. +> May be there were some other effects. But anyway it is better to add +> extra check. +> + +**[v1: -next: clk: sifive: Use devm_platform_ioremap_resource()](http://lore.kernel.org/linux-riscv/20230428070005.41192-1-yang.lee@linux.alibaba.com/)** + +> Convert platform_get_resource(),devm_ioremap_resource() to a single +> call to devm_platform_ioremap_resource(), as this is exactly what this +> function does. +> + +**[v2: RISC-V: Align SBI probe implementation with spec](http://lore.kernel.org/linux-riscv/20230427163626.101042-1-ajones@ventanamicro.com/)** + +> sbi_probe_extension() is specified with "Returns 0 if the given SBI +> extension ID (EID) is not available, or 1 if it is available unless +> defined as any other non-zero value by the implementation." +> Additionally, sbiret.value is a long. Fix the implementation to +> ensure any nonzero long value is considered a success, rather +> than only positive int values. +> + +**[v1: dt-bindings: riscv: explicitly mention assumption of Zicsr & Zifencei support](http://lore.kernel.org/linux-riscv/20230427-fence-blurred-c92fb69d4137@wendy/)** + +> The dt-binding was defined before the extraction of csr access and +> fence.i into their own extensions, and thus the presence of the I +> base extension implies Zicsr and Zifencei. +> There's no harm in adding them obviously, but for backwards +> compatibility with DTs that existed prior to that extraction, software +> is unable to differentiate between "i" and "i_zicsr_zifencei" without +> any further information. +> + +**[v1: RISC-V: KVM: Ensure SBI extension is enabled](http://lore.kernel.org/linux-riscv/20230426171328.69663-1-ajones@ventanamicro.com/)** + +> Ensure guests can't attempt to invoke SBI extension functions when the +> SBI extension's probe function has stated that the extension is not +> available. +> + +**[v1: Handle multi-letter extensions starting with caps in riscv,isa](http://lore.kernel.org/linux-riscv/20230426-satin-avenging-086d4e79a8dd@wendy/)** + +> Following on from [1] in which Yangyu reported kernel panics for a +> riscv,isa string containing "rv64ima_Zifencei", as the parser got +> confused by the capital letter, here's a small change to the parser to +> handle invalid extensions starting with capital & the removal of some +> inaccurate wording from the dt-binding. +> + +**[v1: dmaengine: xilinx: enable on RISC-V platform](http://lore.kernel.org/linux-riscv/20230426074248.19336-1-zong.li@sifive.com/)** + +> Enable the xilinx dmaengine driver on RISC-V platform. We have verified +> the CDMA on RISC-V platform, enable this configuration to allow build on +> RISC-V. +> + +**[v1: Allow case-insensitive RISC-V ISA string](http://lore.kernel.org/linux-riscv/tencent_1647475C9618C390BEC601BE2CC1206D0C07@qq.com/)** + +> According to RISC-V ISA specification, the ISA naming strings are +> case insensitive. The kernel docs require the riscv,isa string must +> be all lowercase to simplify parsing currently. However, this +> limitation is not consistent with RISC-V ISA Spec. +> + +**[v2: riscv: mm: Ensure prot of VM_WRITE and VM_EXEC must be readable](http://lore.kernel.org/linux-riscv/20230425102828.1616812-1-woodrow.shen@sifive.com/)** + +> Commit 8aeb7b17f04e ("RISC-V: Make mmap() with PROT_WRITE imply PROT_READ") +> allows riscv to use mmap with PROT_WRITE only, and meanwhile mmap with w+x +> is also permitted. However, when userspace tries to access this page with +> PROT_WRITE|PROT_EXEC, which causes infinite loop at load page fault as +> well as it triggers soft lockup. According to riscv privileged spec, +> "Writable pages must also be marked readable". The fix to drop the +> `PAGE_COPY_READ_EXEC` and then `PAGE_COPY_EXEC` would be just used instead. +> This aligns the other arches (i.e arm64) for protection_map. +> + +**[v1: Expose the isa-string via the AT_BASE_PLATFORM aux vector](http://lore.kernel.org/linux-riscv/20230424194911.264850-1-heiko.stuebner@vrull.eu/)** + +> The hwprobing infrastructure was merged recently [0] and contains a +> mechanism to probe both extensions but also microarchitecural features +> on a per-core level of detail. +> + +**[v1: RESEND: dt-bindings: riscv: add sv57 mmu-type](http://lore.kernel.org/linux-riscv/20230424-rival-habitual-478567c516f0@spud/)** + +> Dumping the dtb from new versions of QEMU warns that sv57 is an +> undocumented mmu-type. The kernel has supported sv57 for about a year, +> so bring it into the fold. +> + +**[v5: Add STG/ISP/VOUT clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230424135409.6648-1-xingyu.wu@starfivetech.com/)** + +> This patch serises are base on the basic JH7110 SYSCRG/AONCRG +> drivers and add new partial clock drivers and reset supports +> about System-Top-Group(STG), Image-Signal-Process(ISP) +> and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. These +> clocks and resets could be used by DMA, VIN and Display modules. +> + +**[v10: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230424092313.178699-1-alexghiti@rivosinc.com/)** + +> his new version gets rid of the limitation that prevented KASAN kernels +> to use the newly introduced parameters. +> +> While looking into KASLR, I fell onto commit aacd149b6238 ("arm64: head: +> avoid relocating the kernel twice for KASLR"): it allows to use the fdt +> functions very early in the boot process with KASAN enabled by simply +> compiling a new version of those functions without instrumentation. +> + +**[v1: riscv: replace deprecated scall with ecall](http://lore.kernel.org/linux-riscv/20230423223210.126948-1-maskray@google.com/)** + +> scall is a deprecated alias for ecall. ecall is used in several places, +> so there is no assembler compatibility concern. +> + +#### 进程调度 + +**[v2: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230430171809.124686-1-yury.norov@gmail.com/)** + +> for_each_cpu() is widely used in kernel, and it's beneficial to create +> a NUMA-aware version of the macro. +> +> Recently added for_each_numa_hop_mask() works, but switching existing +> codebase to it is not an easy process. +> + +**[v1: sched: core: Simplify sched_can_stop_tick()](http://lore.kernel.org/lkml/20230429002831.2875-1-zeming@nfschina.com/)** + +> Remove useless intermediate variable "fifo_nr_running". +> + +**[v1: sched: add ttwu_migration counter](http://lore.kernel.org/lkml/20230425012234.15388-1-shijie@os.amperecomputing.com/)** + +> This patch adds the ttwu_migration counter to record the migrations. +> Put it at the end, do not break some tools. +> + +#### 内存管理 + +**[v2: permit write-sealed memfd read-only shared mappings](http://lore.kernel.org/linux-mm/cover.1682890156.git.lstoakes@gmail.com/)** + +> The man page for fcntl() describing memfd file seals states the following +> about F_SEAL_WRITE:- +> +> Furthermore, trying to create new shared, writable memory-mappings via +> mmap(2) will also fail with EPERM. +> + +**[v1: mm/mmap/vma_merge: always check invariants](http://lore.kernel.org/linux-mm/df548a6ae3fa135eec3b446eb3dae8eb4227da97.1682885809.git.lstoakes@gmail.com/)** + +> We may still have inconsistent input parameters even if we choose not to +> merge and the vma_merge() invariant checks are useful for checking this +> with no production runtime cost (these are only relevant when +> CONFIG_DEBUG_VM is specified). +> + +**[v1: debugobjects,locking: Annotate __debug_object_init() wait type violation](http://lore.kernel.org/linux-mm/20230429100614.GA1489784@hirez.programming.kicks-ass.net/)** + +> On Tue, Apr 25, 2023 at 11:51:05PM +0800, Qi Zheng wrote: +> > I just tested the following code and +> > it can resolve the warning I encountered. :) +> + +**[v3: Reduce lock contention related with large folio](http://lore.kernel.org/linux-mm/20230429082759.1600796-1-fengwei.yin@intel.com/)** + +> yan tried to enable the large folio for anonymous mapping [1]. +> +> Unlike large folio for page cache which doesn't trigger frequent page +> allocation/free, large folio for anonymous mapping is allocated/freeed +> more frequently. So large folio for anonymous mapping exposes some lock +> contention. +> + +**[v3: migrate_pages: Avoid blocking for IO in MIGRATE_SYNC_LIGHT](http://lore.kernel.org/linux-mm/20230428135414.v3.1.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid/)** + +> The MIGRATE_SYNC_LIGHT mode is intended to block for things that will +> finish quickly but not for things that will take a long time. Exactly +> how long is too long is not well defined, but waits of tens of +> milliseconds is likely non-ideal. +> + +**[v3: net-next/mm: page_pool: new approach for leak detection and shutdown phase](http://lore.kernel.org/linux-mm/168269854650.2191653.8465259808498269815.stgit@firesoul/)** + +> The page_pool (PP) workqueue calling page_pool_release_retry generate +> too many false-positive reports. Further more, these reports of +> page_pool shutdown still having inflight packets are not very helpful +> to track down the root-cause. +> + +**[v8: mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files](http://lore.kernel.org/linux-mm/cover.1682598808.git.quic_charante@quicinc.com/)** + +> This patch aims to implement POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED +> advices to shmem files which can be helpful for the drivers who may want +> to manage the pages of shmem files on their own, like, that are created +> through shmem_file_setup[_with_mnt](). +> + +**[v2: memcg: OOM log improvements](http://lore.kernel.org/linux-mm/20230428132406.2540811-1-yosryahmed@google.com/)** + +> This short patch series brings back some cgroup v1 stats in OOM logs +> that were unnecessarily changed before. It also makes memcg OOM logs +> less reliant on printk() internals. +> + +**[v1: mm: Do not reclaim private data from pinned page](http://lore.kernel.org/linux-mm/20230428124140.30166-1-jack@suse.cz/)** + +> If the page is pinned, there's no point in trying to reclaim it. +> Furthermore if the page is from the page cache we don't want to reclaim +> fs-private data from the page because the pinning process may be writing +> to the page at any time and reclaiming fs private info on a dirty page +> can upset the filesystem (see link below). +> + +**[v1: mm: optimization on page allocation when CMA enabled](http://lore.kernel.org/linux-mm/1682679641-13652-1-git-send-email-zhaoyang.huang@unisoc.com/)** + +> Please be notice bellowing typical scenario that commit 168676649 introduce, +> that is, 12MB free cma pages 'help' GFP_MOVABLE to keep draining/fragmenting +> U&R page blocks until they shrink to 12MB without enter slowpath which against +> current reclaiming policy. This commit change the criteria from hard coded '1/2' +> to watermark check which leave U&R free pages stay around WMARK_LOW when being +> fallback. +> + +**[v5: mm/gup: disallow GUP writing to file-backed mappings by default](http://lore.kernel.org/linux-mm/6b73e692c2929dc4613af711bdf92e2ec1956a66.1682638385.git.lstoakes@gmail.com/)** + +> Writing to file-backed mappings which require folio dirty tracking using +> GUP is a fundamentally broken operation, as kernel write access to GUP +> mappings do not adhere to the semantics expected by a file system. +> + +**[v3: Preserved-over-Kexec RAM](http://lore.kernel.org/linux-mm/1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com/)** + +> Sending out this RFC in part to guage community interest. +> This patchset implements preserved-over-kexec memory storage or PKRAM as a +> method for saving memory pages of the currently executing kernel so that +> they may be restored after kexec into a new kernel. The patches are adapted +> from an RFC patchset sent out in 2013 by Vladimir Davydov [1]. They +> introduce the PKRAM kernel API. +> + +**[v2: Add support for sharing page tables across processes (Previously mshare)](http://lore.kernel.org/linux-mm/cover.1682453344.git.khalid.aziz@oracle.com/)** + +> This patch series adds a new flag to mmap() call - MAP_SHARED_PT. +> This flag can be specified along with MAP_SHARED by a process to +> hint to kernel that it wishes to share page table entries for this +> file mapping mmap region with other processes. Any other process +> that mmaps the same file with MAP_SHARED_PT flag can then share the +> same page table entries. Besides specifying MAP_SHARED_PT flag, the +> processes must map the files at a PMD aligned address with a size +> that is a multiple of PMD size and at the same virtual addresses. +> This last requirement of same virtual addresses can possibly be +> relaxed if that is the consensus. +> + +**[v4: shmem: Add user and group quota support for tmpfs](http://lore.kernel.org/linux-mm/20230426102008.2930932-1-cem@kernel.org/)** + +> Hello folks. +> +> This is the final version of the quota support from tmpfs, with all the issues +> addressed, and now including RwB tags on all patches, and should be ready for +> merge. Details are within each patch, and the original cover-letter below. +> + +**[v1: mm/oom_kill: system enters a state something like hang when running stress-ng](http://lore.kernel.org/linux-mm/20230426051030.112007-1-hui.wang@canonical.com/)** + +> When we run stress-ng on the UC (Ubuntu Core), the system will be in a +> state similar to hang. And we found if a testcase could introduce the +> oom (like stress-ng-bigheap, stress-ng-brk, ...) under the UC, it is +> highly possible that this testcase will make the system be in a state +> like hang. We had a discussion for this issue here: +> https://github.com/ColinIanKing/stress-ng/pull/270 +> + +**[v2: mm: compaction: optimize compact_memory to comply with the admin-guide](http://lore.kernel.org/linux-mm/tencent_DFF54DB2A60F3333F97D3F6B5441519B050A@qq.com/)** + +> For the /proc/sys/vm/compact_memory file, the admin-guide states: +> When 1 is written to the file, all zones are compacted such that free +> memory is available in contiguous blocks where possible. This can be +> important for example in the allocation of huge pages although processes +> will also directly compact memory as required +> + +**[v4: mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()](http://lore.kernel.org/linux-mm/5c26368865e79c743a453dea48d30670b19d2e4f.1682425534.git.baolin.wang@linux.alibaba.com/)** + +> Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which +> checks whether the given zone contains holes, and uses pfn_to_online_page() +> to validate if the start pfn is online and valid, as well as using pfn_valid() +> to validate the end pfn. +> + +**[GIT PULL: ext4 changes for the 6.4 merge window](http://lore.kernel.org/linux-mm/20230425041838.GA150312@mit.edu/)** + +> The following changes since commit e8d018dd0257f744ca50a729e3d042cf2ec9da65: +> +> Linux 6.3-rc3 (2023-03-19 13:27:55 -0700) +> +> are available in the Git repository at: +> +> https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus +> + +**[v2: fs: multigrain timestamps](http://lore.kernel.org/linux-mm/20230424151104.175456-1-jlayton@kernel.org/)** + +> While I don't think we can practically optimize away ctime updates +> like we do with i_version, I do like the idea of using this scheme to +> indicate when we need to use a high-res timestamp. +> + +**[v4: of: fdt: Scan /memreserve/ last](http://lore.kernel.org/linux-mm/20230424113846.46382-1-tanure@linux.com/)** + +> Change the scanning /memreserve/ and /reserved-memory node order to fix +> Kernel panic on Khadas Vim3 Board. +> +> If /memreserve/ goes first, the memory is reserved, but nomap can't be +> applied to the region. So the memory won't be used by Linux, but it is +> still present in the linear map as normal memory, which allows +> speculation. Legitimate access to adjacent pages will cause the CPU +> to end up prefetching into them leading to Kernel panic. +> + +**[v1: string: use __builtin_memcpy() in strlcpy/strlcat](http://lore.kernel.org/linux-mm/20230424112313.3408363-1-glider@google.com/)** + +> lib/string.c is built with -ffreestanding, which prevents the compiler +> from replacing certain functions with calls to their library versions. +> + +**[v1: -v2: mm,unmap: avoid flushing TLB in batch if PTE is inaccessible](http://lore.kernel.org/linux-mm/20230424065408.188498-1-ying.huang@intel.com/)** + +> The version 1 of this patch was merged in mm-unstable branch. If you +> want to move that patch into mm-stable recently, it may be better to +> update that patch with this new version firstly. If you want to do +> that after v6.4-rc1, I will rebase this patch and resend it after +> v6.4-rc1 is released. +> + +**[RFC: allow building a kernel without buffer_heads](http://lore.kernel.org/linux-mm/20230424054926.26927-1-hch@lst.de/)** + +> after all the talk about removing buffer_heads, here is a series that +> shows how to build a kernel without buffer_heads. And how unrealistic +> it is to remove the entirely. +> + +**[v1: mmzone: Introduce for_each_populated_zone_pgdat()](http://lore.kernel.org/linux-mm/20230424030756.1795926-1-yajun.deng@linux.dev/)** + +> Instead of define an index and determining if the zone has memory, +> introduce for_each_populated_zone_pgdat() helper that can be used +> to iterate over each populated zone in pgdat, and convert the most +> obvious users to it. +> + +#### 文件系统 + +**[GIT PULL: iomap: new code for 6.4](http://lore.kernel.org/linux-fsdevel/20230427175543.GA59213@frogsfrogsfrogs/)** + +> Please pull this branch with changes for iomap for 6.4-rc1. The only +> changes for this cycle are the addition of tracepoints to the iomap +> directio code so that Ritesh (who is working on porting ext2 to iomap) +> can observe the io flows more easily. Dave will be sending you a pull +> request for xfs code for this cycle. +> + +**[v1: Prepare for supporting more filesystems with fanotify](http://lore.kernel.org/linux-fsdevel/20230425132223.2608226-1-amir73il@gmail.com/)** + +> This is the second part of the proposal to support fanotify reporing +> file ids on overlayfs. +> +> The first part [1] relaxes the requirements for filesystems to support +> reporting events with fid to require only the ->encode_fh() operation. +> + +**[GIT PULL: sysctl changes for v6.4-rc1](http://lore.kernel.org/linux-fsdevel/ZEcE6Ex20CwMfMKj@bombadil.infradead.org/)** + +> Note: given we *save* memory per each change move away from each +> deprecated call, I don't see a need to immediately *pause* all +> kernel/sysctl.c moves. Each replacement of a deprecated call saves +> us memory and likely more than a the simple empty entry when we move +> a kernel/syctl.c entry to its own file. +> + +**[v1: inotify: Avoid reporting event with invalid wd](http://lore.kernel.org/linux-fsdevel/20230424163219.9250-1-jack@suse.cz/)** + +> When inotify_freeing_mark() races with inotify_handle_inode_event() it +> can happen that inotify_handle_inode_event() sees that i_mark->wd got +> already reset to -1 and reports this value to userspace which can +> confuse the inotify listener. Avoid the problem by validating that wd is +> sensible (and pretend the mark got removed before the event got +> generated otherwise). +> + +**[RFC: allow building a kernel without buffer_heads](http://lore.kernel.org/linux-fsdevel/20230424054926.26927-1-hch@lst.de/)** + +> after all the talk about removing buffer_heads, here is a series that +> shows how to build a kernel without buffer_heads. And how unrealistic +> it is to remove the entirely. +> +> Most of the series refactors some common code to make implementing direct +> I/O easier without use of the ->direct_IO method and the helpers based +> around it. It then switches buffered writes (but not writeback) for +> block devices to use iomap unconditionally, but still using buffer_heads. +> + +**[git pull: vfs.git misc pile](http://lore.kernel.org/linux-fsdevel/20230424042949.GM3390869@ZenIV/)** + +> The following changes since commit eeac8ede17557680855031c6f305ece2378af326: +> +> Linux 6.3-rc2 (2023-03-12 16:36:44 -0700) +> +> are available in the Git repository at: +> +> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git tags/pull-misc +> +> for you to fetch changes up to 73bb5a9017b93093854c18eb7ca99c7061b16367: +> +> fs: Fix description of vfs_tmpfile() (2023-03-12 20:03:48 -0400) +> + +**[git pull: fget() whack-a-mole](http://lore.kernel.org/linux-fsdevel/20230424042529.GI3390869@ZenIV/)** + +> The following changes since commit fe15c26ee26efa11741a7b632e9f23b01aca4cc6: +> +> Linux 6.3-rc1 (2023-03-05 14:52:03 -0800) +> +> are available in the Git repository at: +> +> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git tags/pull-fd +> +> for you to fetch changes up to 4a892c0fe4bb0546d68a89fa595bd22cb4be2576: +> +> fuse_dev_ioctl(): switch to fdget() (2023-04-20 22:55:35 -0400) +> +> fget() to fdget() conversions +> + +**[v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/)** + +> This removes the dependency on interrupts to wake up task. Set task +> state as TASK_RUNNING, if need_resched() returns true, +> while polling for IO completion. +> Earlier, polling task used to sleep, relying on interrupt to wake it up. +> This made some IO take very long when interrupt-coalescing is enabled in +> NVMe. +> + +#### 网络设备 + +**[v4: net: mvpp2: tai: add extts support](http://lore.kernel.org/netdev/20230430170656.137549-1-shmuel.h@siklu.com/)** + +> This patch series adds support for PTP event capture on the Aramda +> 80x0/70x0. This feature is mainly used by tools linux ts2phc(3) in order +> to synchronize a timestamping unit (like the mvpp2's TAI) and a system +> DPLL on the same PCB. +> + +**[v1: net: virtio-net: allow usage of small vrings](http://lore.kernel.org/netdev/20230430131518.2708471-1-alvaro.karsz@solid-run.com/)** + +> At the moment, if a virtio network device uses vrings with less than +> MAX_SKB_FRAGS + 2 entries, the device won't be functional. +> +> The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always +> evaluate to false, leading to TX timeouts. +> + +**[v2: net: bonding: add xdp_features support](http://lore.kernel.org/netdev/e82117190648e1cbb2740be44de71a21351c5107.1682848658.git.lorenzo@kernel.org/)** + +> Introduce xdp_features support for bonding driver according to the slave +> devices attached to the master one. xdp_features is required whenever we +> want to xdp_redirect traffic into a bond device and then into selected +> slaves attached to it. +> + +**[v3: virtio_net: suppress cpu stall when free_unused_bufs](http://lore.kernel.org/netdev/1682783278-12819-1-git-send-email-wangwenliang.1995@bytedance.com/)** + +> For multi-queue and large ring-size use case, the following error +> occurred when free_unused_bufs: +> rcu: INFO: rcu_sched self-detected stall on CPU. +> + +**[v1: net: atlantic: Define aq_pm_ops conditionally on CONFIG_PM](http://lore.kernel.org/netdev/20230428214321.2678571-1-trix@redhat.com/)** + +> The only use of aq_pm_ops is conditional on CONFIG_PM. +> The definition of aq_pm_ops and its functions should also +> be conditional on CONFIG_PM. +> + +**[v1: igb: Define igb_pm_ops conditionally on CONFIG_PM](http://lore.kernel.org/netdev/20230428200009.2224348-1-trix@redhat.com/)** + +> The only use of igb_pm_ops is conditional on CONFIG_PM. +> The definition of igb_pm_ops should also be conditional on CONFIG_PM +> + +**[v4: bpf: Socket lookup BPF API from tc/xdp ingress does not respect VRF bindings.](http://lore.kernel.org/netdev/20230428083007.148364-1-gilad9366@gmail.com/)** + +> When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't +> respected. This patchset fixes this by regarding the incoming device's +> VRF attachment when performing the socket lookups from tc/xdp. +> +> The first two patches are coding changes which factor out the tc helper's +> logic which was shared with cg/sk_skb (which operate correctly). +> + +**[v4: bpf-next: Introduce a new kfunc of bpf_task_under_cgroup](http://lore.kernel.org/netdev/20230428071737.43849-1-zhoufeng.zf@bytedance.com/)** + +> Trace sched related functions, such as enqueue_task_fair, it is necessary to +> specify a task instead of the current task which within a given cgroup. +> + +**[v4: net-next: Wangxun netdev features support](http://lore.kernel.org/netdev/20230428055709.66071-1-mengyuanlou@net-swift.com/)** + +> Implement tx_csum and rx_csum to support hardware checksum offload. +> Implement ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid. +> Enable macros in netdev features which wangxun can support. +> + +**[v7: Create common DPLL configuration API](http://lore.kernel.org/netdev/20230428002009.2948020-1-vadfed@meta.com/)** + +> Implement common API for clock/DPLL configuration and status reporting. +> The API utilises netlink interface as transport for commands and event +> notifications. This API aim to extend current pin configuration and +> make it flexible and easy to cover special configurations. +> + +**[v2: can: bxcan: add support for single peripheral configuration](http://lore.kernel.org/netdev/20230427204540.3126234-1-dario.binacchi@amarulasolutions.com/)** + +> The series adds support for managing bxCAN controllers in single peripheral +> configuration. +> Unlike stm32f4 SOCs, where bxCAN controllers are only in dual peripheral +> configuration, stm32f7 SOCs contain three CAN peripherals, CAN1 and CAN2 +> in dual peripheral configuration and CAN3 in single peripheral +> + +**[v1: net-next: pds_core: add switchdev and tc for vlan offload](http://lore.kernel.org/netdev/20230427164546.31296-1-shannon.nelson@amd.com/)** + +> This is an RFC for adding to the pds_core driver some very simple support +> for VF representors and a tc command for offloading VF port vlans. +> + +**[v1: net: add xdp_features support for bonding driver](http://lore.kernel.org/netdev/cover.1682603719.git.lorenzo@kernel.org/)** + +> Introduce missing xdp_features support for bonding driver. xdp_features +> is required whenever we want to xdp_redirect traffic into a bond device +> and then into selected slaves attached to it. +> + +**[v1: net-next: net: tcp: make txhash use consistent for IPv4](http://lore.kernel.org/netdev/20230427134527.18127-1-atenart@kernel.org/)** + +> Series is divided in two parts. First two commits make the txhash (used +> for the skb hash in TCP) to be consistent for all IPv4/TCP packets (IPv6 +> doesn't have the same issue). Last two commits improve doc/comment +> hash-related parts. +> + +**[v1: mISDN: Use list_count_nodes()](http://lore.kernel.org/netdev/886a6fe86cfc3d787a2e3a5062ce8bd92323ed66.1682602766.git.christophe.jaillet@wanadoo.fr/)** + +> count_list_member() really looks the same as list_count_nodes(), so use the +> latter instead of hand writing it. +> +> The first one return an int and the other a size_t, but that should be +> fine. It is really unlikely that we get so many parties in a conference. +> + +**[v1: net: ice: block LAN in case of VF to VF offload](http://lore.kernel.org/netdev/20230427045711.1625449-1-michal.swiatkowski@linux.intel.com/)** + +> VF to VF traffic shouldn't go outside. To enforce it, set only the loopback +> enable bit in case of all ingress type rules added via the tc tool. +> + +**[v4: net-next: virtio_net: refactor xdp codes](http://lore.kernel.org/netdev/20230427030534.115066-1-xuanzhuo@linux.alibaba.com/)** + +> Due to historical reasons, the implementation of XDP in virtio-net is relatively +> chaotic. For example, the processing of XDP actions has two copies of similar +> code. Such as page, xdp_page processing, etc. +> + +**[v1: leds: introduce new LED hw control APIs](http://lore.kernel.org/netdev/20230427001541.18704-1-ansuelsmth@gmail.com/)** + +> This is a continue of [1]. It was decided to take a more gradual +> approach to implement LEDs support for switch and phy starting with +> basic support and then implementing the hw control part when we have all +> the prereq done. +> + +**[v1: net-next: wifi: ath10k: Use list_count_nodes()](http://lore.kernel.org/netdev/e6ec525c0c5057e97e33a63f8a4aa482e5c2da7f.1682541872.git.christophe.jaillet@wanadoo.fr/)** + +> ath10k_wmi_fw_stats_num_peers() and ath10k_wmi_fw_stats_num_vdevs() really +> look the same as list_count_nodes(), so use the latter instead of hand +> writing it. +> + +**[v1: net-next: wifi: ath11k: Use list_count_nodes()](http://lore.kernel.org/netdev/941484caae24b89d20524b1a5661dd1fd7025492.1682542084.git.christophe.jaillet@wanadoo.fr/)** + +> ath11k_wmi_fw_stats_num_vdevs() and ath11k_wmi_fw_stats_num_bcn() really +> look the same as list_count_nodes(), so use the latter instead of hand +> writing it. +> +> The first ones use list_for_each_entry() and the other list_for_each(), but +> they both count the number of nodes in the list. +> + +**[v2: net: dsa: mv88e6xxx: add mv88e6321 rsvd2cpu](http://lore.kernel.org/netdev/20230426202815.2991822-1-angelo@kernel-space.org/)** + +> Add rsvd2cpu capability for mv88e6321 model, to allow proper bpdu +> processing. +> + +**[v1: net-next: wifi: mwifiex: Use list_count_nodes()](http://lore.kernel.org/netdev/e77ed7f719787cb8836a93b6a6972f4147e40bc6.1682537509.git.christophe.jaillet@wanadoo.fr/)** + +> mwifiex_wmm_list_len() is the same as list_count_nodes(), so use the latter +> instead of hand writing it. +> +> Turn 'ba_stream_num' and 'ba_stream_max' in size_t to keep the same type +> as what is returned by list_count_nodes(). +> + +**[v4: New NDO methods ndo_hwtstamp_get/set](http://lore.kernel.org/netdev/20230426165835.443259-1-kory.maincent@bootlin.com/)** + +> You patch series work on my side with the macb MAC controller and this +> patch. +> I don't know if you are waiting for more reviews but it seems good enough +> to drop the RFC tag. +> + +**[v3: net: net/sched: act_mirred: Add carrier check](http://lore.kernel.org/netdev/20230426151940.639711-1-victor@mojatatu.com/)** + +> As you can see, it's administratively UP but operationally down. +> In this case, sending a packet to this port caused a nasty kernel hang (so +> nasty that we were unable to capture it). Aborting a transmit based on +> operational status (in addition to administrative status) fixes the issue. +> + +**[GIT PULL: Networking for 6.4](http://lore.kernel.org/netdev/20230426143118.53556-1-pabeni@redhat.com/)** + +> We have a few conflicts with your current tree, specifically: +> +> - between commits: +> +> dbb0ea153401 ("thermal: Use thermal_zone_device_type() accessor") +> +> the latter removed the code updated by the former, the resolution +> is deleting mlxsw_thermal_module_trips_reset() and +> mlxsw_thermal_module_trips_update(). +> + +**[v1: net-next: add driver support for Microchip LAN865X Rev.B0 Internal PHYs](http://lore.kernel.org/netdev/20230426114655.93672-1-Parthiban.Veerasooran@microchip.com/)** + +> The first patch updates the LAN867x PHY supported revision number to +> Rev.B1 and the second patch adds the support for Microchip LAN865X Rev.B0 +> 10BASE-T1S Internal PHYs. +> + +**[v2: net-next: Add support for VSC8531_02 PHY and DT RGMII tuning](http://lore.kernel.org/netdev/20230426104313.28950-1-harini.katakam@amd.com/)** + +> Add support for VSC8531_02 PHY ID. +> Also provide an option to tune RGMII delay value via devicetree. +> The default delays are retained in the driver. +> + +**[v1: bpf-next: net/smc: Introduce BPF injection capability](http://lore.kernel.org/netdev/1682501055-4736-1-git-send-email-alibuda@linux.alibaba.com/)** + +> This patches attempt to introduce BPF injection capability for SMC, +> and add selftest to ensure code stability. +> +> As we all know that the SMC protocol is not suitable for all scenarios, +> especially for short-lived. However, for most applications, they cannot +> guarantee that there are no such scenarios at all. Therefore, apps +> may need some specific strategies to decide shall we need to use SMC +> or not, for example, apps can limit the scope of the SMC to a specific +> IP address or port. +> + +**[v2: net/ncsi: clear Tx enable mode when handling a Config required AEN](http://lore.kernel.org/netdev/20230426081350.1214512-1-chou.cosmo@gmail.com/)** + +> ncsi_channel_is_tx() determines whether a given channel should be +> used for Tx or not. However, when reconfiguring the channel by +> handling a Configuration Required AEN, there is a misjudgment that +> the channel Tx has already been enabled, which results in the Enable +> Channel Network Tx command not being sent. +> + +**[v1: net: phy: aquantia: Add 10mbps support](http://lore.kernel.org/netdev/20230426081612.4123059-1-devangnayanbhai.vyas@amd.com/)** + +> This adds support for 10mbps speed in PHY device's +> "supported" field which helps in autonegotiating +> 10mbps link from PHY side where PHY supports the speed +> but not updated in PHY kernel framework. +> +> One such example is AQR113C PHY. +> + +**[v2: net-next: net: phy: hide the PHYLIB_LEDS knob](http://lore.kernel.org/netdev/d82489be8ed911c383c3447e9abf469995ccf39a.1682496488.git.pabeni@redhat.com/)** + +> commit 4bb7aac70b5d ("net: phy: fix circular LEDS_CLASS dependencies") +> solved a build failure, but introduces a new config knob with a default +> 'y' value: PHYLIB_LEDS. +> + +#### 安全增强 + +**[GIT PULL: flexible-array transformations for 6.4-rc1](http://lore.kernel.org/linux-hardening/ZEaNFzLag13mLxOL@work/)** + +> The following changes since commit fe15c26ee26efa11741a7b632e9f23b01aca4cc6: +> +> Linux 6.3-rc1 (2023-03-05 14:52:03 -0800) +> +> are available in the Git repository at: +> +> git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git tags/flex-array-transformations-6.4-rc1 +> +> for you to fetch changes up to 00168b415a60cec7558608efb4fc50f2a73daae2: +> + +#### 异步 IO + +**[v3: io_uring: Pass the whole sqe to commands](http://lore.kernel.org/io-uring/20230430143532.605367-1-leitao@debian.org/)** + +> These three patches prepare for the sock support in the io_uring cmd, as +> described in the following RFC: +> +> https://lore.kernel.org/lkml/20230406144330.1932798-1-leitao@debian.org/ +> +> Since the support linked above depends on other refactors, such as the sock +> ioctl() sock refactor[1], I would like to start integrating patches that have +> consensus and can bring value right now. This will also reduce the patchset +> size later. +> + +**[v1: Rethinking splice](http://lore.kernel.org/io-uring/cover.1682701588.git.asml.silence@gmail.com/)** + +> IORING_OP_SPLICE has problems, many of them are fundamental and rooted +> in the uapi design, see the patch 8 description. This patchset introduces +> a different approach, which came from discussions about splices +> and fused commands and absorbed ideas from both of them. We remove +> reliance onto pipes and registering "spliced" buffers with data as an +> io_uring's registered buffer. Then the user can use it as a usual +> registered buffer, e.g. pass it to IORING_OP_WRITE_FIXED. +> + +**[v1: io_uring attached nvme queue](http://lore.kernel.org/io-uring/20230429093925.133327-1-joshi.k@samsung.com/)** + +> Also, io_uring ring is not to be shared among application threads. +> Application is responsible for building the sharing (if it feels the +> need). This means ring-associated exclusive queue can do away with some +> synchronization costs that occur for shared queue. +> + +**[v11: io_uring: add napi busy polling support](http://lore.kernel.org/io-uring/20230428181248.610605-1-shr@devkernel.io/)** + +> This adds the napi busy polling support in io_uring.c. It adds a new +> napi_list to the io_ring_ctx structure. This list contains the list of +> napi_id's that are currently enabled for busy polling. This list is +> used to determine which napi id's enabled busy polling. For faster +> access it also adds a hash table. +> + +**[v1: io_uring: Add io_uring_setup flag to pre-register ring fd and never install it](http://lore.kernel.org/io-uring/bc8f431bada371c183b95a83399628b605e978a3.1682699803.git.josh@joshtriplett.org/)** + +> With IORING_REGISTER_USE_REGISTERED_RING, an application can register +> the ring fd and use it via registered index rather than installed fd. +> This allows using a registered ring for everything *except* the initial +> mmap. +> + +**[v10: io_uring: add napi busy polling support](http://lore.kernel.org/io-uring/20230425181845.2813854-1-shr@devkernel.io/)** + +> This adds the napi busy polling support in io_uring.c. It adds a new +> napi_list to the io_ring_ctx structure. This list contains the list of +> napi_id's that are currently enabled for busy polling. This list is +> used to determine which napi id's enabled busy polling. For faster +> access it also adds a hash table. +> + +**[v9: liburing: add api for napi busy poll](http://lore.kernel.org/io-uring/20230425182054.2826621-1-shr@devkernel.io/)** + +> This adds two new api's to set/clear the napi busy poll settings. The two +> new functions are called: +> - io_uring_register_napi +> - io_uring_unregister_napi +> +> The patch series also contains the documentation for the two new functions +> and two example programs. The client program is called napi-busy-poll-client +> and the server program napi-busy-poll-server. The client measures the +> roundtrip times of requests. +> + +#### Rust For Linux + +**[v3: rust: helpers: sort includes alphabetically in rust/helpers.c](http://lore.kernel.org/rust-for-linux/20230426204923.16195-1-amiculas@cisco.com/)** + +> Sort the #include directives of rust/helpers.c alphabetically and add a +> comment specifying this. The reason for this is to improve readability +> and to be consistent with the other files with a similar approach within +> 'rust/'. +> + +**[v1: rust: Sort rust/helpers.c's #include directives](http://lore.kernel.org/rust-for-linux/20230426081715.40834-1-amiculas@cisco.com/)** + +> Sort the #include directives of rust/helpers.c alphabetically and add a +> comment specifying this. +> + +#### BPF + +**[v3: bpf-next: Handle immediate reuse in bpf memory allocator](http://lore.kernel.org/bpf/20230429101215.111262-1-houtao@huaweicloud.com/)** + +> As discussed in v1, currently the freed objects in bpf memory allocator +> may be reused immediately by the new allocation, it introduces +> use-after-bpf-ma-free problem for non-preallocated hash map and makes +> lookup procedure return incorrect result. The immediate reuse also makes +> introducing new use case more difficult (e.g. qp-trie). +> + +**[v1: bpf-next: libbpf: capability for resizing datasec maps](http://lore.kernel.org/bpf/20230428222754.183432-1-inwardvessel@gmail.com/)** + +> The thought behind this is to allow for use cases where a given datasec +> needs to scale to for example the number of CPU's present. A bpf program +> can have a global array in a custom data section with an initial length +> and before loading the bpf program, the array length could be extended to +> match the CPU count. The selftests included in this series perform this +> scaling to an arbitrary value to demonstrate how it can work. +> + +**[v1: x86/pie: Make kernel image's virtual address flexible](http://lore.kernel.org/bpf/cover.1682673542.git.houwenlong.hwl@antgroup.com/)** + +> These patches make the changes necessary to build the kernel as Position +> Independent Executable (PIE) on x86_64. A PIE kernel can be relocated +> below the top 2G of the virtual address space. And this patchset +> provides an example to allow kernel image to be relocated in top 512G of +> the address space. +> + +**[v1: bpf-next: selftests/bpf: Add fexit_sleep to DENYLIST.aarch64](http://lore.kernel.org/bpf/20230428034726.2593484-1-martin.lau@linux.dev/)** + +> It is reported that the fexit_sleep never returns in aarch64. +> The remaining tests cannot start. Put this test into DENYLIST.aarch64 +> for now so that other tests can continue to run in the CI. +> + +**[v2: bpf-next: libbpf: btf_dump_type_data_check_overflow needs to consider BTF_MEMBER_BITFIELD_SIZE](http://lore.kernel.org/bpf/20230428013638.1581263-1-martin.lau@linux.dev/)** + +> The reason is in btf_dump_type_data_check_overflow(). It does not use +> BTF_MEMBER_BITFIELD_SIZE from the struct's member (btf_member). Instead, +> it is using the enum size which is 4. It had been working till the recent +> commit 4e04143c869c ("fs_context: drop the unused lsm_flags member") +> removed an integer member which also removed the 4 bytes padding at the end +> of the fs_context. Missing this 4 bytes padding exposed this bug. +> In particular, when btf_dump_type_data_check_overflow() reaches +> the member 'phase', -E2BIG is returned. +> + +**[v2: bpf-next: selftests/bpf: test_progs can read test lists from file](http://lore.kernel.org/bpf/20230427225333.3506052-1-sveiss@meta.com/)** + +> BPF selftests have ALLOWLIST and DENYLIST files, used to control which +> tests are run in CI. These files are currently parsed by a shell +> script. [1] +> +> This patchset allows those files to be specified directly on the +> test_progs command line (eg, as -a @ALLOWLIST). +> + +**[v2: bpf-next: bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen](http://lore.kernel.org/bpf/20230427200409.1785263-1-sdf@google.com/)** + +> optval larger than PAGE_SIZE leads to EFAULT if the BPF program +> isn't careful enough. This is often overlooked and might break +> completely unrelated socket options. Instead of EFAULT, +> let's ignore BPF program buffer changes. See the first patch for +> more info. +> + +**[v1: selftests/bpf: Do not use sign-file as testcase](http://lore.kernel.org/bpf/88e3ab23029d726a2703adcf6af8356f7a2d3483.1682607419.git.legion@kernel.org/)** + +> The sign-file utility (from scripts/) is used in prog_tests/verify_pkcs7_sig.c, +> but the utility should not be called as a test. Executing this utility +> produces the following error: +> +> selftests: /linux/tools/testing/selftests/bpf: urandom_read +> ok 16 selftests: /linux/tools/testing/selftests/bpf: urandom_read +> +> selftests: /linux/tools/testing/selftests/bpf: sign-file +> not ok 17 selftests: /linux/tools/testing/selftests/bpf: sign-file # exit=2 +> + +**[v4: bpf-next: bpftool: Dump map id instead of value for map_of_maps types](http://lore.kernel.org/bpf/20230427120313.43574-1-kuro@kuroa.me/)** + +> When using `bpftool map dump` with map_of_maps, it is usually +> more convenient to show the inner map id instead of raw value. +> +> We are changing the plain print behavior to show inner_map_id +> instead of hex value, this would help with quick look up of +> inner map with `bpftool map dump id `. +> To avoid disrupting scripted behavior, we will add a new +> `inner_map_id` field to json output instead of replacing value. +> + +**[v2: bpf-next: bpf: Make bpf_helper_defs.h c++ friendly](http://lore.kernel.org/bpf/20230426155357.4158846-1-sdf@google.com/)** + +> Compiling C++ BPF programs with existing bpf_helper_defs.h is not +> possible due to stricter C++ type conversions. C++ complains +> about (void *) type conversions: +> +> $ clang++ --include linux/types.h ./tools/lib/bpf/bpf_helper_defs.h +> + +**[v1: bpf-next: Add precision propagation for subprogs and callbacks](http://lore.kernel.org/bpf/20230425234911.2113352-1-andrii@kernel.org/)** + +> As more and more real-world BPF programs become more complex +> and increasingly use subprograms (both static and global), scalar precision +> tracking and its (previously weak) support for BPF subprograms (and callbacks +> as a special case of that) is becoming more and more of an issue and +> limitation. Couple that with increasing reliance on state equivalence (BPF +> open-coded iterators have a hard requirement for state equivalence to converge +> and successfully validate loops), and it becomes pretty critical to address +> this limitation and make precision tracking universally supported for BPF +> programs of any complexity and composition. +> + +**[v1: KEYS: Introduce user mode key and signature parsers](http://lore.kernel.org/bpf/20230425173557.724688-1-roberto.sassu@huaweicloud.com/)** + +> Support new key and signature formats with the same kernel component. +> +> Verify the authenticity of system data with newly supported data formats. +> +> Mitigate the risk of parsing arbitrary data in the kernel. +> + +**[v7: vhost: virtio core prepares for AF_XDP](http://lore.kernel.org/bpf/20230425073613.8839-1-xuanzhuo@linux.alibaba.com/)** + +> Now, virtio may can not work with DMA APIs when virtio features do not have +> VIRTIO_F_ACCESS_PLATFORM. +> +> 1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just +> work with the "real" devices. +> 2. I tried to let xsk support callballs to get phy address from virtio-net +> driver as the dma address. But the maintainers of xsk may want to use dma-buf +> to replace the DMA APIs. I think that may be a larger effort. We will wait +> too long. +> + +**[v2: powerpc/bpf: populate extable entries only during the last pass](http://lore.kernel.org/bpf/20230425065829.18189-1-hbathini@linux.ibm.com/)** + +> Since commit 85e031154c7c ("powerpc/bpf: Perform complete extra passes +> to update addresses"), two additional passes are performed to avoid +> space and CPU time wastage on powerpc. But these extra passes led to +> WARN_ON_ONCE() hits in bpf_add_extable_entry() as extable entries are +> populated again, during the extra pass, without resetting the index. +> Fix it by resetting entry index before repopulating extable entries, +> if and when there is an additional pass. +> + +**[v1: bpf-next: selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests](http://lore.kernel.org/bpf/20230424235128.1941726-1-andrii@kernel.org/)** + +> For now, change the test to assume fixed size of passed in array. Once +> BPF verifier supports precision tracking across subprogram calls, these +> changes will be reverted as unnecessary. +> + +**[[RFC/PATCH bpf-next 00/20] bpf: Add multi uprobe link](http://lore.kernel.org/bpf/20230424160447.2005755-1-jolsa@kernel.org/)** + +> this patchset is adding support to attach multiple uprobes and usdt probes +> through new uprobe_multi link. +> +> The current uprobe is attached through the perf event and attaching many +> uprobes takes a lot of time because of that. +> + +**[v6: tracing: Add fprobe events](http://lore.kernel.org/bpf/168234755610.2210510.12133559313738141202.stgit@mhiramat.roam.corp.google.com/)** + +> Here is the 6th version of improve fprobe and add a basic fprobe event +> support for ftrace (tracefs) and perf. Here is the previous version. +> +> https://lore.kernel.org/all/168198993129.1795549.8306571027057356176.stgit@mhiramat.roam.corp.google.com/ +> + +**[v2: libbpf: Improve version handling when attaching uprobe](http://lore.kernel.org/bpf/ZEV%2FEzOM+TJomP66@eg/)** + +> This change fixes the handling of versions in elf_find_func_offset. +> In the previous implementation, we incorrectly assumed that the +> version information would be present in the string found in the +> string table. +> + +**[v1: bpf-next: xsk: Use pool->dma_pages to check for DMA](http://lore.kernel.org/bpf/20230423180157.93559-1-kal.conley@dectris.com/)** + +> Compare pool->dma_pages instead of pool->dma_pages_cnt to check for an +> active DMA mapping. pool->dma_pages needs to be read anyway to access +> the map so this compiles to more efficient code. +> + +### 周边技术动态 + +#### Qemu + +**[v1: target/riscv: RVV 1-fill tail element changes](http://lore.kernel.org/qemu-devel/20230427205708.246679-1-dbarboza@ventanamicro.com/)** + +> This series makes changes in vext_set_tail_elements_1s() to be a little +> nicer to the emulation. +> +> First patch makes the function a no-op when vta == 0. Aside from the +> logic simplification we also have a little performance boost. +> + +**[v2: hw/riscv: virt: Assume M-mode FW in pflash0 only when "-bios none"](http://lore.kernel.org/qemu-devel/20230425102545.162888-1-sunilvl@ventanamicro.com/)** + +> Currently, virt machine supports two pflash instances each with +> 32MB size. However, the first pflash is always assumed to +> contain M-mode firmware and reset vector is set to this if +> enabled. Hence, for S-mode payloads like EDK2, only one pflash +> instance is available for use. This means both code and NV variables +> of EDK2 will need to use the same pflash. +> + +**[v3: hw/riscv/virt: Add a second UART for secure world](http://lore.kernel.org/qemu-devel/20230425073509.3618388-1-yong.li@intel.com/)** + +> The virt machine can have two UARTs and the second UART +> can be used by the secure payload, firmware or OS residing +> in secure world. Will include the UART device to FDT in a +> seperated patch. +> + +**[v1: Add RISC-V KVM AIA Support](http://lore.kernel.org/qemu-devel/20230424090716.15674-1-yongxuan.wang@sifive.com/)** + +> This series introduces support for KVM AIA in the RISC-V architecture. The +> implementation is refered to Anup's KVM AIA implementation in kvmtool +> (https://github.com/avpatel/kvmtool.git). To test these patches, a Linux kernel +> with KVM AIA support is required, which can be found in the qemu_kvm_aia branch +> at https://github.com/yong-xuan/linux.git. This kernel branch is based on the +> riscv_aia_v1 branch from https://github.com/avpatel/linux.git and includes two +> additional patches. +> + +#### U-Boot + +**[v3: Add ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/u-boot/20230428022515.29393-1-yanhong.wang@starfivetech.com/)** + +> This series of patches base on the latest branch/master,and +> adds ethernet support for the StarFive JH7110 RISC-V SoC. +> The series includes EEPROM, PHY and MAC drivers. The PHY model is +> YT8531 (from Motorcomm Inc), and the MAC version is dwmac-5.20 +> (from Synopsys DesignWare). +> + +**[v5: Add StarFive JH7110 PCIe drvier support](http://lore.kernel.org/u-boot/20230423105859.125764-1-minda.chen@starfivetech.com/)** + +> The PCIe driver depends on gpio, pinctrl, clk and reset driver to do init. +> The PCIe dts configuation includes all these setting. +> + +## 20230423:第 43 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v1: riscv: uprobes: Restore thread.bad_cause](http://lore.kernel.org/linux-riscv/1682214146-3756-1-git-send-email-yangtiezhu@loongson.cn/)** + +> thread.bad_cause is saved in arch_uprobe_pre_xol(), it should be restored +> in arch_uprobe_{post,abort}_xol() accordingly, otherwise the save operation +> is meaningless, this change is similar with x86 and powerpc. +> + +**[v1: dt-bindings: riscv: add sv57 mmu-type](http://lore.kernel.org/linux-riscv/20230421-voucher-ecology-7ddfdf801a71@spud/)** + +> Dumping the dtb from new versions of QEMU warns that sv57 is an +> undocumented mmu-type. The kernel has supported sv57 for about a year, +> so bring it into the fold. +> + +**[GIT PULL: KVM/riscv changes for 6.4](http://lore.kernel.org/linux-riscv/CAAhSdy2RLinG5Gx-sfOqrYDAT=xDa3WAk8r1jTu8ReO5Jo0LVA@mail.gmail.com/)** + +> We have the following KVM RISC-V changes for 6.4: +> 1) ONE_REG interface to enable/disable SBI extensions +> 2) Zbb extension for Guest/VM +> 3) AIA CSR virtualization +> 4) Few minor cleanups and fixes +> + +**[v17: Microchip Soft IP corePWM driver](http://lore.kernel.org/linux-riscv/20230421-neurology-trapezoid-b4fa29923a23@wendy/)** + +> Yet another version of this driver :) +> +> This time around I've implemented Uwe's simplified method for +> calculating the prescale & period_steps. For low values of prescale it +> makes for much worse approximations of the period, but as the period +> increases with respect to the that of the pwm's underlying clock there +> is mostly no different in the approximations. +> + +**[v1: riscv: mm: Ensure prot of VM_WRITE and VM_EXEC must be readable](http://lore.kernel.org/linux-riscv/20230421075111.1391952-1-woodrow.shen@sifive.com/)** + +> The commit 8aeb7b17f04e ("RISC-V: Make mmap() with PROT_WRITE imply PROT_READ") +> allows riscv to use mmap with PROT_WRITE only, and meanwhile mmap with w+x is +> also permitted. However, when userspace tries to access this page with +> PROT_WRITE|PROT_EXEC, which causes infinite loop at load page fault as well as +> it triggers soft lockup. According to riscv privileged spec, +> "Writable pages must also be marked readable". The fix to drop the +> `PAGE_COPY_EXEC` and then `PAGE_COPY_READ_EXEC` should be just used instead. +> This aligns the other arches (i.e arm64) for protection_map. +> + +**[v3: Add JH7110 cpufreq support](http://lore.kernel.org/linux-riscv/20230421031431.23010-1-mason.huo@starfivetech.com/)** + +> The StarFive JH7110 SoC has four RISC-V cores, +> and it supports up to 4 cpu frequency loads. +> +> This patchset adds the compatible strings into the allowlist +> for supporting the generic cpufreq driver on JH7110 SoC. +> Also, it enables the axp15060 pmic for the cpu power source. +> + +**[v1: RISC-V: include cpufeature.h in cpufeature.c](http://lore.kernel.org/linux-riscv/20230420-wound-gizzard-2b2b589d9bea@spud/)** + +> Automation complains: +> warning: symbol '__pcpu_scope_misaligned_access_speed' was not declared. Should it be static? +> +> cpufeature.c doesn't actually include the header of the same name, as it +> had not previously used anything from it. +> The per-cpu variable is declared there, so include it to silence the +> complaints. +> + +**[v5: Add JH7110 USB and USB PHY driver support](http://lore.kernel.org/linux-riscv/20230420110052.3182-1-minda.chen@starfivetech.com/)** + +> This patchset adds USB driver and USB PHY for the StarFive JH7110 SoC. +> USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. +> The patch has been tested on the VisionFive 2 board. +> + +**[v3: Change PWM-controlled LED pin active mode and algorithm](http://lore.kernel.org/linux-riscv/20230420093457.18936-1-nylon.chen@sifive.com/)** + +> According to the circuit diagram of User LEDs - RGB described in the manual hifive-unleashed-a00.pdf[0] and hifive-unmatched-schematics-v3.pdf[1]. +> + +**[v2: Add TDM audio on StarFive JH7110](http://lore.kernel.org/linux-riscv/20230420024118.22677-1-walker.chen@starfivetech.com/)** + +> This patchset adds TDM audio driver for the StarFive JH7110 SoC. The +> first patch adds device tree binding for TDM module. The second patch +> adds the item for JH7110 audio board to the dt-binding of StarFive +> SoC-based boards. The third patch adds tdm driver support for JH7110 +> SoC. The last patch adds device node of tdm and sound card to JH7110 dts. +> + +**[v1: kvmtool: RISC-V CoVE support](http://lore.kernel.org/linux-riscv/20230419222350.3604274-1-atishp@rivosinc.com/)** + +> This series is an initial version of the support for running confidential VMs on +> riscv architecture. This is to get feedback on the proposed COVH, COVI and COVG +> extensions for running Confidential VMs on riscv. The specification is available +> here [0]. Make sure to build it to get the latest changes as it gets updated +> from time to time. +> + +**[v2: Add JH7110 AON PMU support](http://lore.kernel.org/linux-riscv/20230419034833.43243-1-changhuang.liang@starfivetech.com/)** + +> This patchset adds aon power domain driver for the StarFive JH7110 SoC. +> It is used to turn on/off dphy rx/tx power switch. The series has been +> tested on the VisionFive 2 board. +> + +**[v1: pwm: sifive: Simplify using devm_clk_get_prepared()](http://lore.kernel.org/linux-riscv/20230418202102.117658-1-u.kleine-koenig@pengutronix.de/)** + +> Instead of preparing the clk after it was requested and unpreparing in +> .probe()'s error path and .remove(), use devm_clk_get_prepared() which +> copes for unpreparing automatically. +> + +**[v1: Split ptdesc from struct page](http://lore.kernel.org/linux-riscv/20230417205048.15870-1-vishal.moola@gmail.com/)** + +> The MM subsystem is trying to shrink struct page. This patchset +> introduces a memory descriptor for page table tracking - struct ptdesc. +> +> This patchset introduces ptdesc, splits ptdesc from struct page, and +> converts many callers of page table constructor/destructors to use ptdescs. +> + +**[v1: tools/nolibc: add stackprotector support for more architectures](http://lore.kernel.org/linux-riscv/20230408-nolibc-stackprotector-archs-v1-0-271f5c859c71@weissschuh.net/)** + +> Add stackprotector support for all remaining architectures, except s390. +> +> On s390 the stackprotectors are not supported in "global" mode; only +> "sysreg" mode which is not suppored in nolibc. +> + +**[v1: RISC-V: Add steal-time support](http://lore.kernel.org/linux-riscv/20230417103402.798596-1-ajones@ventanamicro.com/)** + +> One frequently touted benefit of virtualization is the ability to +> consolidate machines, increasing resource utilization. It may even be +> desirable to overcommit, at the risk of one or more VCPUs having to wait. +> Hypervisors which have interfaces for guests to retrieve the amount of +> time each VCPU had to wait give observers within the guests ways to +> account for less progress than would otherwise be expected. The SBI STA +> extension proposal[1] provides a standard interface for guest VCPUs to +> retrieve the amount of time "stolen". +> + +**[v3: riscv: mm: execute local TLB flush after populating vmemmap](http://lore.kernel.org/linux-riscv/20230417060618.639395-1-vincent.chen@sifive.com/)** + +> The spare_init() calls memmap_populate() many times to create VA to PA +> mapping for the VMEMMAP area, where all "struct page" are located once +> CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later +> initialized in the zone_sizes_init() function. However, during this +> process, no sfence.vma instruction is executed for this VMEMMAP area. +> This omission may cause the hart to fail to perform page table walk +> because some data related to the address translation is invisible to the +> hart. To solve this issue, the local_flush_tlb_kernel_range() is called +> right after the spare_init() to execute a sfence.vma instruction for the +> VMEMMAP area, ensuring that all data related to the address translation +> is visible to the hart. +> + +**[v1: riscv: dts: starfive: Add PMU controller node](http://lore.kernel.org/linux-riscv/20230417034728.2670-1-walker.chen@starfivetech.com/)** + +> Add the pmu controller node for the StarFive JH7110 SoC. The PMU needs +> to be used by other modules, e.g. VPU,ISP,etc. +> + +#### 进程调度 + +**[v2: net: net/sched: cls_api: Initialize miss_cookie_node when action miss is not used](http://lore.kernel.org/lkml/20230420183634.1139391-1-ivecera@redhat.com/)** + +> Function tcf_exts_init_ex() sets exts->miss_cookie_node ptr only +> when use_action_miss is true so it assumes in other case that +> the field is set to NULL by the caller. If not then the field +> contains garbage and subsequent tcf_exts_destroy() call results +> in a crash. +> Ensure that the field .miss_cookie_node pointer is NULL when +> use_action_miss parameter is false to avoid this potential scenario. +> + +**[v2: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230420051946.7463-1-yury.norov@gmail.com/)** + +> for_each_cpu() is widely used in kernel, and it's beneficial to create +> a NUMA-aware version of the macro. +> + +**[v1: net: sched: print jiffies when transmit queue time out](http://lore.kernel.org/lkml/20230419115632.738730-1-yajun.deng@linux.dev/)** + +> Although there is watchdog_timeo to let users know when the transmit queue +> begin stall, but dev_watchdog() is called with an interval. The jiffies +> will always be greater than watchdog_timeo. +> + +**[v1: drm/msm: Move cmdstream dumping out of sched kthread](http://lore.kernel.org/lkml/20230417225510.494951-1-robdclark@gmail.com/)** + +> This is something that can block for arbitrary amounts of time as +> userspace consumes from the FIFO. So we don't really want this to +> be in the fence signaling path. +> + +**[v1: sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag](http://lore.kernel.org/lkml/20230416213406.2966521-1-davidai@google.com/)** + +> A userspace service may manage uclamp dynamically for individual tasks and +> a child task will unintentionally inherit a pesudo-random uclamp setting. +> This could result in the child task being stuck with a static uclamp value +> that results in poor performance or poor power. +> + +**[GIT PULL: sched/urgent for v6.3-rc7](http://lore.kernel.org/lkml/20230416123412.GDZDvrRCv9VvvmXuPz@fat_crate.local/)** + +> pls pull an urgent scheduler fix for 6.3. +> +> Thx. +> + +#### 内存管理 + +**[v1: mm/gup: disallow GUP writing to file-backed mappings by default](http://lore.kernel.org/linux-mm/f86dc089b460c80805e321747b0898fd1efe93d7.1682168199.git.lstoakes@gmail.com/)** + +> It isn't safe to write to file-backed mappings as GUP does not ensure that +> the semantics associated with such a write are performed correctly, for +> instance filesystems which rely upon write-notify will not be correctly +> notified. +> + +**[v12: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230421231421.2401346-1-nphamcs@gmail.com/)** + +> There is currently no good way to query the page cache statistics of large +> files and directory trees. There is mincore(), but it scales poorly: the +> kernel writes out a lot of bitmap data that userspace has to aggregate, +> when the user really does not care about per-page information in that +> case. The user also needs to mmap and unmap each file as it goes along, +> which can be quite slow as well. +> + +**[v2: migrate: Avoid unbounded blocks in MIGRATE_SYNC_LIGHT](http://lore.kernel.org/linux-mm/20230421221249.1616168-1-dianders@chromium.org/)** + +> This series is the result of discussion around my RFC patch [1] where +> I talked about completely removing the waits for the folio_lock in +> migrate_folio_unmap(). +> + +**[v1: shmem: add support for blocksize > PAGE_SIZE](http://lore.kernel.org/linux-mm/20230421214400.2836131-1-mcgrof@kernel.org/)** + +> This is an initial attempt to add support for block size > PAGE_SIZE for tmpfs. +> Why would you want this? It helps us experiment with higher order folio uses +> with fs APIS and helps us test out corner cases which would likely need +> to be accounted for sooner or later if and when filesystems enable support +> for this. Better review early and burn early than continue on in the wrong +> direction so looking for early feedback. +> + +**[v2: kasan: use internal prototypes matching gcc-13 builtins](http://lore.kernel.org/linux-mm/20230421205754.106794-1-arnd@kernel.org/)** + +> This now passes all randconfig builds on arm, arm64 and x86, but I have +> not tested it on the other architectures that support kasan, since they +> tend to fail randconfig builds in other ways. This might fail if any +> of the 32-bit architectures expect a 'long' instead of 'int' for the +> size argument. +> + +**[v1: block: simplify with PAGE_SECTORS_SHIFT](http://lore.kernel.org/linux-mm/20230421195807.2804512-1-mcgrof@kernel.org/)** + +> A bit of block drivers have their own incantations with +> PAGE_SHIFT - SECTOR_SHIFT. Just simplfy and use PAGE_SECTORS_SHIFT +> all over. +> + +**[v5: cgroup: eliminate atomic rstat flushing](http://lore.kernel.org/linux-mm/20230421174020.2994750-1-yosryahmed@google.com/)** + +> A previous patch series ([1] currently in mm-stable) changed most +> atomic rstat flushing contexts to become non-atomic. This was done to +> avoid an expensive operation that scales with # cgroups and # cpus to +> happen with irqs disabled and scheduling not permitted. There were two +> remaining atomic flushing contexts after that series. This series tries +> to eliminate them as well, eliminating atomic rstat flushing completely. +> + +**[v1: arm64: Also reset KASAN tag if page is not PG_mte_tagged](http://lore.kernel.org/linux-mm/20230420210945.2313627-1-pcc@google.com/)** + +> Consider the following sequence of events: +> +> 1) A page in a PROT_READ|PROT_WRITE VMA is faulted. +> 2) Page migration allocates a page with the KASAN allocator, +> causing it to receive a non-match-all tag, and uses it +> to replace the page faulted in 1. +> 3) The program uses mprotect() to enable PROT_MTE on the page faulted in 1. +> + +**[v4: bio: check return values of bio_add_page](http://lore.kernel.org/linux-mm/20230420100501.32981-1-jth@kernel.org/)** + +> We have two functions for adding a page to a bio, __bio_add_page() which is +> used to add a single page to a freshly created bio and bio_add_page() which is +> used to add a page to an existing bio. +> + +**[v1: shmem: restrict noswap option to initial user namespace](http://lore.kernel.org/linux-mm/20230420-faxen-advokat-40abb4c1a152@brauner/)** + +> Prevent tmpfs instances mounted in an unprivileged namespaces from +> evading accounting of locked memory by using the "noswap" mount option. +> + +**[v15: RESEND: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-mm/20230420060156.895881-1-usama.anjum@collabora.com/)** + +> This syscall is used in Windows applications and games etc. This syscall is +> being emulated in pretty slow manner in userspace. Our purpose is to +> enhance the kernel such that we translate it efficiently in a better way. +> Currently some out of tree hack patches are being used to efficiently +> emulate it in some kernels. We intend to replace those with these patches. +> So the whole gaming on Linux can effectively get benefit from this. It +> means there would be tons of users of this code. +> + +**[v2: module: add debugging auto-load duplicate module support](http://lore.kernel.org/linux-mm/20230420003046.1604251-1-mcgrof@kernel.org/)** + +> The finit_module() system call can in the worst case use up to more than +> twice of a module's size in virtual memory. Duplicate finit_module() +> system calls are non fatal, however they unnecessarily strain virtual +> memory during bootup and in the worst case can cause a system to fail +> to boot. This is only known to currently be an issue on systems with +> larger number of CPUs. +> + +**[v15: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-mm/20230419110716.4113627-1-usama.anjum@collabora.com/)** + +> This syscall is used in Windows applications and games etc. This syscall is +> being emulated in pretty slow manner in userspace. Our purpose is to +> enhance the kernel such that we translate it efficiently in a better way. +> Currently some out of tree hack patches are being used to efficiently +> emulate it in some kernels. We intend to replace those with these patches. +> So the whole gaming on Linux can effectively get benefit from this. It +> means there would be tons of users of this code. +> + +**[v1: mm/cma: mm/cma: retry allocation of dedicated area on EBUSY](http://lore.kernel.org/linux-mm/20230419083851.2555096-1-sergii.piatakov@globallogic.com/)** + +> Sometimes continuous page range can't be successfully allocated, because +> some pages in the range may not pass the isolation test. In this case, +> the CMA allocator gets an EBUSY error and retries allocation again (in +> the slightly shifted range). +> + +**[v1: printk: Enough to disable preemption in printk deferred context](http://lore.kernel.org/linux-mm/20230419074210.17646-1-pmladek@suse.com/)** + +> The comment above printk_deferred_enter()/exit() definition claims +> that it can be used only when interrupts are disabled. +> + +**[v1: mm: skip CMA pages when they are not available](http://lore.kernel.org/linux-mm/1681882824-17532-1-git-send-email-zhaoyang.huang@unisoc.com/)** + +> It is wasting of effort to reclaim CMA pages if they are not availabe +> for current context during direct reclaim. Skip them when under corresponding +> circumstance. +> + +**[v1: mm/mmap: Map MAP_STACK to VM_STACK](http://lore.kernel.org/linux-mm/20230418210230.3495922-1-longman@redhat.com/)** + +> One of the flags of mmap(2) is MAP_STACK to request a memory segment +> suitable for a process or thread stack. The kernel currently ignores +> this flags. Glibc uses MAP_STACK when mmapping a thread stack. However, +> selinux has an execstack check in selinux_file_mprotect() which disallows +> a stack VMA to be made executable. +> + +**[v1: mm: reliable huge page allocator](http://lore.kernel.org/linux-mm/20230418191313.268131-1-hannes@cmpxchg.org/)** + +> As memory capacity continues to grow, 4k TLB coverage has not been +> able to keep up. On Meta's 64G webservers, close to 20% of execution +> cycles are observed to be handling TLB misses when using 4k pages +> only. Huge pages are shifting from being a nice-to-have optimization +> for HPC workloads to becoming a necessity for common applications. +> + +#### 文件系统 + +**[v1: io_uring: add getdents support, take 2](http://lore.kernel.org/linux-fsdevel/20230422-uring-getdents-v1-0-14c1db36e98c@codewreck.org/)** + +> The new API does nothing that cannot be achieved with plain syscalls so +> it shouldn't be introducing any new problem, the only downside is that +> having the state in the file struct isn't very uring-ish and if a +> better solution is found later that will probably require duplicating +> some logic in a new flag... But that seems like it would likely be a +> distant future, and this version should be usable right away. +> + +**[v2: Support negative dentries on case-insensitive ext4 and f2fs](http://lore.kernel.org/linux-fsdevel/20230422000310.1802-1-krisman@suse.de/)** + +> This is the v2 of the negative dentry support on case-insensitive directories. +> It doesn't have any functional changes from v1, but it adds more context and a +> comment to the dentry->d_name access I'm doing in d_revalidate, documenting +> why (i understand) it is safe to do it without protecting from the parallell +> directory changes. +> + +**[GIT PULL: Turn single vector imports into ITER_UBUF](http://lore.kernel.org/linux-fsdevel/f16053ea-d3b8-a8a2-0178-3981fea5a656@kernel.dk/)** + +> This series turns singe vector imports into ITER_UBUF, rather than +> ITER_IOVEC. The former is more trivial to iterate and advance, and hence +> a bit more efficient. From some very unscientific testing, +> 60% of all +> iovec imports are single vector. +> + +**[GIT PULL: pipe: nonblocking rw for io_uring](http://lore.kernel.org/linux-fsdevel/20230421-seilbahn-vorpreschen-bd73ac3c88d7@brauner/)** + +> /* Summary */ +> This contains Jens' work to support FMODE_NOWAIT and thus IOCB_NOWAIT +> for pipes ensuring that all places can deal with non-blocking requests. +> +> To this end, pass down the information that this is a nonblocking +> request so that pipe locking, allocation, and buffer checking correctly +> deal with those. +> + +**[v1: fs/coredump: open coredump file in O_WRONLY instead of O_RDWR](http://lore.kernel.org/linux-fsdevel/20230420120409.602576-1-vsementsov@yandex-team.ru/)** + +> This makes it possible to make stricter apparmor profile and don't +> allow the program to read any coredump in the system. +> + +**[v2: shmem: Add user and group quota support for tmpfs](http://lore.kernel.org/linux-fsdevel/20230420080359.2551150-1-cem@kernel.org/)** + +> This is the version 2 of the quota support from tmpfs addressing some issues +> discussed on V1 and a few extra things, details are within each patch. Original +> cover-letter below. +> + +**[v5: Introduce block provisioning primitives](http://lore.kernel.org/linux-fsdevel/20230420004850.297045-1-sarthakkukreti@chromium.org/)** + +> Next revision of adding support for block provisioning requests. +> + +**[v2: ext4: Handle error pointers being returned from __filemap_get_folio](http://lore.kernel.org/linux-fsdevel/20230419120923.3152939-1-willy@infradead.org/)** + +> Commit "mm: return an ERR_PTR from __filemap_get_folio" changed from +> returning NULL to returning an ERR_PTR(). This cannot be fixed in either +> the ext4 tree or the mm tree, so this patch should be applied as part +> of merging the two trees. +> + +**[v10: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20230419114320.13674-1-nj.shetty@samsung.com/)** + +> The patch series covers the points discussed in November 2021 virtual +> call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0]. +> We have covered the initial agreed requirements in this patchset and +> further additional features suggested by community. +> Patchset borrows Mikulas's token based approach for 2 bdev +> implementation. +> + +**[v1: Backport several fuse patches for 6.1.y](http://lore.kernel.org/linux-fsdevel/20230419095518.51373-1-yb203166@antfin.com/)** + +> Antgroup is using 5.10.y in product environment, we found several patches are +> missing in 5.10.y tree. These patches are needed for us. So we backported them +> to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression. +> + +**[v1: Backport several fuse patches for 5.15.y](http://lore.kernel.org/linux-fsdevel/20230419095424.51328-1-yb203166@antfin.com/)** + +> Antgroup is using 5.10.y in product environment, we found several patches are +> missing in 5.10.y tree. These patches are needed for us. So we backported them +> to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression. +> + +**[v1: Backport several fuse patches to 5.10.y](http://lore.kernel.org/linux-fsdevel/20230419094844.51110-1-yb203166@antfin.com/)** + +> Antgroup is using 5.10.y in product environment, we found several patches are +> missing in 5.10.y tree. These patches are needed for us. So we backported them +> to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression. +> + +**[v4: Introduce provisioning primitives for thinly provisioned storage](http://lore.kernel.org/linux-fsdevel/20230418221207.244685-1-sarthakkukreti@chromium.org/)** + +> This patch series is revision 4 of introducing a new mechanism to pass through provision requests on stacked thinly provisioned storage devices. See [1] for original cover letter. +> +> [1] https://lore.kernel.org/lkml/ZDnMl8A1B1+Tfn5S@redhat.com/T/#md4f20113c2242755747ae069f84be720a6751012 +> + +**[v3: bpf-next: FUSE BPF: A Stacked Filesystem Extension for FUSE](http://lore.kernel.org/linux-fsdevel/20230418014037.2412394-1-drosen@google.com/)** + +> These patches extend FUSE to be able to act as a stacked filesystem. This +> allows pure passthrough, where the fuse file system simply reflects the lower +> filesystem, and also allows optional pre and post filtering in BPF and/or the +> userspace daemon as needed. This can dramatically reduce or even eliminate +> transitions to and from userspace. +> + +**[v1: shmem: stable directory cookies](http://lore.kernel.org/linux-fsdevel/168175931561.2843.16288612382874559384.stgit@manet.1015granger.net/)** + +> The current cursor-based directory cookie mechanism doesn't work +> when a tmpfs filesystem is exported via NFS. This is because NFS +> clients do not open directories: each READDIR operation has to open +> the directory on the server, read it, then close it. The cursor +> state for that directory, being associated strictly with the opened +> struct file, is then discarded. +> + +**[v1: vfs: allow using kernel buffer during fiemap operation](http://lore.kernel.org/linux-fsdevel/bc30483b-7f9b-df4e-7143-8646aeb4b5a2@I-love.SAKURA.ne.jp/)** + +> syzbot is reporting circular locking dependency between ntfs_file_mmap() +> (which has mm->mmap_lock => ni->ni_lock => ni->file.run_lock dependency) +> and ntfs_fiemap() (which has ni->ni_lock => ni->file.run_lock => +> mm->mmap_lock dependency), for commit c4b929b85bdb ("vfs: vfs-level fiemap +> interface") implemented fiemap_fill_next_extent() using copy_to_user() +> where direct mm->mmap_lock dependency is inevitable. +> + +#### 网络设备 + +**[v5: net-next: net/smc: Introduce SMC-D-based OS internal communication acceleration](http://lore.kernel.org/netdev/1682252271-2544-1-git-send-email-guwen@linux.alibaba.com/)** + +> We found SMC-D can be used to accelerate OS internal communication, such as +> loopback or between two containers within the same OS instance. So this patch +> set provides a kind of SMC-D dummy device (we call it the SMC-D loopback device) +> to emulate an ISM device, so that SMC-D can also be used on architectures +> other than s390. The SMC-D loopback device are designed as a system global +> device, visible to all containers. +> + +**[v4: net-next: tsnep: XDP socket zero-copy support](http://lore.kernel.org/netdev/20230421194656.48063-1-gerhard@engleder-embedded.com/)** + +> Implement XDP socket zero-copy support for tsnep driver. I tried to +> follow existing drivers like igc as far as possible. But one main +> + +**[v3: net: netlink: Use copy_to_user() for optval in netlink_getsockopt().](http://lore.kernel.org/netdev/20230421185255.94606-1-kuniyu@amazon.com/)** + +> Brad Spencer provided a detailed report [0] that when calling getsockopt() +> for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such +> options require at least sizeof(int) as length. +> + +**[v5: bpf-next: bpf: add netfilter program type](http://lore.kernel.org/netdev/20230421170300.24115-1-fw@strlen.de/)** + +> Changes since last version: +> - rework test case in last patch wrt. ctx->skb dereference etc (Alexei) +> - pacify bpf ci tests, netfilter program type missed string translation +> in libbpf helper. +> + +**[v5: drivers/net/phy: add driver for Microchip LAN867x 10BASE-T1S PHY](http://lore.kernel.org/netdev/ZEK8Hvl0Zl%2F0NntI@debian/)** + +> This patch adds support for the Microchip LAN867x 10BASE-T1S family +> (LAN8670/1/2). The driver supports P2MP with PLCA. +> + +**[v2: can: virtio: Initial virtio CAN driver.](http://lore.kernel.org/netdev/20230421145653.12811-1-Mikhail.Golubev-Ciuchea@opensynergy.com/)** + +> This is version 3 of the driver after having gotten review comments. +> + +**[v1: net-next: net: dsa: MT7530, MT7531, and MT7988 improvements](http://lore.kernel.org/netdev/20230421143648.87889-1-arinc.unal@arinc9.com/)** + +> This patch series is focused on simplifying the code, and improving the +> logic of the support for MT7530, MT7531, and MT7988 SoC switches. +> +> There's also a fix for the switch on the MT7988 SoC. +> + +#### 异步 IO + +**[v1: io_uring: honor I/O nowait flag for read/write](http://lore.kernel.org/io-uring/20230421172822.8053-1-kch@nvidia.com/)** + +> When IO_URING_F_NONBLOCK is set on io_kiocb req->flag in io_write() or +> io_read() IOCB_NOWAIT is set for kiocb when passed it to the respective +> rw_iter callback. This sets REQ_NOWAIT for underlaying I/O. The result +> is low level driver always sees block layer request as REQ_NOWAIT even +> if user has submitted request with nowait = 0 e.g. fio nowait=0. +> + +**[v1: tools/io_uring: Add .gitignore](http://lore.kernel.org/io-uring/tencent_C8F457D8D10F44760333A1E1AC9B4B0C1507@qq.com/)** + +> Ignore {io_uring-bench,io_uring-cp}. +> + +**[v2: io_uring: Pass the whole sqe to commands](http://lore.kernel.org/io-uring/20230421114440.3343473-1-leitao@debian.org/)** + +> These three patches prepare for the sock support in the io_uring cmd, as +> described in the following RFC: +> +> https://lore.kernel.org/lkml/20230406144330.1932798-1-leitao@debian.org/ +> + +**[v1: test/file-verify.t: Don't run over mlock limit when run as non-root](http://lore.kernel.org/io-uring/20230420185728.4104-1-krisman@suse.de/)** + +> test/file-verify tries to get 2MB of pinned memory at once, which is +> higher than the default allowed for non-root users in older +> kernels (64kb before v5.16, nowadays 8mb). Skip the test for non-root +> users if the registration fails instead of failing the test. +> + +**[v1: Support for mapping SQ/CQ rings into huge page](http://lore.kernel.org/io-uring/20230419224805.693734-1-axboe@kernel.dk/)** + +> io_uring SQ/CQ rings are allocated by the kernel from contigious, normal +> pages, and then the application mmap()'s the rings into userspace. This +> works fine, but does require contigious pages to be available for the +> given SQ and CQ ring sizes. As uptime increases on a given system, so +> does memory fragmentation. Entropy is invevitable. +> + +**[v1: io_uring: Pass whole sqe to commands](http://lore.kernel.org/io-uring/20230419102930.2979231-1-leitao@debian.org/)** + +> These two patches prepares for the sock support in the io_uring cmd, as +> described in the following RFC: +> +> https://lore.kernel.org/lkml/20230406144330.1932798-1-leitao@debian.org/ +> + +**[v1: io_uring: Optimization of buffered random write](http://lore.kernel.org/io-uring/20230419092233.56338-1-luhongfei@vivo.com/)** + +> The buffered random write performance of io_uring is poor +> due to the following reason: +> By default, when performing buffered random writes, io_sq_thread +> will call io_issue_sqe writes req, but due to the setting of +> IO_URING_F_NONBLOCK, req is executed asynchronously in iou-wrk, +> where io_wq_submit_work calls io_issue_sqe completes the write req, +> with issue_flag as IO_URING_F_UNLOCKED | IO_URING_F_IOWQ, +> which will reduce performance. +> This patch will determine whether this req is a buffered random write, +> and if so, io_sq_thread directly calls io_issue_sqe(req, 0) +> completes req instead of completing it asynchronously in iou wrk. +> + +**[v4: io_uring: add support for multishot timeouts](http://lore.kernel.org/io-uring/20230418225817.1905027-1-davidhwei@meta.com/)** + +> A multishot timeout submission will repeatedly generate completions with +> the IORING_CQE_F_MORE cflag set. +> + +**[v1: for-next: another round of rsrc refactoring](http://lore.kernel.org/io-uring/cover.1681822823.git.asml.silence@gmail.com/)** + +> The main part is Patch 3, which establishes 1:1 relation between +> struct io_rsrc_put and nodes, which removes io_rsrc_node_switch() / +> io_rsrc_node_switch_start() and all the additional complexity with +> pre allocations. Note, it doesn't change any guarantees as +> io_queue_rsrc_removal() was doing allocations anyway and could +> always fail. +> + +**[v1: liburing: io_uring sendto](http://lore.kernel.org/io-uring/20230415165821.791763-1-ammarfaizi2@gnuweeb.org/)** + +> There are two patches in this series. The first patch adds +> io_uring_prep_sendto() function. The second patch addd the +> manpage and CHANGELOG. +> + +#### Rust For Linux + +**[v1: v4.1: rust: lock: introduce `SpinLock`](http://lore.kernel.org/rust-for-linux/20230419174426.132207-1-wedsonaf@gmail.com/)** + +> This is the `spinlock_t` lock backend and allows Rust code to use the +> kernel spinlock idiomatically. +> + +**[v1: .gitattributes: set diff driver for Rust source code files](http://lore.kernel.org/rust-for-linux/20230418233048.335281-1-ojeda@kernel.org/)** + +> Git supports a builtin Rust diff driver [1] since v2.23.0 (2019). +> +> It improves the choice of hunk headers in some cases, such as +> + +**[v1: Rust 1.68.2 upgrade](http://lore.kernel.org/rust-for-linux/20230418214347.324156-1-ojeda@kernel.org/)** + +> This is the first upgrade to the Rust toolchain since the initial Rust +> merge, from 1.62.0 to 1.68.2 (i.e. the latest). +> + +#### BPF + +**[v4: bpf-next: bpftool: Show map IDs along with struct_ops links.](http://lore.kernel.org/bpf/20230421214131.352662-1-kuifeng@meta.com/)** + +> A new link type, BPF_LINK_TYPE_STRUCT_OPS, was added to attach +> struct_ops to links. (226bc6ae6405) It would be helpful for users to +> know which map is associated with the link. +> + +**[v1: bpf-next: selftests/bpf: verifier/prevent_map_lookup converted to inline assembly](http://lore.kernel.org/bpf/20230421204514.2450907-1-eddyz87@gmail.com/)** + +> Test verifier/prevent_map_lookup automatically converted to use inline assembly. +> +> This was a part of a series [1] but could not be applied becuase +> another patch from a series had to be witheld. +> + +**[v1: bpf-next: Second set of verifier/*.c migrated to inline assembly](http://lore.kernel.org/bpf/20230421174234.2391278-1-eddyz87@gmail.com/)** + +> This is a follow up for RFC [1]. It migrates a second batch of 23 +> verifier/*.c tests to inline assembly and use of ./test_progs for +> actual execution. Link to the first batch is [2]. +> + +**[v1: Dump map id instead of value for map_of_maps types](http://lore.kernel.org/bpf/20230421101154.23690-1-kuro@kuroa.me/)** + +> When using `bpftool map dump` in plain format, it is usually +> more convenient to show the inner map id instead of raw value. +> Changing this behavior would help with quick debugging with +> `bpftool`, without disruption scripted behavior. Since user +> could dump the inner map with id, but need to convert value. +> + +**[v2: bpf-next: Introduce a new kfunc of bpf_task_under_cgroup](http://lore.kernel.org/bpf/20230421090403.15515-1-zhoufeng.zf@bytedance.com/)** + +> Trace sched related functions, such as enqueue_task_fair, it is necessary to +> specify a task instead of the current task which within a given cgroup. +> + +**[v1: bpf-next: selftests/xsk: put MAP_HUGE_2MB in correct argument](http://lore.kernel.org/bpf/20230421062208.3772-1-magnus.karlsson@gmail.com/)** + +> Put the flag MAP_HUGE_2MB in the correct flags argument instead of the +> wrong offset argument. +> + +**[v3: bpf-next: net/smc: Introduce BPF injection capability](http://lore.kernel.org/bpf/1682051033-66125-1-git-send-email-alibuda@linux.alibaba.com/)** + +> This patches attempt to introduce BPF injection capability for SMC, +> and add selftest to ensure code stability. +> +> As we all know that the SMC protocol is not suitable for all scenarios, +> especially for short-lived. However, for most applications, they cannot +> guarantee that there are no such scenarios at all. Therefore, apps +> may need some specific strategies to decide shall we need to use SMC +> or not, for example, apps can limit the scope of the SMC to a specific +> IP address or port. +> + +**[v2: bpf: Socket lookup BPF API from tc/xdp ingress does not respect VRF bindings.](http://lore.kernel.org/bpf/20230420145041.508434-1-gilad9366@gmail.com/)** + +> When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't +> respected. This patchset fixes this by regarding the incoming device's +> VRF attachment when performing the socket lookups from tc/xdp. +> + +**[v1: net-next: net: lan966x: Don't use xdp_frame when action is XDP_TX](http://lore.kernel.org/bpf/20230420121152.2737625-1-horatiu.vultur@microchip.com/)** + +> When the action of an xdp program was XDP_TX, lan966x was creating +> a xdp_frame and use this one to send the frame back. But it is also +> possible to send back the frame without needing a xdp_frame, because +> it possible to send it back using the page. +> And then once the frame is transmitted is possible to use directly +> page_pool_recycle_direct as lan966x is using page pools. +> This would save some CPU usage on this path. +> + +**[v5: tracing: Add fprobe events](http://lore.kernel.org/bpf/168198993129.1795549.8306571027057356176.stgit@mhiramat.roam.corp.google.com/)** + +> Here is the 5th version of improve fprobe and add a basic fprobe event +> support for ftrace (tracefs) and perf. Here is the previous version. +> + +**[v1: bpf-next: Introduce a new bpf helper of bpf_task_under_cgroup](http://lore.kernel.org/bpf/20230420072657.80324-1-zhoufeng.zf@bytedance.com/)** + +> Trace sched related functions, such as enqueue_task_fair, it is necessary to +> specify a task instead of the current task which within a given cgroup to a map. +> + +**[v2: bpf-next: Dynptr helpers](http://lore.kernel.org/bpf/20230420071414.570108-1-joannelkoong@gmail.com/)** + +> This patchset is the 3rd in the dynptr series. The 1st (dynptr +> fundamentals) can be found here [0] and the second (skb + xdp dynptrs) +> can be found here [1]. +> + +**[v2: bpf-next: Access variable length array relaxed for integer type](http://lore.kernel.org/bpf/20230420032735.27760-1-zhoufeng.zf@bytedance.com/)** + +> Add support for integer type of accessing variable length array. +> Add a selftest to check it. +> + +**[v1: bpf-next: bpftool: Replace "__fallthrough" by a comment to address merge conflict](http://lore.kernel.org/bpf/20230420003333.90901-1-quentin@isovalent.com/)** + +> The recent support for inline annotations in control flow graphs +> generated by bpftool introduced the usage of the "__fallthrough" macro +> in a switch/case block in btf_dumper.c. This change went through the +> bpf-next tree, but resulted in a merge conflict in linux-next, because +> this macro has been renamed "fallthrough" (no underscores) in the +> meantime. +> + +**[v1: bpf-next: bpf: handle another corner case in getsockopt](http://lore.kernel.org/bpf/20230418225343.553806-1-sdf@google.com/)** + +> Martin reports another case where getsockopt EFAULTs perfectly +> valid callers. Let's fix it and also replace EFAULT with +> pr_info_ratelimited. That should hopefully make this place +> less error prone. +> + +**[v2: vmlinux.lds.h: Discard .note.gnu.property section](http://lore.kernel.org/bpf/20230418214925.ay3jpf2zhw75kgmd@treble/)** + +> When tooling reads ELF notes, it assumes each note entry is aligned to +> the value listed in the .note section header's sh_addralign field. +> +> The kernel-created ELF notes in the .note.Linux and .note.Xen sections +> are aligned to 4 bytes. This causes the toolchain to set those +> sections' sh_addralign values to 4. +> + +**[v1: bpf-next: bpftool: Register struct_ops with a link.](http://lore.kernel.org/bpf/20230418200058.603169-1-kuifeng@meta.com/)** + +> You can include an optional path after specifying the object name for the +> 'struct_ops register' subcommand. +> +> Since the commit 226bc6ae6405 ("Merge branch 'Transit between BPF TCP +> congestion controls.'") has been accepted, it is now possible to create a +> link for a struct_ops. This can be done by defining a struct_ops in +> SEC(".struct_ops.link") to make libbpf returns a real link. If we don't pin +> the links before leaving bpftool, they will disappear. To instruct bpftool +> to pin the links in a directory with the names of the maps, we need to +> provide the path of that directory. +> + +**[v6: bpf-next: bpf: Add socket destroy capability](http://lore.kernel.org/bpf/20230418153148.2231644-1-aditi.ghag@isovalent.com/)** + +> This patch adds the capability to destroy sockets in BPF. We plan to use +> the capability in Cilium to force client sockets to reconnect when their +> remote load-balancing backends are deleted. The other use case is +> on-the-fly policy enforcement where existing socket connections prevented +> by policies need to be terminated. +> + +**[v2: bpf-next: XDP-hints: XDP kfunc metadata for driver igc](http://lore.kernel.org/bpf/168182460362.616355.14591423386485175723.stgit@firesoul/)** + +> Implement both RX hash and RX timestamp XDP hints kfunc metadata +> for driver igc. +> + +### 周边技术动态 + +#### Qemu + +**[v8: target/riscv: rework CPU extension validation](http://lore.kernel.org/qemu-devel/20230421132727.121462-1-dbarboza@ventanamicro.com/)** + +> This version dropped patch 12 from v7. Alistair mentioned that it would +> limiti static CPUs needlesly, since there's nothing preventing a static +> CPU to allow for extension changes during runtime, and that misa-w is +> enough to prevent write_misa() during runtime. I agree. +> + +**[v1: hw/riscv: virt: Enable booting M-mode or S-mode FW from pflash0](http://lore.kernel.org/qemu-devel/20230421043353.125701-1-sunilvl@ventanamicro.com/)** + +> Currently, virt machine supports two pflash instances each with +> 32MB size. However, the first pflash is always assumed to +> contain M-mode firmware and reset vector is set to this if +> enabled. Hence, for S-mode payloads like EDK2, only one pflash +> instance is available for use. This means both code and NV variables +> of EDK2 will need to use the same pflash. +> + +**[v3: riscv: Make sure an exception is raised if a pte is malformed](http://lore.kernel.org/qemu-devel/20230420150220.60919-1-alexghiti@rivosinc.com/)** + +> As per the specification, in 64-bit, if any of the pte reserved bits +> Memory Protection"). In addition, we must check the napot/pbmt bits are +> not set if those extensions are not active. +> + +**[v1: target/riscv: add Ventana's Veyron V1 CPU](http://lore.kernel.org/qemu-devel/20230418123624.16414-1-dbarboza@ventanamicro.com/)** + +> Add a virtual CPU for Ventana's first CPU named veyron-v1. It runs +> exclusively for the rv64 target. It's tested with the 'virt' board. +> + +**[v7: target/riscv: rework CPU extensions validation](http://lore.kernel.org/qemu-devel/20230417140013.58893-1-dbarboza@ventanamicro.com/)** + +> In this v7 we have three extra patches: +> +> - patch 4 [1] and 5 [2], both from Weiwei Li, addresses an issue that +> we're going to have with Zca and RVC if we push the priv spec +> disabling code to the end of validation. More details can be seen on +> [3]. Patch 5 commit message also has some context on it; +> + +**[v2: Add RISC-V vector cryptographic instruction set support](http://lore.kernel.org/qemu-devel/20230417135821.609964-1-lawrence.hunter@codethink.co.uk/)** + +> This patchset provides an implementation for Zvbb, Zvbc, Zvkned, Zvknh, Zvksh, +> Zvkg, and Zvksed of the draft RISC-V vector cryptography extensions as per the +> v20230407 version of the specification(1) (3206f07). This is an update to the +> patchset submitted to qemu-devel on Friday, 10 Mar 2023 16:03:01 +0000. +> + +**[v2: target/riscv: Restore the predicate() NULL check behavior](http://lore.kernel.org/qemu-devel/20230417043054.3125614-1-bmeng@tinylab.org/)** + +> When reading a non-existent CSR QEMU should raise illegal instruction +> exception, but currently it just exits due to the g_assert() check. +> +> This actually reverts commit 0ee342256af9205e7388efdf193a6d8f1ba1a617. +> Some comments are also added to indicate that predicate() must be +> provided for an implemented CSR. +> + +**[v1: riscv: implement Ssqosid extension and CBQRI controllers](http://lore.kernel.org/qemu-devel/20230416232050.4094820-1-dfustini@baylibre.com/)** + +> This RFC series implements the Ssqosid extension and the sqoscfg CSR as +> defined in the RISC-V Capacity and Bandwidth Controller QoS Register +> Interface (CBQRI) specification [1]. Quality of Service (QoS) in this +> context is concerned with shared resources on an SoC such as cache +> capacity and memory bandwidth. +> + +#### U-Boot + +**[v5: Add StarFive JH7110 PCIe drvier support](http://lore.kernel.org/u-boot/20230423105859.125764-1-minda.chen@starfivetech.com/)** + +> This patchset needs to apply after patchset in [1]. These PCIe series patches +> are based on the JH7110 RISC-V SoC and VisionFive V2 board. +> +> [1] https://patchwork.ozlabs.org/project/uboot/cover/20230329034224.26545-1-yanhong.wang@starfivetech.com +> + +**[v1: u-boot-riscv/master](http://lore.kernel.org/u-boot/ZEHbqoEXAB+BAtmo@ubuntu01/)** + +> The following changes since commit 5db4972a5bbdbf9e3af48ffc9bc4fec73b7b6a79: +> +> Merge tag 'u-boot-nand-20230417' of https://source.denx.de/u-boot/custodians/u-boot-nand-flash (2023-04-17 10:47:33 -0400) +> + +**[v1: riscv: visionfive2: use OF_BOARD_SETUP](http://lore.kernel.org/u-boot/20230419112801.GA1907@lst.de/)** + +> U-Boot already has a mechanism to fix up the DT before OS boot. +> This avoids the excessive duplication of data and work proposed +> by the explicit separation of 1.2a and 1.3b board revisions. It +> will also, to a good degree, improve the user experience, as +> pointed out by Matthias. +> + +## 20230416:第 42 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v18: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230414155843.12963-1-andy.chiu@sifive.com/)** + +> This patchset is implemented based on vector 1.0 spec to add vector support +> in riscv Linux kernel. There are some assumptions for this implementations. +> + +**[v1: riscv: mm: execute local TLB flush after populating vmemmap](http://lore.kernel.org/linux-riscv/20230414081605.471375-1-vincent.chen@sifive.com/)** + +> The spare_init() calls memmap_populate() many times to create VA to PA +> mapping for the VMEMMAP area, where all "strcut page" are located once +> CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later +> initialized in the zone_sizes_init() function. However, during this +> process, no sfence.vma instruction is executed for this VMEMMAP area. +> This omission may cause the hart to fail to perform page table work +> because some data related to the address translation is invisible to the +> hart. To solve this issue, the local_flush_tlb_kernel_range() is called +> right after the spare_init() to execute a sfence.vma instruction for the +> VMEMMAP area, ensuring that all data related to the address translation +> is visible to the hart. +> + +**[v3: Add PLL clocks driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230414024157.53203-1-xingyu.wu@starfivetech.com/)** + +> This patch serises are to add PLL clocks driver and providers by writing +> and reading syscon registers for the StarFive JH7110 RISC-V SoC. And add +> documentation to describe StarFive System Controller(syscon) Registers. +> + +**[v1: riscv: Allow userspace to directly access perf counters](http://lore.kernel.org/linux-riscv/20230413161725.195417-1-alexghiti@rivosinc.com/)** + +> riscv used to allow direct access to cycle/time/instret counters, +> bypassing the perf framework, this patchset intends to allow the user to +> mmap any counter when accessed through perf. But we can't break the +> existing behaviour so we introduce a sysctl perf_user_access like arm64 +> does, which defaults to the legacy mode described above. +> + +**[v8: Add non-coherent DMA support for AX45MP](http://lore.kernel.org/linux-riscv/20230412110900.69738-1-prabhakar.mahadev-lad.rj@bp.renesas.com/)** + +> On the Andes AX45MP core, cache coherency is a specification option so it +> may not be supported. In this case DMA will fail. To get around with this +> issue this patch series does the below: +> +> 1] Andes alternative ports is implemented as errata which checks if the IOCP +> is missing and only then applies to CMO errata. One vendor specific SBI EXT +> (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of errata. +> + +**[v4: Add JH7110 MIPI DPHY RX support](http://lore.kernel.org/linux-riscv/20230412084540.295411-1-changhuang.liang@starfivetech.com/)** + +> This patchset adds mipi dphy rx driver for the StarFive JH7110 SoC. +> It is used to transfer CSI camera data. The series has been tested on +> the VisionFive 2 board. +> + +**[v4: Add new partial clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230411135558.44282-1-xingyu.wu@starfivetech.com/)** + +> This patch serises are base on the basic JH7110 SYSCRG/AONCRG +> drivers and add new partial clock drivers and reset supports +> about System-Top-Group(STG), Image-Signal-Process(ISP) +> and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. These +> clocks and resets could be used by DMA, VIN and Display modules. +> + +**[v16: Microchip Soft IP corePWM driver](http://lore.kernel.org/linux-riscv/20230411-wizard-cautious-3c048db6b4d2@wendy/)** + +> Uwe & I had a long back and forth about period calculations on v13, +> my ultimate conclusion being that, after some testing of the "corrected" +> calculation in hardware, the original calculation was correct. +> I think we had gotten sucked into discussion the calculation of the +> period itself, when we were in fact trying to calculate a bound on the +> period instead. That discussion is here: +> https://lore.kernel.org/linux-pwm/Y+ow8tfAHo1yv1XL@wendy/ +> + +**[v1: Add JH7110 cpufreq support](http://lore.kernel.org/linux-riscv/20230411083257.16155-1-mason.huo@starfivetech.com/)** + +> The StarFive JH7110 SoC has four RISC-V cores, +> and it supports up to 4 cpu frequency loads. +> +> This patchset adds the compatible strings into the allowlist +> for supporting the generic cpufreq driver on JH7110 SoC. +> Also, it enables the axp15060 pmic for the cpu power source. +> + +**[v1: Add JH7110 DPHY PMU support](http://lore.kernel.org/linux-riscv/20230411064743.273388-1-changhuang.liang@starfivetech.com/)** + +> This patchset adds mipi dphy power domain driver for the StarFive JH7110 +> SoC. It is used to turn on dphy power switch. The series has been tested +> on the VisionFive 2 board. +> + +**[v4: -next: support allocating crashkernel above 4G explicitly on riscv](http://lore.kernel.org/linux-riscv/20230410130553.3226347-1-chenjiahao16@huawei.com/)** + +> On riscv, the current crash kernel allocation logic is trying to +> allocate within 32bit addressible memory region by default, if +> failed, try to allocate without 4G restriction. +> + +**[v1: RISC-V: Detect Ssqosid extension and handle sqoscfg CSR](http://lore.kernel.org/linux-riscv/20230410043646.3138446-1-dfustini@baylibre.com/)** + +> This RFC series adds initial support for the Ssqosid extension and the +> sqoscfg CSR as specified in Chapter 2 of the RISC-V Capacity and +> Bandwidth Controller QoS Register Interface (CBQRI) specification [1]. +> + +**[v1: ata: Change email addresses in MAINTAINERS](http://lore.kernel.org/linux-riscv/20230410042646.124962-1-dlemoal@kernel.org/)** + +> Change my email addresses referenced in the MAINTAINERS file for the ata +> subsystem to dlemoal@kernel.org. While at it, also change other +> references (zonefs and k210 drivers) to the same address. +> + +**[v1: riscv: enable BUILDTIME_TABLE_SORT for !MMU](http://lore.kernel.org/linux-riscv/20230409164306.3801-1-jszhang@kernel.org/)** + +> BUILDTIME_TABLE_SORT works for !MMU as well, so enable it. +> + +#### 进程调度 + +**[v2: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230415050617.324288-1-yury.norov@gmail.com/)** + +> for_each_cpu() is widely used in kernel, and it's beneficial to create +> a NUMA-aware version of the macro. +> +> Recently added for_each_numa_hop_mask() works, but switching existing +> codebase to it is not an easy process. +> + +**[v6: sched/numa: add per-process numa_balancing](http://lore.kernel.org/lkml/20230412140701.58337-1-ligang.bdlg@bytedance.com/)** + +> # Introduce +> Add PR_NUMA_BALANCING in prctl. +> +> A large number of page faults will cause performance loss when numa +> balancing is performing. Thus those processes which care about worst-case +> performance need numa balancing disabled. Others, on the contrary, allow a +> temporary performance loss in exchange for higher average performance, so +> enable numa balancing is better for them. +> + +**[v1: sched/core: Make sched_dynamic_mutex static](http://lore.kernel.org/lkml/016987c1ec4649b74973a000e81c35e48ba6072e.1681277194.git.jpoimboe@kernel.org/)** + +> The sched_dynamic_mutex is only used within the file. Make it static. +> + +**[v1: sched: Rate limit migrations](http://lore.kernel.org/lkml/20230411214116.361016-1-mathieu.desnoyers@efficios.com/)** + +> This WIP patch rate-limits migrations to 32 migrations per 10ms window +> for each task. +> + +#### 内存管理 + +**[v8: mm: process/cgroup ksm support](http://lore.kernel.org/linux-mm/20230415225913.3206647-1-shr@devkernel.io/)** + +> So far KSM can only be enabled by calling madvise for memory regions. To +> be able to use KSM for more workloads, KSM needs to have the ability to be +> enabled / disabled at the process / cgroup level. +> + +**[v5: Replace invocations of prandom_u32() with get_random_u32()](http://lore.kernel.org/linux-mm/20230415173549.5345-1-david.keisarschm@mail.huji.ac.il/)** + +> The security improvements for prandom_u32 done in commits c51f8f88d705 +> from October 2020 and d4150779e60f from May 2022 didn't handle the cases +> when prandom_bytes_state() and prandom_u32_state() are used. +> + +**[v1: mm: rename reclaim_pages() to reclaim_folios()](http://lore.kernel.org/linux-mm/20230415092716.61970-1-wangkefeng.wang@huawei.com/)** + +> As commit a83f0551f496 ("mm/vmscan: convert reclaim_pages() to use +> a folio") changes the arg from page_list to folio_list, but not +> the defination, let's correct it and rename it to reclaim_folios too. +> + +**[v2: mm: make arch_has_descending_max_zone_pfns() static](http://lore.kernel.org/linux-mm/20230415081904.969049-1-arnd@kernel.org/)** + +> clang produces a build failure on x86 for some randconfig builds +> after a change that moves around code to mm/mm_init.c: +> +> Cannot find symbol for section 2: .text. +> mm/mm_init.o: failed +> + +**[v1: NFSD memory allocation optimizations](http://lore.kernel.org/linux-mm/168151777579.1588.7882383278745556830.stgit@klimt.1015granger.net/)** + +> I've found a few ways to optimize the release of pages in NFSD. +> Please let me know if I'm abusing the release_pages() and pagevec +> APIs. +> + +**[v1: mm/folio: Avoid special handling for order value 0 in folio_set_order](http://lore.kernel.org/linux-mm/20230414194832.973194-1-tsahu@linux.ibm.com/)** + +> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order +> folio does not have any tail page to set order. folio->_folio_nr_pages is +> set to 0 for order 0 in folio_set_order. It is required because +> _folio_nr_pages overlapped with page->mapping and leaving it non zero +> caused "bad page" error while freeing gigantic hugepages. This was fixed in +> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic +> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA +> pages to CMA") now explicitly clear page->mapping and hence we won't see +> the bad page error even if _folio_nr_pages remains unset. Also the order 0 +> folios are not supposed to call folio_set_order, So now we can get rid of +> folio_set_order(folio, 0) from hugetlb code path to clear the confusion. +> + +**[v4: modules/kmod: replace implementation with a sempahore](http://lore.kernel.org/linux-mm/20230414171644.2434448-1-mcgrof@kernel.org/)** + +> Changes on this v4: +> +> o Really add Matthew Wilcox' preferred tribal knowledge docs +> o Add all the pending tags +> + +**[v1: lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes](http://lore.kernel.org/linux-mm/20230414162755.281993820@linutronix.de/)** + +> The cpu_dying_mask is not only undocumented but also to some extent a +> misnomer. It's purpose is to capture the last direction of a cpu_up() or +> cpu_down() operation taking eventual rollback operations into account. +> + +**[v5: Introduce Copy-On-Write to Page Table](http://lore.kernel.org/linux-mm/20230414142341.354556-1-shiyn.lin@gmail.com/)** + +> This patch is primarily aimed at optimizing the memory usage of page +> table in processes with large address space, which can potentailly lead +> to improved the fork system calll latency under certain conditions. +> However, we're planning to improve the fork latency in the future but +> not in this patch. +> + +**[v1: mm: page_alloc: Skip regions with hugetlbfs pages when allocating 1G pages](http://lore.kernel.org/linux-mm/20230414141429.pwgieuwluxwez3rj@techsingularity.net/)** + +> A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is +> taking an excessive amount of time for large amounts of memory. Further +> testing allocating huge pages that the cost is linear i.e. if allocating +> 1G pages in batches of 10 then the time to allocate nr_hugepages from +> 10->20->30->etc increases linearly even though 10 pages are allocated at +> each step. Profiles indicated that much of the time is spent checking the +> validity within already existing huge pages and then attempting a migration +> that fails after isolating the range, draining pages and a whole lot of +> other useless work. +> + +**[v1: mm: page_alloc: Assume huge tail pages are valid when allocating contiguous pages](http://lore.kernel.org/linux-mm/20230414082222.idgw745cgcduzy37@techsingularity.net/)** + +> A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is +> taking an excessive amount of time for large amounts of memory. Further +> testing allocating huge pages that the cost is linear i.e. if allocating +> 1G pages in batches of 10 then the time to allocate nr_hugepages from +> 10->20->30->etc increases linearly even though 10 pages are allocated at +> each step. +> + +**[v3: module: avoid userspace pressure on unwanted allocations](http://lore.kernel.org/linux-mm/20230414050836.1984746-1-mcgrof@kernel.org/)** + +> This v3 series follows up on the second iteration of these patches [0]. This +> and other pending changes are avaiable on 20230413-module-alloc-opts +> branch [1] which is based on modules-next. +> + +**[v2: mm: ksm: support hwpoison for ksm page](http://lore.kernel.org/linux-mm/20230414021741.2597273-1-xialonglong1@huawei.com/)** + +> Currently, ksm does not support hwpoison. As ksm is being used more widely +> for deduplication at the system level, container level, and process level, +> supporting hwpoison for ksm has become increasingly important. However, ksm +> pages were not processed by hwpoison in 2009 [1]. +> + +**[v1: migrate_pages: Never block waiting for the page lock](http://lore.kernel.org/linux-mm/20230413182313.RFC.1.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid/)** + +> Currently when we try to do page migration and we're in "synchronous" +> mode (and not doing direct compaction) then we'll wait an infinite +> amount of time for a page lock. This does not appear to be a great +> idea. +> + +**[v1: Setting memory policy for restrictedmem file](http://lore.kernel.org/linux-mm/cover.1681430907.git.ackerleytng@google.com/)** + +> This patchset builds upon the memfd_restricted() system call that was +> discussed in the 'KVM: mm: fd-based approach for supporting KVM' patch +> series [1]. +> + +**[v1: change ->index to PAGE_SIZE for hugetlb pages](http://lore.kernel.org/linux-mm/20230413231452.84529-1-sidhartha.kumar@oracle.com/)** + +> This RFC patch series attempts to simplify the page cache code by removing +> special casing code for hugetlb pages. Normal pages in the page cache are +> indexed by PAGE_SIZE while hugetlb pages are indexed by their huge page +> size. This was previously tried but the xarray was not performant enough +> for the changes. +> + +**[v2: -next: mm: hwpoison: support recovery from HugePage copy-on-write faults](http://lore.kernel.org/linux-mm/20230413131349.2524210-1-liushixin2@huawei.com/)** + +> copy-on-write of hugetlb user pages with uncorrectable errors will result +> in a kernel crash. This is because the copy is performed in kernel mode +> and in general we can not handle accessing memory with such errors while +> in kernel mode. Commit a873dfe1032a ("mm, hwpoison: try to recover from +> copy-on write faults") introduced the routine copy_user_highpage_mc() to +> gracefully handle copying of user pages with uncorrectable errors. However, +> the separate hugetlb copy-on-write code paths were not modified as part +> of commit a873dfe1032a. +> + +**[v6: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230413104034.1086717-1-yosryahmed@google.com/)** + +> Upon running some proactive reclaim tests using memory.reclaim, we +> noticed some tests flaking where writing to memory.reclaim would be +> successful even though we did not reclaim the requested amount fully +> Looking further into it, I discovered that *sometimes* we overestimate +> the number of reclaimed pages in memcg reclaim. +> + +**[v1: printk: Export console trace point for kcsan/kasan/kfence/kmsan](http://lore.kernel.org/linux-mm/20230413100859.1492323-1-quic_pkondeti@quicinc.com/)** + +> The console tracepoint is used by kcsan/kasan/kfence/kmsan test +> modules. Since this tracepoint is not exported, these modules iterate +> over all available tracepoints to find the console trace point. +> Export the trace point so that it can be directly used. +> + +**[v7: ksm: support tracking KSM-placed zero-pages](http://lore.kernel.org/linux-mm/202304131346489021903@zte.com.cn/)** + +> The core idea of this patch set is to enable users to perceive the number +> of any pages merged by KSM, regardless of whether use_zero_page switch has +> been turned on, so that users can know how much free memory increase is +> really due to their madvise(MERGEABLE) actions. But the problem is, when +> enabling use_zero_pages, all empty pages will be merged with kernel zero +> pages instead of with each other as use_zero_pages is disabled, and then +> these zero-pages are no longer monitored by KSM. +> + +**[v1: mm: hwpoison: coredump: support recovery from dump_user_range()](http://lore.kernel.org/linux-mm/20230413041336.26874-1-wangkefeng.wang@huawei.com/)** + +> The dump_user_range() is used to copy the user page to a coredump +> file, but if a hardware memory error occurred during copy, which +> called from __kernel_write_iter() in dump_user_range(), it crashs, +> + +**[v1: selftests/mm: Replace obsolete memalign() with posix_memalign()](http://lore.kernel.org/linux-mm/20230413012751.4445-1-wangdeming@inspur.com/)** + +> memalign() is obsolete according to its manpage. +> +> Replace memalign() with posix_memalign(). +> + +**[v1: mm: huge_memory: Replace obsolete memalign() with posix_memalign()](http://lore.kernel.org/linux-mm/20230413011719.4355-1-wangdeming@inspur.com/)** + +> memalign() is obsolete according to its manpage. +> +> Replace memalign() with posix_memalign() +> + +**[v2: mm: hugetlb_vmemmap: provide stronger vmemmap allocation guarantees](http://lore.kernel.org/linux-mm/20230412195939.1242462-1-pasha.tatashin@soleen.com/)** + +> HugeTLB pages have a struct page optimizations where struct pages for tail +> pages are freed. However, when HugeTLB pages are destroyed, the memory for +> struct pages (vmemmap) need to be allocated again. +> + +**[v1: mm: hugetlb_vmemmap: provide stronger vmemmap allocaction gurantees](http://lore.kernel.org/linux-mm/20230412152337.1203254-1-pasha.tatashin@soleen.com/)** + +> HugeTLB pages have a struct page optimizations where struct pages for tail +> pages are freed. However, when HugeTLB pages are destroyed, the memory for +> struct pages (vmemmap) need to be allocated again. +> + +#### 文件系统 + +**[v1: fanotify: support watching filesystems and mounts inside userns](http://lore.kernel.org/linux-fsdevel/20230416060722.1912831-1-amir73il@gmail.com/)** + +> An unprivileged user is allowed to create an fanotify group and add +> inode marks, but not filesystem and mount marks. +> + +**[v2: fs/proc: add Kthread flag to /proc/$pid/status](http://lore.kernel.org/linux-fsdevel/20230416052404.2920-1-fullspring2018@gmail.com/)** + +> The command `ps -ef ` and `top -c` mark kernel thread by '[' +> and ']', but sometimes the result is not correct. +> The task->flags in /proc/$pid/stat is good, but we need remember +> the value of PF_KTHREAD is 0x00200000 and convert dec to hex. +> If we have no binary program and shell script which read +> /proc/$pid/stat, we can know it directly by +> `cat /proc/$pid/status`. +> + +**[v1: Monitoring unmounted fs with fanotify](http://lore.kernel.org/linux-fsdevel/20230414182903.1852019-1-amir73il@gmail.com/)** + +> Followup on my quest to close the gap with inotify functionality, +> here is a proposal for FAN_UNMOUNT event. +> + +**[v2: Alter fcntl to handle int arguments correctly](http://lore.kernel.org/linux-fsdevel/20230414152459.816046-1-Luca.Vizzarro@arm.com/)** + +> According to the documentation of fcntl, some commands take an int as +> argument. In practice not all of them enforce this behaviour, as they +> instead accept a more permissive long and in most cases not even a +> range check is performed. +> +> An issue could possibly arise from a combination of the handling of the +> varargs in user space and the ABI rules of the target, which may result +> in the top bits of an int argument being non-zero. +> + +**[v1: mm/filemap: allocate folios according to the blocksize](http://lore.kernel.org/linux-fsdevel/20230414134908.103932-1-hare@suse.de/)** + +> If the blocksize is larger than the pagesize allocate folios +> with the correct order. +> + +**[v1: convert create_page_buffers to create_folio_buffers](http://lore.kernel.org/linux-fsdevel/20230414110821.21548-1-p.raghav@samsung.com/)** + +> One of the first kernel panic we hit when we try to increase the +> block size > 4k is inside create_page_buffers()[1]. Even though buffer.c +> function do not support large folios (folios > PAGE_SIZE) at the moment, +> these changes are required when we want to remove that constraint. +> + +**[v3: Introduce provisioning primitives for thinly provisioned storage](http://lore.kernel.org/linux-fsdevel/20230414000219.92640-1-sarthakkukreti@chromium.org/)** + +> This patch series adds a mechanism to pass through provision requests on +> stacked thinly provisioned block devices. +> + +**[v1: fs/ntfs3: disable page fault during ntfs_fiemap()](http://lore.kernel.org/linux-fsdevel/f649c9c0-6c0c-dd0d-e3c9-f0c580a11cd9@I-love.SAKURA.ne.jp/)** + +> syzbot is reporting circular locking dependency between ntfs_file_mmap() +> (which has mm->mmap_lock => ni->ni_lock dependency) and ntfs_fiemap() +> (which has ni->ni_lock => mm->mmap_lock dependency). +> + +**[v1: Backport several patches to 5.10.y](http://lore.kernel.org/linux-fsdevel/20230412041935.1556-1-yb203166@antfin.com/)** + +> Antgroup is using 5.10.y in product environment, we found several patches are +> missing in 5.10.y tree. These patches are needed for us. So we backported them +> to 5.10.y +> + +**[v6: net-next: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1](http://lore.kernel.org/linux-fsdevel/20230411160902.4134381-1-dhowells@redhat.com/)** + +> Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES +> internal sendmsg flag that is intended to replace the ->sendpage() op with +> calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol +> that it should splice the pages supplied if it can and copy them if not. +> + +**[v1: [RESEND] fs: opportunistic high-res file timestamps](http://lore.kernel.org/linux-fsdevel/20230411143702.64495-1-jlayton@kernel.org/)** + +> (Apologies for the resend, but I didn't send this with a wide enough +> distribution list originally). +> + +**[v1: fs: opportunistic high-res file timestamps](http://lore.kernel.org/linux-fsdevel/20230411142708.62475-1-jlayton@kernel.org/)** + +> While I don't think we can practically optimize away ctime updates +> like we do with i_version, I do like the idea of using this scheme to +> indicate when we need to use a high-res timestamp. +> + +**[v1: fanotify: Enable FAN_REPORT_FID on more filesystem types](http://lore.kernel.org/linux-fsdevel/20230411124037.1629654-1-amir73il@gmail.com/)** + +> If kernel supports FAN_REPORT_ANY_FID, use this flag to allow testing +> also filesystems that do not support fsid or NFS file handles (e.g. fuse). +> + +**[v9: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20230411081041.5328-1-anuj20.g@samsung.com/)** + +> The patch series covers the points discussed in November 2021 virtual +> call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0]. +> We have covered the initial agreed requirements in this patchset and +> further additional features suggested by community. +> Patchset borrows Mikulas's token based approach for 2 bdev +> implementation. +> + +**[v4: Providing mount in memfd_restricted() syscall](http://lore.kernel.org/linux-fsdevel/cover.1681176340.git.ackerleytng@google.com/)** + +> This patchset builds upon the memfd_restricted() system call that was +> discussed in the 'KVM: mm: fd-based approach for supporting KVM' patch +> series, at +> https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/ +> + +**[v2: sysv: don't call sb_bread() with pointers_lock held](http://lore.kernel.org/linux-fsdevel/38509ddd-51a2-70da-6564-9ded34b2f363@I-love.SAKURA.ne.jp/)** + +> syzbot is reporting sleep in atomic context in SysV filesystem [1], for +> sb_bread() is called with rw_spinlock held. +> +> A "write_lock(&pointers_lock) => read_lock(&pointers_lock) deadlock" bug +> and a "sb_bread() with write_lock(&pointers_lock)" bug were introduced by +> "Replace BKL for chain locking with sysvfs-private rwlock" in Linux 2.5.12. +> + +**[v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/)** + +> This removes the dependency on interrupts to wake up task. Set task +> state as TASK_RUNNING, if need_resched() returns true, +> while polling for IO completion. +> Earlier, polling task used to sleep, relying on interrupt to wake it up. +> This made some IO take very long when interrupt-coalescing is enabled in +> NVMe. +> + +#### 网络设备 + +**[v1: brcmfmac: Demote some kernel errors to info](http://lore.kernel.org/netdev/20230416-brcmfmac-noise-v1-0-f0624e408761@marcan.st/)** + +> brcmfmac has some messages that are KERN_ERR even though they are +> harmless. This is spooking and confusing people, because they end up +> being the *only* kernel messages on their boot console with common +> error-only printk levels (at least on Apple Macs). +> + +**[v1: net: virtio-net: reject small vring sizes](http://lore.kernel.org/netdev/20230416074607.292616-1-alvaro.karsz@solid-run.com/)** + +> Check vring size and fail probe if a transmit/receive vring size is +> smaller than MAX_SKB_FRAGS + 2. +> +> At the moment, any vring size is accepted. This is problematic because +> it may result in attempting to transmit a packet with more fragments +> than there are descriptors in the ring. +> + +**[v1: net-next: ethtool mm API improvements](http://lore.kernel.org/netdev/20230415173454.3970647-1-vladimir.oltean@nxp.com/)** + +> Currently the ethtool --set-mm API permits the existence of 2 +> configurations which don't make sense: +> +> - pmac-enabled false tx-enabled true +> - tx-enabled false verify-enabled true +> + +**[v1: net-next: Ocelot/Felix driver support for preemptible traffic classes](http://lore.kernel.org/netdev/20230415170551.3939607-1-vladimir.oltean@nxp.com/)** + +> The series "Add tc-mqprio and tc-taprio support for preemptible traffic +> classes" from: +> https://lore.kernel.org/netdev/20230220122343.1156614-1-vladimir.oltean@nxp.com/ +> +> was eventually submitted in a form without the support for the +> Ocelot/Felix switch driver. This patch set picks up that work again, +> and presents a fairly modified form compared to the original. +> + +**[v2: net: net/sched: clear actions pointer in miss cookie init fail](http://lore.kernel.org/netdev/20230415153309.241940-1-pctammela@mojatatu.com/)** + +> Palash reports a UAF when using a modified version of syzkaller[1]. +> +> When 'tcf_exts_miss_cookie_base_alloc()' fails in 'tcf_exts_init_ex()' +> a call to 'tcf_exts_destroy()' is made to free up the tcf_exts +> resources. +> In flower, a call to '__fl_put()' when 'tcf_exts_init_ex()' fails is made; +> Then calling 'tcf_exts_destroy()', which triggers an UAF since the +> already freed tcf_exts action pointer is lingering in the struct. +> + +**[v2: net-next: tsnep: XDP socket zero-copy support](http://lore.kernel.org/netdev/20230415144256.27884-1-gerhard@engleder-embedded.com/)** + +> Implement XDP socket zero-copy support for tsnep driver. I tried to +> follow existing drivers like igc as far as possible. But one main +> + +**[v2: net-next: r8169: use new macros from netdev_queues.h](http://lore.kernel.org/netdev/f07fd01b-b431-6d8d-bd14-d447dffd8e64@gmail.com/)** + +> Add one missing subqueue version of the macros, and use the new macros +> in r8169 to simplify the code. +> + +**[v6: net-next: XDP Rx HWTS metadata for stmmac driver](http://lore.kernel.org/netdev/20230415064503.3225835-1-yoong.siang.song@intel.com/)** + +> Implemented XDP receive hardware timestamp metadata for stmmac driver. +> +> This patchset is tested with tools/testing/selftests/bpf/xdp_hw_metadata. +> Below are the test steps and results. +> + +**[v1: net-next: sctp: add some missing peer_capables in sctp info dump](http://lore.kernel.org/netdev/cover.1681507192.git.lucien.xin@gmail.com/)** + +> The 1st patch removes the unused and obsolete hostname_address from +> sctp_association peer and also the bit from sctp_info peer_capables, +> and then reuses its bit for reconf_capable and use the higher +> available bit for intl_capable in the 2nd patch. +> + +**[v6: ip.7: Add "special and reserved addresses" section](http://lore.kernel.org/netdev/20230414184558.GB2557040@demorgan/)** + +> Break out the discussion of special and reserved IPv4 addresses into +> a subsection, formatted as a pair of definition lists, and briefly +> describing three cases in which Linux no longer treats addresses +> specially, where other systems do or did. +> + +**[v1: net-next: eth: mlx5: avoid iterator use outside of a loop](http://lore.kernel.org/netdev/20230414180729.198284-1-kuba@kernel.org/)** + +> Fix the following warning about risky iterator use: +> +> drivers/net/ethernet/mellanox/mlx5/core/eq.c:1010 mlx5_comp_irq_get_affinity_mask() warn: iterator used outside loop: 'eq' +> + +**[v1: ice: document RDMA devlink parameters](http://lore.kernel.org/netdev/20230414162614.571861-1-jacob.e.keller@intel.com/)** + +> Commit e523af4ee560 ("net/ice: Add support for enable_iwarp and enable_roce +> devlink param") added support for the enable_roce and enable_iwarp +> parameters in the ice driver. It didn't document these parameters in the +> ice devlink documentation file. Add this documentation, including a note +> about the mutual exclusion between the two modes. +> + +**[v1: net-next: net: skbuff: hide some bitfield members](http://lore.kernel.org/netdev/20230414160105.172125-1-kuba@kernel.org/)** + +> There is a number of protocol or subsystem specific fields +> in struct sk_buff which are only accessed by one subsystem. +> We can wrap them in ifdefs with minimal code impact. +> +> This gives us a better chance to save a 2B and a 4B holes +> resulting with the following savings (assuming a lucky +> kernel config): +> + +**[v2: net-next: ax25: exit linked-list searches earlier](http://lore.kernel.org/netdev/20230414143357.5523-1-peter@n8pjl.ca/)** + +> There's no need to loop until the end of the list if we have a result. +> +> Device callsigns are unique, so there can only be one dev returned from +> ax25_addr_ax25dev(). If not, there would be inconsistencies based on +> order of insertion, and refcount leaks. +> + +**[v1: net-next: selftests: openvswitch: add support for testing upcall interface](http://lore.kernel.org/netdev/20230414131750.4185160-1-aconole@redhat.com/)** + +> The existing selftest suite for openvswitch will work for regression +> testing the datapath feature bits, but won't test things like adding +> interfaces, or the upcall interface. Here, we add some additional +> test facilities. +> + +**[v1: net: wwan: Expose secondary AT port on DATA1](http://lore.kernel.org/netdev/20230414-rpmsg-wwan-secondary-at-port-v1-1-6d7307527911@nayarsystems.com/)** + +> Our use-case needs two AT ports available: +> One for running a ppp daemon, and another one for management +> +> This patch enables a second AT port on DATA1 +> + +**[答复: v1: net: Add check for csum_start in skb_partial_csum_set()](http://lore.kernel.org/netdev/a30a8ffaa8dd4cb6a84103eecf0c3338@huawei.com/)** + +> Conceivably this can be added, though it is a bit complex for devices with variable length link layer headers. And it would have to happen not only for packet sockets, but all users of virtio_net_hdr. +> + +**[v1: net-next: net: phy: add driver for MediaTek SoC built-in GE PHYs](http://lore.kernel.org/netdev/ZDihjfnzaZ1yh9cT@makrotopia.org/)** + +> Some of MediaTek's Filogic SoCs come with built-in Gigabit Ethernet +> PHYs which require calibration data from the SoC's efuse. +> Add support for these PHYs to the mediatek-ge driver if built for +> MediaTek's ARM64 SoCs. +> + +**[v2: net-next: virtio/vsock: support datagrams](http://lore.kernel.org/netdev/20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com/)** + +> This series introduces support for datagrams to virtio/vsock. +> +> It is a spin-off (and smaller version) of this series from the summer: +> https://lore.kernel.org/all/cover.1660362668.git.bobby.eshleman@bytedance.com/ +> + +**[v1: Enable multiple MCAN on AM62x](http://lore.kernel.org/netdev/20230413223051.24455-1-jm@ti.com/)** + +> On AM62x there is one MCAN in MAIN domain and two in MCU domain. +> The MCANs in MCU domain were not enabled since there is no +> hardware interrupt routed to A53 GIC interrupt controller. +> Therefore A53 Linux cannot be interrupted by MCU MCANs. +> + +**[v1: net: Revert "net/mlx5: Enable management PF initialization"](http://lore.kernel.org/netdev/20230413222547.56901-1-kuba@kernel.org/)** + +> Paul reports that it causes a regression with IB on CX4 +> and FW 12.18.1000. In addition I think that the concept +> of "management PF" is not fully accepted and requires +> a discussion. +> + +**[v1: net-next: net: page_pool: add pages and released_pages counters](http://lore.kernel.org/netdev/a20f97acccce65d174f704eadbf685d0ce1201af.1681422222.git.lorenzo@kernel.org/)** + +> Introduce pages and released_pages counters to page_pool ethtool stats +> in order to track the number of allocated and released pages from the +> pool. +> + +**[GIT PULL: Networking for v6.3-rc7](http://lore.kernel.org/netdev/20230413213217.822550-1-kuba@kernel.org/)** + +> Including fixes from bpf, and bluetooth. +> +> Not all that quiet given spring celebrations, but "current" fixes +> are thinning out, which is encouraging. One outstanding regression +> in the mlx5 driver when using old FW, not blocking but we're pushing +> for a fix. +> + +**[v5: Add EMAC3 support for sa8540p-ride (devicetree/clk bits)](http://lore.kernel.org/netdev/20230413191541.1073027-1-ahalaney@redhat.com/)** + +> This is a forward port / upstream refactor of code delivered +> downstream by Qualcomm over at [0] to enable the DWMAC5 based +> implementation called EMAC3 on the sa8540p-ride dev board. +> + +**[v9: Another crack at a handshake upcall mechanism](http://lore.kernel.org/netdev/168141287044.157208.15120359741792569671.stgit@manet.1015granger.net/)** + +> Here is v9 of a series to add generic support for transport layer +> security handshake on behalf of kernel socket consumers (user space +> consumers use a security library directly, of course). +> + +**[v1: net-next: lib/win_minmax: export symbol of minmax_running_min](http://lore.kernel.org/netdev/20230413164726.59019-1-bobankhshen@gmail.com/)** + +> This commit export the symbol of the function minmax_running_min +> to make it accessible to dynamically loaded modules. It can make +> this library more general, especially for those congestion +> control algorithm modules who wants to implement a windowed min +> filter. +> + +**[v1: staging: octeon: Convert to use phylink](http://lore.kernel.org/netdev/ZDgNexVTEfyGo77d@lenoch/)** + +> The purpose of this patches is to provide support for SFP cage to +> Octeon ethernet driver. +> + +**[v4: net-next: Add SCM_PIDFD and SO_PEERPIDFD](http://lore.kernel.org/netdev/20230413133355.350571-1-aleksandr.mikhalitsyn@canonical.com/)** + +> 1. Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS, +> but it contains pidfd instead of plain pid, which allows programmers not +> to care about PID reuse problem. +> +> 2. Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd. +> This thing is direct analog of SO_PEERCRED which allows to get plain PID. +> +> 3. Add SCM_PIDFD / SO_PEERPIDFD kselftest +> + +**[v2: bpf-next: bpf: add netfilter program type](http://lore.kernel.org/netdev/20230413133228.20790-1-fw@strlen.de/)** + +> The new program type is 'tracing style', i.e. there is no context +> access rewrite done by verifier, the function argument (struct bpf_nf_ctx) +> isn't stable. +> There is no support for direct packet access, dynptr api should be used +> instead. +> + +**[v1: net-next: Support tunnel mode in mlx5 IPsec packet offload](http://lore.kernel.org/netdev/cover.1681388425.git.leonro@nvidia.com/)** + +> This series extends mlx5 to support tunnel mode in its IPsec packet +> offload implementation. +> + +**[v2: net: Finish up ->msg_control{,_user} split](http://lore.kernel.org/netdev/20230413114705.157046-1-kevin.brodsky@arm.com/)** + +> Commit 1f466e1f15cf ("net: cleanly handle kernel vs user buffers for +> ->msg_control") introduced the msg_control_user and +> msg_control_is_user fields in struct msghdr, to ensure that user +> pointers are represented as such. It also took care of converting most +> users of struct msghdr::msg_control where user pointers are involved. It +> did however miss a number of cases, and some code using msg_control +> inappropriately has also appeared in the meantime. +> + +**[v8: net/packet: support mergeable feature of virtio](http://lore.kernel.org/netdev/20230413114402.50225-1-amy.saq@antgroup.com/)** + +> Packet sockets, like tap, can be used as the backend for kernel vhost. +> In packet sockets, virtio net header size is currently hardcoded to be +> the size of struct virtio_net_hdr, which is 10 bytes; however, it is not +> always the case: some virtio features, such as mrg_rxbuf, need virtio +> net header to be 12-byte long. +> + +**[v5: net-next: Support MACsec VLAN](http://lore.kernel.org/netdev/20230413105622.32697-1-ehakim@nvidia.com/)** + +> This patch series introduces support for hardware (HW) offload MACsec +> devices with VLAN configuration. The patches address both scenarios +> where the VLAN header is both the inner and outer header for MACsec. +> + +**[v3: net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg](http://lore.kernel.org/netdev/ZDfbCsDa6oLKzsed@pr0lnx/)** + +> If the TCA_QFQ_LMAX value is not offered through nlattr, lmax is determined by the MTU value of the network device. +> The MTU of the loopback device can be set up to 2^31-1. +> As a result, it is possible to have an lmax value that exceeds QFQ_MIN_LMAX. +> + +**[v1: net-next: bridge: Add per-{Port, VLAN} neighbor suppression](http://lore.kernel.org/netdev/20230413095830.2182382-1-idosch@nvidia.com/)** + +> In order to minimize the flooding of ARP and ND messages in the VXLAN +> network, EVPN includes provisions [1] that allow participating VTEPs to +> suppress such messages in case they know the MAC-IP binding and can +> reply on behalf of the remote host. In Linux, the above is implemented +> in the bridge driver using a per-port option called "neigh_suppress" +> that was added in kernel version 4.15 [2]. +> + +#### 异步 IO + +**[v1: liburing: io_uring sendto](http://lore.kernel.org/io-uring/20230415165821.791763-1-ammarfaizi2@gnuweeb.org/)** + +> There are two patches in this series. The first patch adds +> io_uring_prep_sendto() function. The second patch addd the +> manpage and CHANGELOG. +> + +**[v3: liburing: multishot timeout support](http://lore.kernel.org/io-uring/20230414225506.4108955-1-davidhwei@meta.com/)** + +> Changes on the liburing side to support multishot timeouts. +> + +**[v1: io_uring: complete request via task work in case of DEFER_TASKRUN](http://lore.kernel.org/io-uring/20230414075313.373263-1-ming.lei@redhat.com/)** + +> So far io_req_complete_post() only covers DEFER_TASKRUN by completing +> request via task work when the request is completed from IOWQ. +> +> However, uring command could be completed from any context, and if io +> uring is setup with DEFER_TASKRUN, the command is required to be +> completed from current context, otherwise wait on IORING_ENTER_GETEVENTS +> can't be wakeup, and may hang forever. +> + +**[v2: liburing: add multishot timeout support](http://lore.kernel.org/io-uring/20230412222931.1635706-1-davidhwei@meta.com/)** + +> Single change to sync the new IORING_TIMEOUT_MULTISHOT flag with kernel. +> +> Mostly unit tests for multishot timeouts. +> + +**[v1: io_uring/uring_cmd: take advantage of completion batching](http://lore.kernel.org/io-uring/bbcdf761-e6f2-c2c5-dfb7-4579124a8fd5@kernel.dk/)** + +> We know now what the completion context is for the uring_cmd completion +> handling, so use that to have io_req_task_complete() decide what the +> best way to complete the request is. This allows batching of the posted +> completions if we have multiple pending, rather than always doing them +> one-by-one. +> + +#### Rust For Linux + +**[v1: rust: init: broaden the blanket impl of `Init`](http://lore.kernel.org/rust-for-linux/20230413100157.740697-1-benno.lossin@proton.me/)** + +> This makes it possible to use `T` as a `impl Init` for every error +> type `E` instead of just `Infallible`. +> + +**[v1: MAINTAINERS: add Benno Lossin as Rust reviewer](http://lore.kernel.org/rust-for-linux/20230412221823.830135-1-ojeda@kernel.org/)** + +> Benno has been involved with the Rust for Linux project for +> the better part of a year now. He has been working on solving +> the safe pinned initialization problem [1], which resulted in +> the pin-init API patch series [2] that allows to reduce the +> need for `unsafe` code in the kernel. He is also working on +> the field projection RFC for Rust [3] to bring pin-init as +> a language feature. +> + +**[v1: v4.1: rust: lock: add `Guard::do_unlocked`](http://lore.kernel.org/rust-for-linux/20230412121431.41627-1-wedsonaf@gmail.com/)** + +> It releases the lock, executes some function provided by the caller, +> then reacquires the lock. This is preparation for the implementation of +> condvars, which will sleep after between unlocking and relocking. +> + +**[v5: scripts: `make rust-analyzer` for out-of-tree modules](http://lore.kernel.org/rust-for-linux/20230411091714.130525-1-varmavinaym@gmail.com/)** + +> Adds support for out-of-tree rust modules to use the `rust-analyzer` +> make target to generate the rust-project.json file. +> +> The change involves adding an optional parameter `external_src` to the +> `generate_rust_analyzer.py` which expects the path to the out-of-tree +> module's source directory. When this parameter is passed, I have chosen +> not to add the non-core modules (samples and drivers) into the result +> since these are not expected to be used in third party modules. Related +> changes are also made to the Makefile and rust/Makefile allowing the +> `rust-analyzer` target to be used for out-of-tree modules as well. +> + +#### BPF + +**[v1: A new bpf map type for fuzzy matching key](http://lore.kernel.org/bpf/303b5895-319d-2bb7-9909-10fec3323df2@antgroup.com/)** + +> For supporting fuzzy matching in bpf map as described in the original +> question [0], we come up with a proposal that would like to have some +> advice or comments from bpf thread. Thanks a lot for all the feedback :) +> +> We plan to implement a new bpf map type, naming BPF_FM_MAP, standing for +> fuzzy matching map. +> The basic idea is implementing a trie-tree using map of map runtime +> structure. +> + +**[v2: bpf-next: Shared ownership for local kptrs](http://lore.kernel.org/bpf/20230415201811.343116-1-davemarchevsky@fb.com/)** + +> The above program will fail verification due to current owning / non-owning ref +> logic: after bpf_list_push_back, n is a non-owning reference and thus cannot be +> passed to bpf_rbtree_add. The only way to get an owning reference for the node +> that was added is to bpf_list_pop_{front,back} it. +> + +**[v2: libbpf: correct the macro KERNEL_VERSION for old kernel](http://lore.kernel.org/bpf/20230414084353.36545-1-songrui.771@bytedance.com/)** + +> The introduced header file linux/version.h in libbpf_probes.c may have a wrong macro KERNEL_VERSION for calculating LINUX_VERSION_CODE in some old kernel (Debian9,10). Below is a version info example from Debian 10. +> + +**[v1: vmlinux.lds.h: Discard .note.gnu.property section](http://lore.kernel.org/bpf/20230413185922.ufmollqlnlghwyvy@treble/)** + +> It looks like CONFIG_DEBUG_INFO_BTF is already (inadvertently) stripping +> it from vmlinux due to how GNU properties are merged by the linker (see +> "How GNU properties are merged" in the ld man page). +> + +**[v1: MAINTAINERS: make me a reviewer of VIRTIO CORE AND NET DRIVERS](http://lore.kernel.org/bpf/20230413071610.43659-1-xuanzhuo@linux.alibaba.com/)** + +> First of all, I personally love open source, linux and virtio. I have +> also participated in community work such as virtio for a long time. +> + +**[v1: net-next: bpf, net: Support redirecting to ifb with bpf](http://lore.kernel.org/bpf/20230413025350.79809-1-laoar.shao@gmail.com/)** + +> In our container environment, we are using EDT-bpf to limit the egress +> bandwidth. EDT-bpf can be used to limit egress only, but can't be used +> to limit ingress. Some of our users also want to limit the ingress +> bandwidth. +> + +**[v3: net: mana: Add support for jumbo frame](http://lore.kernel.org/bpf/1681334163-31084-1-git-send-email-haiyangz@microsoft.com/)** + +> The set adds support for jumbo frame, +> with some optimization for the RX path. +> + +**[v10: bpf: XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash](http://lore.kernel.org/bpf/168132888942.340624.2449617439220153267.stgit@firesoul/)** + +> Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value, +> but doesn't provide information on the RSS hash type (part of 6.3-rc). +> +> This patchset proposal is to change the function call signature via adding +> a pointer value argument for providing the RSS hash type. +> + +**[v1: bpf-next: bpf: Handle NULL in bpf_local_storage_free.](http://lore.kernel.org/bpf/20230412171252.15635-1-alexei.starovoitov@gmail.com/)** + +> During OOM bpf_local_storage_alloc() may fail to allocate 'storage' and +> call to bpf_local_storage_free() with NULL pointer will cause a crash like: +> + +**[v6: bpf-next: xsk: Support UMEM chunk_size > PAGE_SIZE](http://lore.kernel.org/bpf/20230412162114.19389-1-kal.conley@dectris.com/)** + +> The main purpose of this patchset is to add AF_XDP support for UMEM +> chunk sizes > PAGE_SIZE. This is enabled for UMEMs backed by HugeTLB +> pages. +> + +**[v1: selftests/bpf: ignore pointer types check with clang](http://lore.kernel.org/bpf/20230412095912.188453-1-andrea.righi@canonical.com/)** + +> This is due to the fact that bpftool emits duplicate data types with +> + +**[v1: bpf-next: samples/bpf: sampleip: Replace PAGE_OFFSET with _text address](http://lore.kernel.org/bpf/tencent_A0E82E0BEE925285F8156D540731DF805F05@qq.com/)** + +> Macro PAGE_OFFSET(0xffff880000000000) in sampleip_user.c is inaccurate, +> for example, in aarch64 architecture, this value depends on the +> CONFIG_ARM64_VA_BITS compilation configuration, this value defaults to 48, +> the corresponding PAGE_OFFSET is 0xffff800000000000, if we use the value +> defined in sampleip_user.c, then all KSYMs obtained by sampleip are (user) +> + +**[v1: bpf-next: New BPF map and BTF security LSM hooks](http://lore.kernel.org/bpf/20230412043300.360803-1-andrii@kernel.org/)** + +> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which +> are meant to allow highly-granular LSM-based control over the usage of BPF +> subsytem. Specifically, to control the creation of BPF maps and BTF data +> objects, which are fundamental building blocks of any modern BPF application. +> + +**[v1: Smack modifications for: security: Allow all LSMs to provide xattrs for inode_init_security hook](http://lore.kernel.org/bpf/20230411172337.340518-1-roberto.sassu@huaweicloud.com/)** + +> Very very quick modification. Not tested. +> + +**[v1: bpf: lirc program type should not require SYS_CAP_ADMIN](http://lore.kernel.org/bpf/ZDWAcN6wfeXzipHz@gofer.mess.org/)** + +> Make it possible to load lirc program type with just CAP_BPF. +> + +**[v2: bpf-next: xsk: Elide base_addr comparison in xp_unaligned_validate_desc](http://lore.kernel.org/bpf/20230411130025.19704-1-kal.conley@dectris.com/)** + +> Remove redundant (base_addr >= pool->addrs_cnt) comparison from the +> conditional. +> + +**[v1: bpf-next: tools/resolve_btfids: Ignore libsubcmd](http://lore.kernel.org/bpf/tencent_D5422A55AFF3A307880D06AD42D559739708@qq.com/)** + +> Since commit af03299d8536("tools/resolve_btfids: Install subcmd headers") +> introduce subcmd headers directory, we should ignore it. +> + +**[v1: perf bperf: Avoid use after free via union](http://lore.kernel.org/bpf/20230411051718.267228-1-irogers@google.com/)** + +> If bperf sets leader_skel or follower_skel then it appears bpf_skel is +> set and can trigger the following use-after-free +> + +**[v1: bpf-next: xsk: Simplify xp_aligned_validate_desc implementation](http://lore.kernel.org/bpf/20230410121841.643254-1-kal.conley@dectris.com/)** + +> Perform the chunk boundary check like the page boundary check in +> xp_desc_crosses_non_contig_pg(). This simplifies the implementation and +> reduces the number of branches. +> + +**[v1: bpf-next: Dynptr convenience helpers](http://lore.kernel.org/bpf/20230409033431.3992432-1-joannelkoong@gmail.com/)** + +> This patchset is the 3rd in the dynptr series. The 1st (dynptr +> fundamentals) can be found here [0] and the second (skb + xdp dynptrs) +> can be found here [1]. +> + +**[v2: bpf-next: Introduce BPF_MA_REUSE_AFTER_RCU_GP](http://lore.kernel.org/bpf/20230408141846.1878768-1-houtao@huaweicloud.com/)** + +> As discussed in v1, currently the freed objects in bpf memory allocator +> may be reused immediately by the new allocation, it introduces +> use-after-bpf-ma-free problem for non-preallocated hash map and makes +> lookup procedure return incorrect result. The immediate reuse also makes +> introducing new use case more difficult (e.g. qp-trie). +> + +### 周边技术动态 + +#### Qemu + +**[v3: riscv: Add support for the Zfa extension](http://lore.kernel.org/qemu-devel/20230413155010.191051-1-christoph.muellner@vrull.eu/)** + +> This patch introduces the RISC-V Zfa extension, which introduces +> additional floating-point extensions: +> * fli (load-immediate) with pre-defined immediates +> * fminm/fmaxm (like fmin/fmax but with different NaN behaviour) +> * fround/froundmx (round to integer) +> * fcvtmod.w.d (Modular Convert-to-Integer) +> * fmv* to access high bits of float register bigger than XLEN +> * Quiet comparison instructions (fleq/fltq) +> + +**[v1: riscv: Raise an exception if pte reserved bits are not cleared](http://lore.kernel.org/qemu-devel/20230412091716.126601-1-alexghiti@rivosinc.com/)** + +> As per the specification, in 64-bit, if any of the pte reserved bits 60-54 +> is set, an exception should be triggered (see 4.4.1, "Addressing and Memory +> Protection"), so implement this behaviour in the address translation process. +> + +**[v1: target/riscv: Add support for BF16 extensions](http://lore.kernel.org/qemu-devel/20230412023320.50706-1-liweiwei@iscas.ac.cn/)** + +> Specification for BF16 extensions can be found in: +> https://github.com/riscv/riscv-bfloat16 +> +> The port is available here: +> https://github.com/plctlab/plct-qemu/tree/plct-bf16-upstream +> + +**[v3: target/riscv: implement query-cpu-definitions](http://lore.kernel.org/qemu-devel/20230411183511.189632-1-dbarboza@ventanamicro.com/)** + +> In this v3 I removed patches 3 and 4 of v2. +> +> Patch 3 now implements a new type that the generic CPUs (any, rv32, +> rv64, x-rv128) were converted to. This type will be used by +> query-cpu-definitions to determine if a given cpu is static or not based +> on its type. This approach was suggested by Richard Henderson in the v2 +> review. +> + +**[v1: target/riscv: Restore the predicate() NULL check behavior](http://lore.kernel.org/qemu-devel/20230411090211.3039186-1-bmeng@tinylab.org/)** + +> When reading a non-existent CSR QEMU should raise illegal instruction +> exception, but currently it just exits due to the g_assert() check. +> +> This actually reverts commit 0ee342256af9205e7388efdf193a6d8f1ba1a617, +> Some comments are also added to indicate that predicate() must be +> provided for an implemented CSR. +> + +**[v1: target/riscv: Separate implicitly-enabled and explicitly-enabled extensions](http://lore.kernel.org/qemu-devel/20230410033526.31708-1-liweiwei@iscas.ac.cn/)** + +> The patch tries to separate the multi-letter extensions that may implicitly-enabled by misa.EXT from the explicitly-enabled cases, so that the misa.EXT can truely disabled by write_misa(). +> With this separation, the implicitly-enabled zve64d/f and zve32f extensions will no work if we clear misa.V. And clear misa.V will have no effect on the explicitly-enalbed zve64d/f and zve32f extensions. +> + +**[v1: target/riscv: Add support for PC-relative translation](http://lore.kernel.org/qemu-devel/20230409105306.28575-1-liweiwei@iscas.ac.cn/)** + +> This patchset tries to add support for PC-relative translation. +> +> The existence of CF_PCREL can improve performance with the guest +> kernel's address space randomization. Each guest process maps libc.so +> (et al) at a different virtual address, and this allows those +> translations to be shared. +> + +**[v1: target/riscv: Use check for relationship between Zdinx/Zhinx{min} and Zfinx](http://lore.kernel.org/qemu-devel/20230408135908.25269-1-liweiwei@iscas.ac.cn/)** + +> Zdinx/Zhinx{min} require Zfinx. And require relationship is usually done +> by check currently. +> + +#### U-Boot + +**[v4: Add StarFive JH7110 PCIe drvier support](http://lore.kernel.org/u-boot/20230411010209.76561-1-minda.chen@starfivetech.com/)** + +> This patchset needs to apply after patchset in [1]. These PCIe series patches +> are based on the JH7110 RISC-V SoC and VisionFive V2 board. +> +> [1] https://patchwork.ozlabs.org/project/uboot/cover/20230329034224.26545-1-yanhong.wang@starfivetech.com +> + +**[v1: riscv: Support riscv64 image type](http://lore.kernel.org/u-boot/20230410072718.3484-1-rick@andestech.com/)** + +> Allow U-Boot to load 32 or 64 bits RISC-V Kernel Image +> distinguishly. It helps to avoid someone maybe make a mistake +> to run 32-bit U-Boot to load 64-bit kernel. +> + +## 20230409:第 41 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v11: function_graph: Support recording and printing the return value of function](http://lore.kernel.org/linux-riscv/cover.1680954589.git.pengdonglin@sangfor.com.cn/)** + +> When using the function_graph tracer to analyze system call failures, +> it can be time-consuming to analyze the trace logs and locate the kernel +> function that first returns an error. This change aims to simplify the +> process by recording the function return value to the 'retval' member of +> 'ftrace_graph_ent' and printing it when outputing the trace log. +> + +**[v1: Convert SiFive drivers from SOC_FOO dependencies to ARCH_FOO](http://lore.kernel.org/linux-riscv/20230406-undertake-stowing-50f45b90413a@spud/)** + +> RISC-V's SOC_FOO symbols for micro-archs are going away, and being +> replaced with the more common ARCH_FOO pattern that is used by other +> archs (and by vendors with a history outside of RISC-V). +> I kicked the conversion off by converting the Microchip RISC-V bits to +> use their replacement symbol, so here's round two: the various SiFive +> drivers. +> + +**[GIT PULL: RISC-V Devicetrees for v6.4](http://lore.kernel.org/linux-riscv/20230406-shank-impromptu-3d483bbc249f@spud/)** + +> Please pull some Devicetree updates for v6.4, mainly adding the base +> level of support for the StarFive VisionFive v2. +> I wanted to get an initial PR out before -rc6, but I may have another +> PR adding some of the peripherals (pmu, mmc) for the StarFive stuff +> that are already reviewed etc, but need a rebase on top of what +> actually got applied. Is that okay, or will the end of next week be +> too late for you? +> + +**[GIT PULL: RISC-V SoC drivers for v6.4](http://lore.kernel.org/linux-riscv/20230406-islamist-mop-81d651b8830d@spud/)** + +> Please pull some updates for the "otherwise unloved" RISC-V SoC drivers +> for v6.4! The bulk of this is my fixing my own driver, and there's a fix +> in here to make sure that we don't hit randconfig build issues once !MMU +> is enabled for 32-bit kernels. +> + +**[v3: -next: support allocating crashkernel above 4G explicitly on riscv](http://lore.kernel.org/linux-riscv/20230406220206.3067006-1-chenjiahao16@huawei.com/)** + +> On riscv, the current crash kernel allocation logic is trying to +> allocate within 32bit addressible memory region by default, if +> failed, try to allocate without 4G restriction. +> +> In need of saving DMA zone memory while allocating a relatively large +> crash kernel region, allocating the reserved memory top down in +> high memory, without overlapping the DMA zone, is a mature solution. +> Hence this patchset introduces the parameter option crashkernel=X,[high,low]. +> + +**[v1: Add JH7110 PCIe driver support](http://lore.kernel.org/linux-riscv/20230406111142.74410-1-minda.chen@starfivetech.com/)** + +> This patchset adds PCIe driver for the StarFive JH7110 SoC. +> The patch has been tested on the VisionFive 2 board. The test +> devices include M.2 NVMe SSD and Realtek 8169 Ethernet adapter. +> + +**[v7: StarFive's SYSCON support](http://lore.kernel.org/linux-riscv/20230406103308.1280860-1-william.qiu@starfivetech.com/)** + +> This patchset adds initial rudimentary support for the StarFive +> designware mobile storage host controller driver. And this driver will +> be used in StarFive's VisionFive 2 board. The main purpose of adding +> this driver is to accommodate the ultra-high speed mode of eMMC. +> + +**[v4: Add JH7110 USB and USB PHY driver support](http://lore.kernel.org/linux-riscv/20230406015216.27034-1-minda.chen@starfivetech.com/)** + +> This patchset adds USB driver and USB PHY for the StarFive JH7110 SoC. +> USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. +> The patch has been tested on the VisionFive 2 board. +> + +**[GIT PULL: Initial clk/reset support for JH7110 for v6.4](http://lore.kernel.org/linux-riscv/20230405-constant-dreamily-0128e071c665@spud/)** + +> Here's a PR for the StarFive JH7110 clk/reset bits since I'd like to +> take the DT this cycle & depend on the binding headers. +> +> I've picked up R-B tags from Emil on all that patches, despite him being +> listed as an author, as things have changed quite a lot since he was +> involved in writing things many months ago. +> + +**[v2: RISC-V: align ISA extension Kconfig help text with each other](http://lore.kernel.org/linux-riscv/20230405-pucker-cogwheel-3a999a94a2f2@wendy/)** + +> Other extensions only capitalise the first letter in the text visible +> in Kconfig menus, and provide a short comment about the extension's +> meaning. Do the same for Svnapot & Svpbmt. +> +> The precedent for capitalisation in the Kconfig text was set by Zicbom +> & sorta followed for Zicboz. The RVI styling used for multi-letter +> extensions only capitalises the first letter, so do the same here. +> If nothing else, my OCD likes it when the extensions follow a consistent +> pattern. +> + +**[v1: riscv: Adjust dependencies of HAVE_DYNAMIC_FTRACE selection](http://lore.kernel.org/linux-riscv/20230404-riscv-dynamic-ftrace-checks-clang-v1-1-0ce296b7d423@kernel.org/)** + +> When building allmodconfig with clang and its integrated assembler and +> linking with a version of GNU ld prior to 2.36, the following link error +> occurs: +> +> riscv64-linux-gnu-ld: .init.data has both ordered [`__patchable_function_entries' in init/main.o] and unordered [`.init_array.0' in kernel/trace/trace_benchmark.o] sections +> riscv64-linux-gnu-ld: final link failed: bad value +> + +**[v4: Add basic ACPI support for RISC-V](http://lore.kernel.org/linux-riscv/20230404182037.863533-1-sunilvl@ventanamicro.com/)** + +> This patch series enables the basic ACPI infrastructure for RISC-V. +> Supporting external interrupt controllers is in progress and hence it is +> tested using poll based HVC SBI console and RAM disk. +> +> The first patch in this series is one of the patch from Jisheng's +> series [1] which is not merged yet. This patch is required to support +> ACPI since efi_init() which gets called before sbi_init() can enable +> static branches and hits a panic. +> + +**[v4: RISC-V KVM virtualize AIA CSRs](http://lore.kernel.org/linux-riscv/20230404153452.2405681-1-apatel@ventanamicro.com/)** + +> The RISC-V AIA specification is now frozen as-per the RISC-V international +> process. The latest frozen specifcation can be found at: +> https://github.com/riscv/riscv-aia/releases/download/1.0-RC3/riscv-interrupts-1.0-RC3.pdf +> + +**[v5: irqchip/irq-sifive-plic: Add syscore callbacks for hibernation](http://lore.kernel.org/linux-riscv/20230404032908.89638-1-mason.huo@starfivetech.com/)** + +> The priority and enable registers of plic will be reset +> during hibernation power cycle in poweroff mode, +> add the syscore callbacks to save/restore those registers. +> + +**[v5: RISC-V KVM ONE_REG interface for SBI](http://lore.kernel.org/linux-riscv/20230403121527.2286489-1-apatel@ventanamicro.com/)** + +> This series first does few cleanups/fixes (PATCH1 to PATCH5) and adds +> ONE-REG interface for customizing the SBI interface visible to the +> Guest/VM. +> +> The testing of this series has been done with KVMTOOL changes in +> riscv_sbi_imp_v1 branch at: +> https://github.com/avpatel/kvmtool.git +> + +**[v1: riscv: entry: Save a0 prior syscall_enter_from_user_mode()](http://lore.kernel.org/linux-riscv/20230403065207.1070974-1-bjorn@kernel.org/)** + +> The RISC-V calling convention passes the first argument, and the +> return value in the a0 register. For this reason, the a0 register +> needs some extra care; When handling syscalls, the a0 register is +> saved into regs->orig_a0, so a0 can be properly restored for, +> e.g. interrupted syscalls. +> + +**[v1: riscv: Add static call implementation](http://lore.kernel.org/linux-riscv/tencent_A8A256967B654625AEE1DB222514B0613B07@qq.com/)** + +> Add the riscv static call implementation. For each key, a permanent +> trampoline is created which is the destination for all static calls +> for the given key. +> +> The trampoline has a direct jump which gets patched by static_call_update() +> when the destination function changes. +> + +**[v1: RISC-V: KVM: Allow Zbb extension for Guest/VM](http://lore.kernel.org/linux-riscv/20230401112730.2105240-1-apatel@ventanamicro.com/)** + +> We extend the KVM ISA extension ONE_REG interface to allow KVM +> user space to detect and enable Zbb extension for Guest/VM. +> + +**[v7: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230401111934.130844-1-hal.feng@starfivetech.com/)** + +> This patch series adds basic clock, reset & DT support for StarFive +> JH7110 SoC. +> +> @Stephen and @Conor, I have made this series start with the shared +> dt-bindings, so it will be easier to merge. +> + +**[v4: Use dma_default_coherent for devicetree default coherency](http://lore.kernel.org/linux-riscv/20230401091531.47412-1-jiaxun.yang@flygoat.com/)** + +> This series split out second half of my previous series +> "v1: MIPS DMA coherence fixes". +> +> It intends to use dma_default_coherent to determine the default coherency of +> devicetree probed devices instead of hardcoding it with Kconfig options. +> + +#### 进程调度 + +**[v4: sched: Avoid unnecessary migrations within SMT domains](http://lore.kernel.org/lkml/20230406203148.19182-1-ricardo.neri-calderon@linux.intel.com/)** + +> This is v4 of this series. Previous versions can be found here [1], [2], +> and here [3]. To avoid duplication, I do not include the cover letter of +> the original submission. You can read it in [1]. +> + +**[v1: sched: Consider CPU contention in frequency & load-balance busiest CPU selection](http://lore.kernel.org/lkml/20230406155030.1989554-1-dietmar.eggemann@arm.com/)** + +> This is the implementation of the idea to factor in root cfs_rq +> runnable_avg as a way to consider CPU contention for CPU frequency and +> `migrate_util` type load-balance busiest CPU selection. +> + +**[v1: sched: rt: Simplify pick_task_rt()](http://lore.kernel.org/lkml/20230407192435.3390-1-kunyu@nfschina.com/)** + +> Remove useless intermediate variable "p" and its initialization. +> Directly return the next RT scheduling task obtained from +> _pick_next_task_rt(). +> + +**[v2: sched: rt: Simplify pick_next_rt_entity()](http://lore.kernel.org/lkml/20230407180952.2757-1-zeming@nfschina.com/)** + +> Remove useless intermediate variable "next" and its initialization. +> Directly return the next RT scheduling entity obtained from +> list_entry(). +> + +**[v1: sched/psi: set varaiable psi_cgroups_enabled storage-class-specifier to static](http://lore.kernel.org/lkml/20230405163602.1939400-1-trix@redhat.com/)** + +> smatch reports +> kernel/sched/psi.c:143:1: warning: symbol +> 'psi_cgroups_enabled' was not declared. Should it be static? +> +> This variable is only used in one file so should be static. +> + +**[v1: sched: rt: Optimization function 'pick_next_rt_entity'](http://lore.kernel.org/lkml/20230405232900.4019-1-zeming@nfschina.com/)** + +> The moral of this function is to obtain the next RT scheduling entity +> object,while 'list_entry' Implementation function of 'container_of' +> returns the next RT scheduling entity object (no new code should be +> added afterwards), directly returning 'list_entry' The execution result +> is sufficient. +> + +#### 内存管理 + +**[v1: linux-next: delayacct: track delays from IRQ/SOFTIRQ](http://lore.kernel.org/linux-mm/202304081728353557233@zte.com.cn/)** + +> Delay accounting does not track the delay of IRQ/SOFTIRQ. While +> IRQ/SOFTIRQ could have obvious impact on some workloads productivity, +> such as when workloads are running on system which is busy handling +> network IRQ/SOFTIRQ. +> + +**[v4: ACPI: APEI: handle synchronous exceptions with proper si_code](http://lore.kernel.org/linux-mm/20230408091359.31554-1-xueshuai@linux.alibaba.com/)** + +> changes since v3 by addressing comments from Xiaofei: +> - do a force kill for abnormal memofy failure error such as invalid PA, +> unexpected severity, OOM, etc +> - pcik up tested-by tag from Ma Wupeng +> + +**[v1: mm: introduce defer free for cma](http://lore.kernel.org/linux-mm/1680864131-4675-1-git-send-email-zhaoyang.huang@unisoc.com/)** + +> Continues page blocks are expensive for the system. Introducing defer free +> mechanism to buffer some which make the allocation easier. The shrinker will +> ensure the page block can be reclaimed when there is memory pressure. +> + +**[v5: net-next: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1](http://lore.kernel.org/linux-mm/20230406094245.3633290-1-dhowells@redhat.com/)** + +> Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES +> internal sendmsg flag that is intended to replace the ->sendpage() op with +> calls to sendmsg(). MSG_SPLICE is a hint that tells the protocol that it +> should splice the pages supplied if it can and copy them if not. +> + +**[v1: memcg: Default value setting in memcg-v1](http://lore.kernel.org/linux-mm/20230406091450.167779-1-shaun.tancheff@gmail.com/)** + +> Setting min, low and high values with memcg-v1 +> provides bennefits for users that are unable to update +> to memcg-v2. +> +> Setting min, low and high can be set in memcg-v1 +> to apply enough memory pressure to effective throttle +> filesystem I/O without hitting memcg oom. +> + +**[v12: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-mm/20230406074005.1784728-1-usama.anjum@collabora.com/)** + +> *Changes in v12* +> - Update and other memory types to UFFD_FEATURE_WP_ASYNC +> - Rebaase on top of next-20230406 +> - Review updates +> + +**[v2: dma-buf/heaps: system_heap: Avoid DoS by limiting single allocations to half of all memory](http://lore.kernel.org/linux-mm/20230406000854.25764-1-jaewon31.kim@samsung.com/)** + +> Normal free:212600kB min:7664kB low:57100kB high:106536kB +> reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB +> active_file:1200kB inactive_file:0kB unevictable:2932kB +> writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB +> pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB +> free_cma:200844kB +> Out of memory and no killable processes... +> Kernel panic - not syncing: System is deadlocked on memory +> + +**[v2: kmod: simplify with a semaphore](http://lore.kernel.org/linux-mm/20230405203505.1343562-1-mcgrof@kernel.org/)** + +> I split the semaphore simplification work out from my first patch series [0] +> because as although the changes came out of that effort, in the end this set +> of patches are slightly orthogonal to the goal behind that series and this +> ended up being mostly a cleanup with mild bike shedding exercise. +> + +**[v5: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230405185427.1246289-1-yosryahmed@google.com/)** + +> Upon running some proactive reclaim tests using memory.reclaim, we +> noticed some tests flaking where writing to memory.reclaim would be +> successful even though we did not reclaim the requested amount fully. +> Looking further into it, I discovered that *sometimes* we over-report +> the number of reclaimed pages in memcg reclaim. +> + +**[v3: Expose GPU memory as coherently CPU accessible](http://lore.kernel.org/linux-mm/20230405180134.16932-1-ankita@nvidia.com/)** + +> NVIDIA's upcoming Grace Hopper Superchip provides a PCI-like device +> for the on-chip GPU that is the logical OS representation of the +> internal propritary cache coherent interconnect. +> + +**[v1: net-next: net: sunhme: move asm includes to below linux includes](http://lore.kernel.org/linux-mm/20230405-sunhme-includes-fix-v1-1-bf17cc5de20d@kernel.org/)** + +> A recent rearrangement of includes has lead to a problem on m68k +> as flagged by the kernel test robot. +> +> Resolve this by moving the block asm includes to below linux includes. +> A side effect i that non-Sparc asm includes are now immediately +> before Sparc asm includes, which seems nice. +> + +**[v1: mm, page_alloc: use check_pages_enabled static key to check tail pages](http://lore.kernel.org/linux-mm/20230405142840.11068-1-vbabka@suse.cz/)** + +> Commit 700d2e9a36b9 ("mm, page_alloc: reduce page alloc/free sanity +> checks") has introduced a new static key check_pages_enabled to control +> when struct pages are sanity checked during allocation and freeing. Mel +> Gorman suggested that free_tail_pages_check() could use this static key +> as well, instead of relying on CONFIG_DEBUG_VM. That makes sense, so do +> that. Also rename the function to free_tail_page_prepare() because it +> works on a single tail page and has a struct page preparation component +> as well as the optional checking component. +> Also remove some unnecessary unlikely() within static_branch_unlikely() +> statements that Mel pointed out for commit 700d2e9a36b9. +> + +**[v1: memcg-v1: Enable setting memory min, low, high](http://lore.kernel.org/linux-mm/20230405110107.127156-1-shaun.tancheff@gmail.com/)** + +> For users that are unable to update to memcg-v2 this +> provides a method where memcg-v1 can more effectively +> apply enough memory pressure to effectively throttle +> filesystem I/O or otherwise minimize being memcg oom +> killed at the expense of reduced performance. +> + +**[v2: module: avoid userspace pressure on unwanted allocations](http://lore.kernel.org/linux-mm/20230405022702.753323-1-mcgrof@kernel.org/)** + +> This v2 series follows up on the first iteration of these patches [0]. +> They have the following changes made: +> +> o Rolled in fix for an kmemleak issue reported by Jim Cromie +> o Dropped from this series all the semaphore & and simplifications +> on kmod.c as that should just be sent as a separate bike-shedding +> opporunity patch series and it does not in any way address the +> the unwanted allocations. +> o The rest of the feedback was just from Greg KH and I've addressed +> all his feedback. I decided to do away with the debug.c as a +> separate file and leave the #ifdef CONFIG_MODULE_DEBUG eyesore +> at the end of main.c. I guess it's not so bad there. +> o *Tons* of fixes and enhancements to my counters, including tons +> of documentation to help ensure we don't loose track of some of +> the tribal knowledge and so to help ensure we have references to +> what our accounting looks like. Those large wasted virtual memory +> allocations on a simple qemu idle boring boot are simply rediculous, I +> am quite baffled we had not spotted this before, and so it all reveals +> we have quite a bit of optimizations left to do to make loading modules +> an even more smoother experience at bootup. +> + +**[v2: regmap: Use mas_walk() instead of mas_find()](http://lore.kernel.org/linux-mm/20230403-regmap-maple-walk-fine-v2-1-c07371c8a867@kernel.org/)** + +> Liam recommends using mas_walk() instead of mas_find() for our use case so +> let's do that, it avoids some minor overhead associated with being able to +> restart the operation which we don't need since we do a simple search. +> + +**[v1: memcg v1: provide read access to memory.pressure_level](http://lore.kernel.org/linux-mm/20230404105900.2005-1-flosch@nutanix.com/)** + +> This is all fine as long as the subscribing process runs as root and is +> otherwise unconfined by further restrictions. However, if you add strict +> access controls such as selinux, the permission bits will be enforced, +> and opening memory.pressure_level for reading will fail, preventing the +> process from subscribing, even as root. +> + +**[v1: mm/madvise: Use vma_lookup() instead of find_vma()](http://lore.kernel.org/linux-mm/20230404094515.1883552-1-zhangpeng362@huawei.com/)** + +> Using vma_lookup() verifies the address is contained in the found vma. +> This results in easier to read the code. +> + +**[v1: m68k/mm: Use correct bit number in _PAGE_SWP_EXCLUSIVE comment](http://lore.kernel.org/linux-mm/20230404085636.121409-1-david@redhat.com/)** + +> As noticed by Geert, commit b5c88f21531c ("microblaze/mm: support +> __HAVE_ARCH_PTE_SWP_EXCLUSIVE") modified m68k code by accident. While +> replacing 0x080 by CF_PAGE_NOCACHE is correct, although it should have +> been part of commit ed4154067a08 ("m68k/mm: support +> __HAVE_ARCH_PTE_SWP_EXCLUSIVE"), replacing "bit 7" by "bit 24" in the +> comment was wrong. +> + +**[v2: LoongArch: Add kernel address sanitizer support](http://lore.kernel.org/linux-mm/20230404084148.744-1-zhangqing@loongson.cn/)** + +> Kernel Address Sanitizer (KASAN) is a dynamic memory safety error detector +> designed to find out-of-bounds and use-after-free bugs, Generic KASAN is +> supported on LoongArch now. +> +> 1/8 of kernel addresses reserved for shadow memory. But for LoongArch, +> There are a lot of holes between different segments and valid address +> space(256T available) is insufficient to map all these segments to kasan +> shadow memory with the common formula provided by kasan core, saying +> addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET +> + +**[v1: mm: check mapping addr is correct when dump page](http://lore.kernel.org/linux-mm/1680587425-4683-1-git-send-email-Xiaosong.Ma@unisoc.com/)** + +> when we debug with slub_debug_on, the following backtraces show dump_page +> will show wrong info when the bad page is non-NULL mapping and page->mapping +> is 0x80000000000 so do virt_addr valid check is needed when dump mapping page. +> + +**[v1: permit write-sealed memfd read-only shared mappings](http://lore.kernel.org/linux-mm/cover.1680560277.git.lstoakes@gmail.com/)** + +> This patch series is in two parts:- +> +> 1. Currently there are a number of places in the kernel where we assume +> VM_SHARED implies that a mapping is writable. Let's be slightly less +> strict and relax this restriction in the case that VM_MAYWRITE is not +> set. +> + +**[v1: mm-unstable: cgroup: eliminate atomic rstat](http://lore.kernel.org/linux-mm/20230403220337.443510-1-yosryahmed@google.com/)** + +> A previous patch series ([1] currently in mm-unstable) changed most +> atomic rstat flushing contexts to become non-atomic. This was done to +> avoid an expensive operation that scales with # cgroups and # cpus to +> happen with irqs disabled and scheduling not permitted. There were two +> remaining atomic flushing contexts after that series. This series tries +> to eliminate them as well, eliminating atomic rstat flushing completely. +> + +**[v3: Split a folio to any lower order folios](http://lore.kernel.org/linux-mm/20230403201839.4097845-1-zi.yan@sent.com/)** + +> File folio supports any order and people would like to support flexible orders +> for anonymous folio[1] too. Currently, split_huge_page() only splits a huge +> page to order-0 pages, but splitting to orders higher than 0 is also useful. +> This patchset adds support for splitting a huge page to any lower order pages +> and uses it during file folio truncate operations. +> + +**[v8: -next: Delay the initialization of zswap](http://lore.kernel.org/linux-mm/20230403121318.1876082-1-liushixin2@huawei.com/)** + +> In the initialization of zswap, about 18MB memory will be allocated for +> zswap_pool. Since some users may not use zswap, the zswap_pool is wasted. +> Save memory by delaying the initialization of zswap until enabled. +> + +#### 文件系统 + +**[v2: dax: enable dax fault handler to report VM_FAULT_HWPOISON](http://lore.kernel.org/linux-fsdevel/20230406230127.716716-1-jane.chu@oracle.com/)** + +> When dax fault handler fails to provision the fault page due to +> hwpoison, it returns VM_FAULT_SIGBUS which lead to a sigbus delivered +> to userspace with .si_code BUS_ADRERR. Channel dax backend driver's +> detection on hwpoison to the filesystem to provide the precise reason +> for the fault. +> + +**[v1: fsverity: reject FS_IOC_ENABLE_VERITY on mode 3 fds](http://lore.kernel.org/linux-fsdevel/20230406215106.235829-1-ebiggers@kernel.org/)** + +> Commit 56124d6c87fd ("fsverity: support enabling with tree block size < +> PAGE_SIZE") changed FS_IOC_ENABLE_VERITY to use __kernel_read() to read +> the file's data, instead of direct pagecache accesses. +> + +**[v1: shmem: stable directory cookies](http://lore.kernel.org/linux-fsdevel/168080987776.946167.3501480439542616457.stgit@manet.1015granger.net/)** + +> The current cursor-based directory cookie mechanism doesn't work +> when a tmpfs filesystem is exported via NFS. This is because NFS +> clients do not open directories: each READDIR operation has to open +> the directory on the server, read it, then close it. The cursor +> state for that directory, being associated strictly with the opened +> struct file, is then discarded. +> + +**[v2: eventfd: use wait_event_interruptible_locked_irq() helper](http://lore.kernel.org/linux-fsdevel/tencent_F38839D00FE579A60A97BA24E86AF223DD05@qq.com/)** + +> wait_event_interruptible_locked_irq was introduced by commit 22c43c81a51e +> ("wait_event_interruptible_locked() interface"), but older code such as +> eventfd_{write,read} still uses the open code implementation. +> Inspired by commit 8120a8aadb20 +> ("fs/timerfd.c: make use of wait_event_interruptible_locked_irq()"), this +> patch replaces the open code implementation with a single macro call. +> + +**[v1: fsverity: use shash API instead of ahash API](http://lore.kernel.org/linux-fsdevel/20230406003714.94580-1-ebiggers@kernel.org/)** + +> The "ahash" API, like the other scatterlist-based crypto APIs such as +> "skcipher", comes with some well-known limitations. First, it can't +> easily be used with vmalloc addresses. Second, the request struct can't +> be allocated on the stack. This adds complexity and a possible failure +> point that needs to be worked around, e.g. using a mempool. +> + +**[v3: blksnap - block devices snapshots module](http://lore.kernel.org/linux-fsdevel/20230404140835.25166-1-sergei.shtepa@veeam.com/)** + +> I am happy to offer a modified version of the Block Devices Snapshots +> Module. It allows to create non-persistent snapshots of any block devices. +> The main purpose of such snapshots is to provide backups of block devices. +> See more in Documentation/block/blksnap.rst. +> + +**[v1: exfat: add sysfs interface](http://lore.kernel.org/linux-fsdevel/20230405084635.74680-1-frank.li@vivo.com/)** + +> Add sysfs interface to configure exfat related parameters. +> + +**[v1: fstests specific MAINTAINERS file](http://lore.kernel.org/linux-fsdevel/20230404171411.699655-1-zlang@kernel.org/)** + +> I think I might be mad to include that many mailing lists in this patchset... +> +> As I explained in v1: , fstests covers more and more fs testing +> thing, so we always get help from fs specific mailing list, due to they +> learn about their features and bugs more. Besides that, some folks help +> to review patches (relevant with them) more often. So I'd like to bring +> in the similar way of linux/MAINTAINERS, records fs relevant mailing lists, +> reviewers or supporters (or call co-maintainers). To recognize the +> can be added in CC list of a patch. +> + +**[v1: Avoid the mmap lock for fault-around](http://lore.kernel.org/linux-fsdevel/20230404135850.3673404-1-willy@infradead.org/)** + +> The linux-next tree currently contains patches (mostly from Suren) +> which handle some page faults without the protection of the mmap lock. +> This patchset adds the ability to handle page faults on parts of files +> which are already in the page cache without taking the mmap lock. +> + +**[v2: fuse: API for Checkpoint/Restore](http://lore.kernel.org/linux-fsdevel/20230403144517.347517-1-aleksandr.mikhalitsyn@canonical.com/)** + +> The main problem for CRIU is that we have to restore mount namespaces and memory mappings before the process tree. +> It means that when CRIU is performing mount of fuse filesystem it can't use the original FUSE daemon from the +> restorable process tree, but instead use a "fake daemon". +> + +**[v1: shmem: Add user and group quota support for tmpfs](http://lore.kernel.org/linux-fsdevel/20230403084759.884681-1-cem@kernel.org/)** + +> so I'm taking over his work from where he left it of. This series is virtually +> done, and he had updated it with comments from the last version, but, I'm +> initially posting it as a RFC because it's been a while since he posted the +> last version. +> Most of what I did here was rebase his last work on top of current Linus's tree. +> + +**[v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/)** + +> This removes the dependency on interrupts to wake up task. Set task +> state as TASK_RUNNING, if need_resched() returns true, +> while polling for IO completion. +> Earlier, polling task used to sleep, relying on interrupt to wake it up. +> This made some IO take very long when interrupt-coalescing is enabled in +> NVMe. +> + +#### 网络设备 + +**[v7: bpf: XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash](http://lore.kernel.org/netdev/168098183268.96582.7852359418481981062.stgit@firesoul/)** + +> Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value, +> but doesn't provide information on the RSS hash type (part of 6.3-rc). +> +> This patchset proposal is to change the function call signature via adding +> a pointer value argument for providing the RSS hash type. +> +> Patchset also disables all bpf_printk's from xdp_hw_metadata program +> that we expect driver developers to use. +> + +**[v1: nft: main: Error out when combining -i/--interactive and -f/--file](http://lore.kernel.org/netdev/20230408181818.72264-1-pablo@netfilter.org/)** + +> These two options are mutually exclusive, display error in that case: +> +> # nft -i -f test.nft +> Error: -i/--interactive and -f/--file options cannot be combined +> + +**[v2: Add missing DSA properties for marvell switches](http://lore.kernel.org/netdev/20230408152801.2336041-1-andrew@lunn.ch/)** + +> The DSA core has become more picky about DT properties. This patchset +> add missing properties and removes some unused ones, for iMX boards. +> +> Once all the missing properties are added, it should be possible to +> simply phylink and the mv88e6xxx driver. +> + +**[v4: net-next: Support MACsec VLAN](http://lore.kernel.org/netdev/20230408105735.22935-1-ehakim@nvidia.com/)** + +> This patch series introduces support for hardware (HW) offload MACsec +> devices with VLAN configuration. The patches address both scenarios +> where the VLAN header is both the inner and outer header for MACsec. +> + +**[v1: net: ipv6: Add Kconfig option to set default value of accept_dad](http://lore.kernel.org/netdev/3072adab06f9c5f45cc72d2068d1aed0100436ff.1680941918.git.josh@joshtriplett.org/)** + +> The kernel already supports disabling Duplicate Address Detection (DAD) +> by setting net.ipv6.conf.$interface.accept_dad to 0. However, for +> interfaces available at boot time, the kernel brings up the interface +> and sets up the link-local address before processing sysctls set on the +> kernel command line; thus, setting +> sysctl.net.ipv6.conf.default.accept_dad=0 on the kernel command line +> does not suffice to affect such interfaces. +> + +**[v1: Alternative, restart tx after tx used bit read](http://lore.kernel.org/netdev/20230407213349.8013-1-ingo.rohloff@lauterbach.com/)** + +> I am developing on a ZynqMP (Ultrascale+) SoC from AMD/Xilinx. +> I have seen the same issue before commit 4298388574dae6168 ("net: macb: +> restart tx after tx used bit read") +> + +**[v2: net: mana: Add support for jumbo frame](http://lore.kernel.org/netdev/1680901196-20643-1-git-send-email-haiyangz@microsoft.com/)** + +> The set adds support for jumbo frame, +> with some optimization for the RX path. +> + +**[v2: wifi: brcmfmac: add Cypress 43439 SDIO ids](http://lore.kernel.org/netdev/20230407203752.128539-1-marex@denx.de/)** + +> Add SDIO ids for use with the muRata 1YN (Cypress CYW43439). +> The odd thing about this is that the previous 1YN populated +> on M.2 card for evaluation purposes had BRCM SDIO vendor ID, +> while the chip populated on real hardware has a Cypress one. +> The device ID also differs between the two devices. But they +> are both 43439 otherwise, so add the IDs for both. +> + +**[v1: net-next: gve: Unify duplicate GQ min pkt desc size constants](http://lore.kernel.org/netdev/20230407184830.309398-1-shailend@google.com/)** + +> The two constants accomplish the same thing. +> + +**[v4: net-next: ice: allow matching on meta data](http://lore.kernel.org/netdev/20230407165219.2737504-1-michal.swiatkowski@linux.intel.com/)** + +> This patchset is intended to improve the usability of the switchdev +> slow path. Without matching on a meta data values slow path works +> based on VF's MAC addresses. It causes a problem when the VF wants +> to use more than one MAC address (e.g. when it is in trusted mode). +> + +**[v1: regmap: allow upshifting register addresses before performing operations](http://lore.kernel.org/netdev/20230407152604.105467-1-maxime.chevallier@bootlin.com/)** + +> Similar to the existing reg_downshift mechanism, that is used to +> translate register addresses on busses that have a smaller address +> stride, it's also possible to want to upshift register addresses. +> + +**[v1: ARM64: dts: marvell: cn9310: Add missing phy-mode](http://lore.kernel.org/netdev/20230407151839.2320596-1-andrew@lunn.ch/)** + +> The DSA framework has got more picky about always having a phy-mode +> for the CPU port. The SoC Ethernet is being configured to +> 10gbase-r. Set the switch phy-mode based on this. Additionally, the +> SoC Ethernet is using in-band signalling to determine the link speed, +> so add same parameter to the switch. +> + +**[v1: net-next: tools: ynl: throw a more meaningful exception if family not supported](http://lore.kernel.org/netdev/20230407145609.297525-1-kuba@kernel.org/)** + +> cli.py currently throws a pure KeyError if kernel doesn't support +> a netlink family. Users who did not write ynl (hah) may waste +> their time investigating what's wrong with the Python code. +> + +**[v1: net-next: ax25: exit linked-list searches earlier](http://lore.kernel.org/netdev/20230407142042.11901-1-peter@n8pjl.ca/)** + +> There's no need to loop until the end of the list if we have a result. +> +> Device callsigns are unique, so there can only be one dev returned from +> ax25_addr_ax25dev(). If not, there would be inconsistencies based on +> order of insertion, and refcount leaks. +> +> Same reasoning for ax25_get_route() as above. +> + +**[v1: net-next: DSA trace events](http://lore.kernel.org/netdev/20230407141451.133048-1-vladimir.oltean@nxp.com/)** + +> These are useful to debug refcounting issues on CPU and DSA ports, where +> entries may remain lingering, or may be removed too soon, depending on +> bugs in higher layers of the network stack. +> + +**[v3: bpf-next: Add FOU support for externally controlled ipip devices](http://lore.kernel.org/netdev/cover.1680874078.git.cehrig@cloudflare.com/)** + +> This patch set adds support for using FOU or GUE encapsulation with +> an ipip device operating in collect-metadata mode and a set of kfuncs +> for controlling encap parameters exposed to a BPF tc-hook. +> + +**[v2: net-next: net: ethernet: mtk_eth_soc: use be32 type to store be32 values](http://lore.kernel.org/netdev/20230401-mtk_eth_soc-sparse-v2-1-963becba3cb7@kernel.org/)** + +> n_addr is used to store be32 values, +> so a sparse-friendly array of be32 to store these values. +> + +**[v1: net-next: net: davicom: Make davicom drivers not depends on DM9000](http://lore.kernel.org/netdev/20230407094930.2633137-1-weiyongjun@huaweicloud.com/)** + +> All davicom drivers build need CONFIG_DM9000 is set, but this dependence +> is not correctly since dm9051 can be build as module without dm9000, switch +> to using CONFIG_NET_VENDOR_DAVICOM instead. +> + +**[v4: net-next: sfc: add vDPA support for EF100 devices](http://lore.kernel.org/netdev/20230407081021.30952-1-gautam.dawar@amd.com/)** + +> This series adds the vdpa support for EF100 devices. +> For now, only a network class of vdpa device is supported and +> they can be created only on a VF. Each EF100 VF can have one +> of the three function personalities (EF100, vDPA & None) at +> any time with EF100 being the default. A VF's function personality +> is changed to vDPA while creating the vdpa device using vdpa tool. +> + +**[v2: net-next: qlcnic: check pci_reset_function result](http://lore.kernel.org/netdev/20230407071849.309516-1-den-plotnikov@yandex-team.ru/)** + +> Static code analyzer complains to unchecked return value. +> The result of pci_reset_function() is unchecked. +> Despite, the issue is on the FLR supported code path and in that +> case reset can be done with pcie_flr(), the patch uses less invasive +> approach by adding the result check of pci_reset_function(). +> + +**[v1: net/sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg](http://lore.kernel.org/netdev/ZC+Kgc7feqYy%2FGdw@pr0lnx/)** + +> If the TCA_QFQ_LMAX value is not offered through nlattr, lmax is determined by the MTU value of the network device. +> The MTU of the loopback device can be set up to 2^31-1. +> As a result, it is possible to have an lmax value that exceeds QFQ_MIN_LMAX. +> + +**[v4: net-next: net: lockless stop/wake combo macros](http://lore.kernel.org/netdev/20230407012536.273382-1-kuba@kernel.org/)** + +> A lot of drivers follow the same scheme to stop / start queues +> without introducing locks between xmit and NAPI tx completions. +> I'm guessing they all copy'n'paste each other's code. +> The original code dates back all the way to e1000 and Linux 2.6.19. +> + +**[v1: bpf-next: bpf: ensure all memory is initialized in bpf_get_current_comm](http://lore.kernel.org/netdev/20230407001808.1622968-1-brho@google.com/)** + +> BPF helpers that take an ARG_PTR_TO_UNINIT_MEM must ensure that all of +> the memory is set, including beyond the end of the string. +> + +**[v9: net-next: pds_core driver](http://lore.kernel.org/netdev/20230406234143.11318-1-shannon.nelson@amd.com/)** + +> This patchset implements a new driver for use with the AMD/Pensando +> Distributed Services Card (DSC), intended to provide core configuration +> services through the auxiliary_bus and through a couple of EXPORTed +> functions for use initially in VFio and vDPA feature specific drivers. +> + +**[v1: bpf-next: xsk: Elide base_addr comparison in xp_unaligned_validate_desc](http://lore.kernel.org/netdev/20230406212136.19716-1-kal.conley@dectris.com/)** + +> Remove redundant (base_addr >= pool->addrs_cnt) comparison from the +> conditional. +> +> In particular, addr is computed as: +> +> addr = base_addr + offset +> +> where base_addr and offset are stored as 48-bit and 16-bit unsigned +> integers, respectively. The above sum cannot overflow u64 since +> base_addr has a maximum value of 0x0000ffffffffffff and offset has a +> maximum value of 0xffff (implying a maximum sum of 0x000100000000fffe). +> Since overflow is impossible, it follows that addr >= base_addr. +> + +**[v1: net-next: net: make SO_BUSY_POLL available to all users](http://lore.kernel.org/netdev/20230406194634.1804691-1-edumazet@google.com/)** + +> After commit 217f69743681 ("net: busy-poll: allow preemption +> in sk_busy_loop()"), a thread willing to use busy polling +> is not hurting other threads anymore in a non preempt kernel. +> +> I think it is safe to remove CAP_NET_ADMIN check. +> + +**[[PATCH net-next RFC v4 0/5] net: Make MAC/PHY time stamping selectable](http://lore.kernel.org/netdev/20230406173308.401924-1-kory.maincent@bootlin.com/)** + +> Up until now, there was no way to let the user select the layer at +> which time stamping occurs. The stack assumed that PHY time stamping +> is always preferred, but some MAC/PHY combinations were buggy. +> +> This series aims to allow the user to select the desired layer +> administratively. +> + +**[v1: net-next: net: stmmac: dwmac-anarion: address issues flagged by sparse](http://lore.kernel.org/netdev/20230406-dwmac-anarion-sparse-v1-0-b0c866c8be9d@kernel.org/)** + +> Two minor enhancements to dwmac-anarion to address issues flagged by +> sparse. +> +> 1. Always return struct anarion_gmac * from anarion_config_dt() +> 2. Add __iomem annotation to register base +> +> No functional change intended. +> Compile tested only. +> + +**[v1: io_uring: Pass whole sqe to commands](http://lore.kernel.org/netdev/20230406165705.3161734-1-leitao@debian.org/)** + +> Currently uring CMD operation relies on having large SQEs, but future +> operations might want to use normal SQE. +> +> The io_uring_cmd currently only saves the payload (cmd) part of the SQE, +> but, for commands that use normal SQE size, it might be necessary to +> access the initial SQE fields outside of the payload/cmd block. So, +> saves the whole SQE other than just the pdu. +> + +**[v1: bpf-next: net/smc: Introduce BPF injection capability](http://lore.kernel.org/netdev/1680795034-86384-1-git-send-email-alibuda@linux.alibaba.com/)** + +> This patches attempt to introduce BPF injection capability for SMC, +> and add selftest to ensure code stability. +> +> As we all know that the SMC protocol is not suitable for all scenarios, +> especially for short-lived. However, for most applications, they cannot +> guarantee that there are no such scenarios at all. Therefore, apps +> may need some specific strategies to decide shall we need to use SMC +> or not, for example, apps can limit the scope of the SMC to a specific +> IP address or port. +> + +**[v1: add initial io_uring_cmd support for sockets](http://lore.kernel.org/netdev/20230406144330.1932798-1-leitao@debian.org/)** + +> This patchset creates the initial plumbing for a io_uring command for +> sockets. +> +> For now, create two uring commands for sockets, SOCKET_URING_OP_SIOCOUTQ +> and SOCKET_URING_OP_SIOCINQ. They are similar to ioctl operations +> SIOCOUTQ and SIOCINQ. In fact, the code on the protocol side itself is +> heavily based on the ioctl operations. +> + +**[v1: next: wifi: mt76: Replace zero-length array with flexible-array member](http://lore.kernel.org/netdev/ZC7X7KCb+JEkPe5D@work/)** + +> Zero-length arrays are deprecated [1] and have to be replaced by C99 +> flexible-array members. +> +> This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines +> on memcpy() and help to make progress towards globally enabling +> -fstrict-flex-arrays=3 [2] +> + +#### 安全增强 + +**[v2: Tab P11 features](http://lore.kernel.org/linux-hardening/20230406-topic-lenovo_features-v2-0-625d7cb4a944@linaro.org/)** + +**[v2: fortify: Add KUnit tests for runtime overflows](http://lore.kernel.org/linux-hardening/20230407191904.gonna.522-kees@kernel.org/)** + +> This series adds KUnit tests for the CONFIG_FORTIFY_SOURCE behavior of the +> standard C string functions, and for the strcat() family of functions, +> as those were updated during refactoring. Finally, fortification error +> messages are improved to give more context for the failure condition. +> + +**[v1: next: s390/fcx: Replace zero-length array with flexible-array member](http://lore.kernel.org/linux-hardening/ZC7XT5prvoE4Yunm@work/)** + +> Zero-length arrays are deprecated [1] and have to be replaced by C99 +> flexible-array members. +> +> This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines +> on memcpy() and help to make progress towards globally enabling +> -fstrict-flex-arrays=3 [2] +> + +**[v1: next: s390/diag: Replace zero-length array with flexible-array member](http://lore.kernel.org/linux-hardening/ZC7XGpUtVhqlRLhH@work/)** + +> Zero-length arrays are deprecated [1] and have to be replaced by C99 +> flexible-array members. +> +> This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines +> on memcpy() and help to make progress towards globally enabling +> -fstrict-flex-arrays=3 [2] +> + +**[v2: ubsan: Tighten UBSAN_BOUNDS on GCC](http://lore.kernel.org/linux-hardening/20230405022356.gonna.338-kees@kernel.org/)** + +> The use of -fsanitize=bounds on GCC will ignore some trailing arrays, +> leaving a gap in coverage. Switch to using -fsanitize=bounds-strict to +> match Clang's stricter behavior. +> + +#### 异步 IO + +**[v2: optimise resheduling due to deferred tw](http://lore.kernel.org/io-uring/cover.1680782016.git.asml.silence@gmail.com/)** + +> io_uring extensively uses task_work, but when a task is waiting +> every new queued task_work batch will try to wake it up and so +> cause lots of scheduling activity. This series optimises it, +> specifically applied for rw completions and send-zc notifications +> for now, and will helpful for further optimisations. +> + +**[v1: ublk: read any SQE values upfront](http://lore.kernel.org/io-uring/4ea9c4da-5eb8-c9b1-46de-93697291baa5@kernel.dk/)** + +> Since SQE memory is shared with userspace, we should only be reading it +> once. We cannot read it multiple times, particularly when it's read once +> for validation and then read again for the actual use. +> + +#### Rust For Linux + +**[v7: Rust pin-init API for pinned initialization of structs](http://lore.kernel.org/rust-for-linux/20230408122429.1103522-1-y86-dev@protonmail.com/)** + +> This is the seventh version of the pin-init API. See [1] for v6. +> +> The tree at [2] contains these patches applied on top of 6.3-rc1. +> The Rust-doc documentation of the pin-init API can be found at [3]. +> +> These patches are a long way coming, since I held a presentation on +> safe pinned initialization at Kangrejos [4]. And my discovery of this +> problem was almost a year ago [5]. +> + +**[v1: Initial Rust V4L2 support](http://lore.kernel.org/rust-for-linux/20230406215615.122099-1-daniel.almeida@collabora.com/)** + +> media subsystem. +> +> It adds just enough support to write a clone of the virtio-camera +> prototype written by my colleague, Dmitry Osipenko, available at [0]. +> +> Basically, there's support for video_device_register, +> v4l2_device_register and for some ioctls in v4l2_ioctl_ops. There is +> also some initial vb2 support, alongside some wrappers for some types +> found in videodev2.h. +> + +**[v1: v6.1: rust: types: add `Opaque::pin_init`](http://lore.kernel.org/rust-for-linux/20230406065546.787669-1-y86-dev@protonmail.com/)** + +> Add support for pin-init in combination with `Opaque`, the `pin_init` +> function initializes the contents via a user-supplied initializer for +> `T`. +> + +**[v2: rust: virtio: add virtio support](http://lore.kernel.org/rust-for-linux/20230405201416.395840-1-daniel.almeida@collabora.com/)** + +> This used to be a single patch, but I split it into two with the +> addition of struct Scatterlist. +> +> Again a bit new with Rust submissions. I was told by Gary Guo to +> rebase on top of rust-next, but it seems *very* behind? +> + +#### BPF + +**[v2: bpf-next: Introduce BPF_MA_REUSE_AFTER_RCU_GP](http://lore.kernel.org/bpf/20230408141846.1878768-1-houtao@huaweicloud.com/)** + +> As discussed in v1, currently the freed objects in bpf memory allocator +> may be reused immediately by the new allocation, it introduces +> use-after-bpf-ma-free problem for non-preallocated hash map and makes +> lookup procedure return incorrect result. The immediate reuse also makes +> introducing new use case more difficult (e.g. qp-trie). +> + +**[v1: bpf-next: selftests/bpf: Use PERF_COUNT_HW_CPU_CYCLES event for get_branch_snapshot](http://lore.kernel.org/bpf/20230407190130.2093736-1-song@kernel.org/)** + +> perf_event with type=PERF_TYPE_RAW and config=0x1b00 turned out to be not +> reliable in ensuring LBR is active. Thus, test_progs:get_branch_snapshot is +> not reliable in some systems. Replace it with PERF_COUNT_HW_CPU_CYCLES +> event, which gives more consistent results. +> + +**[v1: bpf-next: selftests/bpf: Prevent infinite loop in veristat when base file is too short](http://lore.kernel.org/bpf/20230407154125.896927-1-eddyz87@gmail.com/)** + +> The loop is caused by handle_comparison_mode() not checking if `base` +> variable points to `fallback_stats` prior advancing joined results +> using `base`. +> + +**[v1: bpf-next: bpftool: set program type only if it differs from the desired one](http://lore.kernel.org/bpf/20230407081427.2621590-1-weiyongjun@huaweicloud.com/)** + +> After commit d6e6286a12e7 ("libbpf: disassociate section handler on explicit +> bpf_program__set_type() call"), bpf_program__set_type() will force cleanup +> the program's SEC() definition, this commit fixed the test helper but missed +> the bpftool, which leads to bpftool prog autoattach broken as follows: +> +> $ bpftool prog load spi-xfer-r1v1.o /sys/fs/bpf/test autoattach +> Program spi_xfer_r1v1 does not support autoattach, falling back to pinning +> +> This patch fix bpftool to set program type only if it differs. +> + +**[v1: BPF: replace no-need function call with saved value](http://lore.kernel.org/bpf/20230407064837.32015-1-zhongjun@uniontech.com/)** + +> The var 'is_priv' is already there, needn't call bpf_capable() +> again. +> Applying this patch, to refine the codes making it robust and optimal. +> + +**[v1: BPF: properly precedence of exclusive attr flags](http://lore.kernel.org/bpf/20230407054235.31726-1-zhongjun@uniontech.com/)** + +> BPF_F_STRICT_ALIGNMENT and BPF_F_ANY_ALIGNMENT are exclusive +> flags. Intuitively the strict one should take higher precedence. +> Applying this patch, make semantics of flags more properly. +> + +**[v1: BPF: replace low-entropy member with macro](http://lore.kernel.org/bpf/20230407033418.2295-1-zhongjun@uniontech.com/)** + +> The member orig_idx is a low-entropy once-init invariable data +> member. It can be replace by a series of macros. +> Replace this member by macros can save memory and cpu-time. +> + +**[v4: bpf-next: BPF verifier rotating log](http://lore.kernel.org/bpf/20230406234205.323208-1-andrii@kernel.org/)** + +> This patch set changes BPF verifier log behavior to behave as a rotating log, +> by default. If user-supplied log buffer is big enough to contain entire +> verifier log output, there is no effective difference. But where previously +> user supplied too small log buffer and would get -ENOSPC error result and the +> beginning part of the verifier log, now there will be no error and user will +> get ending part of verifier log filling up user-supplied log buffer. Which +> is, in absolute majority of cases, is exactly what's useful, relevant, and +> what users want and need, as the ending of the verifier log is containing +> details of verifier failure and relevant state that got us to that failure. So +> this rotating mode is made default, but for some niche advanced debugging +> scenarios it's possible to request old behavior by specifying additional +> BPF_LOG_FIXED (8) flag. +> + +**[v2: bpf-next: bpf: Improve verifier for cond_op and spilled loop index variables](http://lore.kernel.org/bpf/20230406164450.1044952-1-yhs@fb.com/)** + +> LLVM commit [1] introduced hoistMinMax optimization like +> (i < VIRTIO_MAX_SGS) && (i < out_sgs) +> to +> upper = MIN(VIRTIO_MAX_SGS, out_sgs) +> ... i < upper ... +> and caused the verification failure. Commit [2] workarounded the issue by +> adding some bpf assembly code to prohibit the above optimization. +> This patch improved verifier such that verification can succeed without +> the above workaround. +> + +**[v4: bpf-next: xsk: Support UMEM chunk_size > PAGE_SIZE](http://lore.kernel.org/bpf/20230406131806.51332-1-kal.conley@dectris.com/)** + +> The main purpose of this patchset is to add AF_XDP support for UMEM +> chunk sizes > PAGE_SIZE. This is enabled for UMEMs backed by HugeTLB +> pages. +> + +**[v1: powerpc/bpf: populate extable entries only during the last pass](http://lore.kernel.org/bpf/20230406073519.75059-1-hbathini@linux.ibm.com/)** + +> Since commit 85e031154c7c ("powerpc/bpf: Perform complete extra passes +> to update addresses"), two additional passes are performed to avoid +> space and CPU time wastage on powerpc. But these extra passes led to +> WARN_ON_ONCE() hits in bpf_add_extable_entry(). Fix it by not adding +> extable entries during the extra pass. +> + +**[v1: BPF: make verifier 'misconfigured' errors more meaningful](http://lore.kernel.org/bpf/20230406014351.8984-1-zhongjun@uniontech.com/)** + +> There are too many so-called 'misconfigured' errors potentially +> feed back to user-space, that make it very hard to judge on +> a glance the reason a verification failure occurred. +> This patch make those similar error outputs more sensitive and readible. +> + +**[v1: Dynptr Verifier Adjustments](http://lore.kernel.org/bpf/20230406004018.1439952-1-drosen@google.com/)** + +> These patches relax a few verifier requirements around dynptrs. +> +> I was unable to test the patch in 0003 due to unrelated issues compiling the +> bpf selftests, but did run an equivalent local test program. +> + +**[v6: bpf-next: bpf: Support 64-bit pointers to kfuncs](http://lore.kernel.org/bpf/20230405213453.49756-1-iii@linux.ibm.com/)** + +> test_ksyms_module fails to emit a kfunc call targeting a module on +> s390x, because the verifier stores the difference between kfunc +> address and __bpf_call_base in bpf_insn.imm, which is s32, and modules +> are roughly (1 << 42) bytes away from the kernel on s390x. +> +> Fix by keeping BTF id in bpf_insn.imm for BPF_PSEUDO_KFUNC_CALLs, +> and storing the absolute address in bpf_kfunc_desc. +> + +**[v2: bpf: selftests/bpf: Wait for receive in cg_storage_multi test](http://lore.kernel.org/bpf/20230405193354.1956209-1-zhuyifei@google.com/)** + +> In some cases the loopback latency might be large enough, causing +> the assertion on invocations to be run before ingress prog getting +> executed. The assertion would fail and the test would flake. +> + +**[v6: Add ftrace direct call for arm64](http://lore.kernel.org/bpf/20230405180250.2046566-1-revest@chromium.org/)** + +> This series adds ftrace direct call support to arm64. +> This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64. +> +> It is meant to be taken by the arm64 tree but it depends on the +> trace-direct-v6.3-rc3 tag of the linux-trace tree: +> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git +> That tag was created by Steven Rostedt so the arm64 tree can pull the prior work +> this depends on. [1] +> + +**[v1: bpf-next: bpf: add netfilter program type](http://lore.kernel.org/bpf/20230405161116.13565-1-fw@strlen.de/)** + +> Add minimal support to hook bpf programs to netfilter hooks, e.g. +> PREROUTING or FORWARD. +> +> For this the most relevant parts for registering a netfilter +> hook via the in-kernel api are exposed to userspace via bpf_link. +> + +**[v3: bpf-next: bpftool: Add inline annotations when dumping program CFGs](http://lore.kernel.org/bpf/20230405132120.59886-1-quentin@isovalent.com/)** + +> This set contains some improvements for bpftool's "visual" program dump +> option, which produces the control flow graph in a DOT format. The main +> objective is to add support for inline annotations on such graphs, so that +> we can have the C source code for the program showing up alongside the +> instructions, when available. The last commits also make it possible to +> display the line numbers or the bare opcodes in the graph, as supported by +> regular program dumps. +> + +**[v1: bpf-next: selftests: xsk: Disable IPv6 on VETH1](http://lore.kernel.org/bpf/20230405082905.6303-1-kal.conley@dectris.com/)** + +> This change fixes flakiness in the BIDIRECTIONAL test: +> +> # [is_pkt_valid] expected length [60], got length [90] +> not ok 1 FAIL: SKB BUSY-POLL BIDIRECTIONAL +> +> When IPv6 is enabled, the interface will periodically send MLDv1 and +> MLDv2 packets. These packets can cause the BIDIRECTIONAL test to fail +> since it uses VETH0 for RX. +> + +**[v1: bpf-next: Exceptions - 1/2](http://lore.kernel.org/bpf/20230405004239.1375399-1-memxor@gmail.com/)** + +> This series implements the bare minimum support for basic BPF +> exceptions. This is a feature to allow programs to simply throw a +> valueless exception within a BPF program to abort its execution. +> Automatic cleanup of held resources and generation of landing pads to +> unwind program state will be done in the part 2 set. +> + +**[v1: bpf-next: bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set'.](http://lore.kernel.org/bpf/20230404060959.2259448-1-martin.lau@linux.dev/)** + +> This set (https://lore.kernel.org/bpf/https://lore.kernel.org/bpf/500d452b-f9d5-d01f-d365-2949c4fd37ab@linux.dev/) +> needs to limit bpf_sock_destroy kfunc to BPF_TRACE_ITER. +> In the earlier reply, I thought of adding a BTF_KFUNC_HOOK_TRACING_ITER. +> + +**[v1: bpf-next: bpf: Follow up to RCU enforcement in the verifier.](http://lore.kernel.org/bpf/20230404045029.82870-1-alexei.starovoitov@gmail.com/)** + +> The patch set is addressing a fallout from +> commit 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier.") +> It was too aggressive with PTR_UNTRUSTED marks. +> Patches 1-6 are cleanup and adding verifier smartness to address real +> use cases in bpf programs that broke with too aggressive PTR_UNTRUSTED. +> The partial revert is done in patch 7 anyway. +> + +### 周边技术动态 + +#### Qemu + +**[v1: target/riscv: Mask the implicitly enabled extensions in isa_string based on priv version](http://lore.kernel.org/qemu-devel/20230407033014.40901-1-liweiwei@iscas.ac.cn/)** + +> Using implicitly enabled extensions such as Zca/Zcf/Zcd instead of their +> super extensions can simplify the extension related check. However, they +> may have higher priv version than their super extensions. So we should mask +> them in the isa_string based on priv version to make them invisible to user +> if the specified priv version is lower than their minimal priv version. +> + +**[v4: hw/riscv: Add ACT related support](http://lore.kernel.org/qemu-devel/20230405095720.75848-1-liweiwei@iscas.ac.cn/)** + +> ACT tests play an important role in riscv tests. This patch tries to +> add related support to run ACT tests. +> +> The port is available here: +> https://github.com/plctlab/plct-qemu/tree/plct-act-upstream-v2 +> + +**[riscv: g_assert for NULL predicate?](http://lore.kernel.org/qemu-devel/e9de7676-b669-4f4e-e3e0-e57fb58b7bd7@intel.com/)** + +> Recent commit 0ee342256af92 switches to g_assert() for the predicate() +> NULL check from returning RISCV_EXCP_ILLEGAL_INST. Qemu doesn't have +> predicate() for un-allocated CSRs, then a buggy userspace application +> reads CSR such as 0x4 causes qemu to exit, I don't think it's expected. +> +> .global _start +> +> .text +> _start: +> csrr t3, 0x4 +> + +#### U-Boot + +**[v1: riscv: Correct a comment in io.h](http://lore.kernel.org/u-boot/20230403033732.2812219-1-bmeng@tinylab.org/)** + +> Replace NDS32 with RISC-V in the comments. +> + +**[v1: riscv: Add a 64-bit image type](http://lore.kernel.org/u-boot/20230402202813.2341959-1-sjg@chromium.org/)** + +> At present it is not possible to know whether an image can be booted by +> a 32- or 64-bit bootloader. This means that U-Boot may attempt to boot +> the wrong image. This may cause a crash which might be hard to debug. +> + +## 20230402:第 40 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v1: RISC-V: KVM: Allow Zbb extension for Guest/VM](http://lore.kernel.org/linux-riscv/20230401112730.2105240-1-apatel@ventanamicro.com/)** + +> We extend the KVM ISA extension ONE_REG interface to allow KVM +> user space to detect and enable Zbb extension for Guest/VM. +> + +**[v7: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230401111934.130844-1-hal.feng@starfivetech.com/)** + +> This patch series adds basic clock, reset & DT support for StarFive +> JH7110 SoC. +> +> @Stephen and @Conor, I have made this series start with the shared +> dt-bindings, so it will be easier to merge. +> + +**[v4: Use dma_default_coherent for devicetree default coherency](http://lore.kernel.org/linux-riscv/20230401091531.47412-1-jiaxun.yang@flygoat.com/)** + +> This series split out second half of my previous series +> "v1: MIPS DMA coherence fixes". +> +> It intends to use dma_default_coherent to determine the default coherency of +> devicetree probed devices instead of hardcoding it with Kconfig options. +> + +**[v1: riscv: dts: nezha-d1: Add memory](http://lore.kernel.org/linux-riscv/20230331182727.4062790-1-evan@rivosinc.com/)** + +> Add memory info for the D1 Nezha, which seems to be required for it to +> boot with the stock firmware. Note that this hardcodes 1GB, which is +> not technically correct as they also make models with different amounts +> of RAM. Is the firmware supposed to populate this? +> + +**[v4: RISC-V KVM ONE_REG interface for SBI](http://lore.kernel.org/linux-riscv/20230331174542.2067560-1-apatel@ventanamicro.com/)** + +> This series first does few cleanups/fixes (PATCH1 to PATCH5) and adds +> ONE-REG interface for customizing the SBI interface visible to the +> Guest/VM. +> +> The testing of this series has been done with KVMTOOL changes in +> riscv_sbi_imp_v1 branch at: +> https://github.com/avpatel/kvmtool.git +> + +**[v10: function_graph: Support recording and printing the return value of function](http://lore.kernel.org/linux-riscv/cover.1680265828.git.pengdonglin@sangfor.com.cn/)** + +> When using the function_graph tracer to analyze system call failures, +> it can be time-consuming to analyze the trace logs and locate the kernel +> function that first returns an error. This change aims to simplify the +> process by recording the function return value to the 'retval' member of +> 'ftrace_graph_ent' and printing it when outputing the trace log. +> + +**[v7: RISC-V non-coherent function pointer based CMO + non-coherent DMA support for AX45MP](http://lore.kernel.org/linux-riscv/20230330204217.47666-1-prabhakar.mahadev-lad.rj@bp.renesas.com/)** + +> On the Andes AX45MP core, cache coherency is a specification option so it +> may not be supported. In this case DMA will fail. To get around with this +> issue this patch series does the below: +> +> 1] Andes alternative ports is implemented as errata which checks if the IOCP +> is missing and only then applies to CMO errata. One vendor specific SBI EXT +> (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of errata. +> + +**[v1: dt-bindings: move cache controller bindings to a cache directory](http://lore.kernel.org/linux-riscv/20230330173255.109731-1-conor@kernel.org/)** + +> There's a bunch of bindings for (mostly l2) cache controllers +> scattered to the four winds, move them to a common directory. +> I renamed the freescale l2cache.txt file, as while that might make sense +> when the parent dir is fsl, it's confusing after the move. +> The two Marvell bindings have had a "marvell," prefix added to match +> their compatibles. +> + +**[v15: Microchip Soft IP corePWM driver](http://lore.kernel.org/linux-riscv/20230330071203.286972-1-conor.dooley@microchip.com/)** + +> Uwe & I had a long back and forth about period calculations on v13, +> my ultimate conclusion being that, after some testing of the "corrected" +> calculation in hardware, the original calculation was correct. +> I think we had gotten sucked into discussion the calculation of the +> period itself, when we were in fact trying to calculate a bound on the +> period instead. That discussion is here: +> https://lore.kernel.org/linux-pwm/Y+ow8tfAHo1yv1XL@wendy/ +> + +**[v8: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230330064321.1008373-1-jeeheng.sia@starfivetech.com/)** + +> This series adds RISC-V Hibernation/suspend to disk support. +> Low level Arch functions were created to support hibernation. +> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write +> cpu state onto the stack, then calling swsusp_save() to save the memory +> image. +> + +**[v1: iommu: PGTABLE_LPAE is also for RISCV](http://lore.kernel.org/linux-riscv/20230330060105.29460-1-rdunlap@infradead.org/)** + +> On riscv64, linux-next-20233030 (and for several days earlier), +> there is a kconfig warning: +> +> WARNING: unmet direct dependencies detected for IOMMU_IO_PGTABLE_LPAE +> Depends on [n]: IOMMU_SUPPORT [=y] && (ARM || ARM64 || COMPILE_TEST [=n]) && !GENERIC_ATOMIC64 [=n] +> Selected by [y]: +> - IPMMU_VMSA [=y] && IOMMU_SUPPORT [=y] && (ARCH_RENESAS [=y] || COMPILE_TEST [=n]) && !GENERIC_ATOMIC64 [=n] +> + +**[v1: DT header disentangling, part 1](http://lore.kernel.org/linux-riscv/20230329-dt-cpu-header-cleanups-v1-0-581e2605fe47@kernel.org/)** + +> This is the first of a series of clean-ups to disentangle the DT +> includes. There's a decade plus old comment in of_device.h: +> +> #include /* temporary until merge */ +> + +**[v1: Add TDM audio on StarFive JH7110](http://lore.kernel.org/linux-riscv/20230329153320.31390-1-walker.chen@starfivetech.com/)** + +> This patchset adds TDM audio driver for the StarFive JH7110 SoC. The +> first patch adds device tree binding for TDM module. The second patch +> adds tdm driver support for JH7110 SoC. The last patch adds device node +> of tdm and sound card to JH7110 dts. +> + +**[v4: Implement GCM ghash using Zbc and Zbkb extensions](http://lore.kernel.org/linux-riscv/20230329140642.2186644-1-heiko.stuebner@vrull.eu/)** + +> This was originally part of my vector crypto series, but was part +> of a separate openssl merge request implementing GCM ghash as using +> non-vector extensions. +> + +**[v2: riscv: Dump user opcode bytes on fatal faults](http://lore.kernel.org/linux-riscv/20230329082950.726-1-cuiyunhui@bytedance.com/)** + +> We encountered such a problem that when the system starts to execute +> init, init exits unexpectedly with error message: "unhandled signal 4 +> code 0x1 ...". +> +> We are more curious about which instruction execution caused the +> exception. After dumping it through show_opcodes(), we found that it +> was caused by a floating-point instruction. +> + +**[v2: riscv: Introduce KASLR](http://lore.kernel.org/linux-riscv/20230329052926.69632-1-alexghiti@rivosinc.com/)** + +> The following KASLR implementation allows to randomize the kernel mapping: +> +> - virtually: we expect the bootloader to provide a seed in the device-tree +> - physically: only implemented in the EFI stub, it relies on the firmware to +> provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation +> hence the patch 3 factorizes KASLR related functions for riscv to take +> advantage. +> + +**[v9: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230329050951.66085-1-alexghiti@rivosinc.com/)** + +> This new version gets rid of the limitation that prevented KASAN kernels +> to use the newly introduced parameters. +> +> While looking into KASLR, I fell onto commit aacd149b6238 ("arm64: head: +> avoid relocating the kernel twice for KASLR"): it allows to use the fdt +> functions very early in the boot process with KASAN enabled by simply +> compiling a new version of those functions without instrumentation. +> + +**[v9: Introduce 64b relocatable kernel](http://lore.kernel.org/linux-riscv/20230329045329.64565-1-alexghiti@rivosinc.com/)** + +> After multiple attempts, this patchset is now based on the fact that the +> 64b kernel mapping was moved outside the linear mapping. +> +> The first patch allows to build relocatable kernels but is not selected +> by default. That patch is a requirement for KASLR. +> The second and third patches take advantage of an already existing powerpc +> script that checks relocations at compile-time, and uses it for riscv. +> + +**[v2: -next: support allocating crashkernel above 4G explicitly on riscv](http://lore.kernel.org/linux-riscv/20230328115150.2700016-1-chenjiahao16@huawei.com/)** + +> On riscv, the current crash kernel allocation logic is trying to +> allocate within 32bit addressible memory region by default, if +> failed, try to allocate without 4G restriction. +> +> In need of saving DMA zone memory while allocating a relatively large +> crash kernel region, allocating the reserved memory top down in +> high memory, without overlapping the DMA zone, is a mature solution. +> Hence this patchset introduces the parameter option crashkernel=X,[high,low]. +> + +**[v18: RISC-V IPI Improvements](http://lore.kernel.org/linux-riscv/20230328035223.1480939-1-apatel@ventanamicro.com/)** + +> This series aims to improve IPI support in Linux RISC-V in following ways: +> 1) Treat IPIs as normal per-CPU interrupts instead of having custom RISC-V +> specific hooks. This also makes Linux RISC-V IPI support aligned with +> other architectures. +> 2) Remote TLB flushes and icache flushes should prefer local IPIs instead +> of SBI calls whenever we have specialized hardware (such as RISC-V AIA +> IMSIC and RISC-V SWI) which allows S-mode software to directly inject +> IPIs without any assistance from M-mode runtime firmware. +> + +**[v17: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230327164941.20491-1-andy.chiu@sifive.com/)** + +> This patchset is implemented based on vector 1.0 spec to add vector support +> in riscv Linux kernel. There are some assumptions for this implementations. +> +> 1. We assume all harts has the same ISA in the system. +> 2. We disable vector in both kernel andy user space [1] by default. Only +> enable an user's vector after an illegal instruction trap where it +> actually starts executing vector (the first-use trap [2]). +> 3. We detect "riscv,isa" to determine whether vector is support or not. +> + +**[v1: dma-mapping: unify support for cache flushes](http://lore.kernel.org/linux-riscv/20230327121317.4081816-1-arnd@kernel.org/)** + +> After a long discussion about adding SoC specific semantics for when +> to flush caches in drivers/soc/ drivers that we determined to be +> fundamentally flawed[1], I volunteered to try to move that logic into +> architecture-independent code and make all existing architectures do +> the same thing. +> + +**[v1: riscv/fault: Dump user opcode bytes on fatal faults](http://lore.kernel.org/linux-riscv/20230327115642.1610-1-cuiyunhui@bytedance.com/)** + +> We encountered such a problem(logs are below). We are more curious about +> which instruction execution caused the exception. After dumping it +> through show_opcodes(), we found that it was caused by a floating-point +> instruction. +> +> In this way, we found the problem: in the system bringup , it is +> precisely that we have not enabled the floating point function. +> + +#### 进程调度 + +**[v1: sched: Introduce per-mm/cpu concurrency id state](http://lore.kernel.org/lkml/20230330230911.228720-1-mathieu.desnoyers@efficios.com/)** + +> Keep track of the currently allocated mm_cid for each mm/cpu rather than +> freeing them immediately. This eliminates most atomic ops when context +> switching back and forth between threads belonging to different memory +> spaces in multi-threaded scenarios (many processes, each with many +> threads). +> + +**[v1: sched/deadline: cpuset: Rework DEADLINE bandwidth restoration](http://lore.kernel.org/lkml/20230329125558.255239-1-juri.lelli@redhat.com/)** + +> Qais reported [1] that iterating over all tasks when rebuilding root +> domains for finding out which ones are DEADLINE and need their bandwidth +> correctly restored on such root domains can be a costly operation (10+ +> ms delays on suspend-resume). He proposed we skip rebuilding root +> domains for certain operations, but that approach seemed arch specific +> and possibly prone to errors, as paths that ultimately trigger a rebuild +> might be quite convoluted (thanks Qais for spending time on this!). +> + +**[v1: perf sched: sync task state macros with kernel](http://lore.kernel.org/lkml/20230329035203.6194-1-zegao2021@gmail.com/)** + +> commit 8ef9925b02c2 ("sched/debug: Add explicit TASK_PARKED printing") +> changes some task state macros, this patch makes perf-sched in sync +> + +**[v1: sched: EEVDF using latency-nice](http://lore.kernel.org/lkml/20230328092622.062917921@infradead.org/)** + +> Many changes since last time; most notably it now fully replaces CFS and uses +> lag based placement for migrations. Smaller changes include: +> +> - uses scale_load_down() for avg_vruntime; I measured the max delta to be +> 44 +> bits on a system/cgroup based kernel build. +> - fixed a bunch of reweight / cgroup placement issues +> - adaptive placement strategy for smaller slices +> - rename se->lag to se->vlag +> + +**[v3: sched: print parent comm in sched_show_task()](http://lore.kernel.org/lkml/20230328034438.GA8421@didi-ThinkCentre-M930t-N000/)** + +> Knowing who the parent is might be useful for debugging. +> For example, we can sometimes resolve kernel hung tasks by stopping +> the person who begins those hung tasks. +> With the parent's name printed in sched_show_task(), +> it might be helpful to let people know which "service" should be operated. +> Also, we move the parent info to a following new line. +> It would be better to solve the situation when the task +> is not alive and we could not get information about the parent. +> + +**[v1: sched/cputime: Make cputime_adjust() more accurate](http://lore.kernel.org/lkml/20230328024827.12187-1-maxing.lan@bytedance.com/)** + +> In the current algorithm of cputime_adjust(), the accumulated stime and +> utime are used to divide the accumulated rtime. When the value is very +> large, it is easy for the stime or utime not to be updated. It can cause +> sys or user utilization to be zero for long time. +> + +**[v2: selftests: sched: Add more core schedule prctl calls](http://lore.kernel.org/lkml/20230327201855.121821-1-ivan.orlov0322@gmail.com/)** + +> The core sched kselftest makes prctl calls only with correct +> parameters. This patch will extend this test with more core +> schedule prctl calls with wrong parameters to increase code +> coverage. +> + +**[v1: sched: Introduce mm_cid runqueue cache](http://lore.kernel.org/lkml/20230327195318.137094-1-mathieu.desnoyers@efficios.com/)** + +> Introduce a per-runqueue cache containing { mm, mm_cid } entries. +> Keep track of the recently allocated mm_cid for each mm rather than +> freeing them immediately. This eliminates most atomic ops when +> context switching back and forth between threads belonging to +> + +**[v1: sched/fair: Make tg->load_avg per node](http://lore.kernel.org/lkml/20230327053955.GA570404@ziqianlu-desk2/)** + +> When using sysbench to benchmark Postgres in a single docker instance +> with sysbench's nr_threads set to nr_cpu, it is observed there are times +> update_cfs_group() and update_load_avg() shows noticeable overhead on +> cpus of one node of a 2sockets/112core/224cpu Intel Sapphire Rapids: +> + +**[GIT PULL: sched/urgent for v6.3-rc4](http://lore.kernel.org/lkml/20230326130354.GDZCBCum4r9MJ8thhi@fat_crate.local/)** + +> please pull an urgent sched fix for 6.3. +> +> Thx. +> + +**[v1: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230325185514.425745-1-yury.norov@gmail.com/)** + +> for_each_cpu() is widely used in kernel, and it's beneficial to create +> a NUMA-aware version of the macro. +> +> Recently added for_each_numa_hop_mask() works, but switching existing +> codebase to it is not an easy process. +> + +#### 内存管理 + +**[v3: Providing mount in memfd_restricted() syscall](http://lore.kernel.org/linux-mm/cover.1680306489.git.ackerleytng@google.com/)** + +> This patchset builds upon the memfd_restricted() system call that was +> discussed in the ‘KVM: mm: fd-based approach for supporting KVM’ patch +> series, at +> https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/ +> + +**[v3: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES)](http://lore.kernel.org/linux-mm/20230331160914.1608208-1-dhowells@redhat.com/)** + +> I've been looking at how to make pipes handle the splicing in of multipage +> folios and also looking to see if I could implement a suggestion from Willy +> that pipe_buffers could perhaps hold a list of pages (which could make +> splicing simpler - an entire splice segment would go in a single +> pipe_buffer). +> + +**[v5: userfaultfd: convert userfaultfd functions to use folios](http://lore.kernel.org/linux-mm/20230331093937.945725-1-zhangpeng362@huawei.com/)** + +> This patch series converts several userfaultfd functions to use folios. +> +> Change log: +> + +**[v3: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230331070818.2792558-1-yosryahmed@google.com/)** + +> Upon running some proactive reclaim tests using memory.reclaim, we +> noticed some tests flaking where writing to memory.reclaim would be +> successful even though we did not reclaim the requested amount fully. +> Looking further into it, I discovered that *sometimes* we over-report +> the number of reclaimed pages in memcg reclaim. +> + +**[v1: memcg: Set memory min, low, high values along with max](http://lore.kernel.org/linux-mm/20230330202232.355471-1-shaun.tancheff@gmail.com/)** + +> memcg-v1 does not expose memory min, low, and high. +> +> These values should to be set to reasonable non-zero values +> when max is set. +> +> This patch sets them to 10%, 20% and 80% respective to max. +> + +**[v3: memcg: avoid flushing stats atomically where possible](http://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/)** + +> rstat flushing is an expensive operation that scales with the number of +> cpus and the number of cgroups in the system. The purpose of this series +> is to minimize the contexts where we flush stats atomically. +> + +**[v1: selftests/mm: Split / Refactor userfault test](http://lore.kernel.org/linux-mm/20230330155707.3106228-1-peterx@redhat.com/)** + +> [Sorry for the test case bomb] +> +> This patchset splits userfaultfd.c into two tests: +> +> - uffd-stress: the "vanilla", old and powerful stress test +> - uffd-unit-tests: all the unit tests will be moved here +> +> This is on my todo list for a long time but I never did it for real. The +> uffd test is growing into a small and cute monster. I start to notice it's +> going harder to maintain such a test and make it useful. +> + +**[v2: bio: check return values of bio_add_page](http://lore.kernel.org/linux-mm/cover.1680172791.git.johannes.thumshirn@wdc.com/)** + +> We have two functions for adding a page to a bio, __bio_add_page() which is +> used to add a single page to a freshly created bio and bio_add_page() which is +> used to add a page to an existing bio. +> +> While __bio_add_page() is expected to succeed, bio_add_page() can fail. +> +> This series converts the callers of bio_add_page() which can easily use +> __bio_add_page() to using it and checks the return of bio_add_page() for +> callers that don't work on a freshly created bio. +> +> Lastly it marks bio_add_page() as __must_check so we don't have to go again +> and audit all callers. +> + +**[v1: mm: ksm: support hwpoison for ksm page](http://lore.kernel.org/linux-mm/20230330074501.205092-1-xialonglong1@huawei.com/)** + +> Currently, ksm does not support hwpoison. As ksm is being used more widely +> for deduplication at the system level, container level, and process level, +> supporting hwpoison for ksm has become increasingly important. However, ksm +> pages were not processed by hwpoison in 2009 [1]. +> + +**[v1: kmemleak-test: Optimize kmemleak_test.c build flow](http://lore.kernel.org/linux-mm/20230330060904.292975-1-gehao@kylinos.cn/)** + +> Now kmemleak-test.c is moved to samples directory, +> if CONFIG_DEBUG_KMEMLEAK_TEST=m,but CONFIG_SAMPLES +> is not set,it will be meaningless. +> +> So we will remove CONFIG_DEBUG_KMEMLEAK_TEST and +> add CONFIG_SAMPLE_KMEMLEAK which in samples directory +> to control kmemleak-test.c build or not +> + +**[v3: regmap: Add basic maple tree register cache](http://lore.kernel.org/linux-mm/20230325-regcache-maple-v3-0-23e271f93dc7@kernel.org/)** + +> The current state of the art for sparse register maps is the +> rbtree cache. This works well for most applications but isn't +> always ideal for sparser register maps since the rbtree can get +> deep, requiring a lot of walking. Fortunately the kernel has a +> data structure intended to address this very problem, the maple +> tree. Provide an initial implementation of a register cache +> based on the maple tree to start taking advantage of it. +> + +**[v12: Memory poison recovery in khugepaged collapsing](http://lore.kernel.org/linux-mm/20230329151121.949896-1-jiaqiyan@google.com/)** + +> Memory DIMMs are subject to multi-bit flips, i.e. memory errors. +> As memory size and density increase, the chances of and number of +> memory errors increase. The increasing size and density of server +> RAM in the data center and cloud have shown increased uncorrectable +> memory errors. There are already mechanisms in the kernel to recover +> from uncorrectable memory errors. This series of patches provides +> the recovery mechanism for the particular kernel agent khugepaged +> when it collapses memory pages. +> + +#### 文件系统 + +**[v1: fs: consolidate duplicate dt_type helpers](http://lore.kernel.org/linux-fsdevel/20230330104144.75547-1-jlayton@kernel.org/)** + +> There are three copies of the same dt_type helper sprinkled around the +> tree. Convert them to use the common fs_umode_to_dtype function instead, +> which has the added advantage of properly returning DT_UNKNOWN when +> given a mode that contains an unrecognized type. +> + +**[v2: fs: consolidate dt_type() helper definitions](http://lore.kernel.org/linux-fsdevel/20230330000157.297698-1-jlayton@kernel.org/)** + +> There are 4 functions named dt_type() in the kernel. There is also the +> S_DT macro in fs_types.h. +> +> Replace the S_DT macro with a static inline named dt_type, and have all +> of the existing copies call that instead. The v9fs helper is renamed to +> distinguish it from the others. +> + +**[v8: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20230327084103.21601-1-anuj20.g@samsung.com/)** + +> The patch series covers the points discussed in November 2021 virtual +> call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0]. +> We have covered the initial agreed requirements in this patchset and +> further additional features suggested by community. +> Patchset borrows Mikulas's token based approach for 2 bdev +> implementation. +> + +**[v1: zonefs: Always invalidate last cache page on append write](http://lore.kernel.org/linux-fsdevel/20230329055823.1677193-1-damien.lemoal@opensource.wdc.com/)** + +> When a direct append write is executed, the append offset may correspond +> to the last page of an inode which might have been cached already by +> buffered reads, page faults with mmap-read or non-direct readahead. +> To ensure that the on-disk and cached data is consistant for such last +> cached page, make sure to always invalidate it in +> zonefs_file_dio_append(). This invalidation will always be a no-op when +> the device block size is equal to the page size (e.g. 4K). +> + +#### 网络设备 + +**[v1: bpf-next: Add FOU support for externally controlled ipip devices](http://lore.kernel.org/netdev/cover.1680379518.git.cehrig@cloudflare.com/)** + +> This patch set adds support for using FOU or GUE encapsulation with +> an ipip device operating in collect-metadata mode and a set of kfuncs +> for controlling encap parameters exposed to a BPF tc-hook. +> +> BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses) +> in the ingress path of an externally controlled tunnel interface via +> the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be +> redirected to the same or a different externally controlled tunnel +> interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt} +> helpers and a call to bpf_redirect. This enables us to redirect packets +> between tunnel interfaces - and potentially change the encapsulation +> type - using only a single BPF program. +> + +**[v1: net-next: ice: lower CPU usage with GNSS](http://lore.kernel.org/netdev/20230401172659.38508-1-mschmidt@redhat.com/)** + +> This series lowers the CPU usage of the ice driver when using its +> provided /dev/gnss*. +> +> Intel engineers, in addition to reviewing the patches for correctness, +> please also consider my doubts expressed in the descriptions of patches +> 1 and 2. There may be better solutions possible. +> + +**[v1: net: ethernet: mtk_eth_soc: use be32 type to store be32 values](http://lore.kernel.org/netdev/20230401-mtk_eth_soc-sparse-v1-1-84e9fc7b8eab@kernel.org/)** + +> Perhaps there is a nicer way to handle this but the code +> calls for converting an array of host byte order 32bit values +> to big endian 32bit values: an ipv6 address to be pretty printed. +> +> Use a sparse-friendly array of be32 to store these values. +> +> Also make use of the cpu_to_be32_array helper rather +> than open coding the conversion. +> + +**[v3: Add EMAC3 support for sa8540p-ride (devicetree/clk bits)](http://lore.kernel.org/netdev/20230331215804.783439-1-ahalaney@redhat.com/)** + +> This is a forward port / upstream refactor of code delivered +> downstream by Qualcomm over at [0] to enable the DWMAC5 based +> implementation called EMAC3 on the sa8540p-ride dev board. +> + +**[v3: net-next: Add EMAC3 support for sa8540p-ride](http://lore.kernel.org/netdev/20230331214549.756660-1-ahalaney@redhat.com/)** + +> This is a forward port / upstream refactor of code delivered +> downstream by Qualcomm over at [0] to enable the DWMAC5 based +> implementation called EMAC3 on the sa8540p-ride dev board. +> + +**[v5: bpf: XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash](http://lore.kernel.org/netdev/168028882260.4030852.1100965689789226162.stgit@firesoul/)** + +> Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value, +> but doesn't provide information on the RSS hash type (part of 6.3-rc). +> +> This patchset proposal is to change the function call signature via adding +> a pointer value argument for providing the RSS hash type. +> + +**[v1: iproute2-next: tc: m_tunnel_key: support code for "nofrag" tunnels](http://lore.kernel.org/netdev/c43213bed30edfa0d6fa1b084e4d48c26417edc9.1680281221.git.dcaratti@redhat.com/)** + +> add control plane for setting TCA_TUNNEL_KEY_NO_FRAG flag on +> act_tunnel_key actions. +> + +**[v1: net-next: mlxsw: Use static trip points for transceiver modules](http://lore.kernel.org/netdev/cover.1680272119.git.petrm@nvidia.com/)** + +> Ido Schimmel writes: +> +> See patch #1 for motivation and implementation details. +> +> Patches #2-#3 are simple cleanups as a result of the changes in the +> first patch. +> + +**[v1: iproute2: ip-xfrm: accept "allow" as action in ip xfrm policy setdefault](http://lore.kernel.org/netdev/dc8c3fcd81a212e47547ae59ee6857ce25048ddd.1680268153.git.sd@queasysnail.net/)** + +> The help text claims that setdefault takes ACTION values, ie block | +> allow. In reality, xfrm_str_to_policy takes block | accept. +> +> We could also fix that by changing the help text/manpage, but then +> it'd be frustrating to have multiple ACTION with similar values used +> in different subcommands. +> + +**[v1: net-next: net: phy: introduce phy_reg_field interface](http://lore.kernel.org/netdev/20230331123259.567627-1-radu-nicolae.pirea@oss.nxp.com/)** + +> Some PHYs can be heavily modified between revisions, and the addresses of +> the registers are changed and the register fields are moved from one +> register to another. +> +> To integrate more PHYs in the same driver with the same register fields, +> but these register fields were located in different registers at +> + +**[v1: net-next: ice: allow matching on metadata](http://lore.kernel.org/netdev/20230331105747.89612-1-michal.swiatkowski@linux.intel.com/)** + +> This patchset is intended to improve the usability of the switchdev +> slow path. Without matching on a metadata values slow path works +> based on VF's MAC addresses. It causes a problem when the VF wants +> to use more than one MAC address (e.g. when it is in trusted mode). +> + +**[v6: net-next: sfc: support unicast PTP](http://lore.kernel.org/netdev/20230331111404.17256-1-ihuguet@redhat.com/)** + +> Unicast PTP was not working with sfc NICs. +> +> The reason was that these NICs don't timestamp all incoming packets, +> but instead they only timestamp packets of the queues that are selected +> for that. Currently, only one RX queue is configured for timestamp: the +> RX queue of the PTP channel. The packets that are put in the PTP RX +> queue are selected according to firmware filters configured from the +> driver. +> + +**[v1: net-next: net: stmmac: publish actual MTU restriction](http://lore.kernel.org/netdev/20230331092344.268981-1-vinschen@redhat.com/)** + +> Apart from devices setting the max MTU value from device tree, +> the initialization functions in many drivers use a default value +> of JUMBO_LEN. +> +> However, that doesn't reflect reality. The stmmac_change_mtu +> function restricts the MTU to the size of a single queue in the TX +> FIFO. +> + +**[v1: net-next: net: stmmac: allow ethtool action on PCI devices if device is down](http://lore.kernel.org/netdev/20230331092341.268964-1-vinschen@redhat.com/)** + +> So far stmmac is only able to handle ethtool commands if the device +> is UP. However, PCI devices usually just have to be in the active +> state for ethtool commands. +> + +**[v2: net: dsa: mv88e6xxx: Reset mv88e6393x force WD event bit](http://lore.kernel.org/netdev/20230331084014.1144597-1-gustav.ekelund@axis.com/)** + +> The force watchdog event bit is not cleared during SW reset in the +> mv88e6393x switch. This is a different behavior compared to mv886390 which +> clears the force WD event bit as advertised. This causes a force WD event +> to be handled over and over again as the SW reset following the event never +> clears the force WD event bit. +> + +**[v1: qlcnic: check pci_reset_function result](http://lore.kernel.org/netdev/20230331080605.42961-1-den-plotnikov@yandex-team.ru/)** + +> Static code analyzer complains to unchecked return value. +> It seems that pci_reset_function return something meaningful +> only if "reset_methods" is set. +> Even if reset_methods isn't used check the return value to avoid +> possible bugs leading to undefined behavior in the future. +> + +**[v1: net: vsock/vmci: convert VMCI error code to -ENOMEM on send](http://lore.kernel.org/netdev/2c3aeeac-2fcb-16f6-41cd-c0ca4e6a6d3e@sberdevices.ru/)** + +> This adds conversion of VMCI specific error code to general -ENOMEM. It +> is needed, because af_vsock.c passes error value returned from transport +> to the user, which does not expect to get VMCI_ERROR_* values. +> + +**[v2: net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT](http://lore.kernel.org/netdev/1680248937-16617-1-git-send-email-quic_srichara@quicinc.com/)** + +> On the remote side, when QRTR socket is removed, af_qrtr will call +> qrtr_port_remove() which broadcasts the DEL_CLIENT packet to all neighbours +> including local NS. NS upon receiving the DEL_CLIENT packet, will remove +> the lookups associated with the node:port and broadcasts the DEL_SERVER +> packet. +> + +#### 安全增强 + +**[v1: LoongArch: Add kernel address sanitizer support](http://lore.kernel.org/linux-hardening/20230328111714.2056-1-zhangqing@loongson.cn/)** + +> 1/8 of kernel addresses reserved for shadow memory. But for LoongArch, +> There are a lot of holes between different segments and valid address +> space(256T available) is insufficient to map all these segments to kasan +> shadow memory with the common formula provided by kasan core, saying +> addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET +> + +**[[RFC/RFT,V2] CFI: Add support for gcc CFI in aarch64](http://lore.kernel.org/linux-hardening/20230325085416.95191-1-ashimida.1990@gmail.com/)** + +> Based on Sami's patch[1], this patch makes the corresponding kernel +> configuration of CFI available when compiling the kernel with the gcc[2]. +> + +#### 异步 IO + +**[v6: io_uring/ublk: add generic IORING_OP_FUSED_CMD](http://lore.kernel.org/io-uring/20230330113630.1388860-1-ming.lei@redhat.com/)** + +> Hello Jens and Guys, +> +> Add generic fused command, which can include one primary command and multiple +> secondary requests. This command provides one safe way to share resource between +> primary command and secondary requests, and primary command is always +> completed after all secondary requests are done, and resource lifetime +> is bound with primary command. +> + +**[v5: io_uring/ublk: add IORING_OP_FUSED_CMD](http://lore.kernel.org/io-uring/20230328150958.1253547-1-ming.lei@redhat.com/)** + +> Hello Jens, +> +> Add IORING_OP_FUSED_CMD, it is one special URING_CMD, the 1st SQE(primary) is +> one 64byte URING_CMD, and the 2nd 64byte SQE(secondary) is another normal +> 64byte OP. The primary command provides device/file io buffer and +> submits OP represented by the secondary SQE using the provided buffer. This way +> solves ublk zero copy problem easily, since io buffer shares same lifetime with +> the primary command. +> + +**[v1: io_uring/poll: clear single/double poll flags on poll arming](http://lore.kernel.org/io-uring/61e3fefd-0a99-5916-c049-9143d3342379@kernel.dk/)** + +> Unless we have at least one entry queued, then don't call into +> io_poll_remove_entries(). Normally this isn't possible, but if we +> retry poll then we can have ->nr_entries cleared again as we're +> setting it up. If this happens for a poll retry, then we'll still have +> at least REQ_F_SINGLE_POLL set. io_poll_remove_entries() then thinks +> it has entries to remove. +> + +#### Rust For Linux + +**[v4: Rust pin-init API for pinned initialization of structs](http://lore.kernel.org/rust-for-linux/20230331215053.585759-1-y86-dev@protonmail.com/)** + +> This is the fourth version of the pin-init API. See [1] for v3. +> +> The tree at [2] contains these patches applied on top of 6.3-rc1. +> The Rust-doc documentation of the pin-init API can be found at [3]. +> +> These patches are a long way coming, since I held a presentation on +> safe pinned initialization at Kangrejos [4]. And my discovery of this +> problem was almost a year ago [5]. +> + +**[v2: rust: error: Add missing wrappers to convert to/from kernel error codes](http://lore.kernel.org/rust-for-linux/20230224-rust-error-v2-0-3900319812da@asahilina.net/)** + +> This series is part of the set of dependencies for the drm/asahi +> Apple M1/M2 GPU driver. +> +> It adds a bunch of missing wrappers in kernel::error, which are useful +> to convert to/from kernel error codes. Since these will be used by many +> abstractions coming up soon, I think it makes sense to merge them as +> soon as possible instead of bundling them with the first user. Hence, +> they have allow() tags to silence dead code warnings. These can be +> removed as soon as the first user is in the kernel crate. +> + +**[v1: rust: Add uapi crate](http://lore.kernel.org/rust-for-linux/20230329-rust-uapi-v1-0-ee78f2933726@asahilina.net/)** + +> In general, direct bindgen bindings for C kernel APIs are not intended +> to be used by drivers outside of the `kernel` crate. However, some +> drivers do need to interact directly with UAPI definitions to implement +> userspace APIs. +> + +#### BPF + +**[v2: bpf-next: bpf: optimize hashmap lookups when key_size is divisible by 4](http://lore.kernel.org/bpf/20230401200602.3275-1-aspsk@isovalent.com/)** + +> The BPF hashmap uses the jhash() hash function. There is an optimized version +> of this hash function which may be used if hash size is a multiple of 4. Apply +> this optimization to the hashmap in a similar way as it is done in the bloom +> filter map. +> + +**[v3: bpf-next: Prepare veristat for packaging](http://lore.kernel.org/bpf/20230331222405.3468634-1-andrii@kernel.org/)** + +> This patch set relicenses veristat.c to dual GPL-2.0/BSD-2 license and +> prepares it to be mirrored to Github at libbpf/veristat repo. +> +> Few small issues in the source code are fixed, found during Github sync +> preparetion. +> + +**[v2: bpf-next: Enable RCU semantics for task kptrs](http://lore.kernel.org/bpf/20230331195733.699708-1-void@manifault.com/)** + +> In commit 22df776a9a86 ("tasks: Extract rcu_users out of union"), the +> 'refcount_t rcu_users' field was extracted out of a union with the +> 'struct rcu_head rcu' field. This allows us to use the field for +> refcounting struct task_struct with RCU protection, as the RCU callback +> no longer flips rcu_users to be nonzero after the callback is scheduled. +> + +**[v10: evm: Do HMAC of multiple per LSM xattrs for new inodes](http://lore.kernel.org/bpf/20230331123221.3273328-1-roberto.sassu@huaweicloud.com/)** + +> One of the major goals of LSM stacking is to run multiple LSMs side by side +> without interfering with each other. The ultimate decision will depend on +> individual LSM decision. +> + +**[v1: bpf-next: veristat: change guess for __sk_buff from CGROUP_SKB to SCHED_CLS](http://lore.kernel.org/bpf/20230330190115.3942962-1-andrii@kernel.org/)** + +> SCHED_CLS seems to be a better option as a default guess for freplace +> programs that have __sk_buff as a context type. +> + +**[[PATCH bpf RFC-V3 0/5] XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash](http://lore.kernel.org/bpf/168019602958.3557870.9960387532660882277.stgit@firesoul/)** + +> Notice targeted 6.3-rc kernel via bpf git tree. +> +> Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value, +> but doesn't provide information on the RSS hash type (part of 6.3-rc). +> + +**[v5: bpf-next: bpf: Add socket destroy capability](http://lore.kernel.org/bpf/20230330151758.531170-1-aditi.ghag@isovalent.com/)** + +> This patch adds the capability to destroy sockets in BPF. We plan to use +> the capability in Cilium to force client sockets to reconnect when their +> remote load-balancing backends are deleted. The other use case is +> on-the-fly policy enforcement where existing socket connections prevented +> by policies need to be terminated. +> + +**[v2: bpf-next: kallsyms: move module-related functions under correct configs](http://lore.kernel.org/bpf/20230330102001.2183693-1-vmalik@redhat.com/)** + +> Functions for searching module kallsyms should have non-empty +> definitions only if CONFIG_MODULES=y and CONFIG_KALLSYMS=y. Until now, +> only CONFIG_MODULES check was used for many of these, which may have +> caused complilation errors on some configs. +> +> This patch moves all relevant functions under the correct configs. +> + +**[v1: bpf-next: bpf: Improve verifier for cond_op and spilled loop index variables](http://lore.kernel.org/bpf/20230330055600.86870-1-yhs@fb.com/)** + +> LLVM commit [1] introduced hoistMinMax optimization like +> (i < VIRTIO_MAX_SGS) && (i < out_sgs) +> to +> upper = MIN(VIRTIO_MAX_SGS, out_sgs) +> ... i < upper ... +> and caused the verification failure. Commit [2] workarounded the issue by +> adding some bpf assembly code to prohibit the above optimization. +> This patch improved verifier such that verification can succeed without +> the above workaround. +> + +**[v1: bpf-next: Teach verifier to determine necessary log buffer size](http://lore.kernel.org/bpf/20230330041642.1118787-1-andrii@kernel.org/)** + +> My imagination is failing me on how to succinctly name this feature and patch +> set, but the point here is to perform internal accounting of what should be +> the necessary size of user-supplied log buffer such as to fit entire log +> contents without truncation, thus avoiding -ENOSPC. +> + +**[v2: bpf-next: xsk: Support UMEM chunk_size > PAGE_SIZE](http://lore.kernel.org/bpf/20230329180502.1884307-1-kal.conley@dectris.com/)** + +> The main purpose of this patchset is to add AF_XDP support for UMEM +> chunk sizes > PAGE_SIZE. This is enabled for UMEMs backed by HugeTLB +> pages. +> + +**[[PATCH bpf RFC-V2 0/5] XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash](http://lore.kernel.org/bpf/168010726310.3039990.2753040700813178259.stgit@firesoul/)** + +> Notice targeted 6.3-rc kernel via bpf git tree. +> +> Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value, +> but doesn't provide information on the RSS hash type (part of 6.3-rc). +> +> This patchset proposal is to use the return value from +> bpf_xdp_metadata_rx_hash() to provide the RSS hash type. +> + +**[v2: bpf-next: Allow BPF TCP CCs to write app_limited](http://lore.kernel.org/bpf/20230329073558.8136-1-bobankhshen@gmail.com/)** + +> This series allow BPF TCP CCs to write app_limited of struct +> tcp_sock. A built-in CC or one from a kernel module is already +> able to write to app_limited of struct tcp_sock. Until now, +> a BPF CC doesn't have write access to this member of struct +> tcp_sock. +> + +**[v2: bpf-next: BPF verifier rotating log](http://lore.kernel.org/bpf/20230328235610.3159943-1-andrii@kernel.org/)** + +> This patch set changes BPF verifier log behavior to behave as a rotating log, +> by default. If user-supplied log buffer is big enough to contain entire +> verifier log output, there is no effective difference. But where previously +> user supplied too small log buffer and would get -ENOSPC error result and the +> beginning part of the verifier log, now there will be no error and user will +> get ending part of verifier log filling up user-supplied log buffer. Which +> is, in absolute majority of cases, is exactly what's useful, relevant, and +> what users want and need, as the ending of the verifier log is containing +> details of verifier failure and relevant state that got us to that failure. So +> this rotating mode is made default, but for some niche advanced debugging +> scenarios it's possible to request old behavior by specifying additional +> BPF_LOG_FIXED (8) flag. +> + +**[v2: memcg: make rstat flushing irq and sleep](http://lore.kernel.org/bpf/20230328221644.803272-1-yosryahmed@google.com/)** + +> Patches 1 and 2 are cleanups requested during reviews of prior versions +> of this series. +> +> Patch 3 makes sure we never try to flush from within an irq context, and +> patch 4 adds a WARN_ON_ONCE() to make sure we catch any violations. +> + +**[v1: Allow BPF TCP CCs to write app_limited](http://lore.kernel.org/bpf/20230328132035.50839-1-bobankhshen@gmail.com/)** + +> This series allow BPF TCP CCs to write app_limited of struct +> tcp_sock. A built-in CC or one from a kernel module is already +> able to write to app_limited of struct tcp_sock. Until now, +> a BPF CC doesn't have write access to this member of struct +> tcp_sock. +> + +**[v2: bpf-next: selftests/bpf: Rewrite two infinite loops in bound check cases](http://lore.kernel.org/bpf/20230329011048.1721937-1-xukuohai@huaweicloud.com/)** + +> The two infinite loops in bound check cases added by commit +> increased the execution time of test_verifier from about 6 seconds to +> about 9 seconds. Rewrite these two infinite loops to finite loops to get +> rid of this extra time cost. +> + +**[v1: net-next: virtio_net: refactor xdp codes](http://lore.kernel.org/bpf/20230328120412.110114-1-xuanzhuo@linux.alibaba.com/)** + +> Due to historical reasons, the implementation of XDP in virtio-net is relatively +> chaotic. For example, the processing of XDP actions has two copies of similar +> code. Such as page, xdp_page processing, etc. +> +> The purpose of this patch set is to refactor these code. Reduce the difficulty +> of subsequent maintenance. Subsequent developers will not introduce new bugs +> because of some complex logical relationships. +> + +**[v1: net-next: bpf, net: support redirecting to ifb with bpf](http://lore.kernel.org/bpf/20230328115105.13553-1-laoar.shao@gmail.com/)** + +> In our container environment, we are using EDT-bpf to limit the egress +> bandwidth. EDT-bpf can be used to limit egress only, but can't be used +> to limit ingress. Some of our users also want to limit the ingress +> bandwidth. But after applying EDT-bpf, which is based on clsact qdisc, +> it is impossible to limit the ingress bandwidth currently, due to some +> reasons, +> 1). We can't add ingress qdisc +> The ingress qdisc can't coexist with clsact qdisc as clsact has both +> ingress and egress handler. So our traditional method to limit ingress +> bandwidth can't work any more. +> 2). We can't redirect ingress packet to ifb with bpf +> By trying to analyze if it is possible to redirect the ingress packet to +> ifb with a bpf program, we find that the ifb device is not supported by +> bpf redirect yet. +> + +**[v2: loongarch/bpf: Skip speculation barrier opcode, which caused ltp testcase bpf_prog02 to fail](http://lore.kernel.org/bpf/20230328071335.2664966-1-guodongtai@kylinos.cn/)** + +> Here just skip the opcode(BPF_ST | BPF_NOSPEC) that has no couterpart to the loongarch. +> +> To verify, use ltp testcase: +> +> Without this patch: +> $ ./bpf_prog02 +> ... ... +> bpf_common.c:123: TBROK: Failed verification: ??? (524) +> + +**[v1: bpf-next: verifier/xdp_direct_packet_access.c converted to inline assembly](http://lore.kernel.org/bpf/20230328020813.392560-1-eddyz87@gmail.com/)** + +> verifier/xdp_direct_packet_access.c automatically converted to inline +> assembly using [1]. +> +> This is a leftover from [2], the last patch in a batch was blocked by +> mail server for being too long. This patch-set splits it in two: +> - one to add migrated test to progs/ +> - one to remove old test from verifier/ +> + +**[v1: bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp](http://lore.kernel.org/bpf/20230328004232.2134233-1-martin.lau@linux.dev/)** + +> While reviewing the udp-iter batching patches, notice the bpf_iter_tcp +> calling sock_put() is incorrect. It should call sock_gen_put instead +> because bpf_iter_tcp is iterating the ehash table which has the +> req sk and tw sk. This patch replaces all sock_put with sock_gen_put +> in the bpf_iter_tcp codepath. +> + +### 周边技术动态 + +#### Qemu + +**[v2: riscv: Add support for the Zfa extension](http://lore.kernel.org/qemu-devel/20230331182824.4104580-1-christoph.muellner@vrull.eu/)** + +> This patch introduces the RISC-V Zfa extension, which introduces +> additional floating-point extensions: +> * fli (load-immediate) with pre-defined immediates +> * fminm/fmaxm (like fmin/fmax but with different NaN behaviour) +> * fround/froundmx (round to integer) +> * fcvtmod.w.d (Modular Convert-to-Integer) +> * fmv* to access high bits of float register bigger than XLEN +> * Quiet comparison instructions (fleq/fltq) +> + +**[v1: target/riscv: Set opcode to env->bins for illegal/virtual instruction fault](http://lore.kernel.org/qemu-devel/20230330034636.44585-1-liweiwei@iscas.ac.cn/)** + +> decode_save_opc() will not work for generate_exception(), since 0 is passed +> to riscv_raise_exception() as pc in helper_raise_exception(), and bins will +> not be restored in this case. +> + +**[v6: target/riscv: rework CPU extensions validation](http://lore.kernel.org/qemu-devel/20230329200856.658733-1-dbarboza@ventanamicro.com/)** + +> This series contains changes proposed by Weiwei Li in v5. +> +> All patches are acked. +> + +#### U-Boot + +**[v2: Add ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/u-boot/20230329102720.25439-1-yanhong.wang@starfivetech.com/)** + +> This series adds ethernet support for the StarFive JH7110 RISC-V SoC. +> The series includes PHY and MAC drivers. The PHY model is +> YT8531 (from Motorcomm Inc), and the MAC version is dwmac-5.20 +> (from Synopsys DesignWare). +> +> The implementation of the phy driver is ported from linux, but it +> has been adjusted for the u-boot framework. +> + +**[v3: Add StarFive JH7110 PCIe drvier support](http://lore.kernel.org/u-boot/20230329100143.10724-1-minda.chen@starfivetech.com/)** + +> The PCIe driver depends on gpio, pinctrl, clk and reset driver to do init. +> The PCIe dts configuation includes all these setting. +> +> The PCIe drivers codes has been tested on the VisionFive V2 boards. +> The test devices includes M.2 NVMe SSD and Realtek 8169 Ethernet adapter. +> + +**[v5: Basic StarFive JH7110 RISC-V SoC support](http://lore.kernel.org/u-boot/20230329034224.26545-1-yanhong.wang@starfivetech.com/)** + +> This series of patches base on the latest branch/master, and add support +> for the StarFive JH7110 RISC-V SoC and VisionFive V2 board. In order for +> this to be achieved, the respective DT nodes have been added, and the +> required defconfigs have been added to the boards' defconfig. What is more, +> the basic required DM drivers have been added, such as reset, clock, pinctrl, +> uart, ram etc. +> + +## 20230326:第 39 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v9: riscv: Use PUD/P4D/PGD pages for the linear mapping](http://lore.kernel.org/linux-riscv/20230324155421.271544-1-alexghiti@rivosinc.com/)** + +> This patchset intends to improve tlb utilization by using hugepages for +> the linear mapping. +> + +**[v7: function_graph: Support recording and printing the return value of function](http://lore.kernel.org/linux-riscv/20230324123731.3801920-1-pengdonglin@sangfor.com.cn/)** + +> When using the function_graph tracer to analyze system call failures, +> it can be time-consuming to analyze the trace logs and locate the kernel +> function that first returns an error. This change aims to simplify the +> process by recording the function return value to the 'retval' member of +> 'ftrace_graph_ent' and printing it when outputing the trace log. +> + +**[v1: RISC-V: convert new selectors of RISCV_ALTERNATIVE to dependencies](http://lore.kernel.org/linux-riscv/20230324121240.3594777-1-conor.dooley@microchip.com/)** + +> for-next contains two additional extensions that select +> RISCV_ALTERNATIVE. RISCV_ALTERNATIVE no longer needs to be selected by +> individual config options as it is now selected for !XIP_KERNEL builds +> by the top level RISCV option. +> These extensions rely on the alternative framework, so convert the +> "select"s to "depends on"s instead. +> + +**[v1: RISC-V: align Svpbmt Kconfig help text with other extensions](http://lore.kernel.org/linux-riscv/20230324092840.3504267-1-conor.dooley@microchip.com/)** + +> Other extensions only capitalise the first letter in Kconfig text +> menus, and provide a short comment about the extension's meaning. +> Do the same for Svpbmt. +> While editing one of the lines, reformat the "spelling" of 64-bit. +> + +**[v4: -next: riscv: jump_label: Optimize the code size with compressed instruction](http://lore.kernel.org/linux-riscv/20230324082320.290410-1-guoren@kernel.org/)** + +> Reduce the size of the static branch instruction and prevent atomic +> update problems when CONFIG_RISCV_ISA_C=y. It also reduces the jump +> range from 1MB to 4KB, but 4KB is enough for the current riscv +> requirement. +> + +**[v11: -next: riscv: Add independent irq/softirq stacks](http://lore.kernel.org/linux-riscv/20230324071239.151677-1-guoren@kernel.org/)** + +> This patch series adds independent irq/softirq stacks to decrease the +> press of the thread stack. Also, add a thread STACK_SIZE config for +> users to adjust the proper size during compile time. +> + +**[v1: riscv: dts: starfive: jh7110: Correct the properties of S7 core](http://lore.kernel.org/linux-riscv/20230324064651.84670-1-hal.feng@starfivetech.com/)** + +> The S7 core has no L1 data cache and MMU, so delete some +> related properties. +> + +**[v8: riscv: Optimize function trace](http://lore.kernel.org/linux-riscv/20230324033342.3177979-1-suagrfillet@gmail.com/)** + +> The first 3 independent patches has been picked in the V7 version of +> this series, this version continues the following 4 patches. +> + +**[v8: Add Ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230324022819.2324-1-samin.guo@starfivetech.com/)** + +> This series adds ethernet support for the StarFive JH7110 RISC-V SoC, +> which includes a dwmac-5.20 MAC driver (from Synopsys DesignWare). +> This series has been tested and works fine on VisionFive-2 v1.2A and +> v1.3B SBC boards. +> + +**[v4: Kconfig: introduce HAS_IOPORT option and select it as necessary](http://lore.kernel.org/linux-riscv/20230323163354.1454196-1-schnelle@linux.ibm.com/)** + +> We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O +> Port access. In a future patch HAS_IOPORT=n will disable compilation of +> the I/O accessor functions inb()/outb() and friends on architectures +> which can not meaningfully support legacy I/O spaces such as s390. +> + +**[v16: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230323145924.4194-1-andy.chiu@sifive.com/)** + +> This patchset is implemented based on vector 1.0 spec to add vector support +> in riscv Linux kernel. There are some assumptions for this implementations. +> + +**[v2: riscv: export cpu/freq invariant to scheduler](http://lore.kernel.org/linux-riscv/20230323123924.3032174-1-suagrfillet@gmail.com/)** + +> RISC-V now manages CPU topology using arch_topology which provides +> CPU capacity and frequency related interfaces to access the cpu/freq +> invariant in possible heterogeneous or DVFS-enabled platforms. +> + +**[v7: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230323045604.536099-1-jeeheng.sia@starfivetech.com/)** + +> This series adds RISC-V Hibernation/suspend to disk support. +> Low level Arch functions were created to support hibernation. +> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write +> cpu state onto the stack, then calling swsusp_save() to save the memory +> image. +> + +**[v1: RISC-V: KVM: Require alternatives](http://lore.kernel.org/linux-riscv/20230322192858.1189272-1-ajones@ventanamicro.com/)** + +> KVM makes use of riscv_has_extension_unlikely() to check for the +> svinval extension. riscv_has_extension_unlikely() is built on +> alternatives, which means KVM should ensure alternatives support +> is available. +> + +**[v1: riscv: require alternatives framework when selecting FPU support](http://lore.kernel.org/linux-riscv/20230322120907.2968494-1-Jason@zx2c4.com/)** + +> When moving switch_to's has_fpu() over to using riscv_has_extension_ +> likely() rather than static branchs, the FPU code gained a dependency on +> the alternatives framework. If CONFIG_RISCV_ALTERNATIVE isn't selected +> when CONFIG_FPU is, then has_fpu() returns false, and switch_to does not +> work as intended. So select CONFIG_RISCV_ALTERNATIVE when CONFIG_FPU is +> selected. +> + +**[v6: Add DMA driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230322094820.24738-1-walker.chen@starfivetech.com/)** + +> This patch series adds dma support for the StarFive JH7110 RISC-V +> SoC. The first patch adds device tree binding. The second patch includes +> dma driver. The last patch adds device node of dma to JH7110 dts. +> + +**[v2: Enable I2S support for RK3588/RK3588S SoCs](http://lore.kernel.org/linux-riscv/20230321215624.78383-1-cristian.ciocaltea@collabora.com/)** + +> There are five I2S/PCM/TDM controllers and two I2S/PCM controllers embedded in +> the RK3588 and RK3588S SoCs. Furthermore, RK3588 provides four additional +> I2S/PCM/TDM controllers. +> + +**[v3: Use dma_default_coherent for devicetree default coherency](http://lore.kernel.org/linux-riscv/20230321110813.26808-1-jiaxun.yang@flygoat.com/)** + +> This series split out second half of my previous series +> "v1: MIPS DMA coherence fixes". +> +> It intends to use dma_default_coherent to determine the default coherency of +> devicetree probed devices instead of hardcoding it with Kconfig options. +> + +**[v6: hwmon: Add StarFive JH71X0 temperature sensor](http://lore.kernel.org/linux-riscv/20230321022644.107027-1-hal.feng@starfivetech.com/)** + +> This adds a driver for the temperature sensor on the JH7100 and JH7110, +> RISC-V SoCs by StarFive Technology Co. Ltd.. The JH7100 is used on the +> BeagleV Starlight board and StarFive VisionFive board. The JH7110 is +> used on the StarFive VisionFive 2 board. +> + +**[v2: Add timer driver for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230320135433.144832-1-xingyu.wu@starfivetech.com/)** + +> This patch serises are to add timer driver for the StarFive JH7110 +> RISC-V SoC. The first patch adds documentation to describe device +> tree bindings. The subsequent patch adds timer driver and support +> JH7110 SoC. The last patch adds device node about timer to JH7110 +> dts. +> + +**[v1: -next: support allocating crashkernel above 4G explicitly on riscv](http://lore.kernel.org/linux-riscv/20230320204244.1637821-1-chenjiahao16@huawei.com/)** + +> On riscv, the current crash kernel allocation logic is trying to +> allocate within 32bit addressible memory region by default, if +> failed, try to allocate without 4G restriction. +> + +**[v6: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230320103750.60295-1-hal.feng@starfivetech.com/)** + +> This patch series adds basic clock, reset & DT support for StarFive +> JH7110 SoC. +> +> You can simply review or test the patches at the link [1]. +> +> [1]: https://github.com/hal-feng/linux/commits/visionfive2-minimal +> + +**[v1: riscv: mm: execute local TLB flush after populating vmemmap](http://lore.kernel.org/linux-riscv/20230320065324.1045276-1-vincent.chen@sifive.com/)** + +> The vmemmap_populate() creates VA to PA mapping for the VMEMMAP area, where +> all "strcut page" are located once CONFIG_SPARSEMEM_VMEMMAP is defined. +> These "struct page" are later initialized in the zone_sizes_init() +> function. However, during this process, no sfence.vma instruction is +> executed for this VMEMMAP area. +> + +**[v1: Deduplicating RISCV cmpxchg.h macros](http://lore.kernel.org/linux-riscv/20230318080059.1109286-1-leobras@redhat.com/)** + +> While studying riscv's cmpxchg.h file, I got really interested in +> understanding how RISCV asm implemented the different versions of +> {cmp,}xchg. +> +> When I understood the pattern, it made sense for me to remove the +> duplications and create macros to make it easier to understand what exactly +> changes between the versions: Instruction sufixes & barriers. +> + +#### 进程调度 + +**[v1: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230325185514.425745-1-yury.norov@gmail.com/)** + +> for_each_cpu() is widely used in kernel, and it's beneficial to create +> a NUMA-aware version of the macro. +> +> Recently added for_each_numa_hop_mask() works, but switching existing +> codebase to it is not an easy process. +> + +**[v2: sched/core: Reduce cost of sched_move_task when config autogroup](http://lore.kernel.org/lkml/20230321064459.39421-1-wuchi.zero@gmail.com/)** + +> Some sched_move_task calls are useless because that +> task_struct->sched_task_group maybe not changed (equals task_group +> of cpu_cgroup) when system enable autogroup. So do some checks in +> sched_move_task. +> + +**[v1: sched: core: Optimize the structure of 'tg_cfs_schedulable_down' function](http://lore.kernel.org/lkml/20230319200255.3640-1-kunyu@nfschina.com/)** + +> Optimize if branches and define in the branch statement +> block parent_quota variable. +> + +#### 内存管理 + +**[v1: regmap: Add basic maple tree register cache](http://lore.kernel.org/linux-mm/20230325-regcache-maple-v1-0-1c76916359fb@kernel.org/)** + +> The current state of the art for sparse register maps is the rbtree cache. +> This works well for most applications but isn't always ideal for sparser +> register maps since the rbtree can get deep, requiring a lot of walking. +> Fortunately the kernel has a data structure intended to address this very +> problem, the maple tree. Provide an initial implementation of a register +> cache based on the maple tree to start taking advantage of it. +> + +**[v3: userfaultfd: convert userfaultfd functions to use folios](http://lore.kernel.org/linux-mm/20230325065608.601391-1-zhangpeng362@huawei.com/)** + +> This patch series converts several userfaultfd functions to use folios. +> +> Change log: +> + +**[v7: -next: Delay the initialization of zswap](http://lore.kernel.org/linux-mm/20230325071420.2246461-1-liushixin2@huawei.com/)** + +> In the initialization of zswap, about 18MB memory will be allocated for +> zswap_pool. Since some users may not use zswap, the zswap_pool is wasted. +> Save memory by delaying the initialization of zswap until enabled. +> + +**[v9: tracing/user_events: Remote write ABI](http://lore.kernel.org/linux-mm/20230324223028.172-1-beaub@linux.microsoft.com/)** + +> As part of the discussions for user_events aligned with user space +> tracers, it was determined that user programs should register a aligned +> value to set or clear a bit when an event becomes enabled. Currently a +> shared page is being used that requires mmap(). Remove the shared page +> implementation and move to a user registered address implementation. +> + +**[v1: mm/damon/sysfs: make more kobj_type structures constant](http://lore.kernel.org/linux-mm/20230324-b4-kobj_type-damon2-v1-1-48ddbf1c8fcf@weissschuh.net/)** + +> Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") +> the driver core allows the usage of const struct kobj_type. +> +> Take advantage of this to constify the structure definition to prevent +> modification at runtime. +> + +**[v2: mm: Be less noisy during memory hotplug](http://lore.kernel.org/linux-mm/20230323174349.35990-1-krckatom@amazon.de/)** + +> Turn a pr_info() into a pr_debug() to prevent dmesg spamming on systems +> where memory hotplug is a frequent operation. +> + +**[v1: selftests/mm: Implement support for arm64 on va](http://lore.kernel.org/linux-mm/20230323105243.2807166-1-chaitanyas.prakash@arm.com/)** + +> The va_128TBswitch selftest is designed and implemented for PowerPC and +> x86 architectures which support a 128TB switch, up to 256TB of virtual +> address space and hugepage sizes of 16MB and 2MB respectively. Arm64 +> platforms on the other hand support a 256Tb switch, up to 4PB of virtual +> address space and a default hugepage size of 512MB when 64k pagesize is +> enabled. +> + +**[v8: convert read_kcore(), vread() to use iterators](http://lore.kernel.org/linux-mm/cover.1679566220.git.lstoakes@gmail.com/)** + +> While reviewing Baoquan's recent changes to permit vread() access to +> vm_map_ram regions of vmalloc allocations, Willy pointed out [1] that it +> would be nice to refactor vread() as a whole, since its only user is +> read_kcore() and the existing form of vread() necessitates the use of a +> bounce buffer. +> + +**[v1: Make rstat flushing IRQ and sleep friendly](http://lore.kernel.org/linux-mm/20230323040037.2389095-1-yosryahmed@google.com/)** + +> Currently, if rstat flushing is invoked using the irqsafe variant +> cgroup_rstat_flush_irqsafe(), we keep interrupts disabled and do not +> sleep for the entire flush operation, which is O(# cpus * # cgroups). +> This can be rather dangerous. +> + +**[v1: iov_iter: Add an iterator-of-iterators](http://lore.kernel.org/linux-mm/3416400.1679508945@warthog.procyon.org.uk/)** + +> Trond Myklebust wrote: +> +> > Add an enum iter_type for ITER_ITER ? :-) +> +> Well, you asked for it... It's actually fairly straightforward once +> ITER_PIPE is removed. +> + +**[v1: memcg v1: provide read access to memory.pressure_level](http://lore.kernel.org/linux-mm/20230322142525.162469-1-flosch@nutanix.com/)** + +> cgroups v1 has a unique way of setting up memory pressure notifications: +> the user opens "memory.pressure_level" of the cgroup they want to +> monitor for pressure, then open "cgroup.event_control" and write the fd +> (among other things) to that file. memory.pressure_level has no other +> use, specifically it does not support any read or write operations. +> Consequently, no handlers are provided, and the file ends up with +> permissions 000. However, to actually use the mechanism, the subscribing +> user must have read access to the file and open the fd for reading, see +> memcg_write_event_control(). +> + +**[v2: Providing mount in memfd_restricted() syscall](http://lore.kernel.org/linux-mm/cover.1679428901.git.ackerleytng@google.com/)** + +> This patchset builds upon the memfd_restricted() system call that was +> discussed in the 'KVM: mm: fd-based approach for supporting KVM' patch +> series, at +> https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/#m7e944d7892afdd1d62a03a287bd488c56e377b0c +> + +**[v1: MAINTAINERS: add myself as vmalloc reviewer](http://lore.kernel.org/linux-mm/55f663af6100c84a71a0065ac0ed22463aa340de.1679421959.git.lstoakes@gmail.com/)** + +> I have recently been involved in both reviewing and submitting patches to +> the vmalloc code in mm and would be willing and happy to help out with +> review going forward if it would be helpful! +> + +#### 文件系统 + +**[v6: ext4: Convert inode preallocation list to an rbtree](http://lore.kernel.org/linux-fsdevel/cover.1679731817.git.ojaswin@linux.ibm.com/)** + +> This patch series aim to improve the performance and scalability of +> inode preallocation by changing inode preallocation linked list to an +> rbtree. I've ran xfstests quick on this series and plan to run auto group +> as well to confirm we have no regressions. +> + +**[v2: Convert most of ext4 to folios](http://lore.kernel.org/linux-fsdevel/20230324180129.1220691-1-willy@infradead.org/)** + +> On top of next-20230321, this converts most of ext4 to use folios instead +> of pages. It does not enable large folios although it fixes some places +> that will need to be fixed before they can be enabled for ext4. It does +> not convert mballoc to use folios. write_begin() and write_end() still +> take a page parameter instead of a folio. +> + +**[v1: fsdax: force clear dirty mark if CoW](http://lore.kernel.org/linux-fsdevel/1679653680-2-1-git-send-email-ruansy.fnst@fujitsu.com/)** + +> XFS allows CoW on non-shared extents to combat fragmentation[1]. The +> old non-shared extent could be mwrited before, its dax entry is marked +> dirty. To be able to delete this entry, clear its dirty mark before +> invalidate_inode_pages2_range(). +> +> [1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/ +> + +**[v1: netfs: Pass a pointer to virt_to_page()](http://lore.kernel.org/linux-fsdevel/20230324102728.712018-1-linus.walleij@linaro.org/)** + +> Like the other calls in this function virt_to_page() expects +> a pointer, not an integer. +> +> However since many architectures implement virt_to_pfn() as +> a macro, this function becomes polymorphic and accepts both a +> (unsigned long) and a (void *). +> +> Fix this up with an explicit cast. +> + +**[v1: Legacy mount option "sloppy" support](http://lore.kernel.org/linux-fsdevel/167963629788.253682.5439077048343743982.stgit@donald.themaw.net/)** + +> There's been some recent discussion about support of the "sloppy" +> mount option. +> +> It's an option that people want to get rid of from time to time and +> when we do we get complaints and end up having to re-instate it. +> +> I think the (fairly) recent mount API changes are the best way to +> eliminate the need for this option over time. +> + +**[v1: vfs: handle sloppy option in fs context monolithic parser](http://lore.kernel.org/linux-fsdevel/167963635629.253682.12145104262169969353.stgit@donald.themaw.net/)** + +> The sloppy option doesn't make sense for fsconfig() and knowedge of how +> to handle this case needs to be present in the caller. It does make +> sense in the legacy options parser, generic_parse_monolithic(), so it +> should allow for it. +> + +**[v1: fs/buffer: adjust the order of might_sleep() in __getblk_gfp()](http://lore.kernel.org/linux-fsdevel/20230323093752.17461-1-gouhao@uniontech.com/)** + +> If 'bh' is found in cache, just return directly. +> might_sleep() is only required on slow paths. +> + +**[v1: fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN](http://lore.kernel.org/linux-fsdevel/1679483469-2-1-git-send-email-ruansy.fnst@fujitsu.com/)** + +> unshare copies data from source to destination. But if the source is +> HOLE or UNWRITTEN extents, we should zero the destination, otherwise the +> result will be unexpectable. +> + +**[v1: fsdax: dedupe should compare the min of two iters' length](http://lore.kernel.org/linux-fsdevel/1679469958-2-1-git-send-email-ruansy.fnst@fujitsu.com/)** + +> In an dedupe corporation iter loop, the length of iomap_iter decreases +> because it implies the remaining length after each iteration. The +> compare function should use the min length of the current iters, not the +> total length. +> + +**[v1: splice: report related fsnotify events](http://lore.kernel.org/linux-fsdevel/20230322062519.409752-1-cccheng@synology.com/)** + +> The fsnotify ACCESS and MODIFY event are missing when manipulating a file +> with splice(2). +> + +**[v2: Add results of early memtest to /proc/meminfo](http://lore.kernel.org/linux-fsdevel/20230321103430.7130-1-tomas.mudrunka@gmail.com/)** + +> Currently the memtest results were only presented in dmesg. +> This adds /proc/meminfo entry which can be easily used by scripts. +> + +**[v1: fuse uring communication](http://lore.kernel.org/linux-fsdevel/20230321011047.3425786-1-bschubert@ddn.com/)** + +> This adds support for uring communication between kernel and +> userspace daemon using opcode the IORING_OP_URING_CMD. The basic +> appraoch was taken from ublk. The patches are in RFC state - +> I'm not sure about all decisions and some questions are marked +> with XXX. +> + +**[v1: Split a folio to any lower order folios](http://lore.kernel.org/linux-fsdevel/20230321004829.2012847-1-zi.yan@sent.com/)** + +> File folio supports any order and people would like to support flexible orders +> for anonymous folio[1] too. Currently, split_huge_page() only splits a huge +> page to order-0 pages, but splitting to orders higher than 0 is also useful. +> This patchset adds support for splitting a huge page to any lower order pages +> and uses it during folio truncate operations. +> + +**[v3: mm: memory-failure: Move memory failure sysctls to its own file](http://lore.kernel.org/linux-fsdevel/20230320074010.50875-1-wangkefeng.wang@huawei.com/)** + +> The sysctl_memory_failure_early_kill and memory_failure_recovery +> are only used in memory-failure.c, move them to its own file. +> + +**[v1: fs: allow to tuck mounts explicitly](http://lore.kernel.org/linux-fsdevel/20230202-fs-move-mount-replace-v1-0-9b73026d5f10@kernel.org/)** + +> Various distributions are adding or are in the process of adding support +> for system extensions and in the future configuration extensions through +> various tools. A more detailed explanation on system and configuration +> extensions can be found on the manpage which is listed below at [1]. +> + +**[v1: 5.10: xfs backports for 5.10.y (from v5.15.103)](http://lore.kernel.org/linux-fsdevel/20230318101529.1361673-1-amir73il@gmail.com/)** + +> Following backports catch up with recent 5.15.y xfs backports. +> +> Patches 1-3 are the backports from the previous 5.15 xfs backports +> round that Chandan requested for 5.4 [1]. +> +> Patches 4-14 are the SGID fixes that I collaborated with Leah [2]. +> Christian has reviewed the backports of his vfs patches to 5.10. +> + +**[v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/)** + +> This removes the dependency on interrupts to wake up task. Set task +> state as TASK_RUNNING, if need_resched() returns true, +> while polling for IO completion. +> Earlier, polling task used to sleep, relying on interrupt to wake it up. +> This made some IO take very long when interrupt-coalescing is enabled in +> NVMe. +> + +#### 网络设备 + +**[v1: net-next: Support MACsec VLAN](http://lore.kernel.org/netdev/20230326072636.3507-1-ehakim@nvidia.com/)** + +> Dear maintainers, +> +> This patch series introduces support for hardware (HW) offload MACsec +> devices with VLAN configuration. The patches address both scenarios +> where the VLAN header is both the inner and outer header for MACsec. +> + +**[v1: return errors other than -ENOMEM to socket](http://lore.kernel.org/netdev/97f19214-ba04-c47e-7486-72e8aa16c690@sberdevices.ru/)** + +> this patchset removes behaviour, where error code returned from any +> transport was always switched to ENOMEM. This works in the same way as +> patch from Bobby Eshleman: +> commit c43170b7e157 ("vsock: return errors other than -ENOMEM to socket"), +> but for receive calls. +> +> vsock_test suite is also updated. +> + +**[v5: net-next: allocate multiple skbuffs on tx](http://lore.kernel.org/netdev/b0d15942-65ba-3a32-ba8d-fed64332d8f6@sberdevices.ru/)** + +> This adds small optimization for tx path: instead of allocating single +> skbuff on every call to transport, allocate multiple skbuff's until +> credit space allows, thus trying to send as much as possible data without +> return to af_vsock.c. +> + +**[v19: net-next: vmxnet3: Add XDP support.](http://lore.kernel.org/netdev/20230325172828.24923-1-witu@nvidia.com/)** + +> The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT. +> +> Background: +> The vmxnet3 rx consists of three rings: ring0, ring1, and dataring. +> For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma +> mapped to the ring's descriptor. If LRO is enabled and packet size larger +> than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of +> the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is +> allocated using alloc_page. So for LRO packets, the payload will be in one +> buffer from r0 and multiple from r1, for non-LRO packets, only one +> descriptor in r0 is used for packet size less than 3k. +> + +**[v1: net: stmmac: don't reject VLANs when IFF_PROMISC is set](http://lore.kernel.org/netdev/20230325112815.3053288-1-vladimir.oltean@nxp.com/)** + +> First, take the case of a Linux bridge. If the kernel is compiled with +> CONFIG_BRIDGE_VLAN_FILTERING=y, then this bridge shall have a VLAN +> database. The bridge shall try to call vlan_add_vid() on its bridge +> ports for each VLAN in the VLAN table. It will do this irrespectively of +> whether that port is *currently* VLAN-aware or not. So it will do this +> even when the bridge was created with vlan_filtering 0. +> But the Linux bridge, in VLAN-unaware mode, configures its ports in +> promiscuous (IFF_PROMISC) mode, so that they accept packets with any +> MAC DA (a switch must do this in order to forward those packets which +> are not directly targeted to its MAC address). +> + +**[v1: driver core: class: mark the struct class for sysfs callbacks as constant](http://lore.kernel.org/netdev/20230325084537.3622280-1-gregkh@linuxfoundation.org/)** + +> struct class should never be modified in a sysfs callback as there is +> nothing in the structure to modify, and frankly, the structure is almost +> never used in a sysfs callback, so mark it as constant to allow struct +> class to be moved to read-only memory. +> +> While we are touching all class sysfs callbacks also mark the attribute +> as constant as it can not be modified. The bonding code still uses this +> structure so it can not be removed from the function callbacks. +> + +**[v2: net-next: tools: ynl: fill in some gaps of ethtool spec](http://lore.kernel.org/netdev/20230324225656.3999785-1-sdf@google.com/)** + +> I was trying to fill in the spec while exploring ethtool API for some +> related work. I don't think I'll have the patience to fill in the rest, +> so decided to share whatever I currently have. +> + +**[v1: net-next: net: phy: bcm7xxx: use devm_clk_get_optional_enabled to simplify the code](http://lore.kernel.org/netdev/5603487f-3b80-b7ec-dbd2-609fa8020e58@gmail.com/)** + +> Use devm_clk_get_optional_enabled to simplify the code. +> + +**[v4: net-next: ynl: add support for user headers and struct attrs](http://lore.kernel.org/netdev/20230324191900.21828-1-donald.hunter@gmail.com/)** + +> Add support for user headers and struct attrs to YNL. This patchset adds +> features to ynl and add a partial spec for openvswitch that demonstrates +> use of the features. +> + +**[v1: net-next: tools: ynl: default to treating enums as flags for mask generation](http://lore.kernel.org/netdev/20230324190356.2418748-1-kuba@kernel.org/)** + +> I was a bit too optimistic in commit bf51d27704c9 ("tools: ynl: fix +> get_mask utility routine"), not every mask we use is necessarily +> coming from an enum of type "flags". We also allow flipping an +> enum into flags on per-attribute basis. That's done by +> the 'enum-as-flags' property of an attribute. +> + +**[v6: net-next: pds_core driver](http://lore.kernel.org/netdev/20230324190243.27722-1-shannon.nelson@amd.com/)** + +> This patchset implements a new driver for use with the AMD/Pensando +> Distributed Services Card (DSC), intended to provide core configuration +> services through the auxiliary_bus and through a couple of EXPORTed +> functions for use initially in VFio and vDPA feature specific drivers. +> + +**[v1: net-next: selftests: tls: add a test for queuing data before setting the ULP](http://lore.kernel.org/netdev/20230324181757.2407412-1-kuba@kernel.org/)** + +> Other tests set up the connection fully on both ends before +> communicating any data. Add a test which will queue up TLS +> records to TCP before the TLS ULP is installed. +> + +**[v1: net-next: net: phy: move getting (R)MII refclock to phylib](http://lore.kernel.org/netdev/0c529488-0fd8-19e1-c5a9-9cf1fab78ed3@gmail.com/)** + +> >From c578be6534254bfc3fd627d9d7be07b1bb46f92c Mon Sep 17 00:00:00 2001 +> Few PHY drivers (smsc, bcm7xxx, micrel) get and enable the (R)MII +> reference clock in their probe() callback. Move this common +> functionality to phylib, this allows to remove it from drivers. +> + +**[v1: net-next: net/core: add optional threading for backlog processing](http://lore.kernel.org/netdev/20230324171314.73537-1-nbd@nbd.name/)** + +> When dealing with few flows or an imbalance on CPU utilization, static RPS +> CPU assignment can be too inflexible. Add support for enabling threaded NAPI +> for backlog processing in order to allow the scheduler to better balance +> processing. This helps better spread the load across idle CPUs. +> + +**[v1: net: ice: make writes to /dev/gnssX synchronous](http://lore.kernel.org/netdev/20230324162056.200752-1-mschmidt@redhat.com/)** + +> The current ice driver's GNSS write implementation buffers writes and +> works through them asynchronously in a kthread. That's bad because: +> - The GNSS write_raw operation is supposed to be synchronous[1][2]. +> - There is no upper bound on the number of pending writes. +> Userspace can submit writes much faster than the driver can process, +> consuming unlimited amounts of kernel memory. +> + +**[v4: vdpa_sim: add support for user VA](http://lore.kernel.org/netdev/20230324153607.46836-1-sgarzare@redhat.com/)** + +> This series adds support for the use of user virtual addresses in the +> vDPA simulator devices. +> +> The main reason for this change is to lift the pinning of all guest memory. +> Especially with virtio devices implemented in software. +> + +**[v1: net: vsock/loopback: use only sk_buff_head.lock to protect the packet queue](http://lore.kernel.org/netdev/20230324115450.11268-1-sgarzare@redhat.com/)** + +> pkt_list_lock was used before commit 71dc9ec9ac7d ("virtio/vsock: +> replace virtio_vsock_pkt with sk_buff") to protect the packet queue. +> After that commit we switched to sk_buff and we are using +> sk_buff_head.lock in almost every place to protect the packet queue +> except in vsock_loopback_work() when we call skb_queue_splice_init(). +> + +**[v1: wpan-next: ieee802154: Handle imited devices](http://lore.kernel.org/netdev/20230324110558.90707-1-miquel.raynal@bootlin.com/)** + +> As rightly pointed out by Alexander a few months ago, ca8210 devices +> will not support sending frames which are not pure datagrams (hardMAC +> wired to the softMAC layer). In order to not confuse users and clarify +> that scanning and beaconing is not supported on these devices, let's add +> a flag to prevent them to be used with the new APIs. +> + +**[v4: bpf-next: xsk: allow remap of fill and/or completion rings](http://lore.kernel.org/netdev/20230324100222.13434-1-nunog@fr24.com/)** + +> The remap of fill and completion rings was frowned upon as they +> control the usage of UMEM which does not support concurrent use. +> At the same time this would disallow the remap of these rings +> into another process. +> +> A possible use case is that the user wants to transfer the socket/ +> UMEM ownership to another process (via SYS_pidfd_getfd) and so +> would need to also remap these rings. +> + +**[v1: Introduce a generic regmap-based MDIO driver](http://lore.kernel.org/netdev/20230324093644.464704-1-maxime.chevallier@bootlin.com/)** + +> When the Altera TSE PCS driver was initially introduced, there were +> comments by Russell that the register layout looked very familiar to the +> existing Lynx PCS driver, the only difference being that the TSE PCS +> driver is memory-mapped whereas the Lynx PCS driver sits on an MDIO bus. +> + +**[[PATCH RESEND net-next 0/3] Constify a few sfp/phy fwnodes](http://lore.kernel.org/netdev/ZB1sBYQnqWbGoasq@shell.armlinux.org.uk/)** + +> This series constifies a bunch of fwnode_handle pointers that are only +> used to refer to but not modify the contents of the fwnode structures. +> +> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ +> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! +> + +**[v1: net-next: Constify a few sfp/phy fwnodes](http://lore.kernel.org/netdev/ZB1rNMAJ9oLr8myx@shell.armlinux.org.uk/)** + +> This series constifies a bunch of fwnode_handle pointers that are only +> used to refer to but not modify the contents of the fwnode structures. +> +> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ +> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! +> + +**[v2: net: dsa: b53: mdio: add support for BCM53134](http://lore.kernel.org/netdev/20230324084138.664285-1-noltari@gmail.com/)** + +> This is based on the initial work from Paul Geurts that was sent to the +> incorrect linux development lists and recipients. +> I've simplified his patches by adding BCM53134 to the is531x5() block since it +> seems that the switch doesn't need a special RGMII config. +> + +**[v1: next: rtlwifi: Replace fake flex-array with flex-array member](http://lore.kernel.org/netdev/ZBz4x+MWoI%2Ff65o1@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated and we are +> moving towards adopting C99 flexible-array members instead. +> + +**[v2: net-next: net: phy: Improved PHY error reporting in state machine](http://lore.kernel.org/netdev/20230323214559.3249977-1-f.fainelli@gmail.com/)** + +> When the PHY library calls phy_error() something bad has happened, and +> we halt the PHY state machine. Calling phy_error() from the main state +> machine however is not precise enough to know whether the issue is +> reading the link status or starting auto-negotiation. +> + +**[v3: net, refcount: Address dst_entry reference count scalability issues](http://lore.kernel.org/netdev/20230323102649.764958589@linutronix.de/)** + +> This is version 3 of this series. Version 2 can be found here: +> +> https://lore.kernel.org/lkml/20230307125358.772287565@linutronix.de +> +> Wangyang and Arjan reported a bottleneck in the networking code related to +> struct dst_entry::__refcnt. Performance tanks massively when concurrency on +> a dst_entry increases. +> + +**[v2: net-next: sfc: support TC decap rules](http://lore.kernel.org/netdev/cover.1679603051.git.ecree.xilinx@gmail.com/)** + +> This series adds support for offloading tunnel decapsulation TC rules to +> ef100 NICs, allowing matching encapsulated packets to be decapsulated in +> hardware and redirected to VFs. +> For now an encap match must be on precisely the following fields: +> ethertype (IPv4 or IPv6), source IP, destination IP, ipproto UDP, +> UDP destination port. This simplifies checking for overlaps in the +> driver; the hardware supports a wider range of match fields which +> future driver work may expose. +> + +**[v2: net: vmxnet3: use gro callback when UPT is enabled](http://lore.kernel.org/netdev/20230323200721.27622-1-doshir@vmware.com/)** + +> Currently, vmxnet3 uses GRO callback only if LRO is disabled. However, +> on smartNic based setups where UPT is supported, LRO can be enabled +> from guest VM but UPT devicve does not support LRO as of now. In such +> cases, there can be performance degradation as GRO is not being done. +> + +#### Rust For Linux + +**[v2: rust: macros: Allow specifying multiple module aliases](http://lore.kernel.org/rust-for-linux/20230224-rust-macros-v2-1-7396e8b7018d@asahilina.net/)** + +> Modules can (and usually do) have multiple alias tags, in order to +> specify multiple possible device matches for autoloading. Allow this by +> changing the alias ModuleInfo field to an Option>. +> + +#### 安全增强 + +**[[RFC/RFT,V2 0/3] Add compiler support for Kernel Control Flow Integrity](http://lore.kernel.org/linux-hardening/20230325081117.93245-1-ashimida.1990@gmail.com/)** + +> This series of patches is mainly used to support the control flow +> integrity protection of the linux kernel [1], which is similar to +> -fsanitize=kcfi in clang 16.0 [2,3]. +> + +**[v1: next: uapi: net: ipv6: Replace fake flex-array with flex-array member](http://lore.kernel.org/linux-hardening/ZBy5bNygP5yxnE9k@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated and we are +> moving towards adopting C99 flexible-array members instead. +> + +**[v1: next: wifi: rndis_wlan: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZBtIbU77L9eXqa4j@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated and we are +> moving towards adopting C99 flexible-array members instead. +> +> Address the following warning found with GCC-13 and +> -fstrict-flex-array=3 enabled: +> drivers/net/wireless/rndis_wlan.c:2902:23: warning: array subscript 0 is outside array bounds of ‘struct ndis_80211_auth_request[0]’ [-Warray-bounds=] +> + +#### 异步 IO + +**[v1: io_uring/rw: transform single vector readv/writev into ubuf](http://lore.kernel.org/io-uring/43cb1fb7-b30b-8df1-bba6-e50797d680c6@kernel.dk/)** + +> It's very common to have applications that use vectored reads or writes, +> even if they only pass in a single segment. Obviously they should be +> using read/write at that point, but... +> +> Vectored IO comes with the downside of needing to retain iovec state, +> and hence they require and allocation and state copy if they end up +> getting deferred. Additionally, they also require extra cleanup when +> completed as the memory as the allocated state memory has to be freed. +> + +**[v1: liburing: add multishot timeout support](http://lore.kernel.org/io-uring/20230323233632.2376374-1-davidhwei@meta.com/)** + +> Single change to sync the new IORING_TIMEOUT_MULTISHOT flag with kernel. +> +> Mostly unit tests for multishot timeouts. +> + +**[v1: block/io_uring: pass in issue_flags for uring_cmd task_work handling](http://lore.kernel.org/io-uring/c56fc63e-7e6b-480e-dfdc-417b00802f11@kernel.dk/)** + +> io_uring_cmd_done() currently assumes that the uring_lock is held +> when invoked, and while it generally is, this is not guaranteed. +> Pass in the issue_flags associated with it, so that we have +> IO_URING_F_UNLOCKED available to be able to lock the CQ ring +> appropriately when completing events. +> + + +#### BPF + +**[v1: bpf-next: Don't invoke KPTR_REF destructor on NULL xchg](http://lore.kernel.org/bpf/20230325213144.486885-1-void@manifault.com/)** + +> When a map value is being freed, we loop over all of the fields of the +> corresponding BPF object and issue the appropriate cleanup calls +> corresponding to the field's type. If the field is a referenced kptr, we +> atomically xchg the value out of the map, and invoke the kptr's +> destructor on whatever was there before. +> + +**[v1: bpf-next: First set of verifier/*.c migrated to inline assembly](http://lore.kernel.org/bpf/20230325025524.144043-1-eddyz87@gmail.com/)** + +> This is a follow up for RFC [1]. It migrates a first batch of 38 +> verifier/*.c tests to inline assembly and use of ./test_progs for +> actual execution. The migration is done by a python script (see [2]). +> +> Each migrated verifier/xxx.c file is mapped to progs/verifier_xxx.c +> plus an entry in the prog_tests/verifier.c. One patch per each file. +> + +**[v1: bpf-next: libbpf: synchronize access to print function pointer](http://lore.kernel.org/bpf/20230325010845.46000-1-inwardvessel@gmail.com/)** + +> This patch prevents races on the print function pointer, allowing the +> libbpf_set_print() function to become thread safe. +> + +**[v2: bpf-next: veristat: add better support of freplace programs](http://lore.kernel.org/bpf/20230324232745.3959567-1-andrii@kernel.org/)** + +> Teach veristat how to deal with freplace BPF programs. As they can't be +> directly loaded by veristat without custom user-space part that sets correct +> target program FD, veristat always fails freplace programs. This patch set +> teaches veristat to guess target program type that will be inherited by +> freplace program itself, and subtitute it for BPF_PROG_TYPE_EXT (freplace) one +> for the purposes of BPF verification. +> + +**[v1: bpf-next: bpftool: Add inline annotations when dumping program CFGs](http://lore.kernel.org/bpf/20230324230209.161008-1-quentin@isovalent.com/)** + +> This set contains some improvements for bpftool's "visual" program dump +> option, which produces the control flow graph in a DOT format. The main +> objective is to add support for inline annotations on such graphs, so that +> we can have the C source code for the program showing up alongside the +> instructions, when available. The last commits also make it possible to +> display the line numbers or the bare opcodes in the graph, as supported by +> regular program dumps. +> + +**[v3: Add ftrace direct call for arm64](http://lore.kernel.org/bpf/20230324171451.2752302-1-revest@chromium.org/)** + +> This series adds ftrace direct call support to arm64. +> This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64. +> + +**[v1: capability: test_deny_namespace breakage due to capability conversion to u64](http://lore.kernel.org/bpf/20230324123626.2177476-1-sashal@kernel.org/)** + +> Commit f122a08b197d ("capability: just use a 'u64' instead of a 'u32[2]' +> array") attempts to use BIT_LL() but actually wanted to use BIT_ULL(), +> fix it up to make the test compile and run again. +> + +**[v4: bpf-next: bpf-nex: Add socket destroy capability](http://lore.kernel.org/bpf/20230323200633.3175753-1-aditi.ghag@isovalent.com/)** + +> This patch adds the capability to destroy sockets in BPF. We plan to use +> the capability in Cilium to force client sockets to reconnect when their +> remote load-balancing backends are deleted. The other use case is +> on-the-fly policy enforcement where existing socket connections prevented +> by policies need to be terminated. +> + +**[v2: bpf-next: bpf: add bound tracking for BPF_MOD](http://lore.kernel.org/bpf/20230324045842.729719-1-xukuohai@huaweicloud.com/)** + +> dst_reg is marked as unknown when BPF_MOD instruction is verified, causing +> the following bpf prog to be incorrectly rejected. +> + +**[v12: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230323032405.3735486-1-kuifeng@meta.com/)** + +> Previously, BPF struct_ops didn't go off, as even when the user +> program creating it was terminated, none of these ever were pinned. +> For instance, the TCP congestion control subsystem indirectly +> maintains a reference count on the struct_ops of any registered BPF +> implemented algorithm. Thus, the algorithm won't be deactivated until +> someone deliberately unregisters it. For compatibility with other BPF +> programs, bpf_links have been created to work in coordination with +> struct_ops maps. This ensures that the registration and unregistration +> of these respective maps is carried out at the start and end of the +> bpf_link. +> + +**[v1: bpf-next: bpf: remember meta->iter info only for initialized iters](http://lore.kernel.org/bpf/20230322232502.836171-1-andrii@kernel.org/)** + +> For iter_new() functions iterator state's slot might not be yet +> initialized, in which case iter_get_spi() will return -ERANGE. This is +> expected and is handled properly. But for iter_next() and iter_destroy() +> cases iter slot is supposed to be initialized and correct, so -ERANGE is +> not possible. +> + +**[v3: bpf-next: bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage](http://lore.kernel.org/bpf/20230322215246.1675516-1-martin.lau@linux.dev/)** + +> This set is a continuation of the effort in using +> bpf_mem_cache_alloc/free in bpf_local_storage [1] +> +> Major change is only using bpf_mem_alloc for task and cgrp storage +> while sk and inode stay with kzalloc/kfree. The details is +> in patch 2. +> +> [1]: https://lore.kernel.org/bpf/20230308065936.1550103-1-martin.lau@linux.dev/ +> + +**[v2: bpf-next: error checking where helpers call bpf_map_ops](http://lore.kernel.org/bpf/20230322194754.185781-1-inwardvessel@gmail.com/)** + +> Within bpf programs, the bpf helper functions can make inline calls to +> kernel functions. In this scenario there can be a disconnect between the +> register the kernel function writes a return value to and the register the +> bpf program uses to evaluate that return value. +> + +**[v3: bpf-next: XDP-hints kfuncs for Intel driver igc](http://lore.kernel.org/bpf/167950085059.2796265.16405349421776056766.stgit@firesoul/)** + +> Implemented XDP-hints metadata kfuncs for Intel driver igc. +> +> Primarily used the tool in tools/testing/selftests/bpf/ xdp_hw_metadata, +> when doing driver development of these features. Recommend other driver +> developers to do the same. In the process xdp_hw_metadata was updated to +> help assist development. I've documented my practical experience with igc +> and tool here[1]. +> + +**[v1: net-next: virtio_net: refactor xdp codes](http://lore.kernel.org/bpf/20230322030308.16046-1-xuanzhuo@linux.alibaba.com/)** + +> Due to historical reasons, the implementation of XDP in virtio-net is relatively +> chaotic. For example, the processing of XDP actions has two copies of similar +> code. Such as page, xdp_page processing, etc. +> + +**[v10: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230321232813.3376064-1-kuifeng@meta.com/)** + +> Major changes: +> +> - Create bpf_links in the kernel for BPF struct_ops to register and +> unregister it. +> +> - Enables switching between implementations of bpf-tcp-cc under a +> name instantly by replacing the backing struct_ops map of a +> bpf_link. +> + +**[v2: bpf-next: bpf: Support ksym detection in light skeleton.](http://lore.kernel.org/bpf/20230321203854.3035-1-alexei.starovoitov@gmail.com/)** + +> v1->v2: update denylist on s390 +> +> Patch 1: Cleanup internal libbpf names. +> Patch 2: Teach the verifier that rdonly_mem != NULL. +> Patch 3: Fix gen_loader to support ksym detection. +> Patch 4: Selftest and update denylist. +> + +**[v3: bpf-next: bpf-next: Add socket destroy capability](http://lore.kernel.org/bpf/20230321184541.1857363-1-aditi.ghag@isovalent.com/)** + +> This patch adds the capability to destroy sockets in BPF. We plan to use +> the capability in Cilium to force client sockets to reconnect when their +> remote load-balancing backends are deleted. The other use case is +> on-the-fly policy enforcement where existing socket connections prevented +> by policies need to be terminated. +> + +**[v2: bpf: xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support](http://lore.kernel.org/bpf/167940675120.2718408.8176058626864184420.stgit@firesoul/)** + +> When driver doesn't implement a bpf_xdp_metadata kfunc the fallback +> implementation returns EOPNOTSUPP, which indicate device driver doesn't +> implement this kfunc. +> + +**[v1: tracing: Refuse fprobe if RCU is not watching](http://lore.kernel.org/bpf/20230321020103.13494-1-laoar.shao@gmail.com/)** + +> It hits below warning on my test machine when running +> selftests/bpf/test_progs, +> + +**[v2: bpf-next: net: skbuff: skb bitfield compaction - bpf](http://lore.kernel.org/bpf/20230321014115.997841-1-kuba@kernel.org/)** + +> I'm trying to make more of the sk_buff bits optional. +> Move the BPF-accessed bits a little - because they must +> be at coding-time-constant offsets they must precede any +> optional bit. While at it clean up the naming a bit. +> + +### 周边技术动态 + +#### Qemu + +**[v4: for-8.1: target/riscv: rework CPU extensions validation](http://lore.kernel.org/qemu-devel/20230322222004.357013-1-dbarboza@ventanamicro.com/)** + +> In this version I simplified the logic used in write_misa() after +> reviews from Weiwei Li. The patch that handled RVV activation was +> removed, making RVV a regular MISA bit to activate/deactivate. +> + +**[v3: target/riscv: reduce overhead of MSTATUS_SUM change](http://lore.kernel.org/qemu-devel/20230322121240.232303-1-fei2.wu@intel.com/)** + +> Kernel needs to access user mode memory e.g. during syscalls, the window +> is usually opened up for a very limited time through MSTATUS.SUM, the +> overhead is too much if tlb_flush() gets called for every SUM change. +> + +**[v3: qemu: linux-user: Emulate /proc/cpuinfo output for riscv](http://lore.kernel.org/qemu-devel/324c2fd4-7044-0dd9-7ad9-b716fbefa5d9@gmail.com/)** + +> RISC-V does not expose all extensions via hwcaps, thus some userspace +> applications may want to query these via /proc/cpuinfo. +> + +## 20230319:第 38 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v1: Deduplicating RISCV cmpxchg.h macros](http://lore.kernel.org/linux-riscv/20230318080059.1109286-1-leobras@redhat.com/)** + +> While studying riscv's cmpxchg.h file, I got really interested in +> understanding how RISCV asm implemented the different versions of +> {cmp,}xchg. +> + +**[v1: KVM: RISC-V: Retry fault if vma_lookup() results become invalid](http://lore.kernel.org/linux-riscv/20230317211106.1234484-1-dmatlack@google.com/)** + +> Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can +> detect if the results of vma_lookup() (e.g. vma_shift) become stale +> before it acquires kvm->mmu_lock. This fixes a theoretical bug where a +> VMA could be changed by userspace after vma_lookup() and before KVM +> reads the mmu_invalidate_seq, causing KVM to install page table entries +> based on a (possibly) no-longer-valid vma_shift. +> + +**[v1: riscv: say disabling zicbom if no or bad riscv,cbom-block-size found](http://lore.kernel.org/linux-riscv/20230317134512.254627-1-ben.dooks@codethink.co.uk/)** + +> If Zicbom is present but there was no riscv,cbom-blocks-size property found +> during the cpu feeatures probe, or the cbom-block-size is not valid, then +> the extension will be disabled. Make the print explicitly say this is +> disabled to ensure that there is no confusion about what is being done. +> + +**[v15: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230317113538.10878-1-andy.chiu@sifive.com/)** + +> This patchset is implemented based on vector 1.0 spec to add vector support +> in riscv Linux kernel. There are some assumptions for this implementations. +> +> 1. We assume all harts has the same ISA in the system. +> 2. We disable vector in both kernel andy user space [1] by default. Only +> enable an user's vector after an illegal instruction trap where it +> actually starts executing vector (the first-use trap [2]). +> 3. We detect "riscv,isa" to determine whether vector is support or not. +> +> - [1] https://lore.kernel.org/all/20220921214439.1491510-17-stillson@rivosinc.com/ +> - [2] https://lore.kernel.org/all/73c0124c-4794-6e40-460c-b26df407f322@rivosinc.com/T/#u +> + +**[[PATCH AUTOSEL 4.14] riscv: Bump COMMAND_LINE_SIZE value to 1024](http://lore.kernel.org/linux-riscv/20230316163422.709087-1-sashal@kernel.org/)** + +> [ Upstream commit 61fc1ee8be26bc192d691932b0a67eabee45d12f ] +> +> Increase COMMAND_LINE_SIZE as the current default value is too low +> for syzbot kernel command line. +> +> There has been considerable discussion on this patch that has led to a +> larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all +> ports. That's not quite done yet, but it's gotten far enough we're +> confident this is not a uABI change so this is safe. +> + +**[[PATCH AUTOSEL 5.4] riscv: Bump COMMAND_LINE_SIZE value to 1024](http://lore.kernel.org/linux-riscv/20230316163408.709028-1-sashal@kernel.org/)** + +> [ Upstream commit 61fc1ee8be26bc192d691932b0a67eabee45d12f ] +> +> Increase COMMAND_LINE_SIZE as the current default value is too low +> for syzbot kernel command line. +> +> There has been considerable discussion on this patch that has led to a +> larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all +> ports. That's not quite done yet, but it's gotten far enough we're +> confident this is not a uABI change so this is safe. +> +> [Palmer: it's not uabi] +> + +**[[PATCH AUTOSEL 5.10] riscv: Bump COMMAND_LINE_SIZE value to 1024](http://lore.kernel.org/linux-riscv/20230316163401.708994-1-sashal@kernel.org/)** + +> [ Upstream commit 61fc1ee8be26bc192d691932b0a67eabee45d12f ] +> +> Increase COMMAND_LINE_SIZE as the current default value is too low +> for syzbot kernel command line. +> +> There has been considerable discussion on this patch that has led to a +> larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all +> ports. That's not quite done yet, but it's gotten far enough we're +> confident this is not a uABI change so this is safe. +> +> [Palmer: it's not uabi] +> + +**[v8: riscv: Use PUD/P4D/PGD pages for the linear mapping](http://lore.kernel.org/linux-riscv/20230316131711.1284451-1-alexghiti@rivosinc.com/)** + +> This patchset intends to improve tlb utilization by using hugepages for +> the linear mapping. +> +> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must +> take care of isolating the kernel text and rodata so that they are not +> mapped with a PUD mapping which would then assign wrong permissions to +> the whole region: it is achieved by introducing a new memblock API. +> + +**[v7: Add Ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230316043714.24279-1-samin.guo@starfivetech.com/)** + +> This series adds ethernet support for the StarFive JH7110 RISC-V SoC. +> The series includes MAC driver. The MAC version is dwmac-5.20 (from +> Synopsys DesignWare). +> The series has been tested on the VisionFive-2-v1.2A and +> VisionFive-2-v1.3B board which equip with JH7110 SoC and works normally. +> +> For more information and support, you can visit RVspace wiki[1]. +> You can simply review or test the patches at the link [2]. +> + +**[v2: Add PLL clocks driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230316030514.137427-1-xingyu.wu@starfivetech.com/)** + +> This patch serises are to add PLL clocks driver and providers by writing +> and reading syscon registers for the StarFive JH7110 RISC-V SoC. +> +> PLL are high speed, low jitter frequency synthesizers in JH7110. +> Each PLL clocks work in integer mode or fraction mode by some dividers, +> and the dividers are set in several syscon registers. +> The formula for calculating frequency is: +> Fvco = Fref * (NI + NF) / M / Q1 +> + +**[v4: function_graph: Support recording and printing the return value of function](http://lore.kernel.org/linux-riscv/20230315133911.958741-1-pengdonglin@sangfor.com.cn/)** + +> When using the function_graph tracer to analyze system call failures, +> it can be time-consuming to analyze the trace logs and locate the kernel +> function that first returns an error. This change aims to simplify the +> process by recording the function return value to the 'retval' member of +> 'ftrace_graph_ent' and printing it when outputing the trace log. +> + +**[v1: Enable I2S support for RK3588/RK3588S SoCs](http://lore.kernel.org/linux-riscv/20230315114806.3819515-1-cristian.ciocaltea@collabora.com/)** + +> There are five I2S/PCM/TDM controllers and two I2S/PCM controllers embedded +> in the RK3588 and RK3588S SoCs. Furthermore, RK3588 provides four additional +> I2S/PCM/TDM controllers. +> +> This patch series adds the required device tree nodes to support all the above. +> + +**[v3: Add JH7110 USB and USB PHY driver support](http://lore.kernel.org/linux-riscv/20230315104411.73614-1-minda.chen@starfivetech.com/)** + +> This patchset adds USB driver and USB PHY for the StarFive JH7110 SoC. +> USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. +> The patch has been tested on the VisionFive 2 board. +> + +**[v3: Add JH7110 MIPI DPHY RX support](http://lore.kernel.org/linux-riscv/20230315100421.133428-1-changhuang.liang@starfivetech.com/)** + +> This patchset adds mipi dphy rx driver for the StarFive JH7110 SoC. +> It is used to transfer CSI camera data. The series has been tested on +> the VisionFive 2 board. +> + +**[v1: Add PTP support for sama7g5](http://lore.kernel.org/linux-riscv/20230315095053.53969-1-durai.manickamkr@microchip.com/)** + +> This patch series is intended to add PTP capability to the GEM and +> EMAC for sama7g5. +> + +**[v2: perf tools riscv: Add support for riscv lookup_binutils_path](http://lore.kernel.org/linux-riscv/20230315051500.13064-1-p4ranlee@gmail.com/)** + +> Add RISC-V binutils path on lookup triplets. +> + +**[v2: mm: Stop alaising VM_FAULT_HINDEX_MASK in arch code](http://lore.kernel.org/linux-riscv/20230315030359.14162-1-palmer@rivosinc.com/)** + +> When reviewing +> +> I noticed that the arch-specific VM_FAULT flags used by arm and s390 +> alias with VM_FAULT_HINDEX_MASK. I'm not sure if it's possible to +> manifest this as a bug, but it certainly seems fragile. +> +> I'm including that original patch this time in the hope that makes it +> easier for folks to review. There were some boring conflicts so I +> figured I'd rebase rather than pinging again. +> + +**[v6: StarFive's SDIO/eMMC driver support](http://lore.kernel.org/linux-riscv/20230315034853.93677-1-william.qiu@starfivetech.com/)** + +> This patchset adds initial rudimentary support for the StarFive +> designware mobile storage host controller driver. And this driver will +> be used in StarFive's VisionFive 2 board. The main purpose of adding +> this driver is to accommodate the ultra-high speed mode of eMMC. +> + +**[v1: RISCV: CANAAN: Make K210_SYSCTL depend on CLK_K210](http://lore.kernel.org/linux-riscv/20230314211030.3953195-1-Mr.Bossman075@gmail.com/)** + +> CLK_K210 is no longer a dependency of SOC_CANAAN, +> but K210_SYSCTL depends on CLK_K210. This patch makes K210_SYSCTL +> depend on CLK_K210. Also fix whitespace errors. +> + +**[v5: Add watchdog driver for StarFive JH7100/JH7110 RISC-V SoCs](http://lore.kernel.org/linux-riscv/20230314132437.121534-1-xingyu.wu@starfivetech.com/)** + +> This patch serises are to add watchdog driver for the StarFive +> JH7100 and JH7110 RISC-V SoCs. The first patch adds docunmentation to +> describe device tree bindings. The subsequent patch adds watchdog driver +> and support JH7100/JH7110 SoCs. And the last patch adds watchdog node in +> the JH7100 dts. And the addition of JH7110 device tree node will be +> submitted after the JH7110 dts merge. This patchset is based on 6.3-rc1. +> +> The watchdog driver has been tested on the VisionFive 1 and VisionFive 2 +> boards which equip with JH7100 and JH7110 SoCs respectively and both +> works normally. +> + +**[v3: Add new partial clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230314124404.117592-1-xingyu.wu@starfivetech.com/)** + +> This patch serises are to add new partial clock drivers and reset +> supports about System-Top-Group(STG), Image-Signal-Process(ISP) +> and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. +> +> Patches 1 to 3 are about the System-Top-Group clock and reset +> generator(STGCRG) part. +> The first patch adds docunmentation to describe STG bindings, and +> the second patch adds support about STG resets. The last patch adds +> clock driver to support STG clocks for JH7110. +> + +**[v3: Kconfig: Introduce HAS_IOPORT config option](http://lore.kernel.org/linux-riscv/20230314121216.413434-1-schnelle@linux.ibm.com/)** + +> Hello Kernel Hackers, +> +> Some platforms such as s390 do not support PCI I/O spaces. On such platforms +> I/O space accessors like inb()/outb() are stubs that can never actually work. +> The way these stubs are implemented in asm-generic/io.h leads to compiler +> warnings because any use will be a NULL pointer access on these platforms. In +> a previous patch we tried handling this with a run-time warning on access. This +> approach however was rejected by Linus[0] with the argument that this really +> should be a compile-time check and, though a much more invasive change, we +> believe that is indeed the right approach. +> + +**[v6: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230314050316.31701-1-jeeheng.sia@starfivetech.com/)** + +> This series adds RISC-V Hibernation/suspend to disk support. +> Low level Arch functions were created to support hibernation. +> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write +> cpu state onto the stack, then calling swsusp_save() to save the memory +> image. +> + +**[v1: riscv: Handle zicsr/zifencei issues between clang and binutils](http://lore.kernel.org/linux-riscv/20230313-riscv-zicsr-zifencei-fiasco-v1-1-dd1b7840a551@kernel.org/)** + +> There are two related issues that appear in certain combinations with +> clang and GNU binutils. +> +> The first occurs when a version of clang that supports zicsr or zifencei +> via '-march=' [1] (i.e, >= 17.x) is used in combination with a version +> of GNU binutils that do not recognize zicsr and zifencei in the +> '-march=' value (i.e., < 2.36): +> + +**[v3: RISC-V: support some cryptography accelerations](http://lore.kernel.org/linux-riscv/20230313191302.580787-1-heiko.stuebner@vrull.eu/)** + +> The base is v14 of the vector patchset but the first patches up to doing +> the Zbc-based GCM GHash can also run without those. Of course the vector- +> crypto extensions are also not ratified yet, hence the marking as RFC. +> + +**[v5: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230311090733.56918-1-hal.feng@starfivetech.com/)** + +> This patch series adds basic clock, reset & DT support for StarFive +> JH7110 SoC. +> +> You can simply review or test the patches at the link [1]. +> +> [1]: https://github.com/hal-feng/linux/commits/visionfive2-minimal +> + +#### 进程调度 + +**[v2: sched/fair: sanitize vruntime of entity being migrated](http://lore.kernel.org/lkml/20230317160810.107988-1-vincent.guittot@linaro.org/)** + +> Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed") +> fixes an overflowing bug, but ignore a case that se->exec_start is reset +> after a migration. +> + +**[v1: sched/core: Avoid selecting the task that is throttled to run when core-sched enable](http://lore.kernel.org/lkml/20230316081806.69544-1-jiahao.os@bytedance.com/)** + +> When {rt, cfs}_rq or dl task is throttled, since cookied tasks +> are not dequeued from the core tree, So sched_core_find() and +> sched_core_next() may return throttled task, which may +> cause throttled task to run on the CPU. +> + +**[v1: net/sched: use real_num_tx_queues in dev_watchdog()](http://lore.kernel.org/lkml/20230315183408.2723-1-praveen.kannoju@oracle.com/)** + +> Currently dev_watchdog() loops through num_tx_queues[Number of TX queues +> allocated at alloc_netdev_mq() time] instead of real_num_tx_queues +> [Number of TX queues currently active in device] to detect transmit +> queue time out. Make this efficient by using real_num_tx_queues. +> + +**[v1: sched/deadline: cpuset: Rework DEADLINE bandwidth restoration](http://lore.kernel.org/lkml/20230315121812.206079-1-juri.lelli@redhat.com/)** + +> Qais reported [1] that iterating over all tasks when rebuilding root +> domains for finding out which ones are DEADLINE and need their bandwidth +> correctly restored on such root domains can be a costly operation (10+ +> ms delays on suspend-resume). He proposed we skip rebuilding root +> domains for certain operations, but that approach seemed arch specific +> and possibly prone to errors, as paths that ultimately trigger a rebuild +> might be quite convoluted (thanks Qais for spending time on this!). +> + +**[v1: sched/rt: Reset sysctl_sched_rr_timeslice when it non-positive](http://lore.kernel.org/lkml/20230314031323.3638994-1-yajun.deng@linux.dev/)** + +> When sysctl_sched_rr_timeslice was set a non-positive number, only +> sched_rr_timeslice was reset to default, This behavior should let +> users know. +> +> So reset sysctl_sched_rr_timeslice at the same time when it +> non-positive. +> + +**[v1: sched/fair: Don't balance migration disabled tasks](http://lore.kernel.org/lkml/20230313065759.39698-1-yangyicong@huawei.com/)** + +> On load balance we didn't check whether the candidate task is migration +> disabled or not, this may hit the WARN_ON in set_task_cpu() since the +> migration disabled tasks are expected to run on their current CPU. +> + +**[v1: sched/fair: scale vruntime delta on migration](http://lore.kernel.org/lkml/20230313021442.115425-1-mathieu.desnoyers@efficios.com/)** + +> On migration, use the respective runqueue spread of the source and +> destination runqueues to scale the vruntime delta of the scheduling +> entity. +> +> The intent of this change is to prevent a task migrated from a very busy +> runqueue (with vruntime going fast) to a less busy runqueue (with a +> vruntime with a slower pace) to enqueue the migrated task far away at +> the very end of the runqueue, thus increasing the destination runqueue +> spread and preventing the enqueued task from being scheduled for a while +> until the vruntime reaches it. +> + +#### 内存管理 + +**[v1: convert read_kcore(), vread() to use iterators](http://lore.kernel.org/linux-mm/cover.1679183626.git.lstoakes@gmail.com/)** + +> While reviewing Baoquan's recent changes to permit vread() access to +> vm_map_ram regions of vmalloc allocations, Willy pointed out [1] that it +> would be nice to refactor vread() as a whole, since its only user is +> read_kcore() and the existing form of vread() necessitates the use of a +> bounce buffer. +> + +**[v8: Shadow stacks for userspace](http://lore.kernel.org/linux-mm/20230319001535.23210-1-rick.p.edgecombe@intel.com/)** + +> This series implements Shadow Stacks for userspace using x86's Control-flow +> Enforcement Technology (CET). CET consists of two related security features: +> shadow stacks and indirect branch tracking. This series implements just the +> shadow stack part of this feature, and just for userspace. +> + +**[v1: Refactor do_fault_around()](http://lore.kernel.org/linux-mm/cover.1679089214.git.lstoakes@gmail.com/)** + +> Refactor do_fault_around() to avoid bitwise tricks and arather difficult to +> follow logic. Additionally, prefer fault_around_pages to +> fault_around_bytes as the operations are performed at a base page +> granularity. +> + +**[v1: Add results of early memtest to /proc/meminfo](http://lore.kernel.org/linux-mm/CAH2-hcJicFJ0h76JzY2DoLNF+4Nk7vGtk8gQv8JWFikt6X-wfA@mail.gmail.com/)** + +> Currently the memtest results were only presented in dmesg. +> This adds /proc/meminfo entry which can be easily used by scripts. +> + +**[v1: mm/page_alloc: Make deferred page init free pages in MAX_ORDER blocks](http://lore.kernel.org/linux-mm/20230317153501.19807-1-kirill.shutemov@linux.intel.com/)** + +> Normal page init path frees pages during the boot in MAX_ORDER chunks, +> but deferred page init path does it in pageblock blocks. +> +> Change deferred page init path to work in MAX_ORDER blocks. +> + +**[v12: mm,kfence: decouple kfence from page granularity mapping judgement](http://lore.kernel.org/linux-mm/1679066974-690-1-git-send-email-quic_zhenhuah@quicinc.com/)** + +> Kfence only needs its pool to be mapped as page granularity, if it is +> inited early. Previous judgement was a bit over protected. From [1], Mark +> suggested to "just map the KFENCE region a page granularity". So I +> decouple it from judgement and do page granularity mapping for kfence +> pool only. Need to be noticed that late init of kfence pool still requires +> page granularity mapping. +> + +**[v3: ACPI: APEI: handle synchronous exceptions with proper si_code](http://lore.kernel.org/linux-mm/20230317072443.3189-1-xueshuai@linux.alibaba.com/)** + +> changes since v2 by addressing comments from Naoya: +> - rename mce_task_work to sync_task_work +> - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify() +> - add steps to reproduce this problem in cover letter +> - Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/T/#mb3dede6b7a6d189dc8de3cf9310071e38a192f8e +> + +**[v1: kvm: mmu: move the added page that exists in current lru list to its tail](http://lore.kernel.org/linux-mm/20230317064920.12700-1-jiangjianwen@uniontech.com/)** + +> If the added page existing in current lru list, it's better to move that +> page to the end of that list. This modification can prolong the lifecycle +> of activated page and decrease I/O requirements while memory is limited. +> + +**[v1: kfence, kcsan: avoid passing -g for tests](http://lore.kernel.org/linux-mm/20230316155104.594662-1-elver@google.com/)** + +> This is because `-g` defaults to the compiler debug info default. If the +> assembler does not support some of the directives used, the above errors +> occur. To fix, remove the explicit passing of `-g`. +> +> All these tests want is that stack traces print valid function names, +> and debug info is not required for that. I currently cannot recall why I +> added the explicit `-g`. +> + +**[v1: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES)](http://lore.kernel.org/linux-mm/20230316152618.711970-1-dhowells@redhat.com/)** + +> [NOTE! This patchset is a work in progress and some modules will not +> compile with it.] +> +> I've been looking at how to make pipes handle the splicing in of multipage +> folios and also looking to see if I could implement a suggestion from Willy +> that pipe_buffers could perhaps hold a list of pages (which could make +> splicing simpler - an entire splice segment would go in a single +> pipe_buffer). +> + +**[v11: mm,kfence: decouple kfence from page granularity mapping judgement](http://lore.kernel.org/linux-mm/1678979429-25815-1-git-send-email-quic_zhenhuah@quicinc.com/)** + +> Kfence only needs its pool to be mapped as page granularity, if it is +> inited early. Previous judgement was a bit over protected. From [1], Mark +> suggested to "just map the KFENCE region a page granularity". So I +> decouple it from judgement and do page granularity mapping for kfence +> pool only. Need to be noticed that late init of kfence pool still requires +> page granularity mapping. +> + +**[v10: mm,kfence: decouple kfence from page granularity mapping judgement](http://lore.kernel.org/linux-mm/1678969110-11941-1-git-send-email-quic_zhenhuah@quicinc.com/)** + +> Kfence only needs its pool to be mapped as page granularity, if it is +> inited early. Previous judgement was a bit over protected. From [1], Mark +> suggested to "just map the KFENCE region a page granularity". So I +> decouple it from judgement and do page granularity mapping for kfence +> pool only. Need to be noticed that late init of kfence pool still requires +> page granularity mapping. +> + +**[v1: Additional selftests for restrictedmem](http://lore.kernel.org/linux-mm/cover.1678926164.git.ackerleytng@google.com/)** + +> This is a series containing additional selftests for restrictedmem, +> prepared to be used with the next iteration of the restrictedmem +> series after v10. +> +> restrictedmem v10 is available at +> https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/. +> + +**[v1: mm/thp: Rename TRANSPARENT_HUGEPAGE_NEVER_DAX to _UNSUPPORTED](http://lore.kernel.org/linux-mm/20230315171642.1244625-1-peterx@redhat.com/)** + +> TRANSPARENT_HUGEPAGE_NEVER_DAX has nothing to do with DAX. It's set when +> has_transparent_hugepage() returns false, checked in hugepage_vma_check() +> and will disable THP completely if false. Rename it to reflect its real +> purpose. +> + +**[v19: splice, block: Use page pinning and kill ITER_PIPE](http://lore.kernel.org/linux-mm/20230315163549.295454-1-dhowells@redhat.com/)** + +> The first half of this patchset kills off ITER_PIPE to avoid a race between +> truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a +> bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes +> memory corruption[2]. Instead, we use filemap_splice_read(), which invokes +> the buffered file reading code and splices from the pagecache into the +> pipe; direct_splice_read(), which bulk-allocates a buffer, reads into it +> and then pushes the filled pages into the pipe; or handle it in +> filesystem-specific code. +> + +**[v1: splice: Convert longs and some ints into ssize_t](http://lore.kernel.org/linux-mm/295324.1678898094@warthog.procyon.org.uk/)** + +> Christoph Hellwig wrote: +> +> > The (pre-existing) long here is odd given that ->splice_read +> > returns a ssize_t. This might be a good time to fix that up. +> +> Here's a patch to do that. I'm not sure yet that I've got all the places that +> need changing as there are a couple of function pointer-taking functions where +> the pointed-to function return value should be changed. +> + +**[v1: Randomized slab caches for kmalloc()](http://lore.kernel.org/linux-mm/20230315095459.186113-1-gongruiqi1@huawei.com/)** + +> When exploiting memory vulnerabilities, "heap spraying" is a common +> technique targeting those related to dynamic memory allocation (i.e. the +> "heap"), and it plays an important role in a successful exploitation. +> Basically, it is to overwrite the memory area of vulnerable object by +> triggering allocation in other subsystems or modules and therefore +> getting a reference to the targeted memory location. It's usable on +> various types of vulnerablity including use after free (UAF), heap out- +> of-bound write and etc. +> + +**[v4: New page table range API](http://lore.kernel.org/linux-mm/20230315051444.3229621-1-willy@infradead.org/)** + +> This patchset changes the API used by the MM to set up page table entries. +> The four APIs are: +> set_ptes(mm, addr, ptep, pte, nr) +> update_mmu_cache_range(vma, addr, ptep, nr) +> flush_dcache_folio(folio) +> flush_icache_pages(vma, page, nr) +> +> flush_dcache_folio() isn't technically new, but no architecture +> implemented it, so I've done that for you. The old APIs remain around +> but are mostly implemented by calling the new interfaces. +> + +#### 安全增强 + +**[v1: next: drm/i915/uapi: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZBSu2QsUJy31kjSE@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated and we are +> moving towards adopting C99 flexible-array members instead. +> + +**[v1: next: wifi: carl9170: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZBSl2M+aGIO1fnuG@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated and we are +> moving towards adopting C99 flexible-array members instead. +> + +**[v1: next: uapi: target: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZBSchMvTdl7VObKI@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated and we are +> moving towards adopting C99 flexible-array members instead. +> +> This helps with the ongoing efforts to tighten the FORTIFY_SOURCE +> routines on memcpy() and help us make progress towards globally +> enabling -fstrict-flex-arrays=3 [1]. +> + +**[v1: mm/slub: reduce the calculation times of 'MAX_OBJS_PER_PAGE'](http://lore.kernel.org/linux-hardening/20230316012517.10479-1-gouhao@uniontech.com/)** + +> when calling calc_slab_order(), 'slub_min_order' +> and 'size' are fixed values, if the condition of +> 'MAX_OBJS_PER_PAGE' is true, it will be returned from +> here every time. +> +> So we can calculate the condition of 'MAX_OBJS_PER_PAGE' +> before calling calculate_order(). +> + +**[v5: x86_64: Improvements at compressed kernel stage](http://lore.kernel.org/linux-hardening/cover.1678785672.git.baskov@ispras.ru/)** + +> This patchset is aimed +> * to improve UEFI compatibility of compressed kernel code for x86_64 +> * to setup proper memory access attributes for code and rodata sections +> * to implement W^X protection policy throughout the whole execution +> of compressed kernel for EFISTUB code path. +> + +**[R: v1: Introduce per-interrupt kernel-stack randomization](http://lore.kernel.org/linux-hardening/414aee3992a54b6c933597bdbf9e0f71@intre.it/)** + +> > -----Messaggio originale----- +> > Da: Jere Viikari +> > A: Ornaghi Davide +> > ; keescook@chromium.org; +> > paulmck@kernel.org; nsaenzju@redhat.com; peterz@infradead.org; +> > bigeasy@linutronix.de; frederic@kernel.org; linux-hardening@vger.kernel.org; +> > linux-kernel@vger.kernel.org +> > +> > I am concerned about the disclaimer. When I replied, I had also to remove all +> > other information to ensure that I did not violate the terms. +> > +> + +**[R: v1: Introduce per-interrupt kernel-stack randomization](http://lore.kernel.org/linux-hardening/c2d598d5a11d4a29815a4eca63606159@intre.it/)** + +> Davide Ornaghi +> Offensive Security Specialist & Intrusion Analyst +> +> T. +39 039 28.45.774 +39 039 96.34.717 +> Intré Security - a venture of Intré S.r.l. +> www.intre.it +> + +#### 异步 IO + +**[v1: for-next: io_uring/kbuf: disallow mapping a badly aligned provided ring buffer](http://lore.kernel.org/io-uring/a0c3e328-badc-3f54-f7ff-b468a316a9d3@kernel.dk/)** + +> On at least parisc, we have strict requirements on how we virtually map +> an address that is shared between the application and the kernel. On +> these platforms, IOU_PBUF_RING_MMAP should be used when setting up a +> shared ring buffer for provided buffers. If the application is mapping +> these pages and asking the kernel to pin+map them as well, then we have +> no control over what virtual address we get in the kernel. +> + +**[[PATCH liburing for-next 0/2] fd msg-ring slot allocation tests](http://lore.kernel.org/io-uring/cover.1678968783.git.asml.silence@gmail.com/)** + +> Add a helper for fd msg-ring passing a file and auto allocating the +> target index, and test it. +> + +**[[v2 PATCH] io_uring: rsrc: Optimize return value variable 'ret'](http://lore.kernel.org/io-uring/20230317182538.3027-1-zeming@nfschina.com/)** + +> The initialization assignment of the variable ret is changed to 0, only +> in 'goto fail;' Use the ret variable as the function return value. +> + +**[v1: io_uring: rsrc: Optimize return value variable 'ret'](http://lore.kernel.org/io-uring/20230316181303.6583-1-zeming@nfschina.com/)** + +> The function returns here and returns ret directly. It may look better. +> + +**[v1: io_uring/sqpoll: Do not set PF_NO_SETAFFINITY on sqpoll threads](http://lore.kernel.org/io-uring/20230314183332.25834-1-mkoutny@suse.com/)** + +> Users may specify a CPU where the sqpoll thread would run. This may +> conflict with cpuset operations because of strict PF_NO_SETAFFINITY +> requirement. That flag is unnecessary for polling "kernel" threads, see +> the reasoning in commit 01e68ce08a30 ("io_uring/io-wq: stop setting +> PF_NO_SETAFFINITY on io-wq workers"). Drop the flag on poll threads too. +> + +**[v3: io_uring/ublk: add IORING_OP_FUSED_CMD](http://lore.kernel.org/io-uring/20230314125727.1731233-1-ming.lei@redhat.com/)** + +> Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to +> be SQE128. The 1st SQE(master) is one 64byte URING_CMD, and the 2nd +> 64byte SQE(slave) is another normal 64byte OP. For any OP which needs +> to support slave OP, io_issue_defs[op].fused_slave needs to be set as 1, +> and its ->issue() can retrieve/import buffer from master request's +> fused_cmd_kbuf. The slave OP is actually submitted from kernel, part of +> this idea is from Xiaoguang's ublk ebpf patchset, but this patchset +> submits slave OP just like normal OP issued from userspace, that said, +> SQE order is kept, and batching handling is done too. +> + +#### Rust For Linux + +**[v1: Rust version of the VGEM driver](http://lore.kernel.org/rust-for-linux/20230317121213.93991-1-mcanal@igalia.com/)** + +> This is my first take on using the DRM Rust abstractions [1] to convert a DRM +> driver, written originally in C, to Rust. This patchset consists of a conversion +> of the vgem driver to a DRM Rust driver. This new driver has the exactly same +> functionalities of the original C driver, but takes advantages of all the Rust +> features. +> + +**[v1: Rust pin-init API for pinned initialization of structs](http://lore.kernel.org/rust-for-linux/Bk4Yd1TBtgoLg2g_c37V3c_Wt30FMS89z7LrjnfadhDquwG_0dUGz1c_9BlMDmymg0tCACBpmCw-wZxlg4Jl4W2gkorh5P78ePgSnJVR5cU=@protonmail.com/)** + +> This series adds the pin-init API for initializing pinned structs in-place. +> It reduces the need for `unsafe` and streamlines initialization of structs. +> +> The first patch adds a utility macro `quote!` for proc-macros. This macro +> converts the typed characters directly into Rust tokens that are the output +> of proc-macros. It is used by the pin-init API. +> + +#### BPF + +**[v8: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230318053144.1180301-1-kuifeng@meta.com/)** + +> Major changes: +> +> - Create bpf_links in the kernel for BPF struct_ops to register and +> unregister it. +> +> - Enables switching between implementations of bpf-tcp-cc under a +> name instantly by replacing the backing struct_ops map of a +> bpf_link. +> + +**[v1: bpf-next: error checking where helpers call bpf_map_ops](http://lore.kernel.org/bpf/20230318011324.203830-1-inwardvessel@gmail.com/)** + +> Within bpf programs, the bpf helper functions can make inline calls to +> kernel functions. In this scenario there can be a disconnect between the +> register the kernel function writes a return value to and the register the +> bpf program uses to evaluate that return value. +> + +**[v1: bpf-next: BPF verifier rotating log](http://lore.kernel.org/bpf/20230317220351.2970665-1-andrii@kernel.org/)** + +> This patch set changes BPF verifier log behavior to behave as a rotating log, +> by default. If user-supplied log buffer is big enough to contain entire +> verifier log output, there is no effective difference. But where previously +> user supplied too small log buffer and would get -ENOSPC error result and the +> beginning part of the verifier log, now there will be no error and user will +> get ending part of verifier log filling up user-supplied log buffer. +> + +**[v2: bpf-next: bpf: Add detection of kfuncs.](http://lore.kernel.org/bpf/20230317201920.62030-1-alexei.starovoitov@gmail.com/)** + +> Allow BPF programs detect at load time whether particular kfunc exists. +> +> Patch 1: Allow ld_imm64 to point to kfunc in the kernel. +> Patch 2: Fix relocation of kfunc in ld_imm64 insn when kfunc is in kernel module. +> Patch 3: Introduce bpf_ksym_exists() macro. +> Patch 4: selftest. +> +> NOTE: detection of kfuncs from light skeleton is not supported yet. +> + +**[v2: bpf-next: selftests/bpf: add --json-summary option to test_progs](http://lore.kernel.org/bpf/20230317163256.3809328-1-chantr4@gmail.com/)** + +> Currently, test_progs outputs all stdout/stderr as it runs, and when it +> is done, prints a summary. +> +> It is non-trivial for tooling to parse that output and extract meaningful +> information from it. +> +> This change adds a new option, `--json-summary`/`-J` that let the caller +> specify a file where `test_progs{,-no_alu32}` can write a summary of the +> run in a json format that can later be parsed by tooling. +> + +**[v1: usermode_driver: Add management library and API](http://lore.kernel.org/bpf/20230317145240.363908-1-roberto.sassu@huaweicloud.com/)** + +> A User Mode Driver (UMD) is a specialization of a User Mode Helper (UMH), +> which runs a user space process from a binary blob, and creates a +> bidirectional pipe, so that the kernel can make a request to that process, +> and the latter provides its response. It is currently used by bpfilter, +> although it does not seem to do any useful work. +> + +**[v1: bpf-next: XDP-hints kfuncs for Intel driver igc](http://lore.kernel.org/bpf/167906343576.2706833.17489167761084071890.stgit@firesoul/)** + +> Implemented XDP-hints metadata kfuncs for Intel driver igc. +> +> Primarily used the tool in tools/testing/selftests/bpf/ xdp_hw_metadata, +> when doing driver development of these features. Recommend other driver +> developers to do the same. In the process xdp_hw_metadata was updated to +> help assist development. I've documented my practical experience with igc +> and tool here[1]. +> +> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org +> + +**[v1: bpf-next: selftests/bpf: Filter out preempt_count_ functions from kprobe_multi bench](http://lore.kernel.org/bpf/20230317114832.13622-1-laoar.shao@gmail.com/)** + +> It's caused by bench test attaching kprobe_multi link to preempt_count_sub +> function, which is not executed in rcu safe context so the kprobe handler +> on top of it will trigger the rcu warning. +> +> Filtering out preempt_count_ functions from the bench test. +> + +**[v2: net: xdp: don't call notifiers during driver init](http://lore.kernel.org/bpf/20230316220234.598091-1-kuba@kernel.org/)** + +> Drivers will commonly perform feature setting during init, if they use +> the xdp_set_features_flag() helper they'll likely run into an ASSERT_RTNL() +> inside call_netdevice_notifiers_info(). +> + +**[v3: bpf-next: xdp: recycle Page Pool backed skbs built from XDP frames](http://lore.kernel.org/bpf/20230313214300.1043280-1-aleksander.lobakin@intel.com/)** + +> Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway. +> +> __xdp_build_skb_from_frame() missed the moment when the networking stack +> became able to recycle skb pages backed by a page_pool. This was making +> e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was +> also affected in some scenarios. +> A lot of drivers use skb_mark_for_recycle() already, it's been almost +> two years and seems like there are no issues in using it in the generic +> code too. {__,}xdp_release_frame() can be then removed as it losts its +> last user. +> Page Pool becomes then zero-alloc (or almost) in the abovementioned +> cases, too. Other memory type models (who needs them at this point) +> have no changes. +> + +**[v2: bpf-next: Make struct bpf_cpumask RCU safe](http://lore.kernel.org/bpf/20230316054028.88924-1-void@manifault.com/)** + +> The struct bpf_cpumask type is currently not RCU safe. It uses the +> bpf_mem_cache_{alloc,free}() APIs to allocate and release cpumasks, and +> those allocations may be reused before an RCU grace period has elapsed. +> + +**[v1: module/decompress: Never use kunmap() for local un-mappings](http://lore.kernel.org/bpf/20230315125256.22772-1-fmdefrancesco@gmail.com/)** + +> Use kunmap_local() to unmap pages locally mapped with kmap_local_page(). +> +> kunmap_local() must be called on the kernel virtual address returned by +> kmap_local_page(), differently from how we use kunmap() which instead +> expects the mapped page as its argument. +> +> In module_zstd_decompress() we currently map with kmap_local_page() and +> unmap with kunmap(). This breaks the code and so it should be fixed. +> + +**[v4: net-next: add some detailed data when reading softnet_stat](http://lore.kernel.org/bpf/20230315092041.35482-1-kerneljasonxing@gmail.com/)** + +> Adding more detailed display of softnet_data when cating +> /proc/net/softnet_stat, which could help users understand more about +> which can be the bottlneck and then tune. +> +> Based on what we've dicussed in the previous mails, we could implement it +> in different ways, like put those display into separate sysfs file or add +> some tracepoints. Still I chose to touch the legacy file to print more +> useful data without changing some old data, say, length of backlog queues +> and time_squeeze. +> + +**[v1: tools/resolve_btfids: Add libsubcmd to .gitignore](http://lore.kernel.org/bpf/20230315054932.1639169-1-gthelen@google.com/)** + +> After building the kernel I see: +> $ git status -s +> ?? tools/bpf/resolve_btfids/libbpf/ +> +> Commit af03299d8536 ("tools/resolve_btfids: Install subcmd headers") +> started copying header files into +> tools/bpf/resolve_btfids/libsubcmd/include/subcmd. These *.h files are +> not covered by higher level wildcard gitignores. +> + +**[v1: net-next: virtio_net: refactor xdp codes](http://lore.kernel.org/bpf/20230315041042.88138-1-xuanzhuo@linux.alibaba.com/)** + +> Due to historical reasons, the implementation of XDP in virtio-net is relatively +> chaotic. For example, the processing of XDP actions has two copies of similar +> code. Such as page, xdp_page processing, etc. +> +> The purpose of this patch set is to refactor these code. Reduce the difficulty +> of subsequent maintenance. Subsequent developers will not introduce new bugs +> because of some complex logical relationships. +> + +**[v2: dwarves: Support for new btf_type_tag encoding](http://lore.kernel.org/bpf/20230314230417.1507266-1-eddyz87@gmail.com/)** + +> In recent discussion in BPF mailing list ([1], look for Solution #2) +> participants agreed to add a new DWARF representation for +> "btf_type_tag" annotations. +> +> Existing representation is DW_TAG_LLVM_annotation object attached as a +> child to a DW_TAG_pointer_type. It means that "btf_type_tag" +> annotation is attached to a pointee type. +> + +**[v1: bpf/for-next: cgroup: Make current_cgns_cgroup_dfl() safe to call after exit_task_namespace()](http://lore.kernel.org/bpf/ZBDuVWiFj2jiz3i8@slm.duckdns.org/)** + +> 332ea1f697be ("bpf: Add bpf_cgroup_from_id() kfunc") added +> bpf_cgroup_from_id() which calls current_cgns_cgroup_dfl() through +> cgroup_get_from_id(). However, BPF programs may be attached to a point where +> current->nsproxy has already been cleared to NULL by exit_task_namespace() +> and calling bpf_cgroup_from_id() would cause an oops. +> + +**[v1: bpf-next: bpf: Allow helpers access ptr_to_btf_id.](http://lore.kernel.org/bpf/20230313235845.61029-1-alexei.starovoitov@gmail.com/)** + +> Allow code like: +> bpf_strncmp(task->comm, 16, "foo"); +> + +### 周边技术动态 + +#### Qemu + +**[v3: for-8.1: target/riscv: rework CPU extensions validation](http://lore.kernel.org/qemu-devel/20230318200436.299464-1-dbarboza@ventanamicro.com/)** + +> This new version contains changes suggested by Weiwei Li. I've also +> reworked write_misa() to cover more cases. write_misa() is now able to +> properly enable RVG, RVV and RVE. +> +> A more in-depth description of what was attempted here can be found in +> [1]. Note that the current validation flow already prevents certain misa +> bits from being disabled (e.g. RVF) due to the presence of Z extensions +> that are already enabled in the hart, so I decided not to add extra +> logic to handle these cases. +> + +**[v1: disas/riscv: Add support for XThead* instructions](http://lore.kernel.org/qemu-devel/20230315133510.3511784-1-christoph.muellner@vrull.eu/)** + +> Support for emulating XThead* instruction has been added recently. +> This patch adds support for these instructions to the RISC-V disassembler. +> + +**[v1: riscv-to-apply queue](http://lore.kernel.org/qemu-devel/20230314063812.30450-1-alistair.francis@opensource.wdc.com/)** + +> The following changes since commit 284c52eec2d0a1b9c47f06c3eee46762c5fc0915: +> +> Merge tag 'win-socket-pull-request' of https://gitlab.com/marcandre.lureau/qemu into staging (2023-03-13 13:44:17 +0000) +> +> are available in the Git repository at: +> +> https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20230314 +> +> for you to fetch changes up to 0d581506de803204c5a321100afa270573382932: +> + +#### Buildroot + +**[package/stress-ng: bump to version V0.15.04](http://lore.kernel.org/buildroot/20230312214625.C78EA87007@busybox.osuosl.org/)** + +> commit: https://git.buildroot.net/buildroot/commit/?id=00553ea186357fd3e2b3c89fa560e9711cc67472 +> branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master +> +> This commit dropped the patch, included upstream in: +> https://github.com/ColinIanKing/stress-ng/commit/5d419c790e648c7a2f96f34ed1b93b326f725545 +> which was included in V0.14.04. +> +> Three patches are also introduced to fix build issues (all +> upstream not but not yet in version). +> +> Also, this new version now depends on BR2_TOOLCHAIN_HAS_SYNC_4. +> + +**[[branch/next] package/stress-ng: bump to version V0.15.04](http://lore.kernel.org/buildroot/20230312213225.AD3AF86FEC@busybox.osuosl.org/)** + +> commit: https://git.buildroot.net/buildroot/commit/?id=00553ea186357fd3e2b3c89fa560e9711cc67472 +> branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/next +> +> This commit dropped the patch, included upstream in: +> https://github.com/ColinIanKing/stress-ng/commit/5d419c790e648c7a2f96f34ed1b93b326f725545 +> which was included in V0.14.04. +> +> Three patches are also introduced to fix build issues (all +> upstream not but not yet in version). +> + +#### U-Boot + +**[v4: Basic StarFive JH7110 RISC-V SoC support](http://lore.kernel.org/u-boot/20230316025332.3297-1-yanhong.wang@starfivetech.com/)** + +> This series of patches base on the latest branch/master, and add support +> for the StarFive JH7110 RISC-V SoC and VisionFive V2 board. In order for +> this to be achieved, the respective DT nodes have been added, and the +> required defconfigs have been added to the boards' defconfig. What is more, +> the basic required DM drivers have been added, such as reset, clock, pinctrl, +> uart, ram etc. +> + +**["bootelf -p" loads every segement without checking its type](http://lore.kernel.org/u-boot/MA0P287MB0617180B862434E07B83A326B2BE9@MA0P287MB0617.INDP287.PROD.OUTLOOK.COM/)** + +> I am making a toy OS kernel on RISC-V platform. The kernel image I built is a ELF file that should be booted using U-Boot's bootelf command. However, I was getting such error when doing bootelf -p : +> +> Unhandled exception: Store/AMO access fault +> +> My kernel file is a 64-bit ELF file with the structure of: (the full readelf report is attached in the email) +> + +## 20230312:第 37 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v1: perf tools riscv: Add support for riscv lookup_binutils_path](http://lore.kernel.org/linux-riscv/20230311112122.28894-1-p4ranlee@gmail.com/)** + +> Add to know RISC-V binutils path. +> Secondarily, edit the code block with alphabetical order. +> + +**[v5: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230311090733.56918-1-hal.feng@starfivetech.com/)** + +> This patch series adds basic clock, reset & DT support for StarFive +> JH7110 SoC. +> +> You can simply review or test the patches at the link [1]. +> +> [1]: https://github.com/hal-feng/linux/commits/visionfive2-minimal +> + +**[v6: Add support for stacked/parallel memories](http://lore.kernel.org/linux-riscv/20230310173217.3429788-1-amit.kumar-mahapatra@amd.com/)** + +> This patch is in the continuation to the discussions which happened on +> 'commit f89504300e94 ("spi: Stacked/parallel memories bindings")' for +> adding dt-binding support for stacked/parallel memories. +> +> This patch series updated the spi-nor, spi core and the spi drivers +> to add stacked and parallel memories support. +> + +**[v3: vdso: Improve cmd_vdso_check to check all dynamic relocations](http://lore.kernel.org/linux-riscv/20230310190750.3323802-1-maskray@google.com/)** + +> The actual intention is that no dynamic relocation exists. However, some +> GNU ld ports produce unneeded R_*_NONE. (If a port fails to determine +> the exact .rel[a].dyn size, the trailing zeros become R_*_NONE +> relocations. E.g. ld's powerpc port recently fixed +> https://sourceware.org/bugzilla/show_bug.cgi?id=29540) R_*_NONE are +> generally no-op in the dynamic loaders. So just ignore them. +> + +**[v1: riscv: relocate R_RISCV_CALL_PLT in kexec_file](http://lore.kernel.org/linux-riscv/20230310182726.GA25154@lst.de/)** + +> Depending on the toolchain (here: gcc-12, binutils-2.40) the +> relocation entries for function calls are no longer R_RISCV_CALL, but +> R_RISCV_CALL_PLT. When trying kexec_load_file on such kernels, it will +> fail with +> + +**[v1: riscv: Kconfig: enable SCHED_MC kconfig](http://lore.kernel.org/linux-riscv/20230310110336.970985-1-suagrfillet@gmail.com/)** + +> RISC-V now builds the sched domain based on the simple possible map. +> +> Enable SCHED_MC to make the building based on cpu_coregroup_mask() +> which also takes care of the NUMA and cores with LLC. +> + +**[v7: riscv: Use PUD/P4D/PGD pages for the linear mapping](http://lore.kernel.org/linux-riscv/20230310094539.764357-1-alexghiti@rivosinc.com/)** + +> This patchset intends to improve tlb utilization by using hugepages for +> the linear mapping. +> +> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must +> take care of isolating the kernel text and rodata so that they are not +> mapped with a PUD mapping which would then assign wrong permissions to +> the whole region: it is achieved by introducing a new memblock API. +> + +**[v2: RISC-V: mm: Support huge page in vmalloc_fault()](http://lore.kernel.org/linux-riscv/20230310075021.3919290-1-dylan@andestech.com/)** + +> Since RISC-V supports ioremap() with huge page (pud/pmd) mapping, +> However, vmalloc_fault() assumes that the vmalloc range is limited +> to pte mappings. To complete the vmalloc_fault() function by adding +> huge page support. +> + +**[v1: Convert users of SOC_MICROCHIP_POLARFIRE to ARCH_MICROCHIP_POLARFIRE](http://lore.kernel.org/linux-riscv/20230309204452.969574-1-conor@kernel.org/)** + +> RISC-V's SOC_FOO symbols for micro-archs are going away, and being +> replaced with the more common ARCH_FOO pattern that is used by other +> archs (and by vendors with a history outside of RISC-V). +> Kick the conversion off by converting the Microchip RISC-V bits to use +> their replacement symbol +> There are no dependencies here, everything can go via subsystem trees. +> We've already added the replacement symbols to RISC-V's Kconfig bits. +> + +**[v1: riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode](http://lore.kernel.org/linux-riscv/20230308091639.602024-1-alexghiti@rivosinc.com/)** + +> When CONFIG_FRAME_POINTER is unset, the stack unwinding function +> walk_stackframe randomly reads the stack and then, when KASAN is enabled, +> + +**[v2: Add JH7110 USB driver support](http://lore.kernel.org/linux-riscv/20230308082800.3008-1-minda.chen@starfivetech.com/)** + +> This patchset adds USB driver for the StarFive JH7110 SoC. +> USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. +> The patch has been tested on the VisionFive 2 board. +> + +**[v5: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230308080612.122398-1-jeeheng.sia@starfivetech.com/)** + +> This series adds RISC-V Hibernation/suspend to disk support. +> Low level Arch functions were created to support hibernation. +> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write +> cpu state onto the stack, then calling swsusp_save() to save the memory +> image. +> + +**[v14: riscv, mm: detect svnapot cpu support at runtime](http://lore.kernel.org/linux-riscv/20230308074853.4393-1-panqinglin00@gmail.com/)** + +> Svnapot is a RISC-V extension for marking contiguous 4K pages as a non-4K +> page. This patch set is for using Svnapot in hugetlb fs and huge vmap. +> +> This patchset adds a Kconfig item for using Svnapot in +> "Platform type"->"SVNAPOT extension support". Its default value is on, +> and people can set it off if they don't allow kernel to detect Svnapot +> hardware support and leverage it. +> + +**[v1: Revert "riscv: Set more data to cacheinfo"](http://lore.kernel.org/linux-riscv/20230308064734.512457-1-suagrfillet@gmail.com/)** + +> There are some duplicate cache attributes populations executed +> in both ci_leaf_init() and later cache_setup_properties(). +> +> Revert the commit baf7cbd94b56 ("riscv: Set more data to cacheinfo") +> to setup only the level and type attributes at this early place. +> + +**[v4: irqchip/irq-sifive-plic: Add syscore callbacks for hibernation](http://lore.kernel.org/linux-riscv/20230308064643.24805-1-mason.huo@starfivetech.com/)** + +> The priority and enable registers of plic will be reset +> during hibernation power cycle in poweroff mode, +> add the syscore callbacks to save/restore those registers. +> + +**[v4: Add watchdog driver for StarFive JH7100/JH7110 RISC-V SoCs](http://lore.kernel.org/linux-riscv/20230308034036.99213-1-xingyu.wu@starfivetech.com/)** + +> This patch serises are to add watchdog driver for the StarFive +> JH7100 and JH7110 RISC-V SoCs. The first patch adds docunmentation to +> describe device tree bindings. The subsequent patch adds watchdog driver +> and support JH7100/JH7110 SoCs. And the last patch adds watchdog node in +> the JH7100 dts. And the addition of JH7110 device tree node will be +> submitted after the JH7110 dts merge. This patchset is based on 6.3-rc1. +> + +**[mailbox,soc: mpfs: add support for fallible services (was v3: Hey Jassi, all,)](http://lore.kernel.org/linux-riscv/d7c3ec51-8493-444a-bdec-2a30b0a15bdc@spud/)** + +> On Tue, Mar 07, 2023 at 08:22:50PM +0000, Conor Dooley wrote: +> > +> +> I botched $subject, I blame copy pasting the branch-description from +> lore and not double checking the output of --cover-from-description=auto +> + +**[v17: RISC-V IPI Improvements](http://lore.kernel.org/linux-riscv/20230307173231.2189275-1-apatel@ventanamicro.com/)** + +> This series aims to improve IPI support in Linux RISC-V in following ways: +> 1) Treat IPIs as normal per-CPU interrupts instead of having custom RISC-V +> specific hooks. This also makes Linux RISC-V IPI support aligned with +> other architectures. +> 2) Remote TLB flushes and icache flushes should prefer local IPIs instead +> of SBI calls whenever we have specialized hardware (such as RISC-V AIA +> IMSIC and RISC-V SWI) which allows S-mode software to directly inject +> IPIs without any assistance from M-mode runtime firmware. +> + +**[v5: Generic IPI sending tracepoint](http://lore.kernel.org/linux-riscv/20230307143558.294354-1-vschneid@redhat.com/)** + +> Detecting IPI *reception* is relatively easy, e.g. using +> trace_irq_handler_{entry,exit} or even just function-trace +> flush_smp_call_function_queue() for SMP calls. +> + +**[v1: RISC-V: enable rust](http://lore.kernel.org/linux-riscv/20230307102441.94417-1-conor.dooley@microchip.com/)** + +> After the authorship debacle on the RFC, I've tried to be even more +> careful this time around. Gary opted for a Co-developed-by in the replies +> of the RFC stuff, so I have given them one. +> I have added SoB's too, but if that is not okay Gary, then please scream +> loudly. +> +> As this is lifted from the state of the Rust-for-Linux tree, the commit +> messages from there cannot be preserved, so these patches have commit +> messages that I wrote. +> + +**[v5: StarFive's SDIO/eMMC driver support](http://lore.kernel.org/linux-riscv/20230307024646.10216-1-william.qiu@starfivetech.com/)** + +> This patchset adds initial rudimentary support for the StarFive +> designware mobile storage host controller driver. And this driver will +> be used in StarFive's VisionFive 2 board. The main purpose of adding +> this driver is to accommodate the ultra-high speed mode of eMMC. +> + +**[v1: RISC-V: Add basic support for the vector extension](http://lore.kernel.org/linux-riscv/20230306222321.1992900-1-conor@kernel.org/)** + +> I've started hitting this in CI while testing Andy's vector enablement +> series. I'm not entirely sure if there is more to do here, other than +> squeezing in the duplicate of what has been done for other extensions. +> + +**[v2: KVM: Refactor KVM stats macros and enable custom stat names](http://lore.kernel.org/linux-riscv/20230306190156.434452-1-dmatlack@google.com/)** + +> This series refactors the KVM stats macros to reduce duplication and +> adds the support for choosing custom names for stats. +> +> Custom name makes it possible to decouple the userspace-visible stat +> names from their internal representation in C. This can allow future +> commits to refactor the various stats structs without impacting +> userspace tools that read KVM stats. +> + +**[v5: spi: Add support for stacked/parallel memories](http://lore.kernel.org/linux-riscv/20230306172109.595464-1-amit.kumar-mahapatra@amd.com/)** + +> This patch is in the continuation to the discussions which happened on +> 'commit f89504300e94 ("spi: Stacked/parallel memories bindings")' for +> adding dt-binding support for stacked/parallel memories. +> +> This patch series updated the spi-nor, spi core and the spi drivers +> to add stacked and parallel memories support. +> + +**[v4: Add DMA driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230306140430.28951-1-walker.chen@starfivetech.com/)** + +> This patch series adds dma support for the StarFive JH7110 RISC-V +> SoC. The first patch adds device tree binding. The second patch includes +> dma driver. The last patch adds device node of dma to JH7110 dts. +> +> The series has been tested on the VisionFive 2 board which equip with +> JH7110 SoC and works normally. +> + +**[v14: Microchip Soft IP corePWM driver](http://lore.kernel.org/linux-riscv/20230306094858.1614819-1-conor.dooley@microchip.com/)** + +> v14 is rebased on top of v6.3-rc1. +> +> Uwe & I had a long back and forth about period calculations on v13, +> my ultimate conclusion being that, after some testing of the "corrected" +> calculation in hardware, the original calculation was correct. +> I think we had gotten sucked into discussion the calculation of the +> period itself, when we were in fact trying to calculate a bound on the +> period instead. That discussion is here: +> https://lore.kernel.org/linux-pwm/Y+ow8tfAHo1yv1XL@wendy/ +> + +#### 进程调度 + +**[v1: sched: EEVDF using latency-nice](http://lore.kernel.org/lkml/20230306132521.968182689@infradead.org/)** + +> Ever since looking at the latency-nice patches, I've wondered if EEVDF would +> not make more sense, and I did point Vincent at some older patches I had for +> that (which is here his augmented rbtree thing comes from). +> +> Also, since I really dislike the dual tree, I also figured we could dynamically +> switch between an augmented tree and not (and while I have code for that, +> that's not included in this posting because with the current results I don't +> think we actually need this). +> + +**[v2: sched/fair: sanitize vruntime of entity being migrated](http://lore.kernel.org/lkml/20230306132418.50389-1-zhangqiao22@huawei.com/)** + +> Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of +> entity being placed") fix an overflowing bug, but ignore +> a case that se->exec_start is reset after a migration. +> + +**[v1: sched: push force idled core_pick task to another cpu](http://lore.kernel.org/lkml/1678106502-58189-1-git-send-email-CruzZhao@linux.alibaba.com/)** + +> When a task with the max priority of its rq is force +> idled because of unmatched cookie, we'd better to find +> a suitable cpu for it to run as soon as possible, which +> is idle and cookie matched. In order to achieve this +> goal, we push the task in sched_core_balance(), after +> steal_cookie_task(). +> + +#### 内存管理 + +**[v4: mm: introduce Designated Movable Blocks](http://lore.kernel.org/linux-mm/20230311003855.645684-1-opendmb@gmail.com/)** + +> This is essentially a resubmission of v3 rebased with a +> rewritten cover letter to hopefully clarify the submission based +> on feedback and follow-on discussion. The individual patches +> have not materially changed. +> + +**[v3: use canonical ftrace path whenever possible](http://lore.kernel.org/linux-mm/20230310192050.4096886-1-zwisler@kernel.org/)** + +> v2 here: +> https://lore.kernel.org/linux-trace-kernel/20230215223350.2658616-1-zwisler@google.com/ +> + +**[v4: mm: process/cgroup ksm support](http://lore.kernel.org/linux-mm/20230310182851.2579138-1-shr@devkernel.io/)** + +> So far KSM can only be enabled by calling madvise for memory regions. To +> be able to use KSM for more workloads, KSM needs to have the ability to be +> enabled / disabled at the process / cgroup level. +> + +**[v1: Using MAP_SHARE_VALIDATE in mmap without fd](http://lore.kernel.org/linux-mm/20230310171617.wqnqs42l2viwjsz5@archlinux/)** + +> I have a rather simple question about the MAP_SHARED_VALIDATE flag in mmap. +> When used without a file pointer, EINVAL is returned. Is there a reason for this? +> I researched a bit but could not find anything. I attached a simple patch that adds MAP_SHARE_VALIDATE to the flags switch and checks for invalid flags. +> + +**[v1: io-mapping: Don't disable preempt on RT in io_mapping_map_atomic_wc().](http://lore.kernel.org/linux-mm/20230310162905.O57Pj7hh@linutronix.de/)** + +> io_mapping_map_atomic_wc() disables preemption and pagefaults for historical +> reasons. The conversion to io_mapping_map_local_wc(), which only disables +> migration, cannot be done wholesale because quite some call sites need to be +> updated to accommodate with the changed semantics. +> + +**[v1: mm: memory-failure: correct HWPOISON_INJECT config](http://lore.kernel.org/linux-mm/20230310133843.76883-1-wangkefeng.wang@huawei.com/)** + +> Use IS_ENABLED(CONFIG_HWPOISON_INJECT) to check whether or not to +> enable HWPoison injector module. +> + +**[v4: mm,kfence: decouple kfence from page granularity mapping judgement](http://lore.kernel.org/linux-mm/1678440604-796-1-git-send-email-quic_zhenhuah@quicinc.com/)** + +> Kfence only needs its pool to be mapped as page granularity, previous +> judgement was a bit over protected. Decouple it from judgement and do +> page granularity mapping for kfence pool only [1]. +> +> To implement this, also relocate the kfence pool allocation before the +> linear mapping setting up, arm64_kfence_alloc_pool is to allocate phys +> addr, __kfence_pool is to be set after linear mapping set up. +> +> LINK: [1] https://lore.kernel.org/linux-arm-kernel/1675750519-1064-1-git-send-email-quic_zhenhuah@quicinc.com/T/ +> + +**[v2: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230309093109.3039327-1-yosryahmed@google.com/)** + +> Upon running some proactive reclaim tests using memory.reclaim, we +> noticed some tests flaking where writing to memory.reclaim would be +> successful even though we did not reclaim the requested amount fully. +> Looking further into it, I discovered that *sometimes* we over-report +> the number of reclaimed pages in memcg reclaim. +> + +**[v17: splice, block: Use page pinning and kill ITER_PIPE](http://lore.kernel.org/linux-mm/20230308165251.2078898-1-dhowells@redhat.com/)** + +> The first half of this patchset kills off ITER_PIPE to avoid a race between +> truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a +> bio with unpinned/unref'ed pages from an O_DIRECT splice read. +> + +**[v16: splice, block: Use page pinning and kill ITER_PIPE](http://lore.kernel.org/linux-mm/20230308143754.1976726-1-dhowells@redhat.com/)** + +> The first half of this patchset kills off ITER_PIPE to avoid a race between +> truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a +> bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes +> memory corruption[2]. Instead, we use filemap_splice_read(), which invokes +> the buffered file reading code and splices from the pagecache into the +> pipe; direct_splice_read(), which bulk-allocates a buffer, reads into it +> and then pushes the filled pages into the pipe; or handle it in +> filesystem-specific code. +> + +**[v1: Prototype for direct map awareness in page allocator](http://lore.kernel.org/linux-mm/20230308094106.227365-1-rppt@kernel.org/)** + +> This is a third attempt to make page allocator aware of the direct map +> layout and allow grouping of the pages that must be unmapped from +> the direct map. +> + +**[v3: mm/damon/paddr: minor code improvement](http://lore.kernel.org/linux-mm/20230308083311.120951-1-wangkefeng.wang@huawei.com/)** + +> Unify folio_put() to make code more clear, and also fix minor issue in +> damon_pa_young(). +> + +**[v11: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230308032748.609510-1-nphamcs@gmail.com/)** + +> There is currently no good way to query the page cache state of large +> file sets and directory trees. There is mincore(), but it scales poorly: +> the kernel writes out a lot of bitmap data that userspace has to +> aggregate, when the user really doesn not care about per-page information +> in that case. The user also needs to mmap and unmap each file as it goes +> along, which can be quite slow as well. +> + +**[v1: mm/slub: Reduce memory consumption in extreme scenarios](http://lore.kernel.org/linux-mm/20230307082811.120774-1-chenjun102@huawei.com/)** + +> If call kmalloc_node with NO __GFP_THISNODE and node[A] with no memory. +> Slub will alloc a slub page which is not belong to A, and put the page +> to kmem_cache_node[page_to_nid(page)]. The page can not be reused +> at next calling, because NULL will be get from get_partical(). +> That make kmalloc_node consume more memory. +> + +**[v1: mm/oom_kill: don't kill exiting tasks in oom_kill_memcg_member](http://lore.kernel.org/linux-mm/20230307074808.235649-1-haifeng.xu@shopee.com/)** + +> If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member() +> to kill all processes in the memcg. When scanning tasks in memcg, maybe +> the provided task is marked as oom victim. Also, some tasks are likely +> to release their address space. There is no need to kill the exiting tasks. +> +> In order to handle these tasks which may free memory in the future, add +> a function helper reap_task_will_free_mem() to mark it oom victim and +> queue it in oom reaper. +> + +**[v1: mm: rmap: merge HugeTLB mapcount logic with THPs](http://lore.kernel.org/linux-mm/20230306230004.1387007-1-jthoughton@google.com/)** + +> HugeTLB pages may soon support being mapped with PTEs. To allow for this +> case, merge HugeTLB's mapcount scheme with THP's. +> +> The first patch of this series comes from the HugeTLB high-granularity +> mapping series[1], though with some updates, as the original version +> was buggy[2] and incomplete. +> + +#### 文件系统 + +**[v3: sunrpc: simplfy sysctl registrations](http://lore.kernel.org/linux-fsdevel/20230311233944.354858-1-mcgrof@kernel.org/)** + +> This is my v3 series to simplify sysctl registration for sunrpc. The +> first series was posted just yesterday [0] but 0-day found an issue with +> CONFIG_SUNRPC_DEBUG. After this fix I poasted a fix for v2 [1] but alas +> 0-day then found an issue when CONFIG_SUNRPC_DEBUG is disabled. This +> fixes both cases... hopefully that's it. +> + +**[v2: mm: hugetlb: move hugeltb sysctls to its own file](http://lore.kernel.org/linux-fsdevel/20230311074734.123269-1-wangkefeng.wang@huawei.com/)** + +> This moves all hugetlb sysctls to its own file, also kill an +> useless hugetlb_treat_movable_handler() since commit d6cb41cc44c6 +> ("mm, hugetlb: remove hugepages_treat_as_movable sysctl"). +> + +**[v1: s390: simplify sysctl registration](http://lore.kernel.org/linux-fsdevel/20230310234525.3986352-1-mcgrof@kernel.org/)** + +> s390 is the last architecture and one of the last users of +> register_sysctl_table(). It was last becuase it had one use case +> with dynamic memory allocation and it just required a bit more +> thought. +> + +**[v1: arm: simplify two-level sysctl registration for ctl_isa_vars](http://lore.kernel.org/linux-fsdevel/20230310233521.3971907-1-mcgrof@kernel.org/)** + +> There is no need to declare two tables to just create directories, +> this can be easily be done with a prefix path with register_sysctl(). +> +> Simplify this registration. +> + +**[v1: x86: simplify sysctl registrations](http://lore.kernel.org/linux-fsdevel/20230310233248.3965389-1-mcgrof@kernel.org/)** + +> These are trivial conversions to reduce more code and avoid API calls +> that we are deprecating [0]. +> +> [0] https://lore.kernel.org/all/20230310223947.3917711-1-mcgrof@kernel.org/T/#u +> + +**[v1: ppc: simplify sysctl registration](http://lore.kernel.org/linux-fsdevel/20230310232850.3960676-1-mcgrof@kernel.org/)** + +> We can simplify the way we do sysctl registration both by +> reducing the number of lines and also avoiding calllers which +> could do recursion. The docs are being updated to help reflect +> this better [0]. +> +> [0] https://lore.kernel.org/all/20230310223947.3917711-1-mcgrof@kernel.org/T/#u +> + +**[v1: ia64: simplify one-level sysctl registration for kdump_ctl_table](http://lore.kernel.org/linux-fsdevel/20230310232416.3958751-1-mcgrof@kernel.org/)** + +> There is no need to declare an extra tables to just create directory, +> this can be easily be done with a prefix path with register_sysctl(). +> +> Simplify this registration. +> + +**[v1: misc filesystems: simplify sysctl registration](http://lore.kernel.org/linux-fsdevel/20230310231206.3952808-1-mcgrof@kernel.org/)** + +> This simplifies syctl registration for a few misc filesystems according +> to our latest preference / guidance [0]. register_sysctl_table() incurs +> possible recursion and we can avoid that by dealing with flat +> directories with files in them, and having the subdirectories explicitly +> named with register_sysctl(). +> + +**[v1: xfs: simplify two-level sysctl registration for xfs_table](http://lore.kernel.org/linux-fsdevel/20230310230219.3948819-1-mcgrof@kernel.org/)** + +> There is no need to declare two tables to just create directories, +> this can be easily be done with a prefix path with register_sysctl(). +> +> Simplify this registration. +> + +**[v1: proc_sysctl: enhance documentation](http://lore.kernel.org/linux-fsdevel/20230310223947.3917711-1-mcgrof@kernel.org/)** + +> Expand documentation to clarify: +> +> o that paths don't need to exist for the new API callers +> o clarify that we *require* callers to keep the memory of +> the table around during the lifetime of the sysctls +> o annotate routines we are trying to deprecate and later remove +> + +**[git pull: common helper for kmap_local_page() users in local filesystems](http://lore.kernel.org/linux-fsdevel/20230310204431.GW3390869@ZenIV/)** + +> kmap_local_page() conversions in local filesystems keep running into +> kunmap_local_page()+put_page() combinations; we can keep inventing names +> for identical inline helpers, but it's getting rather inconvenient. I've added +> a trivial helper to linux/highmem.h instead. +> + +**[v2: mm: memory-failure: Move memory failure sysctls to its own file](http://lore.kernel.org/linux-fsdevel/20230310035709.16281-1-wangkefeng.wang@huawei.com/)** + +> The sysctl_memory_failure_early_kill and memory_failure_recovery +> are only used in memory-failure.c, move them to its own file. +> + +**[v1: filelocks: use mount idmapping for setlease permission check](http://lore.kernel.org/linux-fsdevel/20230309-generic_setlease-use-idmapping-v1-1-6c970395ac4d@kernel.org/)** + +> A user should be allowed to take out a lease via an idmapped mount if +> the fsuid matches the mapped uid of the inode. generic_setlease() is +> checking the unmapped inode uid, causing these operations to be denied. +> + +**[v11: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-fsdevel/20230309135718.1490461-1-usama.anjum@collabora.com/)** + +> These patches are based on next-20230307 and UFFD_FEATURE_WP_UNPOPULATED +> patches from Peter. +> +> *Changes in v11* +> - Rebase on top of next-20230307 +> - Base patches on UFFD_FEATURE_WP_UNPOPULATED (https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com) +> - Do a lot of cosmetic changes and review updates +> - Remove ENGAGE_WP + ! GET operation as it can be performed with UFFDIO_WRITEPROTECT +> + +**[v5: epoll: use refcount to reduce ep_mutex contention](http://lore.kernel.org/linux-fsdevel/323de732635cc3513c1837c6cbb98f012174f994.1678312201.git.pabeni@redhat.com/)** + +> The application is multi-threaded, creates a new epoll entry for +> each incoming connection, and does not delete it before the +> connection shutdown - that is, before the connection's fd close(). +> +> Many different threads compete frequently for the epmutex lock, +> affecting the overall performance. +> + +**[v1: MAINTAINERS: repair a malformed T: entry in IDMAPPED MOUNTS](http://lore.kernel.org/linux-fsdevel/20230308143640.9811-1-lukas.bulwahn@gmail.com/)** + +> The T: entries shall be composed of a SCM tree type (git, hg, quilt, stgit +> or topgit) and location. +> +> Add the SCM tree type to the T: entry and reorder the file entries in +> alphabetical order. +> + +#### 网络设备 + +**[v6: Create common DPLL/clock configuration API](http://lore.kernel.org/netdev/20230312022807.278528-1-vadfed@meta.com/)** + +> Implement common API for clock/DPLL configuration and status reporting. +> The API utilises netlink interface as transport for commands and event +> notifications. This API aim to extend current pin configuration and +> make it flexible and easy to cover special configurations. +> + +**[v2: net-next: net: dsa: mv88e6xxx: accelerate C45 scan](http://lore.kernel.org/netdev/20230311203132.156467-1-klaus.kudielka@gmail.com/)** + +> Starting with commit 1a136ca2e089 ("net: mdio: scan bus based on bus +> capabilities for C22 and C45"), mdiobus_scan_bus_c45() is being called on +> buses with MDIOBUS_NO_CAP. On a Turris Omnia (Armada 385, 88E6176 switch), +> this causes a significant increase of boot time, from 1.6 seconds, to 6.3 +> seconds. The boot time stated here is until start of /init. +> + +**[v1: net: phy: smsc: bail out in lan87xx_read_status if genphy_read_status fails](http://lore.kernel.org/netdev/026aa4f2-36f5-1c10-ab9f-cdb17dda6ac4@gmail.com/)** + +> If genphy_read_status fails then further access to the PHY may result +> in unpredictable behavior. To prevent this bail out immediately if +> genphy_read_status fails. +> + +**[v1: net-next: net: introduce budget_squeeze to help us tune rx behavior](http://lore.kernel.org/netdev/20230311163614.92296-1-kerneljasonxing@gmail.com/)** + +> When we encounter some performance issue and then get lost on how +> to tune the budget limit and time limit in net_rx_action() function, +> we can separately counting both of them to avoid the confusion. +> + +**[v1: net-next: net-sysfs: display two backlog queue len separately](http://lore.kernel.org/netdev/20230311151756.83302-1-kerneljasonxing@gmail.com/)** + +> Sometimes we need to know which one of backlog queue can be exactly +> long enough to cause some latency when debugging this part is needed. +> Thus, we can then separate the display of both. +> + +**[v1: net: wireless: wcn36xx: Add support for pronto-v3](http://lore.kernel.org/netdev/20230311150647.22935-1-sireeshkodali1@gmail.com/)** + +> Pronto-v3 is a WiFi remoteproc found on MSM8953 and other Qualcomm +> platforms. Support for booting the remoteproc has already been merged, +> however, due to a slight change in the register map between v2 and v3, +> the wcn36xx driver does not work on pronot-v3. This patch updates the +> register definitions to make wcn36xx work on pronto-v3 as well. +> + +**[v1: dt-bindings: pinctrl: ti-k3: Move k3.h to arch specific](http://lore.kernel.org/netdev/20230311131325.9750-1-nm@ti.com/)** + +> As discussed in [1], lets do some basic cleanups and move pin ctrl +> definitions to arch folder. +> +> Base: next-20230310 +> +> [1] https://lore.kernel.org/all/c4d53e9c-dac0-8ccc-dc86-faada324beba@linaro.org/ +> + +**[v1: nfc: trf7970a: mark OF related data as maybe unused](http://lore.kernel.org/netdev/20230311111328.251219-1-krzysztof.kozlowski@linaro.org/)** + +> The driver can be compile tested with !CONFIG_OF making certain data +> unused: +> +> drivers/nfc/trf7970a.c:2232:34: error: ‘trf7970a_of_match’ defined but not used [-Werror=unused-const-variable=] +> + +**[v2: RFC: rtw88: Add SDIO support](http://lore.kernel.org/netdev/20230310202922.2459680-1-martin.blumenstingl@googlemail.com/)** + +> Recently the rtw88 driver has gained locking support for the "slow" bus +> types (USB, SDIO) as part of USB support. Thanks to everyone who helped +> make this happen! +> +> Based on the USB work (especially the locking part and various +> bugfixes) this series adds support for SDIO based cards. It's the +> result of a collaboration between Jernej and myself. Neither of us has +> access to the rtw88 datasheets. All of our work is based on studying +> the RTL8822BS and RTL8822CS vendor drivers and trial and error. +> + +**[v1: net: tunnels: annotate lockless accesses to dev->needed_headroom](http://lore.kernel.org/netdev/20230310191109.2384387-1-edumazet@google.com/)** + +> IP tunnels can apparently update dev->needed_headroom +> in their xmit path. +> +> This patch takes care of three tunnels xmit, and also the +> core LL_RESERVED_SPACE() and LL_RESERVED_SPACE_EXTRA() +> helpers. +> + +**[v8: Add support for NXP bluetooth chipsets](http://lore.kernel.org/netdev/20230310181921.1437890-1-neeraj.sanjaykale@nxp.com/)** + +> This patch adds a driver for NXP bluetooth chipsets. +> +> The driver is based on H4 protocol, and uses serdev APIs. It supports host +> to chip power save feature, which is signalled by the host by asserting +> break over UART TX lines, to put the chip into sleep state. +> + +**[v1: wifi: mt76: mt7921e: Set memory space enable in PCI_COMMAND if unset](http://lore.kernel.org/netdev/20230310170002.200-1-mario.limonciello@amd.com/)** + +> When the BIOS has been configured for Fast Boot, systems with mt7921e +> have non-functional wifi. Turning on Fast boot caused both bus master +> enable and memory space enable bits in PCI_COMMAND not to get configured. +> + +**[v8: iommu/dma: s390 DMA API conversion and optimized IOTLB flushing](http://lore.kernel.org/netdev/20230310-dma_iommu-v8-0-2347dfbed7af@linux.ibm.com/)** + +> This patch series converts s390's PCI support from its platform specific DMA +> API implementation in arch/s390/pci/pci_dma.c to the common DMA IOMMU layer. +> The conversion itself is done in patches 3-4 with patch 2 providing the final +> necessary IOMMU driver improvement to handle s390's special IOTLB flush +> out-of-resource indication in virtualized environments. Patches 1-2 may be +> applied independently. The conversion itself only touches the s390 IOMMU driver +> and s390 arch code moving over remaining functions from the s390 DMA API +> implementation. No changes to common code are necessary. +> + +**[v4: net: bnxt_en: reset PHC frequency in free-running mode](http://lore.kernel.org/netdev/20230310151356.678059-1-vadfed@meta.com/)** + +> When using a PHC in shared between multiple hosts, the previous +> frequency value may not be reset and could lead to host being unable to +> compensate the offset with timecounter adjustments. To avoid such state +> reset the hardware frequency of PHC to zero on init. Some refactoring is +> needed to make code readable. +> + +**[v1: net-next: net: Extend address label support](http://lore.kernel.org/netdev/cover.1678448186.git.petrm@nvidia.com/)** + +> IPv4 addresses can be tagged with label strings. Unlike IPv6 addrlabels, +> which are used for prioritization of IPv6 addresses, these "ip address +> labels" are simply tags that the userspace can assign to IP addresses +> arbitrarily. +> +> IPv4 has had support for these tags since before Linux was tracked in GIT. +> However it has never been possible to change the label after it is once +> defined. This limits usefulness of this feature. A userspace that wants to +> change a label might drop and recreate the address, but that disrupts +> routing and is just impractical. +> + +**[RFC: Adding Microchip's LAN865x 10BASE-T1S MAC-PHY driver support to Linux](http://lore.kernel.org/netdev/076fbcec-27e9-7dc2-14cb-4b0a9331b889@microchip.com/)** + +> I would like to add Microchip's LAN865x 10BASE-T1S MAC-PHY driver +> support to Linux kernel. +> (Product link: https://www.microchip.com/en-us/product/LAN8650) +> +> The LAN8650 combines a Media Access Controller (MAC) and an Ethernet PHY +> to access 10BASE‑T1S networks. The common standard Serial Peripheral +> Interface (SPI) is used so that the transfer of Ethernet packets and +> LAN8650 control/status commands are performed over a single, serial +> interface. +> + +**[v3: net-next: net: hns3: support wake on lan configuration and query](http://lore.kernel.org/netdev/20230310081404.947-1-lanhao@huawei.com/)** + +> The HNS3 driver supports Wake-on-LAN, which can wake up +> the server from power off state to power on state by magic +> packet or magic security packet. +> +> ChangeLog: +> + +**[v1: mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep()](http://lore.kernel.org/netdev/20230310013144.970964-1-joel@joelfernandes.org/)** + +> The k[v]free_rcu() macro's single-argument form is deprecated. +> Therefore switch to the new k[v]free_rcu_mightsleep() variant. The goal +> is to avoid accidental use of the single-argument forms, which can +> introduce functionality bugs in atomic contexts and latency bugs in +> non-atomic contexts. +> + +#### 安全增强 + +**[v1: next: wifi: ath11k: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZAe5L5DtmsQxzqRH@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated and we are +> moving towards adopting C99 flexible-array members instead. +> + +**[v1: next: net/mlx4_en: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZAZ8mNbphtPyZWM6@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated and we are +> moving towards adopting C99 flexible-array members instead. +> +> Transform zero-length array into flexible-array member in struct +> mlx4_en_rx_desc. +> + +**[v1: next: netxen_nic: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZAZ57I6WdQEwWh7v@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated and we are +> moving towards adopting C99 flexible-array members instead. +> +> Transform zero-length array into flexible-array member in struct +> nx_cardrsp_rx_ctx_t. +> + +**[v1: next: platform/chrome: Replace fake flexible arrays with flexible-array member](http://lore.kernel.org/linux-hardening/ZAZUGBmSLc5wg7AK@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated and we are +> moving towards adopting C99 flexible-array members instead. +> +> Use the DECLARE_FLEX_ARRAY() helper macro to transform zero-length +> arrays in unions with flexible-array members. +> + +**[v1: next: rxrpc: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZAZT11n4q5bBttW0@work/)** + +> Zero-length arrays as fake flexible arrays are deprecated and we are +> moving towards adopting C99 flexible-array members instead. +> +> Transform zero-length array into flexible-array member in struct +> rxrpc_ackpacket. +> + +**[v7: arm64: dts: qcom: sm6125: UFS and xiaomi-laurel-sprout support](http://lore.kernel.org/linux-hardening/20230306170817.3806-1-they@mint.lgbt/)** + +> Introduce Universal Flash Storage support on SM6125 and add support for the Xiaomi Mi A3 based on the former platform. +> + +**[v1: VT: Protect KD_FONT_OP_GET_TALL from unbound access](http://lore.kernel.org/linux-hardening/20230306094921.tik5ewne4ft6mfpo@begin/)** + +> In ioctl(KD_FONT_OP_GET_TALL), userland tells through op->height which +> vpitch should be used to copy over the font. In con_font_get, we were +> not checking that it is within the maximum height value, and thus +> userland could make the vc->vc_sw->con_font_get(vc, &font, vpitch); +> call possibly overflow the allocated max_font_size bytes, and the +> copy_to_user(op->data, font.data, c) call possibly read out of that +> allocated buffer. +> + +#### 异步 IO + +**[v1: optimise local-tw task resheduling](http://lore.kernel.org/io-uring/cover.1678474375.git.asml.silence@gmail.com/)** + +> io_uring extensively uses task_work, but when a task is waiting +> for multiple CQEs it causes lots of rescheduling. This series +> is an attempt to optimise it and be a base for future improvements. +> + +**[v1: io_uring/uring_cmd: ensure that device supports IOPOLL](http://lore.kernel.org/io-uring/2349df76-0acb-0a56-bda1-2cb05aa55151@kernel.dk/)** + +> It's possible for a file type to support uring commands, but not +> pollable ones. Hence before issuing one of those, we should check +> that it is supported and error out upfront if it isn't. +> + +**[v2: liburing: sendzc test improvements](http://lore.kernel.org/io-uring/cover.1677993039.git.asml.silence@gmail.com/)** + +> Add affinity, multithreading and the server, and also fix TPC +> performance issues +> + +#### Rust For Linux + +**[v3: scripts: `make rust-analyzer` for out-of-tree modules](http://lore.kernel.org/rust-for-linux/20230307144233.205819-1-varmavinaym@gmail.com/)** + +> Adds support for out-of-tree rust modules to use the `rust-analyzer` +> make target to generate the rust-project.json file. +> + +**[v1: Rust DRM subsystem abstractions (& preview AGX driver)](http://lore.kernel.org/rust-for-linux/20230307-rust-drm-v1-0-917ff5bc80a8@asahilina.net/)** + +> This is my first take on the Rust abstractions for the DRM +> subsystem. It includes the abstractions themselves, some minor +> prerequisite changes to the C side, as well as the drm-asahi GPU driver +> (for reference on how the abstractions are used, but not necessarily +> intended to land together). +> + +**[v1: rust: virtio: add virtio support](http://lore.kernel.org/rust-for-linux/20230307130332.53029-1-daniel.almeida@collabora.com/)** + +> This patch adds virtIO support to the rust crate. This includes the +> capability to create a virtIO driver (through the module_virtio_driver +> macro and the respective Driver trait) as well as initial virtqueue +> support. +> + +**[v1: scripts: rust-analyzer: Skip crate module directories](http://lore.kernel.org/rust-for-linux/20230307120736.75492-1-nmi@metaspace.dk/)** + +> When generating rust-analyzer configuration, skip module directories. This fixes +> an issue that occur if we have +> +> - drivers/block/driver.rs +> - drivers/block/driver_mod/mod.rs +> +> If `driver_mod` is a module of the crate `driver`, the directory `driver_mod` +> may not contain `Makefile`, and `generate_rust_analyzer.py` will fail. +> + +#### BPF + +**[v2: bpf-next: Support stashing local kptrs with bpf_kptr_xchg](http://lore.kernel.org/bpf/20230310230743.2320707-1-davemarchevsky@fb.com/)** + +> Local kptrs are kptrs allocated via bpf_obj_new with a type specified in program +> BTF. A BPF program which creates a local kptr has exclusive control of the +> lifetime of the kptr, and, prior to terminating, must: +> +> * free the kptr via bpf_obj_drop +> * If the kptr is a {list,rbtree} node, add the node to a {list, rbtree}, +> thereby passing control of the lifetime to the collection +> +> This series adds a third option: +> +> * stash the kptr in a map value using bpf_kptr_xchg +> + +**[v1: kernel/module: add documentation for try_module_get()](http://lore.kernel.org/bpf/20230310190457.3779415-1-mcgrof@kernel.org/)** + +> There is quite a bit of tribal knowledge around proper use of try_module_get() +> and requiring *somehow* the module to still exist to use this call in a way +> that is safe. Document this bit of tribal knowledge. To be clear, you should +> only use try_module_get() *iff* you are 100% sure the module already does +> exist and is not on its way out. +> + +**[v1: libbpf: Explicitly call write to append content to file](http://lore.kernel.org/bpf/20230310150216.922-1-patteliu@gmail.com/)** + +> Write data to fd by calling "vdprintf", in most implementations +> of the standard library, the data is finally written by the writev syscall. +> But "uprobe_events/kprobe_events" does not allow segmented writes, +> so switch the "append_to_file" function to explicit write() call. +> + +**[v1: dwarves: dwarves: improve BTF encoder comparison method](http://lore.kernel.org/bpf/1678459850-16140-1-git-send-email-alan.maguire@oracle.com/)** + +> Currently when looking for function prototype mismatches with a view +> to excluding inconsistent functions, we fall back to a comparison +> between parameter names when the name and number of parameters match. +> This is brittle, as it is sometimes the case that a function has +> multiple type-identical definitions which use different parameters. +> + +**[v1: dwarves: syscall functions in BTF](http://lore.kernel.org/bpf/ZAsBYpsBV0wvkhh0@krava/)** + +> hi, +> with latest pahole fixes we get rid of some syscall functions (with +> __x64_sys_ prefix) and it seems to fall down to 2 cases: +> +> - weak syscall functions generated in kernel/sys_ni.c prevent these syscalls +> to be generated in BTF. +> + +**[v1: bpf-next: bpf: ensure state checkpointing at iter_next() call sites](http://lore.kernel.org/bpf/20230310060149.625887-1-andrii@kernel.org/)** + +> State equivalence check and checkpointing performed in is_state_visited() +> employs certain heuristics to try to save memory by avoiding state checkpoints +> if not enough jumps and instructions happened since last checkpoint. This leads +> to unpredictability of whether a particular instruction will be checkpointed +> and how regularly. While normally this is not causing much problems (except +> inconveniences for predictable verifier tests, which we overcome with +> BPF_F_TEST_STATE_FREQ flag), turns out it's not the case for open-coded +> iterators. +> + +**[v6: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230310043812.3087672-1-kuifeng@meta.com/)** + +> Major changes: +> +> - Create bpf_links in the kernel for BPF struct_ops to register and +> unregister it. +> +> - Enables switching between implementations of bpf-tcp-cc under a +> name instantly by replacing the backing struct_ops map of a +> bpf_link. +> + +**[v1: bpf: take into account liveness when propagating precision](http://lore.kernel.org/bpf/20230309224131.57449-1-andrii@kernel.org/)** + +> When doing state comparison, if old state has register that is not +> marked as REG_LIVE_READ, then we just skip comparison, regardless what's +> the state of corresponing register in current state. This is because not +> REG_LIVE_READ register is irrelevant for further program execution and +> correctness. All good here. +> + +**[v2: net-next:pull request: i40e: support XDP multi-buffer](http://lore.kernel.org/bpf/20230309212819.1198218-1-anthony.l.nguyen@intel.com/)** + +> Tirthendu Sarkar says: +> +> This patchset adds multi-buffer support for XDP. Tx side already has +> support for multi-buffer. This patchset focuses on Rx side. The last +> patch contains actual multi-buffer changes while the previous ones are +> preparatory patches. +> + +**[v2: enable bpf_prog_pack allocator for powerpc](http://lore.kernel.org/bpf/20230309180213.180263-1-hbathini@linux.ibm.com/)** + +> Most BPF programs are small, but they consume a page each. For systems +> with busy traffic and many BPF programs, this may also add significant +> pressure on instruction TLB. High iTLB pressure usually slows down the +> whole system causing visible performance degradation for production +> workloads. +> + +**[v2: bpf-next: selftests/bpf: use ifname instead of ifindex in XDP](http://lore.kernel.org/bpf/cover.1678382940.git.lorenzo@kernel.org/)** + +> Use interface name instead of interface index in XDP compliance test tool logs. +> Improve XDP compliance test tool error messages. +> + +**[v3: security: Always enable integrity LSM](http://lore.kernel.org/bpf/20230309085433.1810314-1-roberto.sassu@huaweicloud.com/)** + +> Since the integrity (including IMA and EVM) functions are currently always +> called by the LSM infrastructure, and always after all LSMs, formalize +> these requirements by introducing a new LSM ordering called LSM_ORDER_LAST, +> and set it for the 'integrity' LSM (patch 1). +> + +**[v1: bpf-next: selftests/bpf: make BPF_CFLAGS stricter with -Wall](http://lore.kernel.org/bpf/20230309054015.4068562-1-andrii@kernel.org/)** + +> Make BPF-side compiler flags stricter by adding -Wall. Fix tons of small +> issues pointed out by compiler immediately after that. That includes newly +> added bpf_for(), bpf_for_each(), and bpf_repeat() macros. +> + +**[v1: Revert "libbpf: Poison strlcpy()"](http://lore.kernel.org/bpf/20230309004836.2808610-1-jesussanp@google.com/)** + +> It added the pragma poison directive to libbpf_internal.h to protect +> against accidental usage of strlcpy but ended up breaking the build for +> toolchains based on libcs which provide the strlcpy() declaration from +> string.h (e.g. uClibc-ng). The include order which causes the issue is: +> + +**[v4: bpf-next: bpf: Refactor release_regno searching logic](http://lore.kernel.org/bpf/20230309004504.1153898-1-davemarchevsky@fb.com/)** + +> Kfuncs marked KF_RELEASE indicate that they release some +> previously-acquired arg. The verifier assumes that such a function will +> only have one arg reg w/ ref_obj_id set, and that that arg is the one to +> be released. Multiple kfunc arg regs have ref_obj_id set is considered +> an invalid state. +> + +**[v3: net: ixgbe: Panic during XDP_TX with > 64 CPUs](http://lore.kernel.org/bpf/20230308220756.587317-1-jjh@daedalian.us/)** + +> In commit 'ixgbe: let the xdpdrv work with more than 64 cpus' +> (4fe815850bdc), support was added to allow XDP programs to run on systems +> with more than 64 CPUs by locking the XDP TX rings and indexing them +> using cpu % 64 (IXGBE_MAX_XDP_QS). +> +> Upon trying this out patch via the Intel 5.18.6 out of tree driver +> on a system with more than 64 cores, the kernel paniced with an +> array-index-out-of-bounds at the return in ixgbe_determine_xdp_ring in +> ixgbe.h, which means ixgbe_determine_xdp_q_idx was just returning the +> cpu instead of cpu % IXGBE_MAX_XDP_QS. +> + +**[v5: bpf-next: BPF open-coded iterators](http://lore.kernel.org/bpf/20230308184121.1165081-1-andrii@kernel.org/)** + +> Add support for open-coded (aka inline) iterators in BPF world. This is a next +> evolution of gradually allowing more powerful and less restrictive looping and +> iteration capabilities to BPF programs. +> + +**[v3: bpf: xsk: Add missing overflow check in xdp_umem_reg](http://lore.kernel.org/bpf/20230308174013.1114745-1-kal.conley@dectris.com/)** + +> The number of chunks can overflow u32. Make sure to return -EINVAL on +> overflow. +> +> Also remove a redundant u32 cast assigning umem->npgs. +> + +**[v1: net-next: net: stmmac: call stmmac_finalize_xdp_rx() on a condition](http://lore.kernel.org/bpf/20230308162619.329372-1-lsahn@ooseel.net/)** + +> The current codebase calls the function no matter net device has XDP +> programs or not. So the finalize function is being called everytime when RX +> bottom-half in progress. It needs a few machine instructions for nothing +> in the case that XDP programs are not attached at all. +> + +**[v1: xsk: Add missing overflow check in xdp_umem_reg](http://lore.kernel.org/bpf/20230308105130.1113833-1-kal.conley@dectris.com/)** + +> The number of chunks can overflow u32. Make sure to return -EINVAL on +> overflow. +> + +**[v2: bpf-next: bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage](http://lore.kernel.org/bpf/20230308065936.1550103-1-martin.lau@linux.dev/)** + +> This set is to use bpf_mem_cache_alloc/free in bpf_local_storage. +> The primary motivation is to solve the deadlock/recursion issue +> when bpf_task_storage is used in a bpf tracing prog [1]. This set +> also comes with a micro-benchmark to test the storage creation. +> + +**[[PATCH net, stable v1 0/3] add checking sq is full inside xdp xmit](http://lore.kernel.org/bpf/20230308024935.91686-1-xuanzhuo@linux.alibaba.com/)** + +> If the queue of xdp xmit is not an independent queue, then when the xdp +> xmit used all the desc, the xmit from the __dev_queue_xmit() may encounter +> the following error. +> + +**[v4: net-next: udp: introduce __sk_mem_schedule() usage](http://lore.kernel.org/bpf/20230308021153.99777-1-kerneljasonxing@gmail.com/)** + +> Keep the accounting schema consistent across different protocols +> with __sk_mem_schedule(). Besides, it adjusts a little bit on how +> to calculate forward allocated memory compared to before. After +> applied this patch, we could avoid receive path scheduling extra +> amount of memory. +> + +**[v1: bpf-next: net: skbuff: skb bitfield compaction - bpf](http://lore.kernel.org/bpf/20230308003159.441580-1-kuba@kernel.org/)** + +> I'm trying to make more of the sk_buff bits optional. +> Move the BPF-accessed bits a little - because they must +> be at coding-time-constant offsets they must precede any +> optional bit. While at it clean up the naming a bit. +> + +**[[RFC/PATCHSET 0/9] perf record: Implement BPF sample filter (v4)](http://lore.kernel.org/bpf/20230307233309.3546160-1-namhyung@kernel.org/)** + +> There have been requests for more sophisticated perf event sample +> filtering based on the sample data. Recently the kernel added BPF +> programs can access perf sample data and this is the userspace part +> to enable such a filtering. +> +> This still has some rough edges and needs more improvements. But +> I'd like to share the current work and get some feedback for the +> directions and idea for further improvements. +> + +### 周边技术动态 + +#### Qemu + +**[v1: Add RISC-V vector cryptographic instruction set support](http://lore.kernel.org/qemu-devel/20230310160346.1193597-1-lawrence.hunter@codethink.co.uk/)** + +> NB: this is an update over the patch series submitted today (2023/03/10) at 09:11. It fixes some accidental mangling of commits 02, 04 and 08/45. +> +> This patchset provides an implementation for Zvkb, Zvkned, Zvknh, Zvksh, Zvkg, and Zvksed of the draft RISC-V vector cryptography extensions as per the 20230303 version of the specification(1) (1fcbb30). Please note that the Zvkt data-independent execution latency extension has not been implemented, and we would recommend not using these patches in an environment where timing attacks are an issue. +> + +**[v2: target/riscv: Add RVV registers to log](http://lore.kernel.org/qemu-devel/20230309135403.102703-1-ivan.klokov@syntacore.com/)** + +> Added QEMU option 'rvv' to add RISC-V RVV registers to log like regular regs. +> + +**[v5: target/riscv: refactor Zicond and reuse in XVentanaCondOps](http://lore.kernel.org/qemu-devel/20230307180708.302867-1-philipp.tomsich@vrull.eu/)** + +> After the original Zicond support was stuck/fell through the cracks on +> the mailing list at v3 (and a different implementation was merged in +> the meanwhile), we now refactor Zicond and then reuse it in +> XVentanaCondOps. +> + +**[v3: hw/riscv: Add ACT related support](http://lore.kernel.org/qemu-devel/20230307032915.10059-1-liweiwei@iscas.ac.cn/)** + +> ACT tests play an important role in riscv tests. This patch tries to +> add related support to run ACT tests. +> +> The port is available here: +> https://github.com/plctlab/plct-qemu/tree/plct-act-upstream-v3 +> + +**[v1: Sixth RISC-V PR for 8.0](http://lore.kernel.org/qemu-devel/20230306220259.7748-1-palmer@rivosinc.com/)** + +> The following changes since commit 2946e1af2704bf6584f57d4e3aec49d1d5f3ecc0: +> +> configure: Disable thread-safety warnings on macOS (2023-03-04 14:03:46 +0000) +> +> are available in the Git repository at: +> +> https://gitlab.com/palmer-dabbelt/qemu.git tags/pull-riscv-to-apply-20230306 +> + +**[v1: qemu: linux-user: Emulate /proc/cpuinfo output for riscv](http://lore.kernel.org/qemu-devel/167811752616.21558.7117682501860352029-0@git.sr.ht/)** + +> RISC-V does not expose all extensions via hwcaps, thus some userspace +> applications may want to query these via /proc/cpuinfo. +> +> Currently when querying this file the host's file is shown instead +> which is slightly confusing. Emulate a basic /proc/cpuinfo file +> with mmu info and an ISA sting. +> + +## 20230305:第 36 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v1: dt-bindings: yamllint: Require a space after a comment '#'](http://lore.kernel.org/linux-riscv/20230303214223.49451-1-robh@kernel.org/)** + +> Enable yamllint to check the prefered commenting style of requiring a +> space after a comment character '#'. Fix the cases in the tree which +> have a warning with this enabled. +> + +**[v5: RISC-V: Don't check text_mutex during stop_machine](http://lore.kernel.org/linux-riscv/20230303143754.4005217-1-conor.dooley@microchip.com/)** + +> We're currently using stop_machine() to update ftrace & kprobes, which +> means that the thread that takes text_mutex during may not be the same +> as the thread that eventually patches the code. +> + +**[v3: Add basic ACPI support for RISC-V](http://lore.kernel.org/linux-riscv/20230303133647.845095-1-sunilvl@ventanamicro.com/)** + +> This patch series enables the basic ACPI infrastructure for RISC-V. +> Supporting external interrupt controllers is in progress and hence it is +> tested using poll based HVC SBI console and RAM disk. +> + +**[GIT PULL: RISC-V Patches for the 6.3 Merge Window, Part 2](http://lore.kernel.org/linux-riscv/mhng-030bb7f3-2a9b-4061-8f41-e3e20c9b1671@palmer-ri-x1c9/)** + +> merged tag 'riscv-for-linus-6.3-mw1' +> The following changes since commit 01687e7c935ef70eca69ea2d468020bc93e898dc: +> +> Merge tag 'riscv-for-linus-6.3-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux (2023-02-25 11:14:08 -0800) +> + +**[v5: Add Ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230303085928.4535-1-samin.guo@starfivetech.com/)** + +> This series adds ethernet support for the StarFive JH7110 RISC-V SoC. +> The series includes MAC driver. The MAC version is dwmac-5.20 (from +> Synopsys DesignWare). For more information and support, you can visit +> RVspace wiki[1]. +> +> You can simply review or test the patches at the link [2]. +> + +**[v3: lib/test_string.c: Add strncmp() tests](http://lore.kernel.org/linux-riscv/20230302071934.254111-1-bjorn@kernel.org/)** + +> The RISC-V strncmp() fails on some inputs, see the linked thread for +> more details. It turns out there were no strncmp() calls in the self +> tests, this adds one. +> +> Reported-by: Heiko Stübner +> + +**[v6: riscv: Use PUD/P4D/PGD pages for the linear mapping](http://lore.kernel.org/linux-riscv/20230301082552.274331-1-alexghiti@rivosinc.com/)** + +> This patchset intends to improve tlb utilization by using hugepages for +> the linear mapping. +> +> base-commit-tag: v6.2-rc7 +> + +**[v3: Add RISC-V 32 NOMMU support](http://lore.kernel.org/linux-riscv/20230301002657.352637-1-Mr.Bossman075@gmail.com/)** + +> This patch-set aims to add NOMMU support to RV32. +> Many people want to build simple emulators or HDL +> models of RISC-V this patch makes it possible to +> run linux on them. +> +> Yimin Gu is the original author of this set. +> + +**[v1: RISC-V: T-Head vector handling](http://lore.kernel.org/linux-riscv/20230228215435.3366914-1-heiko@sntech.de/)** + +> As is widely known the T-Head C9xx cores used for example in the +> Allwinner D1 implement an older non-ratified variant of the vector spec. +> +> While userspace will probably have a lot more problems implementing +> support for both, on the kernel side the needed changes are actually +> somewhat small'ish and can be handled via alternatives somewhat nicely. +> + +**[v8: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230228154629.240541-1-alexghiti@rivosinc.com/)** + +> This new version gets rid of the limitation that prevented KASAN kernels +> to use the newly introduced parameters. +> +> While looking into KASLR, I fell onto commit aacd149b6238 ("arm64: head: +> avoid relocating the kernel twice for KASLR"): it allows to use the fdt +> functions very early in the boot process with KASAN enabled by simply +> compiling a new version of those functions without instrumentation. +> + +**[v1: riscv: support ELF format binaries in nommu mode](http://lore.kernel.org/linux-riscv/20230228135126.1686427-1-gerg@kernel.org/)** + +> The following changes add the ability to run ELF format binaries when +> running RISC-V in nommu mode. That support is actually part of the +> ELF-FDPIC loader, so these changes are all about making that work on +> RISC-V. +> + +**[v2: RISC-V: support some cryptography accelerations](http://lore.kernel.org/linux-riscv/20230228000544.2234136-1-heiko@sntech.de/)** + +> So this was my playground the last days. +> +> The base is v13 of the vector patchset but the first patches up to doing +> the Zbc-based GCM GHash can also run without those. Of course the vector- +> crypto extensions are also not ratified yet, hence the marking as RFC. +> +> As v13 of the vector patchset dropped the patches for in-kernel usage of +> vector instructions, I picked the ones from v12 over into this series +> for now. +> + +**[v5: hwmon: Add StarFive JH71X0 temperature sensor](http://lore.kernel.org/linux-riscv/20230227134125.120638-1-hal.feng@starfivetech.com/)** + +> This adds a driver for the temperature sensor on the JH7100 and JH7110, +> RISC-V SoCs by StarFive Technology Co. Ltd.. The JH7100 is used on the +> BeagleV Starlight board and StarFive VisionFive board. The JH7110 is +> used on the StarFive VisionFive 2 board. +> + +**[v3: Add DMA driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230227131042.16125-1-walker.chen@starfivetech.com/)** + +> This patch series adds dma support for the StarFive JH7110 RISC-V +> SoC. The first patch adds device tree binding. The second patch includes +> dma driver. The last patch adds device node of dma to JH7110 dts. +> +> The series has been tested on the VisionFive 2 board which equip with +> JH7110 SoC and works normally. +> + +**[v1: sched/doc: supplement CPU capacity with RISC-V](http://lore.kernel.org/linux-riscv/20230227105941.2749193-1-suagrfillet@gmail.com/)** + +> This commit 7d2078310cbf ("dt-bindings: arm: move cpu-capacity to a +> shared loation") updates some references about capacity-dmips-mhz +> property in this document. +> + +#### 进程调度 + +**[v2: sched/debug: Put sched/domains files under the verbose flag](http://lore.kernel.org/lkml/20230303183754.3076321-1-pauld@redhat.com/)** + +> The debug files under sched/domains can take a long time to regenerate, +> especially when updates are done one at a time. Move these files under +> the sched verbose debug flag. Allow changes to verbose to trigger +> generation of the files. This lets a user batch the updates but still +> have the information available. The detailed topology printk messages +> are also under verbose. + +**[v3: REBASE: sched/numa: Enhance vma scanning](http://lore.kernel.org/lkml/cover.1677672277.git.raghavendra.kt@amd.com/)** + +> The patchset proposes one of the enhancements to numa vma scanning +> suggested by Mel. This is continuation of [3]. +> +> Reposting the rebased patchset to akpm mm-unstable tree (March 1) +> +> Existing mechanism of scan period involves, scan period derived from +> per-thread stats. Process Adaptive autoNUMA [1] proposed to gather NUMA +> fault stats at per-process level to capture aplication behaviour better. +> + +**[v1: sched/core: Use do-while instead of for loop in set_nr_if_polling](http://lore.kernel.org/lkml/20230228161426.4508-1-ubizjak@gmail.com/)** + +> Use equivalent do-while loop instead of infinite for loop. +> +> There are no asm code changes. +> + +**[v3: sched/numa: Enhance vma scanning](http://lore.kernel.org/lkml/cover.1677557481.git.raghavendra.kt@amd.com/)** + +> The patchset proposes one of the enhancements to numa vma scanning +> suggested by Mel. This is continuation of [3]. +> +> Existing mechanism of scan period involves, scan period derived from +> per-thread stats. Process Adaptive autoNUMA [1] proposed to gather NUMA +> fault stats at per-process level to capture aplication behaviour better. +> + +#### 内存管理 + +**[v3: fold per-CPU vmstats remotely](http://lore.kernel.org/linux-mm/20230303195841.310844446@redhat.com/)** + +> By having vmstat_shepherd flush the per-CPU counters to the +> global counters from remote CPUs. +> +> This is done using cmpxchg to manipulate the counters, +> both CPU locally (via the account functions), +> and remotely (via cpu_vm_stats_fold). +> +> Thanks to Aaron Tomlin for diagnosing issue 1 and writing +> the initial patch series. +> + +**[v2: mm/damon/paddr: minor code improvement](http://lore.kernel.org/linux-mm/20230303084343.171958-1-wangkefeng.wang@huawei.com/)** + +> Unify folio_put() to make code more clear. +> + +**[v2: dma-buf: system_heap: avoid reclaim for order 4](http://lore.kernel.org/linux-mm/20230303050332.10138-1-jaewon31.kim@samsung.com/)** + +> Using order 4 pages would be helpful for IOMMUs mapping, but trying to +> get order 4 pages could spend quite much time in the page allocation. +> From the perspective of responsiveness, the deterministic memory +> allocation speed, I think, is quite important. +> + +**[v1: mm: compaction: limit illegal input parameters of compact_memory interface](http://lore.kernel.org/linux-mm/202303030844412743985@zte.com.cn/)** + +> Available only when CONFIG_COMPACTION is set. When 1 is written to +> the file, all zones are compacted such that free memory is available +> in contiguous blocks where possible. +> But echo others-parameter > compact_memory, this function will be +> triggered by writing parameters to the interface. +> + +**[v1: tmpfs: add the option to disable swap](http://lore.kernel.org/linux-mm/20230302232758.888157-1-mcgrof@kernel.org/)** + +> After a couple of RFCs I think this is ready for PATCH form. Review +> is appreciated. Below the changes I also list the series of tests +> I performed to verify correctness. In short you either create a fs +> with swap or without, but if you can't change that option later. +> If we really wanted to, we could work on accepting this change on +> reconfigure (remount) but its not clear yet that is desirable so +> for now keep things simple. +> + +**[v1: mm: teach mincore_hugetlb about pte markers](http://lore.kernel.org/linux-mm/20230302222404.175303-1-jthoughton@google.com/)** + +> By checking huge_pte_none(), we incorrectly classify PTE markers as +> "present". Instead, check huge_pte_none_mostly(), classifying PTE +> markers the same as if the PTE were completely blank. +> +> PTE markers, unlike other kinds of swap entries, don't reference any +> physical page and don't indicate that a physical page was mapped +> previously. As such, treat them as non-present for the sake of +> mincore(). +> + +**[v1: mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage](http://lore.kernel.org/linux-mm/20230302175423.589164-1-david@redhat.com/)** + +> Currently, we'd lose the userfaultfd-wp marker when PTE-mapping a huge +> zeropage, resulting in the next write faults in the PMD range +> not triggering uffd-wp events. +> +> Various actions (partial MADV_DONTNEED, partial mremap, partial munmap, +> partial mprotect) could trigger this. However, most importantly, +> un-protecting a single sub-page from the userfaultfd-wp handler when +> processing a uffd-wp event will PTE-map the shared huge zeropage and +> lose the uffd-wp bit for the remainder of the PMD. +> + +**[v1: -next: mm/damon/paddr: minor refactor of damon_pa_pageout()](http://lore.kernel.org/linux-mm/20230302144926.40012-1-wangkefeng.wang@huawei.com/)** + +> Omit two lines by converting if(!folio_isolate_lru()) to +> if(folio_isolate_lru()). +> + +**[v2: mm/debug_vm_pgtable: Replace pte_mkhuge() with arch_make_huge_pte()](http://lore.kernel.org/linux-mm/20230302114845.421674-1-anshuman.khandual@arm.com/)** + +> Since the following commit arch_make_huge_pte() should be used directly in +> generic memory subsystem as a platform provided page table helper, instead +> of pte_mkhuge(). Change hugetlb_basic_tests() to call arch_make_huge_pte() +> directly, and update its relevant documentation entry as required. +> + +**[v1: migrate_pages: silence gcc notes for mis-casting](http://lore.kernel.org/linux-mm/20230302012610.17055-1-ying.huang@intel.com/)** + +> The following GCC notes was reported for commit 64c8902ed441 +> ("migrate_pages: split unmap_and_move() to _unmap() and _move()"). +> + +**[v1: maple_tree: export symbol mas_preallocate()](http://lore.kernel.org/linux-mm/20230302011035.4928-1-dakr@redhat.com/)** + +> Fix missing EXPORT_SYMBOL_GPL() statement for mas_preallocate(). +> + +**[v3: kcov: improve documentation](http://lore.kernel.org/linux-mm/72be5c215c275f35891229b90622ed859f196a46.1677684837.git.andreyknvl@google.com/)** + +> Improve KCOV documentation: +> +> - Use KCOV instead of kcov, as the former is more widely-used. +> +> - Mention Clang in compiler requirements. +> +> - Use ``annotations`` for inline code. +> +> - Rework remote coverage collection documentation for better clarity. +> +> - Various smaller changes. +> + +**[v5: mm: ioremap: Convert architectures to take GENERIC_IOREMAP way](http://lore.kernel.org/linux-mm/20230301034247.136007-1-bhe@redhat.com/)** + +> Motivation and implementation: +> In this patchset, firstly introduce generic_ioremap_prot() and +> generic_iounmap() to extract the generic codes for GENERIC_IOREMAP. +> By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(), +> generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap() +> and iounmap() are all visible and available to arch. Arch needs to +> provide wrapper functions to override the generic version if there's +> arch specific handling in its corresponding ioremap_prot(), ioremap() +> or iounmap(). With these changes, duplicated ioremap/iounmap() code uder +> ARCH-es are removed, and the equivalent functioality is kept as before. +> + +**[v3: New page table range API](http://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/)** + +> This patchset changes the API used by the MM to set up page table entries. +> The four APIs are: +> set_ptes(mm, addr, ptep, pte, nr) +> update_mmu_cache_range(vma, addr, ptep, nr) +> flush_dcache_folio(folio) +> flush_icache_pages(vma, page, nr) +> +> flush_dcache_folio() isn't technically new, but no architecture +> implemented it, so I've done that for you. The old APIs remain around +> but are mostly implemented by calling the new interfaces. +> + +**[v1: tomoyo: replace tomoyo_round2() with kmalloc_size_roundup()](http://lore.kernel.org/linux-mm/20230228093556.19027-1-vbabka@suse.cz/)** + +> It seems tomoyo has had its own implementation of what +> kmalloc_size_roundup() does today. Remove the function tomoyo_round2() +> and replace it with kmalloc_size_roundup(). It provides more accurate +> results and doesn't contain a while loop. +> + +**[v2: bpf-next: mm/bpf/perf: Store build id in inode object](http://lore.kernel.org/linux-mm/20230228093206.821563-1-jolsa@kernel.org/)** + +> hi, +> this is RFC patchset for adding build id under inode's object. +> +> The main change to previous post [1] is to use inode object instead of file +> object for build id data. +> +> However.. ;-) while using inode as build id storage place saves some memory +> by keeping just one copy of the build id for all file instances, there seems +> to be another problem. +> + +**[v1: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230228085002.2592473-1-yosryahmed@google.com/)** + +> Reclaimed pages through other means than LRU-based reclaim are tracked +> through reclaim_state in struct scan_control, which is stashed in +> current task_struct. These pages are added to the number of reclaimed +> pages through LRUs. For memcg reclaim, these pages generally cannot be +> linked to the memcg under reclaim and can cause an overestimated count +> of reclaimed pages. This short series tries to address that. +> + +**[v2: mm/uffd: UFFD_FEATURE_WP_UNPOPULATED](http://lore.kernel.org/linux-mm/20230227230044.1596744-1-peterx@redhat.com/)** + +> This is a new feature that controls how uffd-wp handles none ptes. When +> it's set, the kernel will handle anonymous memory the same way as file +> memory, by allowing the user to wr-protect unpopulated ptes. +> + +#### 文件系统 + +**[v1: security: Move IMA and EVM to the LSM infrastructure](http://lore.kernel.org/linux-fsdevel/20230303181842.1087717-1-roberto.sassu@huaweicloud.com/)** + +> This patch set depends on: +> - https://lore.kernel.org/linux-integrity/20221201104125.919483-1-roberto.sassu@huaweicloud.com/ (there will be a v8 shortly) +> - https://lore.kernel.org/linux-security-module/20230217032625.678457-1-paul@paul-moore.com/ +> +> IMA and EVM are not effectively LSMs, especially due the fact that in the +> past they could not provide a security blob while there is another LSM +> active. +> + +**[v1: folio_copy_tail](http://lore.kernel.org/linux-fsdevel/20230303064315.701090-1-willy@infradead.org/)** + +> I'm trying to make it easy & efficient for a filesystem to read its file +> tails into a folio. iomap's implementation was pretty good, but had +> some limitations (eg tails couldn't cross a page boundary). +> +> This should be an all-singing, all-dancing implementation which copies +> the correct part of the buffer into the correct part of the folio and +> zeroes the remainder of the folio. It should work with highmem, but +> the calculations are a bit tricky and I may have got something wrong. +> + +**[cifs test patch to make cifs use its own version of write_cache_pages()](http://lore.kernel.org/linux-fsdevel/522532.1677800499@warthog.procyon.org.uk/)** + +> Here's my patch to give cifs its own copy of write_cache_pages() so that the +> function pointer can be eliminated in case some sort of spectre thing is +> causing a slowdown. +> +> This goes on top of "cifs test patch to convert to using write_cache_pages()". +> + +**[v1: sysctl: deprecate register_sysctl_paths()](http://lore.kernel.org/linux-fsdevel/20230302202826.776286-1-mcgrof@kernel.org/)** + +> As we trim down the insane kernel/sysctl.c large array and move +> sysctls out we're looking to optimize the way we do syctl registrations +> so we deal with just flat entries so to make the registration code +> much easier to maintain and so it does not recurse. In dealing with +> some of these things it reminded us that we will eventually get to the +> point of just passing in the ARRAY_SIZE() we want, to get there we +> should strive to move away from the older callers that do need the +> recursion. +> + +**[v1: printk: serial: 8250: implement non-BKL console](http://lore.kernel.org/linux-fsdevel/87wn3zsz5x.fsf@jogness.linutronix.de/)** + +> Implement the necessary callbacks to allow the 8250 console driver +> to perform as a non-BKL console. Remove the implementation for the +> legacy console callback (write) and add implementations for the +> non-BKL consoles (write_atomic, write_thread, port_lock) and add +> CON_NO_BKL to the initial flags. +> + +**[v1: fs/ceph/mds_client: ignore responses for waiting requests](http://lore.kernel.org/linux-fsdevel/20230302130650.2209938-1-max.kellermann@ionos.com/)** + +> If a request is put on the waiting list, its submission is postponed +> until the session becomes ready (e.g. via `mdsc->waiting_for_map` or +> `session->s_waiting`). If a `CEPH_MSG_CLIENT_REPLY` happens to be +> received before `CEPH_MSG_MDS_MAP`, the request gets freed, and then +> this assertion fails: +> +> WARN_ON_ONCE(!list_empty(&req->r_wait)); +> + +**[v1: kernfs: Introduce separate rwsem to protect inode](http://lore.kernel.org/linux-fsdevel/20230302043203.1695051-1-imran.f.khan@oracle.com/)** + +> This change set is consolidating the changes discussed and/or mentioned +> in [1] and [2]. I have not received any feedback about any of the +> patches included in this change set, so I am rebasing them on current +> linux-next tip and bringing them all in one place. +> + +**[v1: userfaultfd: move unprivileged_userfaultfd sysctl to its own file](http://lore.kernel.org/linux-fsdevel/20230301100627.3505739-1-zhangpeng362@huawei.com/)** + +> The sysctl_unprivileged_userfaultfd is part of userfaultfd, move it to +> its own file. +> + +**[v1: erofs: support for mounting a single block device with multiple devices](http://lore.kernel.org/linux-fsdevel/20230301070417.13084-1-zhujia.zj@bytedance.com/)** + +> In order to support mounting multi-layer container image as a block +> device, add single block device with multiple devices feature for EROFS. +> +> In this mode, all meta/data contents will be mapped into one block address. +> User could directly mount the block device by EROFS. +> + +**[v2: hostfs: handle idmapped mounts](http://lore.kernel.org/linux-fsdevel/20230301015002.2402544-1-development@efficientek.com/)** + +> Let hostfs handle idmapped mounts. This allows to have the same hostfs +> mount appear in multiple locations with different id mappings. +> + +**[v1: GDB VFS utils](http://lore.kernel.org/linux-fsdevel/cover.1677631565.git.development@efficientek.com/)** + +> I've created a couple GDB convenience functions that I found useful when +> debugging some VFS issues and figure others might find them useful. For +> instance, they are useful in setting conditional breakpoints on VFS +> functions where you only care if the dentry path is a certain value. I +> took the opportunity to create a new "vfs" python module to give VFS +> related utilities a home. +> + +**[GIT PULL: xfs: moar new code for 6.3](http://lore.kernel.org/linux-fsdevel/167762780388.3622158.16184008545274432486.stg-ugh@magnolia/)** + +> Please pull this branch with changes for xfs for 6.3-rc1. This second +> pull request contains a fix for a deadlock in the allocator. It +> continues the slow march towards being able to offline AGs, and it +> refactors the interface to the xfs allocator to be less indirection +> happy. +> + +**[v2: splice: Prevent gifting of multipage folios](http://lore.kernel.org/linux-fsdevel/2740801.1677513063@warthog.procyon.org.uk/)** + +> Don't let parts of compound pages/multipage folios be gifted by (vm)splice +> into a pipe as the other end may only be expecting single-page gifts (fuse +> and virtio console for example). +> +> replace_page_cache_folio(), for example, will do the wrong thing if it +> tries to replace a single paged folio with a multipage folio. +> + +#### 网络设备 + +**[v1: nfc: change order inside nfc_se_io error path](http://lore.kernel.org/netdev/20230304164844.133931-1-pchelkin@ispras.ru/)** + +> cb_context should be freed on error paths in nfc_se_io as stated by commit +> 25ff6f8a5a3b ("nfc: fix memory leak of se_io context in nfc_genl_se_io"). +> +> Make the error path in nfc_se_io unwind everything in reverse order, i.e. +> free the cb_context after unlocking the device. +> + +**[v1: net: dsa: mt7530: move PLL setup out of port 6 pad configuration](http://lore.kernel.org/netdev/20230304125453.53476-1-arinc.unal@arinc9.com/)** + +> Move the PLL setup of the MT7530 switch out of the pad configuration of +> port 6 to mt7530_setup, after reset. +> +> This fixes the improper initialisation of the switch when only port 5 is +> used as a CPU port. +> + +**[v4: netdevice: use ifmap instead of plain fields](http://lore.kernel.org/netdev/20230304122432.265902-1-vincenzopalazzodev@gmail.com/)** + +> clean the code by using the ifmap instead of plain fields, +> and avoid code duplication. +> +> v4 with some build error that the 0 day bot found while +> compiling some drivers that I was not able to build on +> my machine. +> + +**[v2: net: dpaa2-mac: Get serdes only for backplane links](http://lore.kernel.org/netdev/20230304003159.1389573-1-sean.anderson@seco.com/)** + +> When commenting on what would become commit 085f1776fa03 ("net: dpaa2-mac: +> add backplane link mode support"), Ioana Ciornei said [1]: +> +> > ...DPMACs in TYPE_BACKPLANE can have both their PCS and SerDes managed +> > by Linux (since the firmware is not touching these). That being said, +> > DPMACs in TYPE_PHY (the type that is already supported in dpaa2-mac) can +> > also have their PCS managed by Linux (no interraction from the +> > firmware's part with the PCS, just the SerDes). +> + +**[v1: nf-next: netfilter: handle ipv6 jumbo packets properly for bridge ovs and tc](http://lore.kernel.org/netdev/cover.1677888566.git.lucien.xin@gmail.com/)** + +> Currently pskb_trim_rcsum() is always done on the RX path. However, IPv6 +> jumbo packets hide the real packet len in the Hop-by-hop option header, +> which should be parsed before doing the trim. +> + +**[v6: Another crack at a handshake upcall mechanism](http://lore.kernel.org/netdev/167786872946.7199.12490725847535629441.stgit@91.116.238.104.host.secureserver.net/)** + +> Here is v6 of a series to add generic support for transport layer +> security handshake on behalf of kernel socket consumers (user space +> consumers use a security library directly, of course). A summary of +> the purpose of these patches is archived here: +> +> https://lore.kernel.org/netdev/1DE06BB1-6BA9-4DB4-B2AA-07DE532963D6@oracle.com/ +> + +**[v1: bpf-next: selftests/bpf: use ifname instead of ifindex in XDP compliance test tool](http://lore.kernel.org/netdev/5d11c9163490126fdc391dacb122480e4c059e62.1677863821.git.lorenzo@kernel.org/)** + +> Rely on interface name instead of interface index in error messages or logs +> from XDP compliance test tool. +> Improve XDP compliance test tool error messages. +> + +**[v2: Up until now, there was no way to let the user select the layer at which time stamping occurs. The stack assumed that PHY time stamping is always preferred, but some MAC/PHY combinations were buggy.](http://lore.kernel.org/netdev/20230303164248.499286-1-kory.maincent@bootlin.com/)** + +> This series aims to allow the user to select the desired layer +> administratively. +> +> This patch is broken out for review, but it will eventually be +> squashed into Patch 3 after comments come in. +> + +**[v1: net: phylib: get rid of unnecessary locking](http://lore.kernel.org/netdev/E1pY8Pq-00D0sw-NY@rmk-PC.armlinux.org.uk/)** + +> The locking in phy_probe() and phy_remove() does very little to prevent +> any races with e.g. phy_attach_direct(), but instead causes lockdep ABBA +> warnings. Remove it. +> + +**[v1: netdevice: use ifmap isteand of plain fields](http://lore.kernel.org/netdev/20230303150818.132386-1-vincenzopalazzodev@gmail.com/)** + +> clean the code by using the ifmap instead of plain fields, +> and avoid code duplication. +> +> P.S: I'm giving credit to the author of the FIXME commit. +> + +**[v2: bpf-next: xdp: recycle Page Pool backed skbs built from XDP frames](http://lore.kernel.org/netdev/20230303133232.2546004-1-aleksander.lobakin@intel.com/)** + +> Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway. +> +> __xdp_build_skb_from_frame() missed the moment when the networking stack +> became able to recycle skb pages backed by a page_pool. This was making +> e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was +> also affected in some scenarios. +> A lot of drivers use skb_mark_for_recycle() already, it's been almost +> two years and seems like there are no issues in using it in the generic +> code too. {__,}xdp_release_frame() can be then removed as it losts its +> last user. +> + +**[v1: net-next: net: netfilter: Keep conntrack reference until IPsecv6 policy checks are done](http://lore.kernel.org/netdev/20230303094221.1501961-1-madhu.koriginja@nxp.com/)** + +> Keep the conntrack reference until policy checks have been performed for +> IPsec V6 NAT support. The reference needs to be dropped before a packet is +> queued to avoid having the conntrack module unloadable. +> +> V1-v2: added missing () in ip6_input.c in below condition +> if (!(ipprot->flags & INET6_PROTO_NOPOLICY)) +> V2-v3: replaced nf_reset with nf_reset_ct +> + +**[回复: v5: wwan: core: Support slicing in port TX flow of WWAN subsystem](http://lore.kernel.org/netdev/PSAPR03MB5653D7BAA0E5DDB2D03B341BF7B39@PSAPR03MB5653.apcprd03.prod.outlook.com/)** + +> I'm sorry to bother you, but I want to know whether my patch is accepted by the community. +> Because it seems to be a merge window, but the patch state still is "Not Applicable". Could you +> give me some suggestions about this patch state? +> + +**[v3: net-next: net/smc: Use percpu ref for wr tx reference](http://lore.kernel.org/netdev/20230303082115.449-1-KaiShen@linux.alibaba.com/)** + +> The refcount wr_tx_refcnt may cause cache thrashing problems among +> cores and we can use percpu ref to mitigate this issue here. We +> gain some performance improvement with percpu ref here on our +> customized smc-r verion. Applying cache alignment may also mitigate +> this problem but it seem more reasonable to use percpu ref here. +> We can also replace wr_reg_refcnt with one percpu reference like +> wr_tx_refcnt. +> + +**[v5: bpf-next: bpf: Introduce kptr RCU.](http://lore.kernel.org/netdev/20230303041446.3630-1-alexei.starovoitov@gmail.com/)** + +> - make KF_RCU stronger and require that bpf program checks for NULL +> before passing such pointers into kfunc. The prog has to do that anyway +> to access fields and it aligns with BTF_TYPE_SAFE_RCU allowlist. +> + +**[v2: net-next: Add tx push buf len param to ethtool](http://lore.kernel.org/netdev/20230302203045.4101652-1-shayagr@amazon.com/)** + +> Changed since v1: +> - Added the new ethtool param to generic netlink specs +> - Dropped dynamic advertisement of tx push buff support in ENA. +> The driver will advertise it for all platforms +> +> This patchset adds a new sub-configuration to ethtool get/set queue +> params (ethtool -g) called 'tx-push-buf-len'. +> + +**[v8: mac80211_hwsim: Add PMSR support](http://lore.kernel.org/netdev/20230302160310.923349-1-jaewan@google.com/)** + +> Dear Kernel maintainers, +> +> First of all, thank you for spending your precious time for reviewing +> my changes, and also sorry for my mistakes in previous patchsets. +> +> Let me propose series of CLs for adding PMSR support in the mac80211_hwsim. +> +> PMSR (peer measurement) is generalized measurement between STAs, +> and currently FTM (fine time measurement or flight time measurement) +> is the one and only measurement. +> + +**[v1: [net:netfilter]: Keep conntrack reference until IPsecv6 policy checks are done](http://lore.kernel.org/netdev/20230302112324.906365-1-madhu.koriginja@nxp.com/)** + +> Keep the conntrack reference until policy checks have been performed for +> IPsec V6 NAT support. The reference needs to be dropped before a packet is +> queued to avoid having the conntrack module unloadable. +> + +**[v2: linux-next: selftests: net: udpgso_bench_tx: Add test for IP fragmentation of UDP packets](http://lore.kernel.org/netdev/202303021838359696196@zte.com.cn/)** + +> The UDP GSO bench only tests the performance of userspace payload splitting +> and UDP GSO. But we are also concerned about the performance comparing with +> IP fragmentation and UDP GSO. In other words comparing IP fragmentation and +> segmentation. +> + +**[v2: net: stmmac: add to set device wake up flag when stmmac init phy](http://lore.kernel.org/netdev/20230302062143.181285-1-clementwei90@163.com/)** + +> When MAC is not support PMT, driver will check PHY's WoL capability +> and set device wakeup capability in stmmac_init_phy(). We can enable +> the WoL through ethtool, the driver would enable the device wake up +> flag. Now the device_may_wakeup() return true. +> + +**[v3: net: ice: copy last block omitted in ice_get_module_eeprom()](http://lore.kernel.org/netdev/20230301204707.2592337-1-poros@redhat.com/)** + +> ice_get_module_eeprom() is broken since commit e9c9692c8a81 ("ice: +> Reimplement module reads used by ethtool") In this refactor, +> ice_get_module_eeprom() reads the eeprom in blocks of size 8. +> But the condition that should protect the buffer overflow +> ignores the last block. The last block always contains zeros. +> + +**[v11: net-next: net: ethernet: mtk_eth_soc: various enhancements](http://lore.kernel.org/netdev/cover.1677699407.git.daniel@makrotopia.org/)** + +> This series brings a variety of fixes and enhancements for mtk_eth_soc, +> adds support for the MT7981 SoC and facilitates sharing the SGMII PCS +> code between mtk_eth_soc and mt7530. +> +> Note that this series depends on commit 697c3892d825 +> ("regmap: apply reg_base and reg_downshift for single register ops") to +> not break mt7530 pcs register access. +> + +**[v13: bpf-next: Add skb + xdp dynptrs](http://lore.kernel.org/netdev/20230301154953.641654-1-joannelkoong@gmail.com/)** + +> This patchset is the 2nd in the dynptr series. The 1st can be found here [0]. +> +> This patchset adds skb and xdp type dynptrs, which have two main benefits for +> packet parsing: +> * allowing operations on sizes that are not statically known at +> compile-time (eg variable-sized accesses). +> * more ergonomic and less brittle iteration through data (eg does not need +> manual if checking for being within bounds of data_end) +> + +**[v6: Add support for NXP bluetooth chipsets](http://lore.kernel.org/netdev/20230301154514.3292154-1-neeraj.sanjaykale@nxp.com/)** + +> This patch adds a driver for NXP bluetooth chipsets. +> +> The driver is based on H4 protocol, and uses serdev APIs. It supports host +> to chip power save feature, which is signalled by the host by asserting +> break over UART TX lines, to put the chip into sleep state. +> +> To support this feature, break_ctl has also been added to serdev-tty along +> with a new serdev API serdev_device_break_ctl(). +> + +**[v1: net: ieee802154: Prevent user from crashing the host](http://lore.kernel.org/netdev/20230301154450.547716-1-miquel.raynal@bootlin.com/)** + +> Avoid crashing the machine by checking +> info->attrs[NL802154_ATTR_SCAN_TYPE] presence before de-referencing it, +> which was the primary intend of the blamed patch. +> + +**[v1: [NETFILTER]: Keep conntrack reference until IPsecv6 policy checks are done](http://lore.kernel.org/netdev/20230301145534.421569-1-madhu.koriginja@nxp.com/)** + +> Keep the conntrack reference until policy checks have been performed for +> IPsec V6 NAT support. The reference needs to be dropped before a packet is +> queued to avoid having the conntrack module unloadable. +> + +**[v1: vsock: check error queue to set EPOLLERR](http://lore.kernel.org/netdev/76e7698d-890b-d14d-fa34-da5dd7dd13d8@sberdevices.ru/)** + +> EPOLLERR must be set not only when there is error on the socket, but also +> when error queue of it is not empty (may be it contains some control +> messages). Without this patch 'poll()' won't detect data in error queue. +> This patch is based on 'tcp_poll()'. +> + +**[v1: net: ionic: catch failure from devlink_alloc](http://lore.kernel.org/netdev/20230301013623.32226-1-shannon.nelson@amd.com/)** + +> Add a check for NULL on the alloc return. If devlink_alloc() fails and +> we try to use devlink_priv() on the NULL return, the kernel gets very +> unhappy and panics. With this fix, the driver load will still fail, +> but at least it won't panic the kernel. +> + +**[v1: net: tls: avoid hanging tasks on the tx_lock](http://lore.kernel.org/netdev/20230301002857.2101894-1-kuba@kernel.org/)** + +> syzbot sent a hung task report and Eric explains that adversarial +> receiver may keep RWIN at 0 for a long time, so we are not guaranteed +> to make forward progress. Thread which took tx_lock and went to sleep +> may not release tx_lock for hours. Use interruptible sleep where +> possible and reschedule the work if it can't take the lock. +> + +**[v3: net-next: vsock: add support for sockmap](http://lore.kernel.org/netdev/20230227-vsock-sockmap-upstream-v3-0-7e7f4ce623ee@bytedance.com/)** + +> Add support for sockmap to vsock. +> +> We're testing usage of vsock as a way to redirect guest-local UDS +> requests to the host and this patch series greatly improves the +> performance of such a setup. +> +> Compared to copying packets via userspace, this improves throughput by +> 121% in basic testing. +> + +**[v4: bpf-next: net/smc: Introduce BPF injection capability](http://lore.kernel.org/netdev/1677602291-1666-1-git-send-email-alibuda@linux.alibaba.com/)** + +> This patches attempt to introduce BPF injection capability for SMC, +> and add selftest to ensure code stability. +> +> As we all know that the SMC protocol is not suitable for all scenarios, +> especially for short-lived. However, for most applications, they cannot +> guarantee that there are no such scenarios at all. Therefore, apps +> may need some specific strategies to decide shall we need to use SMC +> or not, for example, apps can limit the scope of the SMC to a specific +> IP address or port. +> + +#### 安全增强 + +**[v1: ubsan: Tighten UBSAN_BOUNDS on GCC](http://lore.kernel.org/linux-hardening/20230302225444.never.053-kees@kernel.org/)** + +> The use of -fsanitize=bounds on GCC will ignore some trailing arrays, +> leaving a gap in coverage. Switch to using -fsanitize=bounds-strict to +> match Clang's stricter behavior. +> + +**[v1: kheaders: Use array declaration instead of char](http://lore.kernel.org/linux-hardening/20230302224946.never.243-kees@kernel.org/)** + +> Under CONFIG_FORTIFY_SOURCE, memcpy() will check the size of destination +> and source buffers. Defining kernel_headers_data as "char" would trip +> this check. Since these addresses are treated as byte arrays, define +> them as arrays (as done everywhere else). +> + +#### 异步 IO + +**[v1: io_uring/poll: don't pass in wake func to io_init_poll_iocb()](http://lore.kernel.org/io-uring/f7f8fd3e-a810-d9d7-5433-32957e880652@kernel.dk/)** + +> We only use one, and it's io_poll_wake(). Hardwire that in the initial +> init, as well as in __io_queue_proc() if we're setting up for double +> poll. +> + +**[v1: io_uring: add IORING_OP_FUSED_CMD](http://lore.kernel.org/io-uring/20230301140611.163055-1-ming.lei@redhat.com/)** + +> Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to +> be SQE128. The 1st SQE(master) is one 64byte URING_CMD, and the 2nd +> 64byte SQE(slave) is another normal 64byte OP. For any OP which needs +> to support slave OP, io_issue_defs[op].fused_slave needs to be set as 1, +> and its ->issue() can retrieve/import buffer from master request's +> fused_cmd_kbuf. +> + +**[v1: io_uring/poll: allow some retries for poll triggering spuriously](http://lore.kernel.org/io-uring/8a746fe0-dd72-568c-e601-19c9192c38fb@kernel.dk/)** + +> If we get woken spuriously when polling and fail the operation with +> -EAGAIN again, then we generally only allow polling again if data +> had been transferred at some point. This is indicated with +> REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket +> was originally empty, then we haven't transferred data yet and we will +> fail the poll re-arm. This either punts the socket to io-wq if it's +> blocking, or it fails the request with -EAGAIN if not. Neither condition +> is desirable, as the former will slow things down, while the latter +> will make the application confused. +> + +#### Rust For Linux + +**[v1: rust: sort uml documentation arch support table](http://lore.kernel.org/rust-for-linux/I0YeaNjTtc4Nh47ZLJfAs6rgfAc_QZxhynNfz-GQKssVZ1S2UI_cTScCkp9-oX-hPYVcP3EfF7N0HMB9iAlm1FcvOJagnQoLeHtiW3bGCgM=@bamelis.dev/)** + +> The arch_support table was not sorted alphabetically. +> Sorts the table properly. +> + +#### BPF + +**[v1: bpf-next: bpf: Use separate RCU callbacks for freeing selem](http://lore.kernel.org/bpf/20230303141542.300068-1-memxor@gmail.com/)** + +> Martin suggested that instead of using a byte in the hole (which he has +> a use for in his future patch) in bpf_local_storage_elem, we can +> dispatch a different call_rcu callback based on whether we need to free +> special fields in bpf_local_storage_elem data. The free path, described +> in commit 9db44fdd8105 ("bpf: Support kptrs in local storage maps"), +> only waits for call_rcu callbacks when there are special (kptrs, etc.) +> fields in the map value, hence it is necessary that we only access +> smap in this case. +> + +**[v1: cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers](http://lore.kernel.org/bpf/20230303095310.238553-1-kamalesh.babulal@oracle.com/)** + +> Replace mutex_[un]lock() with cgroup_[un]lock() wrappers to stay +> consistent across cgroup core and other subsystem code, while +> operating on the cgroup_mutex. +> + +**[v2: bpf-next: libbpf: usdt arm arg parsing support](http://lore.kernel.org/bpf/20230303083706.3597-1-puranjay12@gmail.com/)** + +> Parsing of USDT arguments is architecture-specific; on arm it is +> relatively easy since registers used are r[0-10], fp, ip, sp, lr, +> pc. Format is slightly different compared to aarch64; forms are +> +> - "size @ [ reg, #offset ]" for dereferences, for example +> "-8 @ [ sp, #76 ]" ; " -4 @ [ sp ]" +> - "size @ reg" for register values; for example +> "-4@r0" +> - "size @ #value" for raw values; for example +> "-8@#1" +> +> Add support for parsing USDT arguments for ARM architecture. +> + +**[v3: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230303012122.852654-1-kuifeng@meta.com/)** + +> Previously, BPF struct_ops didn't go off, as even when the user +> program creating it was terminated, none of these ever were pinned. +> For instance, the TCP congestion control subsystem indirectly +> maintains a reference count on the struct_ops of any registered BPF +> implemented algorithm. +> + +**[v3: bpf-next: selftests/bpf: Add -Wuninitialized flag to bpf prog flags](http://lore.kernel.org/bpf/20230303005500.1614874-1-davemarchevsky@fb.com/)** + +> Per C99 standard [0], Section 6.7.8, Paragraph 10: +> +> If an object that has automatic storage duration is not initialized +> explicitly, its value is indeterminate. +> +> And in the same document, in appendix "J.2 Undefined behavior": +> +> The behavior is undefined in the following circumstances: +> [...] +> The value of an object with automatic storage duration is used while +> it is indeterminate (6.2.4, 6.7.8, 6.8). +> + +**[v2: bpf-next: bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program types](http://lore.kernel.org/bpf/ZAD8QyoszMZiTzBY@slm.duckdns.org/)** + +> These helpers are safe to call from any context and there's no reason to +> restrict access to them. Remove them from bpf_trace and filter lists and add +> to bpf_base_func_proto() under perfmon_capable(). +> + +**[v2: bpf-next: bpf: add netfilter program type](http://lore.kernel.org/bpf/20230302172757.9548-1-fw@strlen.de/)** + +> Add minimal support to hook bpf programs to netfilter hooks, +> e.g. PREROUTING or FORWARD. +> +> For this the most relevant parts for registering a netfilter +> hook via the in-kernel api are exposed to userspace via bpf_link. +> +> The new program type is 'tracing style' and assumes skb dynptrs are used +> rather than 'direct packet access'. +> + +**[v4: bpf-next: Make uprobe attachment APK aware](http://lore.kernel.org/bpf/20230301212308.1839139-1-deso@posteo.net/)** + +> On Android, APKs (android packages; zip packages with somewhat +> prescriptive contents) are first class citizens in the system: the +> shared objects contained in them don't exist in unpacked form on the +> file system. Rather, they are mmaped directly from within the archive +> and the archive is also what the kernel is aware of. +> + +**[v1: bpf-next: selftests/bpf: support custom per-test flags and multiple expected messages](http://lore.kernel.org/bpf/20230301175417.3146070-1-eddyz87@gmail.com/)** + +> This patch allows to specify program flags and multiple verifier log +> messages for the test_loader kind of tests. For example: +> +> tools/testing/selftets/bpf/progs/foobar.c: +> +> SEC("tc") +> __success __log_level(7) +> __msg("first message") +> __msg("next message") +> __flag(BPF_F_ANY_ALIGNMENT) +> int buz(struct __sk_buff *skb) +> { ... } +> + +**[v1: Discard .note.gnu.property in vmlinux](http://lore.kernel.org/bpf/SY4P282MB108446E9ED9FB180AE717D5F9DAD9@SY4P282MB1084.AUSP282.PROD.OUTLOOK.COM/)** + +> When the kernel image is finally linked, all the notes are packed into a +> single .notes section, but these notes may have different alignments. +> +> binutils above 2.32 adds a ".note.gnu.property" section to the compiled +> output, which is 4-byte aligned on 32-bit, but 8-byte aligned on 64-bit. +> At present, the notes generated by both the ELFNOTE macro and the VDSO +> linker script are 4-byte aligned. +> + +**[v1: bpf-next: libbpf: Use text error for btf_custom_path failures](http://lore.kernel.org/bpf/20230228142531.439324-1-9erthalion6@gmail.com/)** + +> Use libbpf_strerror_r to expand the error when failed to parse the btf +> file at btf_custom_path. It does not change a lot locally, but since the +> error will bubble up through a few layers, it may become quite +> confusing otherwise. +> + +**[v3: bpf-next: selftests/bpf: Set __BITS_PER_LONG if target is bpf for LoongArch](http://lore.kernel.org/bpf/1677585781-21628-1-git-send-email-yangtiezhu@loongson.cn/)** + +> If target is bpf, there is no __loongarch__ definition, __BITS_PER_LONG +> defaults to 32, __NR_nanosleep is not defined: +> +> #if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32 +> #define __NR_nanosleep 101 +> __SC_3264(__NR_nanosleep, sys_nanosleep_time32, sys_nanosleep) +> #endif +> + +**[v2: bpf-next: Support defragmenting IPv(4|6) packets in BPF](http://lore.kernel.org/bpf/cover.1677526810.git.dxu@dxuuu.xyz/)** + +> In the context of a middlebox, fragmented packets are tricky to handle. +> The full 5-tuple of a packet is often only available in the first +> fragment which makes enforcing consistent policy difficult. There are +> really only two stateless options, neither of which are very nice: +> +> Enforce policy on first fragment and accept all subsequent fragments. +> + +**[v3: bpf-next: bpf: bpf memory usage](http://lore.kernel.org/bpf/20230227152032.12359-1-laoar.shao@gmail.com/)** + +> Currently we can't get bpf memory usage reliably. bpftool now shows the +> bpf memory footprint, which is difference with bpf memory usage. The +> + +### 周边技术动态 + +#### Qemu + +**[v11: riscv: Allow user to set the satp mode](http://lore.kernel.org/qemu-devel/20230303131252.892893-1-alexghiti@rivosinc.com/)** + +> This introduces new properties to allow the user to set the satp mode, +> see patch 3 for full syntax. In addition, it prevents cpus to boot in a +> satp mode they do not support (see patch 4). +> +> https://gitlab.com/bonzini/qemu into staging") +> + +**[v1: Risc-V CPU state by hart ID](http://lore.kernel.org/qemu-devel/20230303065055.915652-1-mchitale@ventanamicro.com/)** + +> Currently a Risc-V platform cannot realizes multiple CPUs with non contiguous +> hart IDs because the APLIC, IMSIC and ACLINT emulation code uses the +> contiguous logical CPU ID to fetch per CPU state. +> +> This patchset implements cpu_by_arch_id for Risc-V to get the CPU state +> by hart ID which may be sparse instead of the contigous logical CPU id. +> + +**[v2: hw/riscv/virt.c: add cbo[mz]-block-size fdt properties](http://lore.kernel.org/qemu-devel/20230302091406.407824-1-dbarboza@ventanamicro.com/)** + +> Based-on: 20230224132536.552293-1-dbarboza@ventanamicro.com +> ("v8: riscv: Add support for Zicbo[m,z,p] instructions") +> +> This second version, which is still dependent on: +> + +**[v1: hw/riscv/virt.c: add cbom-block-size fdt property](http://lore.kernel.org/qemu-devel/20230301215902.375217-1-dbarboza@ventanamicro.com/)** + +> I'm sending this almost last minute patch as part of the work done in: +> + +#### Buildroot + +**[[branch/2022.11.x] package/wolfssl: disable assembly when not supported](http://lore.kernel.org/buildroot/20230228153524.51C5486CA7@busybox.osuosl.org/)** + +> commit: https://git.buildroot.net/buildroot/commit/?id=348b2e25df76b9a603639dc9a99c03442ab673e1 +> branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/2022.11.x +> +> wolfssl contains some assembly code and its configure.ac script +> enables the assembly code depending on the CPU architecture. However, +> the detection logic is not sufficient and leads to using the assembly +> code in situation where it should not. +> +> Here are two examples: +> +> - As soon as the architecture is mips64/mips64el, it uses assembly +> code, but that assembly code is not mips64r6 compatible. +> +> - As soon as the architecture is RISC-V, it uses assembly code, but +> that assembly code uses multiplication instructions, without paying +> attention that the "M" extension may not be available in the RISC-V +> CPU instruction set. +> + +#### U-Boot + +**[v3: Basic StarFive JH7110 RISC-V SoC support](http://lore.kernel.org/u-boot/20230303032432.7837-1-yanhong.wang@starfivetech.com/)** + +> This series of patches base on the latest branch/master, and add support +> for the StarFive JH7110 RISC-V SoC and VisionFive V2 board. In order for +> this to be achieved, the respective DT nodes have been added, and the +> required defconfigs have been added to the boards' defconfig. What is more, +> the basic required DM drivers have been added, such as reset, clock, pinctrl, +> uart, ram etc. +> + +**[Question regarding U-boot MultiCore SMP](http://lore.kernel.org/u-boot/44c8dbba-961b-3fb1-1c3e-f196b7a95e20@sysgo.com/)** + +> I am working on the PolarFire RISC-V icicle kit and use u-boot to start +> my application. +> I configured the firmware to start u-boot on all harts (cores) and found +> out that u-boot uses a "HART lottery system" to decide which core/hart +> it runs on. +> In my special case I want u-boot to start on the first hart and the +> other harts shall wait for the interrupt. +> + +**[v1: Kconfig: Sort the BUILD_TARGET list](http://lore.kernel.org/u-boot/20230228062221.489088-1-marek.vasut+renesas@mailbox.org/)** + +> Sort the defaults list in BUILD_TARGET Kconfig option. No functional change. +> + +## 20230226:第 35 期 + +### 内核动态 + +#### RISC-V 架构支持 + +* [v14: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230224170118.16766-1-andy.chiu@sifive.com/) + + This patchset is implemented based on vector 1.0 spec to add vector support + in riscv Linux kernel. There are some assumptions for this implementations. + +* [v6: RISC-V: Apply Zicboz to clear_page](http://lore.kernel.org/linux-riscv/20230224162631.405473-1-ajones@ventanamicro.com/) + + When the Zicboz extension is available we can more rapidly zero naturally + aligned Zicboz block sized chunks of memory. As pages are always page + aligned and are larger than any Zicboz block size will be, then + clear_page() appears to be a good candidate for the extension. + +* [v1: RESEND: RISC-V: enable rust](http://lore.kernel.org/linux-riscv/20230224135044.2882109-1-conor.dooley@microchip.com/) + + This is a somewhat blind (and maybe foolish) attempt at enabling Rust + for RISC-V. I've tested this on Icicle, and the modules seem to work. + +* [v1: RISC-V: mm: Support huge page in vmalloc_fault()](http://lore.kernel.org/linux-riscv/20230224104001.2743135-1-dylan@andestech.com/) + + RISC-V supports ioremap() with huge page (pud/pmd) mapping, but + vmalloc_fault() assumes that the vmalloc range is limited to pte + mappings. Add huge page support to complete the vmalloc_fault() function. + +* [v7: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230224100218.1824569-1-alexghiti@rivosinc.com/) + + This new version gets rid of the limitation that prevented KASAN kernels + to use the newly introduced parameters. +* [v2: RISC-V: Stop emitting attributes](http://lore.kernel.org/linux-riscv/20230223224605.6995-1-palmer@rivosinc.com/) + + The RISC-V ELF attributes don't contain any useful information. New + toolchains ignore them, but they frequently trip up various older/mixed + toolchains. So just turn them off. + +* [v1: RISC-V: avoid build issues for clang/llvm-17 with binutils 2.35](http://lore.kernel.org/linux-riscv/20230223220546.52879-1-conor@kernel.org/) + + Here's an attempted (interim?) fix for issues on v5.10 due to the + presence of zifencei & zicsr in object files. + +* [v2: Use dma_default_coherent for devicetree default coherency](http://lore.kernel.org/linux-riscv/20230223113644.23356-1-jiaxun.yang@flygoat.com/) + + This series split out second half of my previous series + "v1: MIPS DMA coherence fixes". + + It intends to use dma_default_coherent to determine the default coherency of + devicetree probed devices instead of hardcoding it with Kconfig options. + +* [v2: Add JH7110 MIPI DPHY RX support](http://lore.kernel.org/linux-riscv/20230223015952.201841-1-changhuang.liang@starfivetech.com/) + + This patchset adds mipi dphy rx driver for the StarFive JH7110 SoC. + It is used to transfer CSI camera data. The series has been tested on + the VisionFive 2 board. + +* [v1: MAINTAINERS: add missing clock driver coverage for Microchip FPGAs](http://lore.kernel.org/linux-riscv/20230222124610.257101-1-conor.dooley@microchip.com/) + + When the CCC support was added, the clock binding coverage was + converted to a regex in commit 71c8517e004b ("MAINTAINERS: update + polarfire soc clock binding"), but the coverage for the clock drivers + themselves was not updated. Rectify that now. + +* [v1: RESEND: scripts/gdb: add lx_current support for riscv](http://lore.kernel.org/linux-riscv/20230222093730.1826523-1-suagrfillet@gmail.com/) + + RISC-V uses the tp register to save the current task_struct address + as its current() defines. So lx_current() of riscv just returns the + dereference of the address cast via task_ptr_type. + +* [v17: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230222033021.983168-1-guoren@kernel.org/) + + The patches convert riscv to use the generic entry infrastructure from + kernel/entry/*. Some optimization for entry.S with new .macro and merge + ret_from_kernel_thread into ret_from_fork. + +* [v3: RISC-V Hardware Probing User Interface](http://lore.kernel.org/linux-riscv/20230221190858.3159617-1-evan@rivosinc.com/) + + There's been a bunch of off-list discussions about this, including at Plumbers. + + Instead this patch set takes a very different approach and provides a set + of key/value pairs that encode various bits about the system. + +* [v1: Add PLL clocks driver for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230221141147.303642-1-xingyu.wu@starfivetech.com/) + + This patch serises are to add PLL clocks driver and modify + the system clock driver to depend on PLL clocks driver for the + StarFive JH7110 RISC-V SoC. + +* [v2: Add DMA driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230221140424.719-1-walker.chen@starfivetech.com/) + + This patch series adds dma support for the StarFive JH7110 RISC-V SoC. + The first patch adds device tree binding. The second patch includes dma + driver. The last patch adds device node of dma to JH7110 dts. + +* [v3: bpf-next: riscv, bpf: Add kfunc support for RV64](http://lore.kernel.org/linux-riscv/20230221140656.3480496-1-pulehui@huaweicloud.com/) + + This patch adds kernel function call support for RV64. Since the offset + from RV64 kernel and module functions to bpf programs is almost within + the range of s32, the current infrastructure of RV64 is already + sufficient for kfunc, so let's turn it on. + +* [v1: Add PTP support for sama7g5](http://lore.kernel.org/linux-riscv/20230221092104.730504-1-durai.manickamkr@microchip.com/) + + This patch series is intended to add PTP capability to the GEM and + EMAC for sama7g5. + +* [v2: Add new partial clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230221083323.302471-1-xingyu.wu@starfivetech.com/) + + This patch serises are to add new partial clock drivers and reset + supports about System-Top-Group(STG), Image-Signal-Process(ISP) + and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. + +* [v4: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230221024645.127922-1-hal.feng@starfivetech.com/) + + This patch series adds basic clock, reset & DT support for StarFive + JH7110 SoC. Patch 17 depends on series [1] which provides pinctrl + dt-bindings. Patch 19 depends on series [2] which provides dt-bindings + of VisionFive 2 board and JH7110 SoC. + +* [v4: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230221023523.1498500-1-jeeheng.sia@starfivetech.com/) + + This series adds RISC-V Hibernation/suspend to disk support. + Low level Arch functions were created to support hibernation. + +* [v3: Add watchdog driver for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230220081926.267695-1-xingyu.wu@starfivetech.com/) + + This patch serises are to add watchdog driver for the StarFive JH7110 + RISC-V SoC. The first patch adds docunmentation to describe device + tree bindings. The subsequent patch adds watchdog driver and support + JH7110 SoC. And the addition of device tree node will be submitted + after the JH7110 dts merge. This patchset is based on 6.2. + +#### 进程调度 + +* [v1: net: net/sched: cls_api: Move call to tcf_exts_miss_cookie_base_destroy()](http://lore.kernel.org/lkml/20230224-cls_api-wunused-function-v1-1-12c77986dc2d@kernel.org/) + + Move the call to tcf_exts_miss_cookie_base_destroy() in + tcf_exts_destroy() out of the '#ifdef CONFIG_NET_CLS_ACT', so that it + always appears used to the compiler, while not changing any behavior + with any of the various configuration combinations. + +* [v3: sched/fair: Interleave cfs bandwidth timers for improved single thread performance at low utilization](http://lore.kernel.org/lkml/20230223185918.1500132-1-sshegde@linux.vnet.ibm.com/) + + CPU cfs bandwidth controller uses hrtimer. Currently there is no initial + value set. Hence all period timers would align at expiry. + This happens when there are multiple CPU cgroup's. + +* [v1: kernel/sched/core.c: Modified prio_less().](http://lore.kernel.org/lkml/CAHOvCC7yjceArav9Ps0v1EP4CjfkrxbfXFgABK54cdFKNoE8iw@mail.gmail.com/) + + The sched_class structure is defined to be sorted by pointer size. + + This matches the sched class priority. + In the prio_less() function in kernel/sched/core.c, + the less value can be determined by pointer operation as follows. + +#### 内存管理 + +* [v1: cifs: Improve use of filemap_get_folios_tag()](http://lore.kernel.org/linux-mm/2244151.1677251586@warthog.procyon.org.uk/) + + The inefficiency derived from filemap_get_folios_tag() get a batch of + contiguous folios in Vishal's change to afs that got copied into cifs can + be reduced by skipping over those folios that have been passed by the start + position rather than going through the process of locking, checking and + trying to write them. + +* [v1: mm: hugetlb_vmemmap: simplify hugetlb_vmemmap_init() a bit](http://lore.kernel.org/linux-mm/20230223065947.64134-1-songmuchun@bytedance.com/) + + The check of IS_ENABLED(CONFIG_PROC_SYSCTL) is unnecessary since + register_sysctl_init() will be empty in this case. So, there is + no warnings after removing the check. + +* [v1: [nvdimm][crash] pmem memmap dump support](http://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/) + + This mail raises a pmem memmap dump requirement and possible solutions, but they are all still premature. + I really hope you can provide some feedback. + + pmem memmap can also be called pmem metadata here. + +* [v2: tmpfs: add the option to disable swap](http://lore.kernel.org/linux-mm/20230223024412.3522465-1-mcgrof@kernel.org/) + + This adds noswap support to tmpfs. This follows up the first RFC [0], + you can look at that link for details of the testing done. On this + v2 I've addressed the feedback provided by Matthew Wilcox and Yosry Ahmed. + +* [v1: RFC: mm: pagemap: add vma(VM_PFNMAP) support in pagemap_pte_hole()](http://lore.kernel.org/linux-mm/20230223024332.1337578-1-sunke@kylinos.cn/) + + pagemap currently does not support vma(FIXMAP), add support + in pagemap_pte_hole(). + +* [v2: mm: userfaultfd: refactor and add UFFDIO_CONTINUE_MODE_WP](http://lore.kernel.org/linux-mm/20230223005754.2700663-1-axelrasmussen@google.com/) + + The refactors are sorted by increasing controversial-ness, the idea being we + could drop some of the refactors if they are deemed not worth it. + +* [v2: mm/khugepaged: alloc_charge_hpage() take care of mem charge errors](http://lore.kernel.org/linux-mm/20230222195247.791227-1-peterx@redhat.com/) + + If memory charge failed, instead of returning the hpage but with an error, + allow the function to cleanup the folio properly, which is normally what a + function should do in this case - either return successfully, or return + with no side effect of partial runs with an indicated error. + +* [v1: swiotlb: mark swiotlb_memblock_alloc() as __init](http://lore.kernel.org/linux-mm/20230222070411.6186-1-rdunlap@infradead.org/) + + swiotlb_memblock_alloc() calls memblock_alloc(), which calls + (__init) memblock_alloc_try_nid(). However, swiotlb_membloc_alloc() + can be marked as __init since it is only called by swiotlb_init_remap(), + which is already marked as __init. + +* [v8: tracing/user_events: Remote write ABI](http://lore.kernel.org/linux-mm/20230221211143.574-1-beaub@linux.microsoft.com/) + + As part of the discussions for user_events aligned with user space + tracers, it was determined that user programs should register a aligned + value to set or clear a bit when an event becomes enabled. Currently a + shared page is being used that requires mmap(). Remove the shared page + implementation and move to a user registered address implementation. + +* [v1: dmapool: push new blocks in ascending order](http://lore.kernel.org/linux-mm/20230221165400.1595247-1-kbusch@meta.com/) + + Some users of the dmapool need their allocations to happen in ascending + order. The recent optimizations pushed the blocks in reverse order, so + restore the previous behavior by linking the next available block from + low-to-high. + +* [Sv: v1: mm/memcontrol: add memory.peak in cgroup root](http://lore.kernel.org/linux-mm/DB4PR02MB93344BAA949FA7E25E298C90FEA59@DB4PR02MB9334.eurprd02.prod.outlook.com/) + + Thanks for the quick response! I think we are just trying to get the same value that was available for us in cgroup v1 memory.max_usage_in_bytes. I guess this value also is incomplete for representing the system memory usage. Is it due the incompleteness that the memory.peak has been left out in the root of cgroup v2? + +* [v1: mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON](http://lore.kernel.org/linux-mm/20230221085905.1465385-1-naoya.horiguchi@linux.dev/) + + After a memory error happens on a clean folio, a process unexpectedly + receives SIGBUS when it accesses to the error page. This SIGBUS killing + is pointless and simply degrades the level of RAS of the system, because + the clean folio can be dropped without any data lost on memory error + handling as we do for a clean pagecache. + +* [v1: mm: slub: make kobj_type structure constant](http://lore.kernel.org/linux-mm/20230220-kobj_type-mm-slub-v1-1-5ae49b96d9aa@weissschuh.net/) + + Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") + the driver core allows the usage of const struct kobj_type. + + Take advantage of this to constify the structure definition to prevent + modification at runtime. + +* [v1: mm/zsmalloc: Split zsdesc from struct page](http://lore.kernel.org/linux-mm/20230220132218.546369-1-42.hyeyoo@gmail.com/) + + The purpose of this series is to define own memory descriptor for zsmalloc, + instead of re-using various fields of struct page. This is a part of the + effort to reduce the size of struct page to unsigned long and enable + dynamic allocation of memory descriptors. + +* [v2: Add tests for memblock_alloc_node()](http://lore.kernel.org/linux-mm/59d4745b-7b2-bf6-7b8-f6571d78d336@mail.polimi.it/) + + This test is aimed at verifying the memblock_alloc_node() to work as + expected, so setting the correct NUMA node for the new allocated + region. The memblock_alloc_node() is called directly without using any + stub. The core check is between the requested NUMA node and the `nid` field inside the memblock_region structure. +* [v10: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230219073318.366189-1-nphamcs@gmail.com/) + + There is currently no good way to query the page cache state of large + file sets and directory trees. There is mincore(), but it scales poorly: + the kernel writes out a lot of bitmap data that userspace has to + aggregate, when the user really doesn not care about per-page information + in that case. The user also needs to mmap and unmap each file as it goes + along, which can be quite slow as well. + +#### 文件系统 + +* [v1: SSDFS: flash-friendly LFS file system for ZNS SSD](http://lore.kernel.org/linux-fsdevel/20230225010927.813929-1-slava@dubeyko.com/) + + I am completely aware that patchset is big. And I am opened for any + advices how I can split the patchset on reasonable portions with + the goal to introduce SSDFS for the review. Even now, I excluded + the code of several subsystems to make the patchset slightly + smaller. Potentially, I can introduce SSDFS by smaller portions + with limited fucntionality. However, it can confuse and makes it + hard to understand how declared goals are achieved by implemented functionality. + +* [[RESEND v2 PATCH] init/do_mounts.c: add virtiofs root fs support](http://lore.kernel.org/linux-fsdevel/20230224143751.36863-1-david@ixit.cz/) + + Make it possible to boot directly from a virtiofs file system with tag + 'myfs' using the following kernel parameters: + + rootfstype=virtiofs root=myfs rw + + Booting directly from virtiofs makes it possible to use a directory on + the host as the root file system. This is convenient for testing and + situations where manipulating disk image files is cumbersome. + +* [git pull: vfs.git misc bits](http://lore.kernel.org/linux-fsdevel/Y%2FgxyQA+yKJECwyp@ZenIV/) + + That should cover the rest of what I had in -next; I'd been sick for + several weeks, so a lot of pending stuff I hoped to put into -next + is going to miss this window ;-/ + + Al, off to deal with the remaining pile in the mailbox... + +* [GIT PULL: iomap: new code for 6.3](http://lore.kernel.org/linux-fsdevel/167703901677.1909640.1798642413122202835.stg-ugh@magnolia/) + + Please pull this branch with changes for iomap for 6.3-rc1. This is + mostly rearranging things to make life easier for gfs2, nothing all that + mindblowing for this release. + + As usual, I did a test-merge with the main upstream branch as of a few + minutes ago, and didn't see any conflicts. Please let me know if you + encounter any problems. + +* [v1: Minor documentation clean-up in fs](http://lore.kernel.org/linux-fsdevel/20230220170210.15677-1-lukas.bulwahn@gmail.com/) + + please pick this minor documentation clean-up in fs. It is not in the + Documentation directory, but I would consider these README files also some unsorted largely distributed kernel documentation. + + +* [v7: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20230220105336.3810-1-nj.shetty@samsung.com/) + + The patch series covers the points discussed in November 2021 virtual + call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0]. + We have covered the initial agreed requirements in this patchset and + further additional features suggested by community. + Patchset borrows Mikulas's token based approach for 2 bdev + implementation. + +* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) + + This removes the dependency on interrupts to wake up task. Set task + state as TASK_RUNNING, if need_resched() returns true, + while polling for IO completion. + Earlier, polling task used to sleep, relying on interrupt to wake it up. + This made some IO take very long when interrupt-coalescing is enabled in + NVMe. + +#### 网络设备 + +* [v12: bpf-next: Add skb + xdp dynptrs](http://lore.kernel.org/netdev/20230226085120.3907863-1-joannelkoong@gmail.com/) + + This patchset is the 2nd in the dynptr series. The 1st can be found here [0]. + + When comparing the differences in runtime for packet parsing without dynptrs + vs. with dynptrs, there is no noticeable difference. Patch 9 contains more + details as well as examples of how to use skb and xdp dynptrs. + +* [v1: r8169: disable ASPM during NAPI poll](http://lore.kernel.org/netdev/af076f1f-a034-82e5-8f76-f3ec32a14eaa@gmail.com/) + + This is a rework of ideas from Kai-Heng on how to avoid the known + ASPM issues whilst still allowing for a maximum of ASPM-related power + savings. As a prerequisite some locking is added first. + +* [v9: net-next: r8169: Temporarily disable ASPM on NAPI poll](http://lore.kernel.org/netdev/20230225034635.2220386-1-kai.heng.feng@canonical.com/) + + The series is to temporarily disable ASPM on NAPI poll, so the NIC can + "regain" the performace loss when ASPM is enabled. The idea is from + Realtek vendor driver's feature "dynamic ASPM" . + + We have "dynamic ASPM" mechanism in Ubuntu 22.04 LTS kernel for quite a + while, and AFAIK it hasn't introduced any regression so far. + +* [v1: iproute2: genl: print caps for all families](http://lore.kernel.org/netdev/20230225003754.1726760-1-kuba@kernel.org/) + + Back in 2006 kernel commit 334c29a64507 ("[GENETLINK]: Move + command capabilities to flags.") removed some attributes and + moved the capabilities to flags. Corresponding iproute2 + commit 26328fc3933f ("Add controller support for new features + exposed") added the ability to print those caps. + + Printing is gated on version of the family, but we're checking + the version of each individual family rather than the control + family. The format of attributes in the control family + is dictated by the version of the control family alone. + +* [v3: Self-encapsulate the thermal zone device structure](http://lore.kernel.org/netdev/20230224210634.3994365-1-daniel.lezcano@linaro.org/) + + The exported thermal headers expose the thermal core structure while those + should be private to the framework. The initial idea was the thermal sensor + drivers use the thermal zone device structure pointer to pass it around from + the ops to the thermal framework API like a handler. + + * v2: [kernel: Clear workqueue to avoid use-after-free](http://lore.kernel.org/netdev/20230224195313.1877313-1-jiangzp@google.com/) + + After the hci_sync rework, cmd_sync_work was cleared when calling + hci_unregister_dev, but not when powering off the adapter. + Use-after-free errors happen when a work is still scheduled + when cmd is freed by __mgmt_power_off. + +* [v5: Another crack at a handshake upcall mechanism](http://lore.kernel.org/netdev/167726551328.5428.13732817493891677975.stgit@91.116.238.104.host.secureserver.net/) + + Here is v5 of a series to add generic support for transport layer + security handshake on behalf of kernel socket consumers (user space consumers use a security library directly, of course). + +* [v1: net: avoid indirect memory pressure calls](http://lore.kernel.org/netdev/20230224184606.7101-1-fw@strlen.de/) + + There is a noticeable tcp performance regression (loopback or cross-netns), + seen with iperf3 -Z (sendfile mode) when generic retpolines are needed. + + With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave + memory pressure happen much more often. + +* [v1: net: Regressions in Ocelot switch drivers](http://lore.kernel.org/netdev/20230224155235.512695-1-vladimir.oltean@nxp.com/) + + These are 3 patches which resolve a regression in the Seville driver, + one in the Felix driver and a generic one which affects any kernel + compiled with 2 Kconfig options enabled. All of them have in common my + lack of attention during review/testing. The patches touch the DSA, MFD + and MDIO drivers for Ocelot. I think it would be preferable if all + patches went through netdev (with Lee's Ack). + +* [v1: brcmfmac: pcie: Add 4359C0 firmware definition](http://lore.kernel.org/netdev/20230224-topic-brcm_tone-v1-1-333b0ac67934@linaro.org/) + + Some phones from around 2016, as well as other random devices have + this chip called 43956 or 4359C0 or 43596A0, which is more or less + just a rev bump (v9) of the already-supported 4359. Add a corresponding + firmware definition to allow for choosing the correct blob. + +* [v1: net-next: packet: allow MSG_NOSIGNAL in recvmsg](http://lore.kernel.org/netdev/20230224071745.20717-1-equinox@diac24.net/) + + packet_recvmsg() whitelists a bunch of MSG_* flags, which notably does + not include MSG_NOSIGNAL. Unfortunately, io_uring always sets + MSG_NOSIGNAL, meaning AF_PACKET sockets can't be used in io_uring recvmsg(). + +* [v1: linux-next: selftests: net: udpgso_bench_tx: Add test for IP fragmentation of UDP packets](http://lore.kernel.org/netdev/202302241438536013777@zte.com.cn/) + + The UDP GSO bench only tests the performance of userspace payload splitting + and UDP GSO. But we are also concerned about the performance comparing with I. + +* [v2: net-next: sfc: support offloading TC VLAN push/pop actions to the MAE](http://lore.kernel.org/netdev/20230223235026.26066-1-edward.cree@amd.com/) + + EF100 can pop and/or push up to two VLAN tags. + +* [v1: net-next: ibmvnic: Assign XPS map to correct queue index](http://lore.kernel.org/netdev/20230223153944.44969-1-nnac123@linux.ibm.com/) + + When setting the XPS map value for TX queues, use the index of the + transmit queue. + Previously, the function was passing the index of the loop that iterates + over all queues (RX and TX). This was causing invalid XPS map values. + +* [v1: net: net/sched: act_connmark: handle errno on tcf_idr_check_alloc](http://lore.kernel.org/netdev/20230223141639.13491-1-pctammela@mojatatu.com/) + + Smatch reports that 'ci' can be used uninitialized. + The current code ignores errno coming from tcf_idr_check_alloc, which + will lead to the incorrect usage of 'ci'. Handle the errno as it should. + +* [[net PATCH v2] octeontx2-af: Unlock contexts in the queue context cache in case of fault detection](http://lore.kernel.org/netdev/20230223110125.2172509-1-saikrishnag@marvell.com/) + + NDC caches contexts of frequently used queue's (Rx and Tx queues) + contexts. Due to a HW errata when NDC detects fault/poision while + accessing contexts it could go into an illegal state where a cache + line could get locked forever. To makesure all cache lines in NDC + are available for optimum performance upon fault/lockerror/posion + errors scan through all cache lines in NDC and clear the lock bit. + +* [v5: Bluetooth: NXP: Add protocol support for NXP Bluetooth chipsets](http://lore.kernel.org/netdev/20230223103614.4137309-4-neeraj.sanjaykale@nxp.com/) + + This adds a driver based on serdev driver for the NXP BT serial protocol + based on running H:4, which can enable the built-in Bluetooth device + inside an NXP BT chip. + + This driver has Power Save feature that will put the chip into sleep state + whenever there is no activity for 2000ms, and will be woken up when any + activity is to be initiated over UART. + +* [v5: Add support for NXP bluetooth chipsets](http://lore.kernel.org/netdev/20230223103614.4137309-1-neeraj.sanjaykale@nxp.com/) + + This patch adds a driver for NXP bluetooth chipsets. + + The driver is based on H4 protocol, and uses serdev APIs. It supports + host to chip power save feature, which is signalled by the host by + asserting break over UART TX lines, to put the chip into sleep state. + +* [[REGRESSION PATCH RFC] net: phy: don't resume PHY via MDIO when iface is not up](http://lore.kernel.org/netdev/20230223070519.2211-1-wsa+renesas@sang-engineering.com/) + + TLDR; Commit 96fb2077a517 ("net: phy: consider that suspend2ram may cut + off PHY power") caused regressions for us when resuming an interface + which is not up. It turns out the problem is another one, the above + commit only makes it visible. The attached patch is probably not the + right fix, but at least is proving my assumptions AFAICS. + +* [v1: net: add no-op for napi_busy_loop if CONFIG_NET_RX_BUSY_POLL=n](http://lore.kernel.org/netdev/20230223012258.1701175-1-jacob.e.keller@intel.com/) + + Commit 7db6b048da3b ("net: Commonize busy polling code to focus on napi_id + instead of socket") introduced napi_busy_loop and refactored sk_busy_loop + to call this new function. The commit removed the no-op implementation of + sk_busy_loop in the #else block for CONFIG_NET_RX_BUSY_POLL, and placed the + declaration of napi_busy_poll inside the # block where sk_busy_loop used to + be declared. + +* [v2: net-next: mlx5 technical debt of hairpin params](http://lore.kernel.org/netdev/20230222230202.523667-1-saeed@kernel.org/) + + As previously discussed, this series provides the switch from debugfs to devlink + params for hairpin. + + Per the discussion in [1], move the hairpin queues control (number and size) + from debugfs to devlink. + + [1] https://lore.kernel.org/all/20230111194608.7f15b9a1@kernel.org/ + +* [v1: 5.4/5.10: mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh](http://lore.kernel.org/netdev/20230222200301.254791-1-pchelkin@ispras.ru/) + + The null-ptr-deref problem fixed in the following patch is hit on older + branches. + + The patch failed to be initially backported into stable branches older + than 5.15 due to the fix-spell-comment commit ab4040df6efb ("mac80211: fix + some spelling mistakes"). + +* [v1: net: sunhme: Return an error when we are out of slots](http://lore.kernel.org/netdev/20230222170935.1820939-1-seanga2@gmail.com/) + + We only allocate enough space for four devices when the parent is a QFE. If + we couldn't find a spot (because five devices were created for whatever + reason), we would not return an error from probe(). Return ENODEV, which + was what we did before. + +* [v1: can: esd_usb: Improve code readability by means of replacing struct esd_usb_msg with a union](http://lore.kernel.org/netdev/20230222163754.3711766-1-frank.jungclaus@esd.eu/) + + As suggested by Vincent Mailhol, declare struct esd_usb_msg as a union + instead of a struct. Then replace all msg->msg.something constructs, + that make use of esd_usb_msg, with simpler and prettier looking + msg->something variants. + +* [v2: gro: optimise redundant parsing of packets](http://lore.kernel.org/netdev/20230222145917.GA12590@debian/) + + The first commit frees up space in the GRO CB. The second commit reduces the + redundant parsing during the complete phase, using the freed CB space. + + In addition, the second commit contains a fix for a potential problem in BIG + TCP, which is detailed in the commit message itself. + +* [v3: octeontx2-pf: Use correct struct reference in test condition](http://lore.kernel.org/netdev/Y%2FYYkKddeHOt80cO@ubun2204.myguest.virtualbox.org/) + + Fix the typo/copy-paste error by replacing struct variable ah_esp_mask name + by ah_esp_hdr. + Issue identified using doublebitand.cocci Coccinelle semantic patch. + +* [[net PATCH v2] octeontx2-pf: Recalculate UDP checksum for ptp 1-step sync packet](http://lore.kernel.org/netdev/20230222113600.1965116-1-saikrishnag@marvell.com/) + + When checksum offload is disabled in the driver via ethtool, + the PTP 1-step sync packets contain incorrect checksum, since + the stack calculates the checksum before driver updates + PTP timestamp field in the packet. This results in PTP packets + getting dropped at the other end. This patch fixes the issue by + re-calculating the UDP checksum after updating PTP + timestamp field in the driver. + +* [v3: net-next: net: virtio_net: implement exact header length guest feature](http://lore.kernel.org/netdev/20230222080638.382211-1-jiri@resnulli.us/) + + Virtio spec introduced a feature VIRTIO_NET_F_GUEST_HDRLEN which when + which when set implicates that device benefits from knowing the exact + size of the header. For compatibility, to signal to the device that + the header is reliable driver also needs to set this feature. + Without this feature set by driver, device has to figure + out the header size itself. + +* [v3: net: stmmac: Premature loop termination check was ignored](http://lore.kernel.org/netdev/87y1oq5es0.fsf@henneberg-systemdesign.com/) + + The premature loop termination check makes sense only in case of the + jump to read_again where the count may have been updated. But + read_again did not include the check. + +* [[net PATCH] octeontx2-af: Unlock contexts in the queue context cache in case of fault detection](http://lore.kernel.org/netdev/20230222065921.1852686-1-saikrishnag@marvell.com/) + + NDC caches contexts of frequently used queue's (Rx and Tx queues) + contexts. Due to a HW errata when NDC detects fault/poision while + accessing contexts it could go into an illegal state where a cache + line could get locked forever. To makesure all cache lines in NDC + are available for optimum performance upon fault/lockerror/posion + errors scan through all cache lines in NDC and clear the lock bit. + +* [v11: bpf-next: Add skb + xdp dynptrs](http://lore.kernel.org/netdev/20230222060747.2562549-1-joannelkoong@gmail.com/) + + When comparing the differences in runtime for packet parsing without dynptrs + vs. with dynptrs, there is no noticeable difference. Patch 9 contains more + details as well as examples of how to use skb and xdp dynptrs. + +#### 安全增强 + +* [v1: next: usb: host: oxu210hp-hcd: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/Y%2FgynI9Wv8RZTD8M@work/) + + Zero-length arrays as fake flexible arrays are deprecated and we are + moving towards adopting C99 flexible-array members instead. + + Transform zero-length array into flexible-array member in struct ehci_regs. + +* [GIT PULL: flexible-array transformations for 6.3-rc1](http://lore.kernel.org/linux-hardening/Y%2FfnjS5eHNauiUUR@work/) + + The following changes since commit 88603b6dc419445847923fcb7fe5080067a30f98: + + Linux 6.2-rc2 (2023-01-01 13:53:16 -0800) + + are available in the Git repository at: + + git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git tags/flex-array-transformations-6.3-rc1 + + for you to fetch changes up to b942a520d9e43bc31f0808d2f2267a1ddba75518: + + bcache: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper (2023-01-05 17:48:45 -0600) + + flexible-array transformations for 6.3-rc1 + + Please, pull the following patches that transform zero-length arrays, + in unions, into flexible arrays. These patches have been baking in + linux-next for the whole development cycle. + +* [v1: wifi: iwlwifi: dvm: Add struct_group for struct iwl_keyinfo keys](http://lore.kernel.org/linux-hardening/20230218191056.never.374-kees@kernel.org/) + + Function iwlagn_send_sta_key() was trying to write across multiple + structure members in a single memcpy(). Add a struct group "keys" to + let the compiler see the intended bounds of the memcpy, which includes + the tkip keys as well. Silences false positive memcpy() run-time + warning: + + memcpy: detected field-spanning write (size 32) of single field "sta_cmd.key.key" at drivers/net/wireless/intel/iwlwifi/dvm/sta.c:1103 (size 16) + +#### 异步 IO + +* [v3: io_uring: Add KASAN support for alloc caches](http://lore.kernel.org/io-uring/20230223164353.2839177-1-leitao@debian.org/) + + This patchset enables KASAN for alloc cache buffers. These buffers are + used by apoll and netmsg code path. These buffers will now be poisoned + when not used, so, if randomly touched, a KASAN warning will pop up. + +* [v1: for-next: io_uring: registered huge buffer optimisations](http://lore.kernel.org/io-uring/cover.1677041932.git.asml.silence@gmail.com/) + + Improve support for registered buffers consisting of huge pages by + keeping them as a single element bvec instead of chunking them into + 4K pages. It improves performance quite a bit cutting CPU cycles on + dma-mapping and promoting a more efficient use of hardware. + +* [v2: Add io_uring & ebpf based methods to implement zero-copy for ublk](http://lore.kernel.org/io-uring/20230222132534.114574-1-xiaoguang.wang@linux.alibaba.com/) + + Normally, userspace block device implementations need to copy data between + kernel block layer's io requests and userspace block device's userspace + daemon. For example, ublk and tcmu both have similar logic, but this + operation will consume cpu resources obviously, especially for large io. + +* [v1: tools/io_uring: tools/io_uring: correctly set "ret" for sq_poll case](http://lore.kernel.org/io-uring/20230221073736.628851-1-ZiyangZhang@linux.alibaba.com/) + + For sq_poll case, "ret" is not initialized or cleared/set. In this way, + output of this test program is incorrect and we can not even stop this + program by pressing CTRL-C. + + Reset "ret" to zero in each submission/completion round, and assign + "ret" to "this_reap". + +* [v1: liburing: test sends with huge pages](http://lore.kernel.org/io-uring/cover.1676941370.git.asml.silence@gmail.com/) + + Add huge pages support for zc send benchmark and huge pages + tests in send-zerocopy.c. + +* [v1: liburing: test/buf-ring: add test for buf ring occupying exactly one page](http://lore.kernel.org/io-uring/20230218184618.70966-1-wlukowicz01@gmail.com/) + + This shows an issue with how the kernel calculates buffer ring sizes + during their registration. + + Allocate two pages, register a buf ring fully occupying the first one, + while protecting the second one to make sure it's not used. The + registration should succeed. + +#### Rust For Linux + +* [v1: rust: xarray: Add an abstraction for XArray](http://lore.kernel.org/rust-for-linux/20230224-rust-xarray-v1-1-80f0904ce5d3@asahilina.net/) + + The XArray is an abstract data type which behaves like a very large + array of pointers. Add a Rust abstraction for this data type. + + The initial implementation uses explicit locking on get operations and + returns a guard which blocks mutation, ensuring that the referenced + object remains alive. + +* [v1: rust: time: New module for timekeeping functions](http://lore.kernel.org/rust-for-linux/20230221-gpu-up-time-v1-1-bf8fe74b7f55@asahilina.net/) + + This module is intended to contain functions related to kernel + timekeeping and time. Initially, this just wraps ktime_get() and + ktime_get_boottime() and returns them as core::time::Duration instances. + This is useful for drivers that need to implement simple retry loops and + timeouts. + +#### BPF + +* [v3: bpf-next: Add support for kptrs in more BPF maps](http://lore.kernel.org/bpf/20230225154010.391965-1-memxor@gmail.com/) + + This set adds support for kptrs in percpu hashmaps, percpu LRU hashmaps, + and local storage maps (covering sk, cgrp, task, inode). + + Tests are expanded to test more existing maps at runtime and also test + the code path for the local storage maps (which is shared by all + implementations). + +* [v2: bpf-next:: Add socket destroy capability](http://lore.kernel.org/bpf/20230223215311.926899-1-aditi.ghag@isovalent.com/) + + This patch adds the capability to destroy sockets in BPF. We plan to use + the capability in Cilium to force client sockets to reconnect when their + remote load-balancing backends are deleted. The other use case is + on-the-fly policy enforcement where existing socket connections prevented + by policies need to be terminated. + +* [v3: blk-ioprio: Introduce promote-to-rt policy](http://lore.kernel.org/bpf/20230223134852.3745349-1-houtao@huaweicloud.com/) + + Since commit a78418e6a04c ("block: Always initialize bio IO priority on + submit"), bio->bi_ioprio will never be IOPRIO_CLASS_NONE when calling + blkcg_set_ioprio(), so there will be no way to promote the io-priority + of one cgroup to IOPRIO_CLASS_RT, because bi_ioprio will always be + greater than or equals to IOPRIO_CLASS_RT. + +* [bpf: RFC for platform specific BPF helper addition](http://lore.kernel.org/bpf/0838bc96-c8a8-c326-a8f0-80240cf6b31a@linux.intel.com/) + + Some background first; on x86 platforms there is a free running TSC + counter which can be used to generate extremely accurate profiling time + stamps. Currently this can be used by BPF programs via hooking into perf + subsystem and reading the value there; however this reduces the accuracy + due to latency + jitter involved with long execution chain, and also the + timebase gets converted into relative from the start of the execution of + the program, instead of getting an absolute system level value. + +* [v2: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230223011238.12313-1-kuifeng@meta.com/) + + Previously, BPF struct_ops didn't go off, as even when the user + program creating it was terminated, none of these ever were pinned. + For instance, the TCP congestion control subsystem indirectly + maintains a reference count on the struct_ops of any registered BPF + implemented algorithm. Thus, the algorithm won't be deactivated until + someone deliberately unregisters it. + +* [[RFC/PATCHSET 0/8] perf record: Implement BPF sample filter (v3)](http://lore.kernel.org/bpf/20230222230141.1729048-1-namhyung@kernel.org/) + + There have been requests for more sophisticated perf event sample + filtering based on the sample data. Recently the kernel added BPF + programs can access perf sample data and this is the userspace part + to enable such a filtering. + + This still has some rough edges and needs more improvements. But + I'd like to share the current work and get some feedback for the + directions and idea for further improvements. + +* [v2: bpf-next: bpf: bpf memory usage](http://lore.kernel.org/bpf/20230222014553.47744-1-laoar.shao@gmail.com/) + + Currently we can't get bpf memory usage reliably. bpftool now shows the + bpf memory footprint, which is difference with bpf memory usage. The + +* [v1: bpf-next: bpf: Add bpf_cgroup_from_id() kfunc](http://lore.kernel.org/bpf/Y%2FVA+jP0mB5cMZEz@slm.duckdns.org/) + + cgroup ID is an userspace-visible 64bit value uniquely identifying a given + cgroup. As the IDs are used widely, it's useful to be able to look up the + matching cgroups. Add bpf_cgroup_from_id(). + +* [v1: bpf: Add support for absolute value BPF timers](http://lore.kernel.org/bpf/20230221151846.2218217-1-tero.kristo@linux.intel.com/) + + Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start() + to start an absolute value timer instead of the default relative value. + This makes the timer expire at an exact point in time, instead of a time + with latencies and jitter induced by both the BPF and timer subsystems. + This is useful e.g. in certain time sensitive profiling cases, where we + need a timer to expire at an exact point in time. + +* [v2: bpf-next: net/smc: Introduce BPF injection capability](http://lore.kernel.org/bpf/1676981919-64884-1-git-send-email-alibuda@linux.alibaba.com/) + + This PATCHes attempt to introduce BPF injection capability for SMC, + and add selftest to ensure code stability. + + As we all know that the SMC protocol is not suitable for all scenarios, + especially for short-lived. However, for most applications, they cannot + guarantee that there are no such scenarios at all. Therefore, apps + may need some specific strategies to decide shall we need to use SMC + or not, for example, apps can limit the scope of the SMC to a specific + IP address or port. + +* [v1: net-next: xsk: add linux/vmalloc.h to xsk.c](http://lore.kernel.org/bpf/20230221075140.46988-1-xuanzhuo@linux.alibaba.com/) + + Fix the failure of the compilation under the sh4. + + Because we introduced remap_vmalloc_range() earlier, this has caused + the compilation failure on the sh4 platform. So this introduction of the + header file of linux/vmalloc.h. + +* [v3: bpf-next: libbpf: allow users to set kprobe/uprobe attach mode](http://lore.kernel.org/bpf/20230221025347.389047-1-imagedong@tencent.com/) + + By default, libbpf will attach the kprobe/uprobe eBPF program in the + latest mode that supported by kernel. In this series, we add the support + to let users manually attach kprobe/uprobe in legacy/perf/link mode in + the 1th patch. + + And in the 2th patch, we split the testing 'attach_probe' into multi + subtests, as Andrii suggested. + + In the 3th patch, we add the testings for loading kprobe/uprobe in + +* [v1: bpf-next: libbpf: Document bpf_{btf,link,map,prog}_get_info_by_fd()](http://lore.kernel.org/bpf/20230220234958.764997-1-iii@linux.ibm.com/) + + Replace the short informal description with the proper doc comments. + +* [v1: bbpf: usdt arm arg parsing support](http://lore.kernel.org/bpf/20230220212233.13229-1-puranjay12@gmail.com/) + + Parsing of USDT arguments is architecture-specific; on arm it is + relatively easy since registers used are r[0-10], fp, ip, sp, lr, + pc. Format is slightly different compared to aarch64; forms are + + - "size @ [ reg, #offset ]" for dereferences, for example + "-8 @ [ sp, #76 ]" ; " -4 @ [ sp ]" + - "size @ reg" for register values; for example + "-4@r0" + - "size @ #value" for raw values; for example + "-8@#1" + + Add support for parsing USDT arguments for ARM architecture. + +* [v1: bpf-next: bpf: Check for helper calls in check_subprogs()](http://lore.kernel.org/bpf/20230220163756.753713-1-iii@linux.ibm.com/) + + The condition src_reg != BPF_PSEUDO_CALL && imm == BPF_FUNC_tail_call + may be satisfied by a kfunc call. This would lead to unnecessarily + setting has_tail_call. Use src_reg == 0 instead. + +* [v2: bpf-next: bpf: Allow reads from uninit stack](http://lore.kernel.org/bpf/20230219200427.606541-1-eddyz87@gmail.com/) + + This patch-set modifies BPF verifier to accept programs that read from + uninitialized stack locations, but only if executed in privileged mode. + This provides significant verification performance gains: 30% to 70% less + processed states for big number of test programs. + +### 周边技术动态 + +#### Qemu + +* [v1: Fourth RISC-V PR for QEMU 8.0, Attempt 2](http://lore.kernel.org/qemu-devel/20230224185908.32706-1-palmer@rivosinc.com/) + + The following changes since commit 417296c8d8588f782018d01a317f88957e9786d6: + + tests/qtest/netdev-socket: Raise connection timeout to 60 seconds (2023-02-09 11:23:53 +0000) + + are available in the Git repository at: + + git@github.com:palmer-dabbelt/qemu.git tags/pull-riscv-to-apply-20230224 + + for you to fetch changes up to 8c89d50c10afdd98da82642ca5e9d7af4f1c18bd: + + target/riscv: Fix vslide1up.vf and vslide1down.vf (2023-02-23 14:21:34 -0800) + + Fourth RISC-V PR for QEMU 8.0, Attempt 2 + +* [v8: riscv: Add support for Zicbo[m,z,p] instructions](http://lore.kernel.org/qemu-devel/20230224132536.552293-1-dbarboza@ventanamicro.com/) + + This version has a change in patch 2, proposed by Weiwei Li, where we're + now triggering virt_instruction_fault before triggering illegal_insn + fault from S mode. + +* [v1: target/riscv: Add support for Svadu extension](http://lore.kernel.org/qemu-devel/20230224040852.37109-1-liweiwei@iscas.ac.cn/) + + This patchset adds support svadu extension. It also fixes some relationship between *envcfg fields and Svpbmt/Sstc extensions. + + Specification for Svadu extension can be found in: + + https://github.com/riscv/riscv-svadu + + The port is available here: + https://github.com/plctlab/plct-qemu/tree/plct-svadu-upstream + +* [v2: NUMA: Apply socket-NUMA-node boundary for aarch64 and RiscV machines](http://lore.kernel.org/qemu-devel/20230223081401.248835-1-gshan@redhat.com/) + + For arm64 and RiscV architecture, the driver (/base/arch_topology.c) is + used to populate the CPU topology in the Linux guest. It's required that + the CPUs in one socket can't span mutiple NUMA nodes. Otherwise, the Linux + scheduling domain can't be sorted out, as the following warning message + indicates. To avoid the unexpected confusion, this series attempts to + rejects such kind of insane configurations. + +* [v1: target/riscv/vector_helper.c: create vext_set_tail_elems_1s()](http://lore.kernel.org/qemu-devel/20230221184525.140704-1-dbarboza@ventanamicro.com/) + + Commit 752614cab8e6 ("target/riscv: rvv: Add tail agnostic for vector + load / store instructions") added code to set the tail elements to 1 in + the end of vext_ldst_stride(), vext_ldst_us(), vext_ldst_index() and + vext_ldff(). Aside from a env->vl versus an evl value being used in the + first loop, the code is being repeated 4 times. + +* [v1: target/riscv: Add support for Zicond extension](http://lore.kernel.org/qemu-devel/20230221091009.36545-1-liweiwei@iscas.ac.cn/) + + The spec can be found in https://github.com/riscv/riscv-zicond. + Two instructions are added: + - czero.eqz: Moves zero to a register rd, if the condition rs2 is + equal to zero, otherwise moves rs1 to rd. + - czero.nez: Moves zero to a register rd, if the condition rs2 is + nonzero, otherwise moves rs1 to rd. + +#### U-Boot + +* [v1: Add StarFive JH7110 PCIe drvier support](http://lore.kernel.org/u-boot/20230223105240.15180-1-minda.chen@starfivetech.com/) + + The PCIe driver depends on gpio, pinctrl, clk and reset driver to do init. + The PCIe dts configuation includes all these setting. + + The PCIe drivers codes has been tested on the VisionFive V2 boards. + The test devices includes M.2 NVMe SSD and Realtek 8169 Ethernet adapter. + +* [Boot from 64-bit memory address?](http://lore.kernel.org/u-boot/BL3PR11MB5713975ADD19187E59776A2389AB9@BL3PR11MB5713.namprd11.prod.outlook.com/) + + Is it possible to boot from a DRAM memory address beyond the 32-bit boundary? I'm trying to configure a new RISC-V board which has 2GB of DRAM starting at offset 0x40_0000_0000. I started from the settings for an existing RISC-V board and made adjustments for my HW, but when I try to boot, I run into an "out of memory" error. + +* [v1: riscv: Support CONFIG_REMAKE_ELF](http://lore.kernel.org/u-boot/20230220060239.42279-1-samuel@sholland.org/) + + Add flags to tell objcopy what kind of ELF to create. + +## 20230219:第 34 期 + +### 内核动态 + +#### RISC-V 架构支持 + +* [v1: Add dead syscalls elimination support](http://lore.kernel.org/linux-riscv/cover.1676594211.git.falcon@tinylab.org/) + + CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION allows to eliminate dead code + and data, this patchset allows to further eliminate dead syscalls which + are not used in target system. + +* [v2: Add basic ACPI support for RISC-V](http://lore.kernel.org/linux-riscv/20230216182043.1946553-1-sunilvl@ventanamicro.com/) + + This patch series enables the basic ACPI infrastructure for RISC-V. + Supporting external interrupt controllers is in progress and hence it is + tested using poll based HVC SBI console and RAM disk. + +* [v1: dt-bindings: riscv: correct starfive visionfive 2 compatibles](http://lore.kernel.org/linux-riscv/20230216131511.3327943-1-conor.dooley@microchip.com/) + + Using "va" and "vb" doesn't match what's written on the board, or the + communications from StarFive. + Switching to using the silkscreened version number will ease confusion & + the risk of another spin of the board containing a "conflicting" version identifier. + +* [v3: RISC-V: Don't check text_mutex during stop_machine](http://lore.kernel.org/linux-riscv/20230215164317.727657-1-conor@kernel.org/) + + We're currently using stop_machine() to update ftrace, which means that + the thread that takes text_mutex during ftrace_prepare() may not be the + same as the thread that eventually patches the code. This isn't + actually a race because the lock is still held (preventing any other + concurrent accesses) and there is only one thread running during + stop_machine(), but it does trigger a lockdep failure. + +* [v1: riscv: Introduce KASLR](http://lore.kernel.org/linux-riscv/20230215145113.465558-1-alexghiti@rivosinc.com/) + + The following KASLR implementation allows to randomize the kernel mapping: + + - virtually: we expect the bootloader to provide a seed in the device-tree + - physically: only implemented in the EFI stub, it relies on the firmware to + provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation + hence the patch 3 factorizes KASLR related functions for riscv to take advantage. + +* [v1: riscv: Avoid enabling interrupts in die()](http://lore.kernel.org/linux-riscv/20230215144828.3370316-1-mnissler@rivosinc.com/) + + While working on something else, I noticed that the kernel would start + accepting interrupts again after crashing in an interrupt handler. Since + the kernel is already in inconsistent state, enabling interrupts is + dangerous and opens up risk of kernel state deteriorating further. + +* [v8: Introduce 64b relocatable kernel](http://lore.kernel.org/linux-riscv/20230215143626.453491-1-alexghiti@rivosinc.com/) + + After multiple attempts, this patchset is now based on the fact that the + 64b kernel mapping was moved outside the linear mapping. + + The first patch allows to build relocatable kernels but is not selected + by default. That patch is a requirement for KASLR. + +* [v1: bpf-next: Support bpf trampoline for RV64](http://lore.kernel.org/linux-riscv/20230215135205.1411105-1-pulehui@huaweicloud.com/) + + BPF trampoline is the critical infrastructure of the bpf + subsystem, acting as a mediator between kernel functions + and BPF programs. Numerous important features, such as + using ebpf program for zero overhead kernel introspection, + rely on this key component. + +* [v4: StarFive's SDIO/eMMC driver support](http://lore.kernel.org/linux-riscv/20230215113249.47727-1-william.qiu@starfivetech.com/) + + This patchset adds initial rudimentary support for the StarFive + designware mobile storage host controller driver. And this driver will + be used in StarFive's VisionFive 2 board. The main purpose of adding + this driver is to accommodate the ultra-high speed mode of eMMC. + +* [v1: MAINTAINERS: repair file entry for STARFIVE JH7110 MMC/SD/SDIO DRIVER](http://lore.kernel.org/linux-riscv/20230215080203.27445-1-lukas.bulwahn@gmail.com/) + + Commit bfde6b3869f5 ("mmc: starfive: Add sdio/emmc driver support") adds a + section in MAINTAINERS refering to the file drivers/mmc/dw_mmc-starfive.c, + but the file is actually located at drivers/mmc/host/dw_mmc-starfive.c. + +* [v1: RISC-V: Guard alternative asm macros with !LINKER_SCRIPT](http://lore.kernel.org/linux-riscv/20230214201358.10647-1-palmer@rivosinc.com/) + + Without this I get a handful of .macro related directives that trip up LD. + +* [v2: RISC-V: Enable dead code elimination](http://lore.kernel.org/linux-riscv/20230214150959.49088-1-falcon@tinylab.org/) + + Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for RISC-V, allowing + the user to enable dead code elimination. In order for this to work, + ensure that we keep the alternative table by annotating them with KEEP. + +* [[PATCH v1 RFC Zisslpcfi 00/20] riscv control-flow integrity for U mode](http://lore.kernel.org/linux-riscv/20230213045351.3945824-1-debug@rivosinc.com/) + + I've been working on linux support for shadow stack and landing pad + instruction on riscv for a while. + + These are still RFC quality. But atleast they're in a shape which can + start a discussion and I can get some feedback. So I decided to sending out patches. + +* [v2: Add RISC-V 32 NOMMU support](http://lore.kernel.org/linux-riscv/20230212205506.1992714-1-Mr.Bossman075@gmail.com/) + + This patch-set aims to add NOMMU support to RV32. + Many people want to build simple emulators or HDL + models of RISC-V this patch makes it possible to run linux on them. + +* [v1: RISC-V: take text_mutex during alternative patching](http://lore.kernel.org/linux-riscv/20230212194735.491785-1-conor@kernel.org/) + + This issue was exposed by 702e64550b12 ("riscv: fpu: switch has_fpu() to + riscv_has_extension_likely()"), as it is the patching in has_fpu() that + triggers the splats in Guenter's report. + +#### 进程调度 + +* [v1: sched: Consider task_struct::saved_state in wait_task_inactive().](http://lore.kernel.org/lkml/Y++UzubyNavLKFDP@linutronix.de/) + + wait_task_inactive() waits for thread to unschedule in a certain task state. + + Check also for task_struct::saved_state if the desired match was not found in + task_struct::__state on PREEMPT_RT. If the state was found in saved_state, wait + until the task is idle and state is visible in task_struct::__state. + +* [v1: net: sched: sch: null pointer dereference in htb_offload_move_qdisc()](http://lore.kernel.org/lkml/20230216104939.3553390-1-alok.a.tiwari@oracle.com/) + + A possible case of null pointer dereference detected by static analyzer + htb_destroy_class_offload() is calling htb_find() which can return NULL value + for invalid class id, moved_cl=htb_find(classid, sch); + in that case it should not pass 'moved_cl' to htb_offload_move_qdisc() if 'moved_cl' is NULL pointer return -EINVAL. + +* [v1: sched: sd_llc_id initialized](http://lore.kernel.org/lkml/20230215015435.100559-1-sunshouxin@chinatelecom.cn/) + + In my test,I use isolcpus to isolate cpu for specific, + and then I noticed different scenario when core binding. + +* [v1: sched/fair: Interleave cfs bandwidth timers for improved single thread performance at low utilization](http://lore.kernel.org/lkml/9c57c92c-3e0c-b8c5-4be9-8f4df344a347@linux.vnet.ibm.com/) + + CPU cfs bandwidth controller uses hrtimer called period timer. Quota is + refilled upon the timer expiry and re-started when there are running tasks + within the cgroup. Each cgroup has a separate period timer which manages + the period and quota for that cgroup. + +#### 内存管理 + +* [v6: Shadow stacks for userspace](http://lore.kernel.org/linux-mm/20230218211433.26859-1-rick.p.edgecombe@intel.com/) + + This series implements Shadow Stacks for userspace using x86's Control-flow + Enforcement Technology (CET). CET consists of two related security features: + shadow stacks and indirect branch tracking. This series implements just the + shadow stack part of this feature, and just for userspace. + +* [v1: Add flag as THP allocation hint for memfd_restricted() syscall](http://lore.kernel.org/linux-mm/cover.1676680548.git.ackerleytng@google.com/) + + This patchset builds upon the memfd_restricted() system call that has + been discussed in the ‘KVM: mm: fd-based approach for supporting KVM’ + patch series, at + https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/#m7e944d7892afdd1d62a03a287bd488c56e377b0c + +* [v2: hugetlb: introduce HugeTLB high-granularity mapping](http://lore.kernel.org/linux-mm/20230218002819.1486479-1-jthoughton@google.com/) + + This series introduces the concept of HugeTLB high-granularity mapping + (HGM). This series teaches HugeTLB how to map HugeTLB pages at + high-granularity, similar to how THPs can be PTE-mapped. + +* [v1: 5.15: of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem](http://lore.kernel.org/linux-mm/20230217200731.285514-1-isaacmanjarres@google.com/) + + commit ce4d9a1ea35ac5429e822c4106cb2859d5c71f3e upstream. + + Patch series "Fix kmemleak crashes when scanning CMA regions", v2. + +* [v2: drm-next: v1: DRM GPUVA Manager & Nouveau VM_BIND UAPI](http://lore.kernel.org/linux-mm/20230217134422.14116-1-dakr@redhat.com/) + + This patch series provides a new UAPI for the Nouveau driver in order to + support Vulkan features, such as sparse bindings and sparse residency. + + Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to + keep track of GPU virtual address (VA) mappings in a more generic way. + +* [v5: mm/userfaultfd: Support WP on multiple VMAs](http://lore.kernel.org/linux-mm/20230217105558.832710-1-usama.anjum@collabora.com/) + + This is a simple use case where user may or may not know if the memory + area has been divided into multiple VMAs. + + We need an implementation which doesn't disrupt the already present + users. So keeping things simple, stop going over all the VMAs if any one + of the VMA hasn't been registered in WP mode. While at it, remove the + un-needed error check as well. + +* [v1: mm-unstable: mm/kvm: lockless accessed bit harvest](http://lore.kernel.org/linux-mm/20230217041230.2417228-1-yuzhao@google.com/) + + This patchset RCU-protects KVM page tables and compare-and-exchanges + KVM PTEs with the accessed bit set by hardware. It significantly + improves the performance of guests when the host is under heavy memory pressure. + +* [RFC for new feature to move pages from one vma to another without split](http://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyjniNVjp0Aw@mail.gmail.com/) + + Requesting comments on a new feature which remaps pages from one + private anonymous mapping to another, without altering the vmas + involved. Two alternatives exist but both have drawbacks: + 1. userfaultfd ioctls allocate new pages, copy data and free the old + ones even when updates could be done in-place; + 2. mremap results in vma splitting in most of the cases due to 'pgoff' mismatch. + +* [v2: dm-crypt: allocate compound pages if possible](http://lore.kernel.org/linux-mm/alpine.LRH.2.21.2302161619430.5436@file01.intranet.prod.int.rdu2.redhat.com/) + + It was reported that allocating pages for the write buffer in dm-crypt + causes measurable overhead [1]. + + This patch changes dm-crypt to allocate compound pages if they are + available. If not, we fall back to the mempool. + + [1] https://listman.redhat.com/archives/dm-devel/2023-February/053284.html + +* [v2: kasan: call clear_page with a match-all tag instead of changing page tag](http://lore.kernel.org/linux-mm/20230216195924.3287772-1-pcc@google.com/) + + Instead of changing the page's tag solely in order to obtain a pointer + with a match-all tag and then changing it back again, just convert the + pointer that we get from kmap_atomic() into one with a match-all tag + before passing it to clear_page(). + +* [v4: mm: ioremap: Convert architectures to take GENERIC_IOREMAP way](http://lore.kernel.org/linux-mm/20230216123419.461016-1-bhe@redhat.com/) + + Currently, many architecutres have't taken the standard GENERIC_IOREMAP + way to implement ioremap_prot(), iounmap(), and ioremap_xx(), but make + these functions specifically under each arch's folder. Those cause many duplicated codes of ioremap() and iounmap(). + +* [v1: mm, page_alloc: reduce page alloc/free sanity checks](http://lore.kernel.org/linux-mm/20230216095131.17336-1-vbabka@suse.cz/) + + Historically, we have performed sanity checks on all struct pages being + allocated or freed, making sure they have no unexpected page flags or + certain field values. This can detect insufficient cleanup and some + cases of use-after-free, although on its own it can't always identify + the culprit. The result is a warning and the "bad page" being leaked. + +#### 文件系统 + +* [v4: ext4: Convert inode preallocation list to an rbtree](http://lore.kernel.org/linux-fsdevel/cover.1676634592.git.ojaswin@linux.ibm.com/) + + This patch series aim to improve the performance and scalability of + inode preallocation by changing inode preallocation linked list to an + rbtree. I've ran xfstests quick on this series and plan to run auto group + as well to confirm we have no regressions. + +* [GIT PULL: Fsnotify changes for 6.3-rc1](http://lore.kernel.org/linux-fsdevel/20230217112939.daimrvd7uivov5eu@quack3/) + + since I'm on vacation next week I'm sending my pull requests for the + merge window a bit earlier. Could you please pull from + + git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify_for_v6.3-rc1 + + to get support for auditing decisions regarding fanotify permission events. + +* [[GIT PULL for-6.3] Make building the legacy dio code conditional](http://lore.kernel.org/linux-fsdevel/754b3cc0-c420-3257-9569-833c42f93808@kernel.dk/) + + The following changes since commit 2241ab53cbb5cdb08a6b2d4688feb13971058f65: + + Linux 6.2-rc5 (2023-01-21 16:27:01 -0800) + + are available in the Git repository at: + + git://git.kernel.dk/linux.git tags/for-6.3/dio-2023-02-16 + + for you to fetch changes up to 9636e650e16f6b01f0044f7662074958c23e4707: + + fs: build the legacy direct I/O code conditionally (2023-01-26 10:30:56 -0700) + +* [v2: eventfd: use wait_event_interruptible_locked_irq() helper](http://lore.kernel.org/linux-fsdevel/tencent_98334C552AB55C90FCE4523A327393DFF606@qq.com/) + + wait_event_interruptible_locked_irq was introduced by commit 22c43c81a51e + ("wait_event_interruptible_locked() interface"), but older code such as + eventfd_{write,read} still uses the open code implementation. + Inspired by commit 8120a8aadb20 + ("fs/timerfd.c: make use of wait_event_interruptible_locked_irq()"), this + patch replaces the open code implementation with a single macro call. + +* [GIT PULL: i_version handling changes for v6.3](http://lore.kernel.org/linux-fsdevel/0d67a8a252ef22c6506f45761c2f7d1185a44190.camel@kernel.org/) + + The following changes since commit 948ef7bb70c4acaf74d87420ea3a1190862d4548: + + Merge tag 'modules-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux (2023-01-24 18:19:44 -0800) + + are available in the Git repository at: + + https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git tags/iversion-v6.3 + + for you to fetch changes up to 58a033c9a3e003e048a0431a296e58c6b363b02b: + + nfsd: remove fetch_iversion export operation (2023-01-26 07:00:06 -0500) + +* [v1: chardev: make kobj_type structures constant](http://lore.kernel.org/linux-fsdevel/20230216-kobj_type-chardev-v1-1-94e213b73e85@weissschuh.net/) + + Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") + the driver core allows the usage of const struct kobj_type. + + Take advantage of this to constify the structure definitions to prevent + modification at runtime. + +* [v1: Revert boot-breaking changes in fs/](http://lore.kernel.org/linux-fsdevel/20230215-topic-next-20230214-revert-v1-0-c58cd87b9086@linaro.org/) + + next-20230213 introduced commit d9722a475711 ("splice: Do splice read from + a buffered file without using ITER_PIPE") which broke booting on any + Qualcomm ARM64 device I grabbed, dereferencing a null pointer in + generic_filesplice_read+0xf8/x598. Revert it (and its dependency) + (or accept better solutions should anybody come up with such) to make + them bootable again. + +* [v1: mm: userfaultfd: add UFFDIO_CONTINUE_MODE_WP to install WP PTEs](http://lore.kernel.org/linux-fsdevel/20230214215046.1187635-1-axelrasmussen@google.com/) + + UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new + PTE to resolve a missing fault, one can install a write-protected one. + This is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in combination. + +* [v14: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-fsdevel/20230214171330.2722188-1-dhowells@redhat.com/) + + Here are patches to provide support for extracting pages from an iov_iter + and to use this in the extraction functions in the block layer bio code. + +* [Attending LFS (was: v2: FUSE BPF: A Stacked Filesystem Extension for FUSE)](http://lore.kernel.org/linux-fsdevel/56d5ac0e-4c54-46b7-85d3-5de127562630@app.fastmail.com/) + + I wouldn't be able to get the travel funded by my employer, and I don't think I'm a suitable recipient for the Linux Foundation's travel fund. Therefore, I think it would make more sense for me to attend potentially relevant sessions remotely. + +* [v3: iov_iter: Adjust styling/location of new splice functions](http://lore.kernel.org/linux-fsdevel/20230214083710.2547248-1-dhowells@redhat.com/) + + Here are patches to make some changes that Christoph requested[1] to the new generic file splice functions that I implemented[2]. + + I've also updated worked the changes into the commits on my iov-extract + branch if that would be preferable, though that means Jens would need to + update his for-6.3/iov-extract again. + +* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) + + This removes the dependency on interrupts to wake up task. Set task + state as TASK_RUNNING, if need_resched() returns true, + while polling for IO completion. + Earlier, polling task used to sleep, relying on interrupt to wake it up. + This made some IO take very long when interrupt-coalescing is enabled in NVMe. + +#### 网络设备 + +* [v1: net: fec: Allow turning off IRQ coalescing](http://lore.kernel.org/netdev/20230218214037.16977-1-richard@nod.at/) + + Setting tx/rx-frames or tx/rx-usecs to zero is currently possible but + has no effect. + Also IRQ coalescing is always enabled on supported hardware. + +* [v1: wifi: iwlwifi: dvm: Add struct_group for struct iwl_keyinfo keys](http://lore.kernel.org/netdev/20230218191056.never.374-kees@kernel.org/) + + Function iwlagn_send_sta_key() was trying to write across multiple + structure members in a single memcpy(). Add a struct group "keys" to + let the compiler see the intended bounds of the memcpy, which includes + the tkip keys as well. + +* [v3: bpf-next: xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support](http://lore.kernel.org/netdev/167673444093.2179692.14745621008776172374.stgit@firesoul/) + + When driver doesn't implement a bpf_xdp_metadata kfunc the default + implementation returns EOPNOTSUPP, which indicate device driver doesn't + implement this kfunc. + +* [v2: net-next: net: phy: micrel: Add support for PTP_PF_PEROUT for lan8841](http://lore.kernel.org/netdev/20230218123038.2761383-1-horatiu.vultur@microchip.com/) + + Lan8841 has 10 GPIOs and it has 2 events(EVENT_A and EVENT_B). It is + possible to assigned the 2 events to any of the GPIOs, but a GPIO can + have only 1 event at a time. + These events are used to generate periodic signals. It is possible to + configure the length, the start time and the period of the signal by + configuring the event. + +* [v1: mt76: mt7915: expose device tree match table](http://lore.kernel.org/netdev/20230218112946.3039855-1-lorenz@brun.one/) + + On MT7986 the WiFi driver currently does not get automatically loaded, + requiring manual modprobing because the device tree compatibles are not + exported into metadata. + + Add the missing MODULE_DEVICE_TABLE macro to fix this. + +* [v1: bnxt: avoid overflow in bnxt_get_nvram_directory()](http://lore.kernel.org/netdev/20230218095024.23193-1-korotkov.maxim.s@gmail.com/) + + The value of an arithmetic expression is subject + of possible overflow due to a failure to cast operands to a larger data + type before performing arithmetic. Used macro for multiplication instead + operator for avoiding overflow. + +* [v1: dt-bindings: net: dsa: mediatek,mt7530: change some descriptions to literal](http://lore.kernel.org/netdev/20230218072348.13089-1-arinc.unal@arinc9.com/) + + The line endings must be preserved on gpio-controller, io-supply, and + reset-gpios properties to look proper when the YAML file is parsed. + +* [v1: nf: netfilter: use skb len to match in length_mt6](http://lore.kernel.org/netdev/361acd69270a8c2746da5774644dda9147b407a1.1676676177.git.lucien.xin@gmail.com/) + + For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0, + and its real payload_len ( > 65535) is saved in hbh exthdr. With 0 + length for the jumbo packets, it may mismatch. + +* [v3: net-next: pds_core driver](http://lore.kernel.org/netdev/20230217225558.19837-1-shannon.nelson@amd.com/) + + This patchset implements new driver for use with the AMD/Pensando + Distributed Services Card (DSC), intended to provide core configuration + services through the auxiliary_bus for VFio and vDPA feature specific drivers. + +* [v13: net-next: net/sched: cls_api: Support hardware miss to tc action](http://lore.kernel.org/netdev/20230217223620.28508-1-paulb@nvidia.com/) + + This series adds support for hardware miss to instruct tc to continue execution + in a specific tc action instance on a filter's action list. The mlx5 driver patch + (besides the refactors) shows its usage instead of using just chain restore. + +* [v3: page_pool: add a comment explaining the fragment counter usage](http://lore.kernel.org/netdev/20230217222130.85205-1-ilias.apalodimas@linaro.org/) + + When reading the page_pool code the first impression is that keeping + two separate counters, one being the page refcnt and the other being fragment pp_frag_count, is counter-intuitive. + +* [v6: intel-next: i40e: support XDP multi-buffer](http://lore.kernel.org/netdev/20230217191515.166819-1-tirthendu.sarkar@intel.com/) + + This patchset adds multi-buffer support for XDP. Tx side already has + support for multi-buffer. This patchset focuses on Rx side. The last + patch contains actual multi-buffer changes while the previous ones are + preparatory patches. + +* [v1: net-next: net: bcmgenet: Support wake-up from s2idle](http://lore.kernel.org/netdev/20230217183415.3300158-1-f.fainelli@gmail.com/) + + When we suspend into s2idle we also need to enable the interrupt line + that generates the MPD and HFB interrupts towards the host CPU interrupt + controller (typically the ARM GIC or MIPS L1) to make it exit s2idle. + +* [v1: net-next: scm: add user copy checks to put_cmsg()](http://lore.kernel.org/netdev/20230217182454.2432057-1-edumazet@google.com/) + + This is a followup of commit 2558b8039d05 ("net: use a bounce + buffer for copying skb->mark") + + x86 and powerpc define user_access_begin, meaning + that they are not able to perform user copy checks + when using user_write_access_begin() / unsafe_copy_to_user() and friends [1] + +* [v3: net-next: net: lan966x: Use automatic selection of VCAP rule actionset](http://lore.kernel.org/netdev/20230217132831.2508465-1-horatiu.vultur@microchip.com/) + + Since commit 81e164c4aec5 ("net: microchip: sparx5: Add automatic + selection of VCAP rule actionset") the VCAP API has the capability to + select automatically the actionset based on the actions that are attached + to the rule. So it is not needed anymore to hardcode the actionset in the + driver, therefore it is OK to remove this. + +* [v2: net-next: net: default_rps_mask follow-up](http://lore.kernel.org/netdev/cover.1676635317.git.pabeni@redhat.com/) + + The first patch namespacify the setting. In the common case, once + proper isolation is in place in the main namespace, forwarding + to/from each child netns will allways happen on the desidered CPUs. + +* [v1: net-next: net: virtio_net: implement exact header length guest feature](http://lore.kernel.org/netdev/20230217121547.3958716-1-jiri@resnulli.us/) + + virtio_net_hdr_from_skb() fills up hdr_len to skb_headlen(skb). + + Virtio spec introduced a feature VIRTIO_NET_F_GUEST_HDRLEN which when + set implicates that the driver provides the exact size of the header. + +* [v2: bpf-next: xdp: bpf_xdp_metadata use NODEV for no device support](http://lore.kernel.org/netdev/167663589722.1933643.15760680115820248363.stgit@firesoul/) + + With our XDP-hints kfunc approach, where individual drivers overload the + default implementation, it can be hard for API users to determine + whether or not the current device driver have this kfunc available. + +* [v2: net-next: add ethtool categorized statistics](http://lore.kernel.org/netdev/20230217110211.433505-1-rakesh.sankaranarayanan@microchip.com/) + + Patch series contain following changes: + - add categorized ethtool statistics for Microchip KSZ series switches, + support "eth-mac", "eth-phy", "eth-ctrl", "rmon" parameters with + ethtool statistics command. mib parameter index are same for all + KSZ family switches except KSZ8830. So, functions can be re-used + across all KSZ Families (except KSZ8830) and LAN937x series. Create + separate functions for KSZ8830 with their mib parameters. + - Remove num_alus member from ksz_chip_data structure since it is unused + +* [v2: net/core: add optional threading for rps backlog processing](http://lore.kernel.org/netdev/20230217100606.1234-1-nbd@nbd.name/) + + When dealing with few flows or an imbalance on CPU utilization, static RPS + CPU assignment can be too inflexible. Add support for enabling threaded NAPI + for RPS backlog processing in order to allow the scheduler to better balance + processing. This helps better spread the load across idle CPUs. + +* [v1: wifi: rtl8xxxu: add LEDS_CLASS dependency](http://lore.kernel.org/netdev/20230217095910.2480356-1-arnd@kernel.org/) + + rtl8xxxu now unconditionally uses LEDS_CLASS, so a Kconfig dependency + is required to avoid link errors: + + aarch64-linux-ld: drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.o: in function `rtl8xxxu_disconnect': + rtl8xxxu_core.c:(.text+0x730): undefined reference to `led_classdev_unregister' + +* [v1: bpf-next: selftests/bpf: run mptcp in a dedicated netns](http://lore.kernel.org/netdev/20230217082607.3309391-1-liuhangbin@gmail.com/) + + The current mptcp test is run in init netns. If the user or default + system config disabled mptcp, the test will fail. Let's run the mptcp + test in a dedicated netns to avoid none kernel default mptcp setting. + +* [v1: Rework MAC drivers EEE support](http://lore.kernel.org/netdev/20230217034230.1249661-1-andrew@lunn.ch/) + + phy_init_eee() is supposed to be called once auto-neg has been + completed to determine if EEE should be used with the current link + mode. The MAC hardware should then be configured to either enable or + disable EEE. Many drivers get this wrong, calling phy_init_eee() once, + or only in the ethtool set_eee callback. + +* [v3: net-next: net/mlx5e: Add GBP VxLAN HW offload support](http://lore.kernel.org/netdev/20230217033925.160195-1-gavinl@nvidia.com/) + + This patch series adds HW offloading support for TC flows with VxLAN GBP encap/decap. + +* [v1: net-next: net: phy: Read EEE abilities when using .features](http://lore.kernel.org/netdev/20230217031520.1249198-1-andrew@lunn.ch/) + + A PHY driver can use a static integer value to indicate what link mode + features it supports, i.e, its abilities.. This is the old way, but + useful when dynamically determining the devices features does not + work, e.g. support of fibre. + +* [v1: net-next: Add additional phydev locks](http://lore.kernel.org/netdev/20230217030714.1249009-1-andrew@lunn.ch/) + + The phydev lock should be held when accessing members of phydev, or + calling into the driver. Some of the phy_ethtool_ functions are + missing locks. Add them. To avoid deadlock the marvell driver is + modified since it calls one of the functions which gain locks, which + would result in a deadlock. + +* [v1: net-next: Add tc-mqprio and tc-taprio support for preemptible traffic classes](http://lore.kernel.org/netdev/20230216232126.3402975-1-vladimir.oltean@nxp.com/) + + The last RFC in August 2022 contained a proposal for the UAPI of both + TSN standards which together form Frame Preemption (802.1Q and 802.3): + https://patchwork.kernel.org/project/netdevbpf/cover/20220816222920.1952936-1-vladimir.oltean@nxp.com/ + +* [v10: bpf-next: Add skb + xdp dynptrs](http://lore.kernel.org/netdev/20230216225524.1192789-1-joannelkoong@gmail.com/) + + When comparing the differences in runtime for packet parsing without dynptrs + vs. with dynptrs, there is no noticeable difference. Patch 9 contains more + details as well as examples of how to use skb and xdp dynptrs. + +* [v3: can: esd_usb: Some more preparation for supporting esd CAN-USB/3](http://lore.kernel.org/netdev/20230216190450.3901254-1-frank.jungclaus@esd.eu/) + + Another small batch of patches to be seen as preparation for adding + support of the newly available esd CAN-USB/3 to esd_usb.c. + + Due to some unresolved questions adding support for + CAN_CTRLMODE_BERR_REPORTING has been postponed to one of the future + patches. + +#### 安全增强 + +* [v3: smb3: Replace smb2pdu 1-element arrays with flex-arrays](http://lore.kernel.org/linux-hardening/20230218002436.give.204-kees@kernel.org/) + + The kernel is globally removing the ambiguous 0-length and 1-element + arrays in favor of flexible arrays, so that we can gain both compile-time + and run-time array bounds checking[1]. + +* [v1: wifi: brcmfmac: p2p: Introduce generic flexible array frame member](http://lore.kernel.org/linux-hardening/20230215224110.never.022-kees@kernel.org/) + + Silence run-time memcpy() false positive warning when processing + management frames: + + memcpy: detected field-spanning write (size 27) of single field "&mgmt_frame->u" at drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c:1469 (size 26) + +* [v1: cifs: Replace remaining 1-element arrays](http://lore.kernel.org/linux-hardening/20230215000945.never.734-kees@kernel.org/) + + The kernel is globally removing the ambiguous 0-length and 1-element + arrays in favor of flexible arrays, so that we can gain both compile-time + and run-time array bounds checking[1]. + +* [v1: cifs: Convert struct fealist away from 1-element array](http://lore.kernel.org/linux-hardening/20230215000832.never.591-kees@kernel.org/) + + The kernel is globally removing the ambiguous 0-length and 1-element + arrays in favor of flexible arrays, so that we can gain both compile-time + and run-time array bounds checking[1]. + +#### 异步 IO + +* [v1: Cache tctx cancelation state in the ctx](http://lore.kernel.org/io-uring/20230217155600.157041-1-axboe@kernel.dk/) + + One of the more expensive parts of io_req_local_work_add() is that it + has to pull in the remote task tctx to check for the very unlikely event + that we are in a cancelation state. + +* [[GIT PULL for-6.3] Switch io_uring to ITER_UBUF](http://lore.kernel.org/io-uring/7ec9c3d0-1028-4d58-8ef1-0cce3083696c@kernel.dk/) + + Since we now have ITER_UBUF available, switch to using it for single + ranges as it's more efficient than ITER_IOVEC for that. + +* [v2: io_uring: Adjust mapping wrt architecture aliasing requirements](http://lore.kernel.org/io-uring/Y+3kwh8BokobVl6o@p100/) + + Some architectures have memory cache aliasing requirements (e.g. parisc) + if memory is shared between userspace and kernel. This patch fixes the + kernel to return an aliased address when asked by userspace via mmap(). + +* [v1: Add io_uring & ebpf based methods to implement zero-copy for ublk](http://lore.kernel.org/io-uring/20230215004122.28917-1-xiaoguang.wang@linux.alibaba.com/) + + Normally, userspace block device impementations need to copy data between + kernel block layer's io requests and userspace block device's userspace + daemon, for example, ublk and tcmu both have similar logic, but this + operation will consume cpu resources obviously, especially for large io. + +* [v1: test/fsnotify: Skip fsnotify test if sys/fanotify.h not available](http://lore.kernel.org/io-uring/20230214164613.2844230-1-alviro.iskandar@gnuweeb.org/) + + Fix build on Termux (Android). Most android devices don't have + on Termux. Skip the test if it's not available. + +#### Rust For Linux + +* [GIT PULL: Rust for 6.3](http://lore.kernel.org/rust-for-linux/20230212183249.162376-1-ojeda@kernel.org/) + + A new set of features for the Rust support. + + By the time you pick this, these commits will have been in linux-next + for quite a while. No conflicts expected. No changes to the C side. + +#### BPF + +* [v1: dwarves: dwarves: change BTF encoding skip logic for functions](http://lore.kernel.org/bpf/1676675433-10583-1-git-send-email-alan.maguire@oracle.com/) + + It has been observed [1] that the recent dwarves changes + that skip BTF encoding for functions that have optimized-out + parameters are too aggressive, leading to missing kfuncs + which generate warnings and a BPF selftest failure. + +* [v1: bpf-next: libbpf: Make uprobe attachment APK aware](http://lore.kernel.org/bpf/20230217191908.1000004-1-deso@posteo.net/) + + On Android, APKs (android packages; zip packages with somewhat + prescriptive contents) are first class citizens in the system: the + shared objects contained in them don't exist in unpacked form on the + file system. Rather, they are mmaped directly from within the archive + and the archive is also what the kernel is aware of. + +* [v1: bpf-next: bpf: Tidy up verifier checking](http://lore.kernel.org/bpf/20230217005451.2438147-1-joannelkoong@gmail.com/) + + This change refactors check_mem_access() to check against the base type of + the register, and uses switch case checking instead of if / else if + checks. This change also uses the existing clear_called_saved_regs() + function for resetting caller saved regs in check_helper_call(). + +* [v1: bpf-next: Allow reads from uninit stack](http://lore.kernel.org/bpf/20230216183606.2483834-1-eddyz87@gmail.com/) + + This patch-set modifies BPF verifier to accept programs that read from + uninitialized stack locations, but only if executed in privileged mode. + This provides significant verification performance gains: 30% to 70% less + processed states for big number of test programs. + +* [v1: intel-net: ice: xsk: disable txq irq before flushing hw](http://lore.kernel.org/bpf/20230216122839.6878-1-maciej.fijalkowski@intel.com/) + + ice_qp_dis() intends to stop a given queue pair that is a target of xsk + pool attach/detach. One of the steps is to disable interrupts on these + queues. It currently is broken in a way that txq irq is turned off + *after* HW flush which in turn takes no effect. + +* [v4: net-next: xsk: support use vaddr as ring](http://lore.kernel.org/bpf/20230216083047.93525-1-xuanzhuo@linux.alibaba.com/) + + When we try to start AF_XDP on some machines with long running time, due + to the machine's memory fragmentation problem, there is no sufficient + contiguous physical memory that will cause the start failure. + +* [v2: bpf-next: bpf: Only allocate one bpf_mem_cache for bpf_cpumask_ma](http://lore.kernel.org/bpf/20230216024821.2202916-1-houtao@huaweicloud.com/) + + The size of bpf_cpumask is fixed, so there is no need to allocate many + bpf_mem_caches for bpf_cpumask_ma, just one bpf_mem_cache is enough. + Also add comments for bpf_mem_alloc_init() in bpf_mem_alloc.h to prevent + future miuse. + +* [v1: bpf: xsk: check IFF_UP earlier in Tx path](http://lore.kernel.org/bpf/20230215143309.13145-1-maciej.fijalkowski@intel.com/) + + Xsk Tx can be triggered via either sendmsg() or poll() syscalls. These + two paths share a call to common function xsk_xmit() which has two + sanity checks within. + +* [v1: libbbpf/bpftool: Support 32-bit Architectures.](http://lore.kernel.org/bpf/CANk7y0joRFw2F4iAuN9r-dWWMvOmbFZz_J4rhGhgVFjdnxPTYw@mail.gmail.com/) + + The BPF selftests fail to compile on 32-bit architectures as the skeleton + generated by bpftool doesn’t take into consideration the size difference of + variables on 32-bit/64-bit architectures. + +* [v1: bpf-next: bpf: Introduce kptr_rcu.](http://lore.kernel.org/bpf/20230215065812.7551-1-alexei.starovoitov@gmail.com/) + + The __kptr_ref turned out to be too limited, since any "trusted" pointer access + requires bpf_kptr_xchg() which is impractical when the same pointer needs + to be dereferenced by multiple cpus. + The __kptr "untrusted" only access isn't very useful in practice. + Rename __kptr to __kptr_untrusted with eventual goal to deprecate it, + and rename __kptr_ref to __kptr, since that looks to be more common use of kptrs. + Introduce __kptr_rcu that can be directly dereferenced and used similar + to native kernel C code. + +* [v2: bpf-next: Improvements for BPF_ST tracking by verifier](http://lore.kernel.org/bpf/20230214232030.1502829-1-eddyz87@gmail.com/) + + This patch-set is a part of preparation work for -mcpu=v4 option for + BPF C compiler (discussed in [1]). Among other things -mcpu=v4 should + enable generation of BPF_ST instruction by the compiler. + +* [v1: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230214221718.503964-1-kuifeng@meta.com/) + + Previously, BPF struct_ops didn't go off, as even when the user + program creating it was terminated, none of these ever were pinned. + For instance, the TCP congestion control subsystem indirectly + maintains a reference count on the struct_ops of any registered BPF + implemented algorithm. Thus, the algorithm won't be deactivated until + someone deliberately unregisters it. + +* [v3: bpf-next: bpf: Refactor release_regno searching logic](http://lore.kernel.org/bpf/20230214190551.2264057-1-davemarchevsky@fb.com/) + + Currently the ref_obj_id and OBJ_RELEASE searching is done in the code + that examines each individual arg (check_func_arg for helpers and + check_kfunc_args inner loop for kfuncs). This patch pulls out this + searching to occur before individual arg type handling, resulting in a + cleaner separation of logic and shared logic between kfuncs and helpers. + +* [Attending LFS (was: v2: FUSE BPF: A Stacked Filesystem Extension for FUSE)](http://lore.kernel.org/bpf/56d5ac0e-4c54-46b7-85d3-5de127562630@app.fastmail.com/) + + I wouldn't be able to get the travel funded by my employer, and I don't think I'm a suitable recipient for the Linux Foundation's travel fund. Therefore, I think it would make more sense for me to attend potentially relevant sessions remotely. + +* [v2: bpf-next: selftests/bpf: Cross-compile bpftool](http://lore.kernel.org/bpf/20230214161253.183458-1-bjorn@kernel.org/) + + When the BPF selftests are cross-compiled, only the a host version of + bpftool is built. This version of bpftool is used on the host-side to + generate various intermediates, e.g., skeletons. + +* [v2: LoongArch: BPF: Use 4 instructions for function address in JIT](http://lore.kernel.org/bpf/20230214152633.2265699-1-hengqi.chen@gmail.com/) + + The issus can be reproduced by running the "inline simple bpf_loop call" + verifier test. + + This is because we are emiting 2-4 instructions for 64-bit immediate moves. + During the first pass of JIT, the placeholder address is zero, emiting two + instructions for it. In the extra pass, the function address is in XKVRANGE, + emiting four instructions for it. This change the instruction index in JIT context. + +* [[RFC/PATCHSET 0/7] perf record: Implement BPF sample filter (v1)](http://lore.kernel.org/bpf/20230214050452.26390-1-namhyung@kernel.org/) + + There have been requests for more sophisticated perf event sample + filtering based on the sample data. Recently the kernel added BPF + programs can access perf sample data and this is the userspace part + to enable such a filtering. + +* [v6: bpf-next: BPF rbtree next-gen datastructure](http://lore.kernel.org/bpf/20230214004017.2534011-1-davemarchevsky@fb.com/) + + This series adds a rbtree datastructure following the "next-gen + datastructure" precedent set by recently-added linked-list [0]. This is + a reimplementation of previous rbtree RFC [1] to use kfunc + kptr + instead of adding a new map type. + +* [Proposal for patch - Extend bpftool prog run to accept cpu and flags options](http://lore.kernel.org/bpf/CH2PR21MB14309C209861239DB568C3C1FADD9@CH2PR21MB1430.namprd21.prod.outlook.com/) + + The existing bpf_test_run_opts structure exposes additional fields including "flags" and "cpu". I propose extending the bpftool prog run to accept options so set these additional fields. + +### 周边技术动态 + +#### Qemu + +* [v6: riscv: Add support for Zicbo[m,z,p] instructions](http://lore.kernel.org/qemu-devel/20230217203445.51077-1-dbarboza@ventanamicro.com/) + + This new version contains a change in patch 2 based on Richard's + feedback in v5 [1]. + +* [v1: Fourth RISC-V PR for QEMU 8.0](http://lore.kernel.org/qemu-devel/20230217175203.19510-1-palmer@rivosinc.com/) + + The following changes since commit 417296c8d8588f782018d01a317f88957e9786d6: + + tests/qtest/netdev-socket: Raise connection timeout to 60 seconds (2023-02-09 11:23:53 +0000) + + are available in the Git repository at: + + https://github.com/palmer-dabbelt/qemu.git tags/pull-riscv-to-apply-20230217 + + for you to fetch changes up to e8c0697d79ef05aa5aefb1121dfede59855556b4: + + target/riscv: Fix vslide1up.vf and vslide1down.vf (2023-02-16 08:10:40 -0800) + +#### Buildroot + +* [board/visionfive2: add link to documentation](http://lore.kernel.org/buildroot/20230212205037.0B3D685B14@busybox.osuosl.org/) + + commit: https://git.buildroot.net/buildroot/commit/?id=8f48b3983cdb32dfcd59e7e549c8eaa1503fe342 + branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master + + Add a link to RVspace Documentation Center, which did not exist + when readme.txt was first submitted. It provides datasheet, quick + start, schematics, and so on. + +#### U-Boot + +* [v1: u-boot-riscv/master](http://lore.kernel.org/u-boot/Y+9vIoBzKIo0XKva@ubuntu01/) + + The following changes since commit faac9dee8e0629326dc122f4624fc4897e3f38b0: + + Prepare v2023.04-rc2 (2023-02-13 18:39:15 -0500) + + are available in the Git repository at: + + https://source.denx.de/u-boot/custodians/u-boot-riscv.git + + for you to fetch changes up to 7574b6476afc1fd76816be6567458f6ca4f44234: + + riscv: binman: Add help message for missing blobs (2023-02-17 19:07:48 +0800) + +* [v1: riscv: binman: Add help message for missing blobs](http://lore.kernel.org/u-boot/20230216011945.4833-1-rick@andestech.com/) + + Add the 'missing-msg' for more detailed output + on missing system firmware. + +* [Please pull u-boot-dm into -next](http://lore.kernel.org/u-boot/CAPnjgZ1-mw_DJr1Db-4iuNXgv7o_9CPZ9KgoCK9DBDfvKVptKQ@mail.gmail.com/) + + This is for the -next branch + + https://source.denx.de/u-boot/custodians/u-boot-dm/-/pipelines/15198 + + The following changes since commit faac9dee8e0629326dc122f4624fc4897e3f38b0: + + Prepare v2023.04-rc2 (2023-02-13 18:39:15 -0500) + + are available in the Git repository at: + + git://git.denx.de/u-boot-dm.git tags/dm-next-valentine + + for you to fetch changes up to 9a8a27a76ad7ab51f19c7f019d7cdac8a3f9f3c9: + + dm: test: Add a test for the various migration combinations + (2023-02-14 09:43:27 -0700) + +* [v3: doc: arch: Add document for RISC-V architecture](http://lore.kernel.org/u-boot/20230214101851.11648-1-peterlin@andestech.com/) + + This patch adds a brief introduction to the RISC-V architecture and + the typical boot process used on a variety of RISC-V platforms. + +* [v5: dm: Move to new driver model schema for device tree tags](http://lore.kernel.org/u-boot/20230213155641.1208774-1-sjg@chromium.org/) + + Now that a new schema has been accepted upstream, press it into service in U-Boot. + +* [v3: RFC: Migrate to split config](http://lore.kernel.org/u-boot/20230212231638.1134219-1-sjg@chromium.org/) + + U-Boot uses an SPL prefix on CONFIG options to indicate when an option + relates to SPL. For example, while CONFIG_TEXT_BASE is the text base for + U-Boot proper, CONFIG_SPL_TEXT_BASE is the text base for SPL. + +## 20230212:第 33 期 + +### 内核动态 + +#### RISC-V 架构支持 + +* [v1: RISC-V: add a spin_shadow_stack declaration](http://lore.kernel.org/linux-riscv/20230210185945.915806-1-conor@kernel.org/) + + The patchwork automation reported a sparse complaint that + spin_shadow_stack was not declared and should be static: + ../arch/riscv/kernel/traps.c:335:15: warning: symbol 'spin_shadow_stack' was not declared. Should it be static? + +* [v3: irqchip/irq-sifive-plic: Add syscore callbacks for hibernation](http://lore.kernel.org/linux-riscv/20230210100122.80255-1-mason.huo@starfivetech.com/) + + The priority and enable registers of plic will be reset + during hibernation power cycle in poweroff mode, + add the syscore callbacks to save/restore those registers. + +* [v1: Add JH7110 MIPI DPHY RX support](http://lore.kernel.org/linux-riscv/20230210061713.6449-1-changhuang.liang@starfivetech.com/) + + This patchset adds power mipi dphy rx driver for the StarFive JH7110 SoC. + It use to transfer the CSI cameras data. The series has been tested on + the VisionFive 2 board. + +* [v1: clocksource/drivers/riscv: Refuse to probe on T-Head](http://lore.kernel.org/linux-riscv/20230209232302.25658-1-palmer@rivosinc.com/) + + As of d9f15a9de44a ("Revert "clocksource/drivers/riscv: Events are + stopped during CPU suspend"") this driver no longer functions correctly + for the T-Head firmware. That shouldn't impact any users, as we've got + a functioning driver that's higher priority, but let's just be safe and + ban it from probing at all. + +* [v4: RISC-V: Apply Zicboz to clear_page](http://lore.kernel.org/linux-riscv/20230209152628.129914-1-ajones@ventanamicro.com/) + + When the Zicboz extension is available we can more rapidly zero naturally + aligned Zicboz block sized chunks of memory. As pages are always page + aligned and are larger than any Zicboz block size will be, then + clear_page() appears to be a good candidate for the extension. + +* [v5: Basic pinctrl support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230209143702.44408-1-hal.feng@starfivetech.com/) + + This patch series adds basic pinctrl support for StarFive JH7110 SoC. + +* [v13: riscv, mm: detect svnapot cpu support at runtime](http://lore.kernel.org/linux-riscv/20230209131647.17245-1-panqinglin00@gmail.com/) + + Svnapot is a RISC-V extension for marking contiguous 4K pages as a non-4K + page. This patch set is for using Svnapot in hugetlb fs and huge vmap. + +* [v1: riscv: hwcap: Don't alphabetize ISA extension IDs](http://lore.kernel.org/linux-riscv/20230209123636.123537-1-ajones@ventanamicro.com/) + + While the comment above the ISA extension ID definitions says + "Entries are sorted alphabetically.", this stopped being good + advice with commit d8a3d8a75206 ("riscv: hwcap: make ISA extension + ids can be used in asm"), as we now use macros instead of enums. + +* [v12: riscv, mm: detect svnapot cpu support at runtime](http://lore.kernel.org/linux-riscv/20230209035343.15282-1-panqinglin00@gmail.com/) + + Svnapot is a RISC-V extension for marking contiguous 4K pages as a non-4K + page. This patch set is for using Svnapot in hugetlb fs and huge vmap. + +* [v1: riscv: dts: nezha-d1: add gpio-line-names](http://lore.kernel.org/linux-riscv/20230208014504.18899-1-twoerner@gmail.com/) + + Add descriptive names so users can associate specific lines with their + respective pins on the 40-pin header according to the schematics found at: + + http://dl.linux-sunxi.org/D1/D1_Nezha_development_board_schematic_diagram_20210224.pdf + +* [GIT PULL: KVM/riscv changes for 6.3](http://lore.kernel.org/linux-riscv/CAAhSdy25NgCY23u=icRgcZpEZzNgJkyEN92KEVL8D-SvUwTBXg@mail.gmail.com/) + + We have the following KVM RISC-V changes for 6.3: + 1) Fix wrong usage of PGDIR_SIZE to check page sizes + 2) Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect() + 3) Redirect illegal instruction traps to guest + 4) SBI PMU support for guest + +* [v6: KVM perf support](http://lore.kernel.org/linux-riscv/20230207095529.1787260-1-atishp@rivosinc.com/) + + This series extends perf support for KVM. The KVM implementation relies + on the SBI PMU extension and trap n emulation of hpmcounter CSRs. + The KVM implementation exposes the virtual counters to the guest and internally + manage the counters using kernel perf counters. + +* [v1: RISC-V: support some cryptography accelerations](http://lore.kernel.org/linux-riscv/20230206225846.1381789-1-heiko@sntech.de/) + + So this was my playground the last days. + + The base is v13 of the vector patchset but the first patches up to doing + the Zbc-based GCM GHash can also run without those. Of course the vector- + crypto extensions are also not ratified yet, hence the marking as RFC. + +* [v2: RISC-V Hardware Probing User Interface](http://lore.kernel.org/linux-riscv/20230206201455.1790329-1-evan@rivosinc.com/) + + These are very much up for discussion, as it's a pretty big new user + interface and it's quite a bit different from how we've historically + done things: this isn't just providing an ISA string to userspace, this + has its own format for providing information to userspace. + +* [v1: Add DMA driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230206113811.23133-1-walker.chen@starfivetech.com/) + + This patch series adds dma support for the StarFive JH7110 RISC-V SoC. + The first patch adds device tree binding. The second patch includes dma + driver. The last patch adds device node of dma to JH7110 dts. + +#### 进程调度 + +* [v3: sched/fair: sanitize vruntime of entity being placed](http://lore.kernel.org/lkml/20230209193107.1432770-1-rkagan@amazon.de/) + + When a scheduling entity is placed onto cfs_rq, its vruntime is pulled + to the base level (around cfs_rq->min_vruntime), so that the entity + doesn't gain extra boost when placed backwards. + + However, if the entity being placed wasn't executed for a long time, its + vruntime may get too far behind (e.g. while cfs_rq was executing a + low-weight hog), which can inverse the vruntime comparison due to s64 overflow. + +* [v1: livepatch,sched: Add livepatch task switching to cond_resched()](http://lore.kernel.org/lkml/cover.1675969869.git.jpoimboe@kernel.org/) + + Fix patching stalls caused by busy kthreads. + +* [v1: sched: show cpu number when sched_show_task](http://lore.kernel.org/lkml/20230208124655.2592560-1-peng.fan@oss.nxp.com/) + + It would be helpful to show cpu number when dump task. Such as + when doing system suspend, we could know the failed freezing + process run on which cpu. + +* [v1: sched: sd_llc_id initialized](http://lore.kernel.org/lkml/20230207103636.13783-1-sunshouxin@chinatelecom.cn/) + + In my test,I use isolcpus to isolate cpu for specific, + and then I noticed different scenario when core binding. + +* [v3: sched: Introduce classes of tasks for load balance](http://lore.kernel.org/lkml/20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com/) + + This is third version of this patchset. Previous versions can be found + here [1] and here [2]. For brevity, I did not include the cover letter + from the original posting. You can read it here [1]. + +* [v3: sched: pick_next_rt_entity(): check list_entry](http://lore.kernel.org/lkml/20230128-list-entry-null-check-sched-v3-1-b1a71bd1ac6b@diag.uniroma1.it/) + + Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") + removed any path which could make pick_next_rt_entity() return NULL. + However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of + pick_next_rt_entity()) still checks the error condition, which can + never happen, since list_entry() never returns NULL. + +* [v2: sched/deadline: Add more reschedule cases to prio_changed_dl()](http://lore.kernel.org/lkml/20230206140612.701871-1-vschneid@redhat.com/) + + On that kernel, it is quite easy to trigger using rt-tests's deadline_test + [1] with the test running on isolated CPUs (this reduces the chance of + something unrelated setting TIF_NEED_RESCHED on the idle tasks, making the + issue even more obvious as the hung task detector chimes in). + +#### 内存管理 + +* [GIT PULL: memblock: Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."](http://lore.kernel.org/linux-mm/Y+dqPRXSqoP1x7u5@kernel.org/) + + The following changes since commit 4ec5183ec48656cec489c49f989c508b68b518e3: + + Linux 6.2-rc7 (2023-02-05 13:13:28 -0800) + + are available in the Git repository at: + + https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock tags/fixes-2023-02-11 + +* [v1: New arch interfaces for manipulating multiple pages](http://lore.kernel.org/linux-mm/20230211033948.891959-1-willy@infradead.org/) + + Here's my latest draft of a new set of page table manipulation APIs. I've + only done alpha, arc and x86 (other than x86, I'm going alphabetically). + Before I go much further, some feedback might be a good idea. Or if + someone wants to volunteer to do their architecture ;-) + +* [v1: psi: reduce min window size to 50ms](http://lore.kernel.org/linux-mm/8b7a3270fe253de1cd2b71473e29394409b2a0f7.1676067791.git.quic_sudaraja@quicinc.com/) + + Few systems would require much finer-grained tracking of memory + pressure in the system using PSI mechanism. Reduce the minimum + allowable window size to be 50ms to increase the sampling rate + of PSI monitor for much faster response and reaction to memory + pressures in the system. With 50ms window size, the smallest + resolution of memory pressure that can be tracked is now 5ms. + +* [v1: mm: add tracepoints to ksm](http://lore.kernel.org/linux-mm/20230210214645.2720847-1-shr@devkernel.io/) + + This adds the following tracepoints to ksm: + - start / stop scan + - ksm enter / exit + - merge a page + - merge a page with ksm + - remove a page + - remove a rmap item + + This patch has been split off from the RFC patch series "mm: + process/cgroup ksm support". + +* [v2: bpf-next: bpf, mm: introduce cgroup.memory=nobpf](http://lore.kernel.org/linux-mm/20230210154734.4416-1-laoar.shao@gmail.com/) + + The bpf memory accouting has some known problems in contianer + environment, + + - The container memory usage is not consistent if there's pinned bpf + program + After the container restart, the leftover bpf programs won't account + to the new generation, so the memory usage of the container is not + consistent. This issue can be resolved by introducing selectable + memcg, but we don't have an agreement on the solution yet. + +* [v1: mm/memcg: Skip high limit check in root memcg](http://lore.kernel.org/linux-mm/20230210094550.5125-1-haifeng.xu@shopee.com/) + + The high limit checks the memory usage from given memcg to root memcg. + However, there is no limit in root memcg. So this check makes no sense + and we can ignore it. + +* [v2: fold per-CPU vmstats remotely](http://lore.kernel.org/linux-mm/20230209150150.380060673@redhat.com/) + + This is done using cmpxchg to manipulate the counters, + both CPU locally (via the account functions), + and remotely (via cpu_vm_stats_fold). + + Thanks to Aaron Tomlin for diagnosing issue 1 and writing + the initial patch series. + +* [v1: Writeback handling of pinned pages](http://lore.kernel.org/linux-mm/20230209121046.25360-1-jack@suse.cz/) + + since we are slowly getting into a state where folios used as buffers for + [R]DMA are detectable by folio_maybe_dma_pinned(), I figured it is time we also + address the original problems filesystems had with these pages [1] - namely + that page/folio private data can get reclaimed from the page while it is being + written to by the DMA and also that page contents can be modified while the + page is under writeback. + +* [v13: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-mm/20230209102954.528942-1-dhowells@redhat.com/) + + Here are patches to provide support for extracting pages from an iov_iter + and to use this in the extraction functions in the block layer bio code. + +* [v2: mm/page_alloc: optimize find_suitable_fallback() and fallbacks array](http://lore.kernel.org/linux-mm/20230209101144.496144-1-yajun.deng@linux.dev/) + + There is no need to execute the next loop if it not return in the first + loop. So add a break at the end of the loop. + + At the same time, add !migratetype_is_mergeable() before the loop and + reduce the first index size from MIGRATE_TYPES to MIGRATE_PCPTYPES in + fallbacks array. + +* [v1: mm/page_alloc: optimize the loop in find_suitable_fallback()](http://lore.kernel.org/linux-mm/20230209024435.3392916-1-yajun.deng@linux.dev/) + + There is no need to execute the next loop if it not return in the first + loop. So add a break at the end of the loop. + + There are only three rows in fallbacks, so reduce the first index size + from MIGRATE_TYPES to MIGRATE_PCPTYPES. + +* [v1: Revert "slub: force on no_hash_pointers when slub_debug is enabled"](http://lore.kernel.org/linux-mm/20230208194712.never.999-kees@kernel.org/) + + Linking no_hash_pointers() to slub_debug has had a chilling effect + on using slub_debug features for security hardening, since system + builders are forced to choose between redzoning and heap address location + exposures. Instead, just require that the "no_hash_pointers" boot param + needs to be used to expose pointers during slub_debug reports. + +* [v1: Prevent ->map_pages from sleeping](http://lore.kernel.org/linux-mm/20230208145335.307287-1-willy@infradead.org/) + + In preparation for a larger patch series which will handle (some, easy) + page faults protected only by RCU, change the two filesystems which have + sleeping locks to not take them and hold the RCU lock around calls to + ->map_page to prevent other filesystems from adding sleeping locks. + +* [v1: Memory access profiler(IBS) driven NUMA balancing](http://lore.kernel.org/linux-mm/20230208073533.715-1-bharata@amd.com/) + + Some hardware platforms can provide information about memory accesses + that can be used to do optimal page and task placement on NUMA + systems. AMD processors have a hardware facility called Instruction- + Based Sampling (IBS) that can be used to gather specific metrics + related to instruction fetch and execution activity. + +* [v1: mm/damon/sysfs: make kobj_type structures constant](http://lore.kernel.org/linux-mm/20230207-kobj_type-damon-v1-1-9d4fea6a465b@weissschuh.net/) + + Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") + the driver core allows the usage of const struct kobj_type. + + Take advantage of this to constify the structure definitions to prevent + modification at runtime. + +* [v1: mm: kfence: export kfence_enabled as global variables](http://lore.kernel.org/linux-mm/1675750519-1064-1-git-send-email-quic_zhenhuah@quicinc.com/) + + Export the variable to ease the judgement of whether kfence enabled + at runtime. It should be more precise than through kernel config + "CONFIG_KFENCE". + +* [v1: tmpfs: add the option to disable swap](http://lore.kernel.org/linux-mm/20230207025259.2522793-1-mcgrof@kernel.org/) + + Many folks suggest using tmpfs is not great because it can use swap. + That's not a good reason to *not* use tmpfs, what's just missing is just + the option to let you disable it. And so this does that, to enable that + and also let users experiment with it. + +#### 文件系统 + +* [v1: io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF](http://lore.kernel.org/linux-fsdevel/20230210153212.733006-1-ming.lei@redhat.com/) + + Add two OPs which buffer is retrieved via kernel splice for supporting + fuse/ublk zero copy. + + The 1st patch enhances direct pipe & splice for moving pages in kernel, + so that the two added OPs won't be misused, and avoid potential security hole. + +* [v1: zonefs: make kobj_type structure constant](http://lore.kernel.org/linux-fsdevel/20230210-kobj_type-zonefs-v1-1-9a9c5b40e037@weissschuh.net/) + + Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") + the driver core allows the usage of const struct kobj_type. + + Take advantage of this to constify the structure definition to prevent + modification at runtime. + +* [v1: Add the test_dummy_encryption key on-demand](http://lore.kernel.org/linux-fsdevel/20230208062107.199831-1-ebiggers@kernel.org/) + + This series eliminates the call to fscrypt_destroy_keyring() from + __put_super(), which is causing confusion because it looks like (but + actually isn't) a sleep-in-atomic bug. See the thread "block: sleeping + in atomic warnings", i.e. + https://lore.kernel.org/linux-fsdevel/CAHk-=wg6ohuyrmLJYTfEpDbp2Jwnef54gkcpZ3-BYgy4C6UxRQ@mail.gmail.com + and its responses. + +* [v12: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-fsdevel/20230207171305.3716974-1-dhowells@redhat.com/) + + Here are patches to provide support for extracting pages from an iov_iter + and to use this in the extraction functions in the block layer bio code. + +* [v4: Introduce Copy-On-Write to Page Table](http://lore.kernel.org/linux-fsdevel/20230207035139.272707-1-shiyn.lin@gmail.com/) + + [RUN] vmsplice() + unmap in child ... with hugetlb (2048 kB) + not ok 33 No leak from parent into child + + See the more information about anon cow hugetlb tests: + https://patchwork.kernel.org/project/linux-mm/patch/20220927110120.106906-5-david@redhat.com/ + +* [v1: vfs: Delay root FS switch after UMH completion](http://lore.kernel.org/linux-fsdevel/20230206171032.12801-1-mkoutny@suse.com/) + + We want to make sure no UMHs started with an old root survive into the + world with the new root (they may fail when it is not expected). + Therefore, insert a wait for existing UMHs termination (this assumes UMH + runtime is finite). + +* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) + + This removes the dependency on interrupts to wake up task. Set task + state as TASK_RUNNING, if need_resched() returns true, + while polling for IO completion. + +#### 网络设备 + +* [v1: net-next: sock_map: dump socket map id via diag](http://lore.kernel.org/netdev/20230211201954.256230-1-xiyou.wangcong@gmail.com/) + + Currently there is no way to know which sockmap a socket has been added + to from outside, especially for that a socket can be added to multiple + sockmap's. We could dump this via socket diag, as shown below. + +* [v2: net-next: net: dsa: mt7530: add support for changing DSA master](http://lore.kernel.org/netdev/20230211184101.651462-1-richard@routerhints.com/) + + Add support for changing the master of a port on the MT7530 DSA subdriver. + +* [v5: net: ethernet: mtk_eth_soc: various enhancements](http://lore.kernel.org/netdev/cover.1676128246.git.daniel@makrotopia.org/) + + This series brings a variety of fixes and enhancements for mtk_eth_soc, + adds support for the MT7981 SoC and facilitates sharing the SGMII PCS + code between mtk_eth_soc and mt7530. + +* [v3: net-next: net: wwan: tmi: PCIe driver for MediaTek M.2 modem](http://lore.kernel.org/netdev/20230211083732.193650-1-yanchao.yang@mediatek.com/) + + TMI(T-series Modem Interface) is the PCIe host device driver for MediaTek's + modem. The driver uses the WWAN framework infrastructure to create the + following control ports and network interfaces for data transactions. + +* [v1: net-next: net: Kconfig.debug: wrap socket refcnt debug into an option](http://lore.kernel.org/netdev/20230211065153.54116-1-kerneljasonxing@gmail.com/) + + Since commit 463c84b97f24 ("[NET]: Introduce inet_connection_sock") + commented out the definition of SOCK_REFCNT_DEBUG and later another + patch deleted it, we need to enable it through defining it manually + somewhere. Wrapping it into an option in Kconfig.debug could make + it much clearer and easier for some developers to do things based on this change. + +* [v1: net: make kobj_type structures constant](http://lore.kernel.org/netdev/20230211-kobj_type-net-v1-0-e3bdaa5d8a78@weissschuh.net/) + + Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") + the driver core allows the usage of const struct kobj_type. + + Take advantage of this to constify the structure definitions to prevent + modification at runtime. + +* [v4: net-next: ionic: on-chip descriptors](http://lore.kernel.org/netdev/20230211005017.48134-1-shannon.nelson@amd.com/) + + We start with a couple of house-keeping patches that were originally + presented for 'net', then we add support for on-chip descriptor rings + for tx-push, as well as adding support for rx-push. + +* [v1: net-next: selftests: forwarding: add a test for MAC Merge layer](http://lore.kernel.org/netdev/20230210221243.228932-1-vladimir.oltean@nxp.com/) + + The MAC Merge layer (IEEE 802.3-2018 clause 99) does all the heavy + lifting for Frame Preemption (IEEE 802.1Q-2018 clause 6.7.2), a TSN + feature for minimizing latency. + + Preemptible traffic is different on the wire from normal traffic in + incompatible ways. + +* [v1: iproute2-next: tc: m_ct: add support for helper](http://lore.kernel.org/netdev/ab1e6bfbefff74b2b4fe230162b198c38cf5b394.1676065393.git.lucien.xin@gmail.com/) + + This patch is to add the setup and dump for helper in tc ct action + in userspace, and the support in kernel was added in: + + https://lore.kernel.org/netdev/cover.1667766782.git.lucien.xin@gmail.com/ + +* [v1: net-next: net/sched: transition actions to pcpu stats and rcu](http://lore.kernel.org/netdev/20230210202725.446422-1-pctammela@mojatatu.com/) + + Following the work done for act_pedit[0], transition the remaining tc + actions to percpu stats and rcu, whenever possible. + Percpu stats make updating the action stats very cheap, while combining + it with rcu action parameters makes it possible to get rid of the per + action lock in the datapath. + +* [v1: net: net/sched: act_ctinfo: use percpu stats](http://lore.kernel.org/netdev/20230210200824.444856-1-pctammela@mojatatu.com/) + + The tc action act_ctinfo was using shared stats, fix it to use percpu stats + since bstats_update() must be called with locks or with a percpu pointer argument. + +* [v4: spi: Add support for stacked/parallel memories](http://lore.kernel.org/netdev/20230210193647.4159467-1-amit.kumar-mahapatra@amd.com/) + + This patch is in the continuation to the discussions which happened on + 'commit f89504300e94 ("spi: Stacked/parallel memories bindings")' for + adding dt-binding support for stacked/parallel memories. + +* [v1: net-next: net: ipa: determine GSI register offsets differently](http://lore.kernel.org/netdev/20230210193655.460225-1-elder@linaro.org/) + + This series changes the way GSI register offset are specified, using + the "reg" mechanism currently used for IPA registers. A follow-on + series will extend this work so fields within GSI registers are also + specified this way. + +* [v1: bpf-next: net: lan966x: set xdp_features flag](http://lore.kernel.org/netdev/01f4412f28899d97b0054c9c1a63694201301b42.1676055718.git.lorenzo@kernel.org/) + + Set xdp_features netdevice flag if lan966x nic supports xdp mode. + +* [v1: net-next: net: pcs: tse: port to pcs-lynx](http://lore.kernel.org/netdev/20230210190949.1115836-1-maxime.chevallier@bootlin.com/) + + When submitting the initial driver for the Altera TSE PCS, Russell King + noted that the register layout for the TSE PCS is very similar to the + Lynx PCS. The main difference being that TSE PCS's register space is + memory-mapped, whereas Lynx's is exposed over MDIO. + +* [v1: net: ethernet: efct Add x3 ethernet driver](http://lore.kernel.org/netdev/20230210130321.2898-1-h.jain@amd.com/) + + This patch series adds new ethernet network driver for Alveo X3522[1]. + X3 is a low-latency NIC with an aim to deliver the lowest possible + latency. It accelerates a range of diverse trading strategies + and financial applications. + + [1] https://www.xilinx.com/x3 + +* [v1: net-next: devlink: don't allow to change net namespace for FW_ACTIVATE reload action](http://lore.kernel.org/netdev/20230210115827.3099567-1-jiri@resnulli.us/) + + The change on network namespace only makes sense during re-init reload + action. For FW activation it is not applicable. So check if user passed + an ATTR indicating network namespace change request and forbid it. + +* [v5: Introduce ICSSG based ethernet Driver](http://lore.kernel.org/netdev/20230210114957.2667963-1-danishanwar@ti.com/) + + The Programmable Real-time Unit and Industrial Communication Subsystem + Gigabit (PRU_ICSSG) is a low-latency microcontroller subsystem in the TI + SoCs. This subsystem is provided for the use cases like the implementation + of custom peripheral interfaces, offloading of tasks from the other + processor cores of the SoC, etc. + +* [v1: b43legacy: Add checking for null for ssb_get_devtypedata(dev)](http://lore.kernel.org/netdev/20230210111228.370513-1-n.petrova@fintech.ru/) + + Function ssb_get_devtypedata(dev) may return null (next call + B43legacy_WARN_ON(!wl) is used for error handling, including null-value). + Therefore, a check is added before calling b43legacy_wireless_exit(), + where the argument containing this value is expected to be dereferenced. + +* [v1: net-next: net: micrel: Add PHC support for lan8841](http://lore.kernel.org/netdev/20230210102701.703569-1-horatiu.vultur@microchip.com/) + + Add support for PHC and timestamping operations for the lan8841 PHY. + PTP 1-step and 2-step modes are supported, over Ethernet and UDP both + ipv4 and ipv6. + +* [v1: net-next: nfp: ethtool: supplement nfp link modes supported](http://lore.kernel.org/netdev/20230210095319.603867-1-simon.horman@corigine.com/) + + Add support for the following modes to the nfp driver: + + NFP_MEDIA_10GBASE_LR + NFP_MEDIA_25GBASE_LR + NFP_MEDIA_25GBASE_ER + + These modes are supported by the hardware and, + support for them was recently added to firmware. + +* [v1: bpf: selftests/bpf: enable mptcp before testing](http://lore.kernel.org/netdev/20230210093205.1378597-1-liuhangbin@gmail.com/) + + Some distros may not enable mptcp by default. Enable it before start the + mptcp server. To use the {read/write}_int_sysctl() functions, I moved + them to test_progs.c + +* [v1: bpf-next: selftests/bpf: Cross-compile bpftool](http://lore.kernel.org/netdev/20230210084326.1802597-1-bjorn@kernel.org/) + + When the BPF selftests are cross-compiled, only the a host version of + bpftool is built. This version of bpftool is used to generate various + intermediates, e.g., skeletons. + +* [v1: net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path](http://lore.kernel.org/netdev/2f74aab82a40e4c11c91ccba40f5b620f6cb209c.camel@gmail.com/) + + syzbot reported that act_len in kalmia_send_init_packet() is + uninitialized when passing it to the first usb_bulk_msg error path. Jiri + Pirko noted that it's pointless to pass it in the error path, and that + the value that would be printed in the second error path would be the + value of act_len from the first call to usb_bulk_msg.[1] + +* [v1: net-next: xsk: support use vaddr as ring](http://lore.kernel.org/netdev/20230210021232.108211-1-xuanzhuo@linux.alibaba.com/) + + When we try to start AF_XDP on some machines with long running time, due + to the machine's memory fragmentation problem, there is no sufficient + continuous physical memory that will cause the start failure. + +* [v1: iproute2-next: iplink: support IPv4 BIG TCP](http://lore.kernel.org/netdev/cover.1675985919.git.lucien.xin@gmail.com/) + + Patch 1 fixes some typos in the documents, and Patch 2 adds two + attributes to allow userspace to enable IPv4 BIG TCP. + +#### 安全增强 + +* [v1: ASoC: Intel: Skylake: Replace 1-element array with flex-array](http://lore.kernel.org/linux-hardening/20230210051447.never.204-kees@kernel.org/) + + The kernel is globally removing the ambiguous 0-length and 1-element + arrays in favor of flexible arrays, so that we can gain both compile-time + and run-time array bounds checking[1]. In this instance, struct + skl_cpr_cfg contains struct skl_cpr_gtw_cfg, which defined "config_data" + as a 1-element array. + +* [v1: bpf: Deprecate "data" member of bpf_lpm_trie_key](http://lore.kernel.org/linux-hardening/20230209192337.never.690-kees@kernel.org/) + + The kernel is globally removing the ambiguous 0-length and 1-element + arrays in favor of flexible arrays, so that we can gain both compile-time + and run-time array bounds checking[1]. + +* [v1: RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size](http://lore.kernel.org/linux-hardening/20230208232549.never.139-kees@kernel.org/) + + Clang can do some aggressive inlining, which provides it with greater + visibility into the sizes of various objects that are passed into + helpers. Specifically, compare_netdev_and_ip() can see through the type + given to the "sa" argument, which means it can generate code for "struct + sockaddr_in" that would have been passed to ipv6_addr_cmp() (that expects + to operate on the larger "struct sockaddr_in6"), which would result in a + compile-time buffer overflow condition detected by memcmp(). + +* [v1: randstruct: disable Clang 15 support](http://lore.kernel.org/linux-hardening/20230208065133.220589-1-ebiggers@kernel.org/) + + The randstruct support released in Clang 15 is unsafe to use due to a + bug that can cause miscompilations: "-frandomize-layout-seed + inconsistently randomizes all-function-pointers structs" + (https://github.com/llvm/llvm-project/issues/60349). It has been fixed + on the Clang 16 release branch, so add a Clang version check. + +* [v3: next: scsi: smartpqi: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/Y+LJz%2Fr6+UeLqnV3@work/) + + One-element arrays are deprecated, and we are replacing them with flexible + array members instead. So, replace one-element array with flexible-array + member in struct report_log_lun_list. + +* [v2: pstore/blk: Export a method to implemente panic_write()](http://lore.kernel.org/linux-hardening/20230206061813.44506-1-victor@allwinnertech.com/) + + The panic_write() is necessary to write the pstore frontend message + to blk devices when panic. Here is a way to register panic_write when + we use "best_effort" way to register the pstore blk-backend. + +* [v1: media: imx-jpeg: Bounds check sizeimage access](http://lore.kernel.org/linux-hardening/20230204183804.never.323-kees@kernel.org/) + + The call of mxc_jpeg_get_plane_size() from mxc_jpeg_dec_irq() sets + plane_no argument to 1. + + Silence the warning by bounds checking comp_planes for future robustness. + +* [v1: scsi: mpi3mr: Replace 1-element array with flex-array](http://lore.kernel.org/linux-hardening/20230204183715.never.937-kees@kernel.org/) + + Nothing else defined MPI3_NVME_ENCAP_CMD_MAX, so the "command" + buffer was being defined as a fake flexible array of size 1. Replace + this with a proper flex array. Avoids this GCC 13 warning under + -fstrict-flex-arrays=3: + +* [v1: usb: host: xhci: mvebu: Iterate over array indexes instead of using pointer math](http://lore.kernel.org/linux-hardening/20230204183651.never.663-kees@kernel.org/) + + Walking the dram->cs array was seen as accesses beyond the first array + item by the compiler. Instead, use the array index directly. This allows + for run-time bounds checking under CONFIG_UBSAN_BOUNDS as well. + +* [v1: btrfs: sysfs: Handle NULL return values](http://lore.kernel.org/linux-hardening/20230204183510.never.909-kees@kernel.org/) + + Each of to_fs_info(), discard_to_fs_info(), and to_space_info() can + return NULL values. Check for these so it's not possible to perform + calculations against NULL pointers. + +* [v1: jfs: Use unsigned variable for length calculations](http://lore.kernel.org/linux-hardening/20230204183355.never.877-kees@kernel.org/) + + To avoid confusing the compiler about possible negative sizes, switch + "ssize" which can never be negative from int to u32. + +* [v1: bpf: Replace bpf_lpm_trie_key 0-length array with flexible array](http://lore.kernel.org/linux-hardening/20230204183241.never.481-kees@kernel.org/) + + This includes fixing the selftest which was incorrectly using a + variable length struct as a header, identified earlier[1]. Avoid this + by just explicitly including the prefixlen member instead of struct + bpf_lpm_trie_key. + + [1] https://lore.kernel.org/all/202206281009.4332AA33@keescook/ + +#### 异步 IO + +* [v8: io_uring: add napi busy polling support](http://lore.kernel.org/io-uring/20230209230144.465620-1-shr@devkernel.io/) + + This adds the napi busy polling support in io_uring.c. It adds a new + napi_list to the io_ring_ctx structure. This list contains the list of + napi_id's that are currently enabled for busy polling. This list is + used to determine which napi id's enabled busy polling. For faster + access it also adds a hash table. + +* [v1: for-next: io_uring: mark task TASK_RUNNING before handling resume/task work](http://lore.kernel.org/io-uring/fdbc0707-ace4-d565-402a-4927fe0b9947@kernel.dk/) + + Just like for task_work, set the task mode to TASK_RUNNING before doing + any potential resume work. We're not holding any locks at this point, + but we may have already set the task state to TASK_INTERRUPTIBLE in + preparation for going to sleep waiting for events. + +* [v7: liburing: add api for napi busy poll](http://lore.kernel.org/io-uring/20230205002424.102422-1-shr@devkernel.io/) + + The patch series also contains the documentation for the two new functions + and two example programs. The client program is called napi-busy-poll-client + and the server program napi-busy-poll-server. The client measures the + roundtrip times of requests. + +#### Rust For Linux + +* [v1: rust: allow to use INIT_STACK_ALL_ZERO](http://lore.kernel.org/rust-for-linux/20230210172203.101331-1-andrea.righi@canonical.com/) + + This flag should be dropped in clang-17, but at the moment it seems more + reasonable to add it to the bindgen CFLAGS to prevent the error above. + + In this way we can enable CONFIG_INIT_STACK_ALL_ZERO with CONFIG_RUST + without triggering any build error. + +#### BPF + +* [v4: bpf-next: BPF rbtree next-gen datastructure](http://lore.kernel.org/bpf/20230209174144.3280955-1-davemarchevsky@fb.com/) + + This series adds a rbtree datastructure following the "next-gen + datastructure" precedent set by recently-added linked-list [0]. This is + a reimplementation of previous rbtree RFC [1] to use kfunc + kptr + instead of adding a new map type. + +* [v1: bpf-next: tools/resolve_btfids: Pass HOSTCFLAGS as EXTRA_CFLAGS to prepare targets](http://lore.kernel.org/bpf/20230209143735.4112845-1-jolsa@kernel.org/) + + Thorsten reported build issue with command line that defined extra + HOSTCFLAGS that were not passed into 'prepare' targets, but were + used to build resolve_btfids objects. + +* [v1: bpf-next: bpf: add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25](http://lore.kernel.org/bpf/1675949331-27935-1-git-send-email-alan.maguire@oracle.com/) + + v1.25 of pahole supports filtering out functions with multiple + inconsistent function prototypes or optimized-out parameters + from the BTF representation. These present problems because + there is no additional info in BTF saying which inconsistent + prototype matches which function instance to help guide + attachment, and functions with optimized-out parameters can + lead to incorrect assumptions about register contents. + +* [v1: dwarves: btf_encoder: ensure elf function representation is fully initialized](http://lore.kernel.org/bpf/1675896868-26339-1-git-send-email-alan.maguire@oracle.com/) + + new fields in BTF encoder state (used to support save and later + addition of function) of ELF function representation need to + be initialized. No need to set parameter names to NULL as + got_parameter_names guards their use. + +* [v1: bpf-next: sfc: move xdp_features configuration in efx_pci_probe_post_io()](http://lore.kernel.org/bpf/9bd31c9a29bcf406ab90a249a28fc328e5578fd1.1675875404.git.lorenzo@kernel.org/) + + Move xdp_features configuration from efx_pci_probe() to + efx_pci_probe_post_io() since it is where all the other basic netdev + features are initialised. + +* [v1: nf-next: bpf, netfilter: minimal support for bpf progs](http://lore.kernel.org/bpf/20230208160307.27534-1-fw@strlen.de/) + + Add minimal support to hook bpf programs to netfilter hooks, + e.g. PREROUTING or FORWARD. + + Hooking is currently possible for all supprted protocols, i.e. + arp, bridge, ip, ip6 and inet (both ipv4/ipv6) pseudo-family. + +* [v2: bpf-next: samples: bpf: syscall_tp: Add syscall openat2 enter/exit tracepoint](http://lore.kernel.org/bpf/tencent_9381CB1A158ED7ADD12C4406034E21A3AC07@qq.com/) + + commit fe3300897cbf("samples: bpf: fix syscall_tp due to unused syscall") + add openat() syscall trapoints, this submit support openat2(). + +* [v2: Add ftrace direct call for arm64](http://lore.kernel.org/bpf/20230207182135.2671106-1-revest@chromium.org/) + + This series adds ftrace direct call support to arm64. + This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64. + + It is meant to apply on top of the arm64 tree which contains Mark Rutland's + series on CALL_OPS [1] under the for-next/ftrace tag. + +* [v4: bpf-next: libbpf: Add sample_period to creation options](http://lore.kernel.org/bpf/20230207081916.3398417-1-arilou@gmail.com/) + + Add option to set when the perf buffer should wake up, by default the + perf buffer becomes signaled for every event that is being pushed to it. + + In case of a high throughput of events it will be more efficient to wake + up only once you have X events ready to be read. + +* [v1: samples: bpf: syscall_tp: Add syscall openat2 enter/exit tracepoint](http://lore.kernel.org/bpf/tencent_FB3E886D062242FF59A997492A3BAF2BA308@qq.com/) + + commit fe3300897cbf("samples: bpf: fix syscall_tp due to unused syscall") + add openat() syscall trapoints, this submit support openat2(). + +* [[RFC/PATCH 0/3] perf lock contention: Track lock owner (v2)](http://lore.kernel.org/bpf/20230207002403.63590-1-namhyung@kernel.org/) + + When there're many lock contentions in the system, people sometimes + want to know who caused the contention, IOW who's the owner of the locks. + + This patchset adds -o/--lock-owner option to track the owner info + if it's available. Right now, it supports mutex and rwsem as they + have the owner fields in themselves. Please see the patch 2 for the details. + +* [v2: bpf-next: net: add missing xdp_features description](http://lore.kernel.org/bpf/7878544903d855b49e838c9d59f715bde0b5e63b.1675705948.git.lorenzo@kernel.org/) + + Add missing xdp_features field description in the struct net_device + documentation. This patch fix the following warning: + + ./include/linux/netdevice.h:2375: warning: Function parameter or member 'xdp_features' not described in 'net_device' + +* [v1: bpf-next: selftests: bpf: Use BTF map in sk_assign](http://lore.kernel.org/bpf/4ebd4e68dec83863c51a9114e6507524c8feafb7.1675698070.git.fmaurer@redhat.com/) + + The sk_assign selftest uses tc to load the BPF object file for the test. If + tc is linked against libbpf 1.0+, this test failed, because the BPF file + used the legacy maps section. This approach is considered legacy by libbpf + and tc (see examples/bpf/README in the iproute2 repo). + +* [v1: bpf-next: samples: bpf: Add macro SYSCALL() for aarch64](http://lore.kernel.org/bpf/tencent_9E0636426959DE97692A50AF79A3D9888B08@qq.com/) + + kernel arm64/kernel/sys.c macro __SYSCALL() adds a prefix __arm64_, we + should support it for aarch64. The following is the output of the bpftrace + script: + + $ sudo bpftrace -l | grep sys_write + ... + kprobe:__arm64_sys_write + kprobe:__arm64_sys_writev + ... + +* [v1: net-next: NXP ENETC AF_XDP zero-copy sockets](http://lore.kernel.org/bpf/20230206100837.451300-1-vladimir.oltean@nxp.com/) + + This is RFC because I have a few things I'm not 100% certain about. + I've tested this with the xdpsock test application, I don't have very + detailed knowledge about the internals of AF_XDP sockets. + + Patches where I'd appreciate if people took a look are 02/11, 05/11, + +### 周边技术动态 + +#### Qemu + +* [v1: target/riscv: avoid env_archcpu() in cpu_get_tb_cpu_state()](http://lore.kernel.org/qemu-devel/20230210123836.506286-1-dbarboza@ventanamicro.com/) + + We have a RISCVCPU *cpu pointer available at the start of the function. + +* [v1: target/riscv: Smepmp: Skip applying default rules when address matches](http://lore.kernel.org/qemu-devel/20230209055206.229392-1-hchauhan@ventanamicro.com/) + + When MSECCFG.MML is set, after checking the address range in PMP if the + asked permissions are not same as programmed in PMP, the default + permissions are applied. This should only be the case when there + is no matching address is found. + +* [v1: MAINTAINERS: Add some RISC-V reviewers](http://lore.kernel.org/qemu-devel/20230209003308.738237-1-alistair.francis@opensource.wdc.com/) + + This patch adds some active RISC-V members as reviewers to the + MAINTAINERS file. + +* [v1: hw/riscv: virt: Simplify virt_{get,set}_aclint()](http://lore.kernel.org/qemu-devel/20230206085007.3618715-1-bmeng@tinylab.org/) + + There is no need to declare an intermediate "MachineState *ms". + +* [v1: configure: normalize riscv* cpu types too](http://lore.kernel.org/qemu-devel/20230204112502.2558739-1-mjt@msgid.tls.msk.ru/) + + For most CPU types out there, ./configure normalizes all + variations into base form plus, optionally, variations, + to find the proper arch-specific code. + +#### Buildroot + +* [package/wolfssl: disable assembly when not supported](http://lore.kernel.org/buildroot/20230207213844.DCD8484378@busybox.osuosl.org/) + + commit: https://git.buildroot.net/buildroot/commit/?id=d8dc5315eb712eca0a5cbf793a6714a47ab6e57e + branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master + + wolfssl contains some assembly code and its configure.ac script + enables the assembly code depending on the CPU architecture. However, + the detection logic is not sufficient and leads to using the assembly + code in situation where it should not. + +* [package/python-spake2: new package](http://lore.kernel.org/buildroot/20230207132558.94BFC84150@busybox.osuosl.org/) + + commit: https://git.buildroot.net/buildroot/commit/?id=9aaef2a07780a200512ccadb2381c559c4ffd8e6 + branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master + + SPAKE2 password-authenticated key exchange (in pure python). + +* [package/python-hkdf: new package](http://lore.kernel.org/buildroot/20230207115045.1C17C840AE@busybox.osuosl.org/) + + commit: https://git.buildroot.net/buildroot/commit/?id=433ce2966f787248d5b8a62c46634e2f86250f8e + branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master + + HMAC-based Extract-and-Expand Key Derivation Function (HKDF). + +* [package/rdma-core: new package](http://lore.kernel.org/buildroot/20230205125237.A6202837ED@busybox.osuosl.org/) + + commit: https://git.buildroot.net/buildroot/commit/?id=ea47e177f093d7378e8e8e1f50d6f4e3fce0a088 + branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master + +#### U-Boot + +* [v1: riscv: add sbi v0.2 or later support](http://lore.kernel.org/u-boot/CAJ8bkywohgE_njAYmLJUUDpHW2+R=NYPSqp1G0rCksrCHCdpDw@mail.gmail.com/) + + add rfence and ipi extension for sbi v0.2 or later. sbi_ipi add support + for sbi v0.2 or later. This can make sbi_ipi break through the limit that + the number of cores needs to be less than or equal to xlen + +* [v1: semihosting: use assembly conduit functions](http://lore.kernel.org/u-boot/20230207152105.2167641-1-andre.przywara@arm.com/) + + to trigger the actual semihosting action in the debugger, we used some + carefully constructed inline assembly sequence. This was motivated by + the trigger being really just a single instruction, so originally this + could be neatly inlined by the compiler. + However we now have a separate function anyway, so inlining is no longer + happening. On top of that the inline assembly was really fragile and + hard to read. + +* [v3: RFC: Migrate to split config](http://lore.kernel.org/u-boot/20230206190550.1692420-1-sjg@chromium.org/) + + U-Boot uses an SPL prefix on CONFIG options to indicate when an option + relates to SPL. For example, while CONFIG_TEXT_BASE is the text base for + U-Boot proper, CONFIG_SPL_TEXT_BASE is the text base for SPL. + +## 20230205:第 32 期 + +### 内核动态 + +#### RISC-V 架构支持 + +* [v5: KVM perf support](http://lore.kernel.org/linux-riscv/20230205011515.1284674-1-atishp@rivosinc.com/) + + This series extends perf support for KVM. The KVM implementation relies + on the SBI PMU extension and trap n emulation of hpmcounter CSRs. + The KVM implementation exposes the virtual counters to the guest and internally + manage the counters using kernel perf counters. + +* [v4: Basic pinctrl support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230203141801.59083-1-hal.feng@starfivetech.com/) + + This patch series adds basic pinctrl support for StarFive JH7110 SoC. + +* [v3: StarFive's SDIO/eMMC driver support](http://lore.kernel.org/linux-riscv/20230203081913.81968-1-william.qiu@starfivetech.com/) + + This patchset adds initial rudimentary support for the StarFive + designware mobile storage host controller driver. And this driver will + be used in StarFive's VisionFive 2 board. The main purpose of adding + this driver is to accommodate the ultra-high speed mode of eMMC. + +* [v4: RISC-V kasan rework](http://lore.kernel.org/linux-riscv/20230203075232.274282-1-alexghiti@rivosinc.com/) + + As described in patch 2, our current kasan implementation is intricate, + so I tried to simplify the implementation and mimic what arm64/x86 are doing. + +* [v1: Documentation: RISC-V: Define Xlinuxs{s,m}aia](http://lore.kernel.org/linux-riscv/20230203001201.14770-1-palmer@rivosinc.com/) + + The AIA specification was only partially frozen, but provides no way to + refer to the subset of behavior that has been frozen. It seems like + there's not a whole lot of interest in the non-frozen behavior, so let's + just define an extension that only consists of the frozen behavior + +* [v1: RISC-V: Only provide the single-letter extensions in HWCAP](http://lore.kernel.org/linux-riscv/20230202233832.11036-1-palmer@rivosinc.com/) + + The recent refactoring led to us leaking some HWCAP bits to userspace + that didn't make much sense. With any luck we'll have a better scheme + soon, but for now just mask off those bits to avoid polluting userspace. + +* [v3: spi: Add support for stacked/parallel memories](http://lore.kernel.org/linux-riscv/20230202152258.512973-1-amit.kumar-mahapatra@amd.com/) + + This patch is in the continuation to the discussions which happened on + 'commit f89504300e94 ("spi: Stacked/parallel memories bindings")' for + adding dt-binding support for stacked/parallel memories. + +* [v1: RESEND: dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx](http://lore.kernel.org/linux-riscv/20230202072814.319903-1-uwu@icenowy.me/) + + T-Head C906/C910 CLINT is not compliant to SiFive ones (and even not + compliant to the newcoming ACLINT spec) because of lack of mtime register. + +* [v1: clocksource: riscv: Patch riscv_clock_next_event() jump before first use](http://lore.kernel.org/linux-riscv/512FC581-4097-4433-9C3D-CBCB7CD61954@rivosinc.com/) + + A static key is used to select between SBI and Sstc timer usage in + riscv_clock_next_event(), but currently the direction is resolved + after cpuhp_setup_state() is called (which sets the next event). + +* [v1: riscv: disable generation of unwind tables](http://lore.kernel.org/linux-riscv/mvmzg9xybqu.fsf@suse.de/) + + GCC 13 will enable -fasynchronous-unwind-tables by default on riscv. In + the kernel, we don't have any use for unwind tables yet, so disable them. + More importantly, the .eh_frame section brings relocations + (R_RISC_32_PCREL, R_RISCV_SET{6,8,16}, R_RISCV_SUB{6,8,16}) into modules + that we are not prepared to handle. + +* [v3: riscv: mm: hugetlb: Enable ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP](http://lore.kernel.org/linux-riscv/20230201015259.3222524-1-guoren@kernel.org/) + + Add HVO support for RISC-V; see commit 6be24bed9da3 ("mm: hugetlb: + introduce a new config HUGETLB_PAGE_FREE_VMEMMAP"). This patch is + similar to commit 1e63ac088f20 ("arm64: mm: hugetlb: enable + HUGETLB_PAGE_FREE_VMEMMAP for arm64"), and riscv's motivation is the same as arm64. + +* [v4: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230131151115.1972740-1-alexghiti@rivosinc.com/) + + This new version gets rid of the limitation that prevented KASAN kernels + to use the newly introduced parameters. + + While looking into KASLR, I fell onto commit aacd149b6238 ("arm64: head: + avoid relocating the kernel twice for KASLR"): it allows to use the fdt + functions very early in the boot process with KASAN enabled by simply + compiling a new version of those functions without instrumentation. + +* [v1: Add basic ACPI support for RISC-V](http://lore.kernel.org/linux-riscv/20230130182225.2471414-1-sunilvl@ventanamicro.com/) + + This patch series enables the basic ACPI infrastructure for RISC-V. + Supporting external interrupt controllers is in progress and hence it is + tested using polling based HVC SBI console and RAM disk. + +* [v3: RISC-V: Apply Zicboz to clear_page](http://lore.kernel.org/linux-riscv/20230130120128.1349464-1-ajones@ventanamicro.com/) + + When the Zicboz extension is available we can more rapidly zero naturally + aligned Zicboz block sized chunks of memory. As pages are always page + aligned and are larger than any Zicboz block size will be, then + clear_page() appears to be a good candidate for the extension. + +* [v2: Change PWM-controlled LED pin active mode and algorithm](http://lore.kernel.org/linux-riscv/20230130093229.27489-1-nylon.chen@sifive.com/) + + According to the circuit diagram of User LEDs - RGB described in the + manual hifive-unleashed-a00.pdf[0] and hifive-unmatched-schematics-v3.pdf[1]. + The behavior of PWM is acitve-high. + +* [v2: riscv: mm: Implement pmdp_collapse_flush for THP](http://lore.kernel.org/linux-riscv/20230130074815.1694055-1-mchitale@ventanamicro.com/) + + When THP is enabled, 4K pages are collapsed into a single huge + page using the generic pmdp_collapse_flush() which will further + use flush_tlb_range() to shoot-down stale TLB entries. + +* [v2: mm, arch: add generic implementation of pfn_valid() for FLATMEM](http://lore.kernel.org/linux-riscv/20230129124235.209895-1-rppt@kernel.org/) + + Every architecture that supports FLATMEM memory model defines its own + version of pfn_valid() that essentially compares a pfn to max_mapnr. + +* [v1: riscv: Add header include guards to insn.h](http://lore.kernel.org/linux-riscv/20230129094242.282620-1-liaochang1@huawei.com/) + + Add header include guards to insn.h to prevent repeating declaration of + any identifiers in insn.h. + +* [v1: riscv: support arch_has_hw_pte_young()](http://lore.kernel.org/linux-riscv/20230129064956.143664-1-tjytimi@163.com/) + + The arch_has_hw_pte_young() is false for riscv by default. If it's + false, page table walk is almost skipped for MGLRU reclaim. And it + will also cause useless step in __wp_page_copy_user(). + +#### 进程调度 + +* [v1: sched/isolation: Prep work for pcp cache draining isolation](http://lore.kernel.org/lkml/20230203232409.163847-1-frederic@kernel.org/) + + For reference: https://lore.kernel.org/lkml/20230125073502.743446-1-leobras@redhat.com/ + And the latest proposal: https://lore.kernel.org/lkml/Y90mZQhW89HtYfT9@dhcp22.suse.cz/ + +* [v1: cpu,sched: Mark arch_cpu_idle_dead() __noreturn](http://lore.kernel.org/lkml/cover.1675461757.git.jpoimboe@kernel.org/) + + These are some minor changes to enable the __noreturn attribute for + arch_cpu_idle_dead(). (If there are no objections, I can merge the + entire set through the tip tree.) + + Until recently [1], in Xen, when a previously offlined CPU was brought + back online, it unexpectedly resumed execution where it left off in the + middle of the idle loop by returning from play_dead() and its caller + arch_cpu_idle_dead(). + +* [v1: sched/deadline: Add more reschedule cases to prio_changed_dl()](http://lore.kernel.org/lkml/20230202182854.3696665-1-vschneid@redhat.com/) + + On that kernel, it is quite easy to trigger using rt-tests's deadline_test + [1] with the test running on isolated CPUs (this reduces the chance of + something unrelated setting TIF_NEED_RESCHED on the idle tasks, making the + issue even more obvious as the hung task detector chimes in). + +* [v1: kernel/sched/core: adjust rt_priority accordingly when prio is changed](http://lore.kernel.org/lkml/1675245680-2811-1-git-send-email-chensong_2000@189.cn/) + + When a high priority process is acquiring a rtmutex which is held by a + low priority process, the latter's priority will be boosted up by calling + rt_mutex_setprio->__setscheduler_prio. + +* [v2: sched/numa: Enhance vma scanning](http://lore.kernel.org/lkml/cover.1675159422.git.raghavendra.kt@amd.com/) + + The patchset proposes one of the enhancements to numa vma scanning + suggested by Mel. This is continuation of [2]. Though I have removed + RFC, I do think some parts need more feedback and refinement. + + Existing mechanism of scan period involves, scan period derived from + per-thread stats. Process Adaptive autoNUMA [1] proposed to gather NUMA + fault stats at per-process level to capture aplication behaviour better. + +* [v1: sched: Consider capacity for certain load balancing decisions](http://lore.kernel.org/lkml/20230201012032.2874481-1-xii@google.com/) + + After load balancing was split into different scenarios, CPU capacity + is ignored for the "migrate_task" case, which means a thread can stay + on a softirq heavy cpu for an extended amount of time. + +* [v2: sched: pick_next_rt_entity(): checked list_entry](http://lore.kernel.org/lkml/20230128-list-entry-null-check-sched-v2-1-d8e010cce91b@diag.uniroma1.it/) + + Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") + removed any path which could make pick_next_rt_entity() return NULL. + However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of + pick_next_rt_entity()) still checks the error condition, which can + never happen, since list_entry() never returns NULL. + +#### 内存管理 + +* [v1: bpf-next: bpf, mm: introduce cgroup.memory=nobpf](http://lore.kernel.org/linux-mm/20230205065805.19598-1-laoar.shao@gmail.com/) + + So let's give the user an option to disable bpf memory accouting. + + The idea of "cgroup.memory=nobpf" is originally by Tejun[1]. + + [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/ + +* [v9: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230203190413.2559707-1-nphamcs@gmail.com/) + + There is currently no good way to query the page cache state of large + file sets and directory trees. There is mincore(), but it scales poorly: + the kernel writes out a lot of bitmap data that userspace has to + aggregate, when the user really doesn not care about per-page information + in that case. + +* [v3: folio based filemap_map_pages()](http://lore.kernel.org/linux-mm/20230203131636.1648662-1-fengwei.yin@intel.com/) + + Current filemap_map_pages() uses page granularity even when + underneath folio is large folio. Making it use folio based + granularity allows batched refcount, rmap and mm counter + update. Which brings performance gain. + +* [v1: mm/page_alloc: reduce fallbacks to (MIGRATE_PCPTYPES - 1)](http://lore.kernel.org/linux-mm/20230203100132.1627787-1-yajun.deng@linux.dev/) + + The commit 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable + pageblocks with others") has removed MIGRATE_CMA and MIGRATE_ISOLATE from + fallbacks list. so there is no need to add an element at the end of every type. + +* [v7: shoot lazy tlbs (lazy tlb refcount scalability improvement)](http://lore.kernel.org/linux-mm/20230203071837.1136453-1-npiggin@gmail.com/) + + (Sorry about the double send) + + This series improves scalability of context switching between user and + kernel threads on large systems with a threaded process spread across a lot of CPUs. + +* [v1: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230202233229.3895713-1-yosryahmed@google.com/) + + Reclaimed pages through other means than LRU-based reclaim are tracked + through reclaim_state in struct scan_control, which is stashed in + current task_struct. These pages are added to the number of reclaimed + pages through LRUs. + +* [v1: mm: memcontrol: don't account swap failures not due to cgroup limits](http://lore.kernel.org/linux-mm/20230202155626.1829121-1-hannes@cmpxchg.org/) + + Upon closer examination, this is an ARM64 machine that doesn't support + swapping out THPs. In that case, the first get_swap_page() fails, and + the kernel falls back to splitting the THP and swapping the 4k + constituents one by one. /proc/vmstat confirms this with a high rate + of thp_swpout_fallback events. + +* [v2: Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()](http://lore.kernel.org/linux-mm/20230202145030.223740842@infradead.org/) + + Since Linus hated on cmpxchg_double(), a few patches to get rid of it, as + proposed here: + + https://lkml.kernel.org/r/Y2U3WdU61FvYlpUh@hirez.programming.kicks-ass.net + + These patches are based on 6.2.0-rc6 + cryptodev-2.6, but also apply to next/master. + + Available here: + + git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git core/wip-u128 + +* [v10: Implement IOCTL to get and/or the clear info about PTEs](http://lore.kernel.org/linux-mm/20230202112915.867409-1-usama.anjum@collabora.com/) + + Historically, soft-dirty PTE bit tracking has been used in the CRIU + project. The procfs interface is enough for finding the soft-dirty bit + status and clearing the soft-dirty bit of all the pages of a process. + We have the use case where we need to track the soft-dirty PTE bit for + only specific pages on-demand. We need this tracking and clear mechanism + of a region of memory while the process is running to emulate the + getWriteWatch() syscall of Windows. + +* [v1: mm: introduce entrance for root_mem_cgroup's current](http://lore.kernel.org/linux-mm/1675312377-4782-1-git-send-email-zhaoyang.huang@unisoc.com/) + + Introducing memory.root_current for the memory charges on root_mem_cgroup. + +* [v1: mm/bpf/perf: Store build id in file object](http://lore.kernel.org/linux-mm/20230201135737.800527-1-jolsa@kernel.org/) + + This RFC patchset adds new config CONFIG_FILE_BUILD_ID option, which adds + build id object pointer to the file object when enabled. The build id is + read/populated when the file is mmap-ed. + +* [v4: mm/vmalloc: replace BUG_ON to a simple if statement](http://lore.kernel.org/linux-mm/20230201115142.GA7772@min-iamroot/) + + As per the coding standards, in the event of an abnormal condition that + should not occur under normal circumstances, the kernel should attempt + recovery and proceed with execution, rather than halting the machine. + +* [v4: mm/vmalloc.c: allow vread() to read out vm_map_ram areas](http://lore.kernel.org/linux-mm/20230201091339.61761-1-bhe@redhat.com/) + + Stephen reported vread() will skip vm_map_ram areas when reading out + /proc/kcore with drgn utility. Please see below link to get more details. + +* [v4: mm: hwposion: support recovery from ksm_might_need_to_copy()](http://lore.kernel.org/linux-mm/20230201074433.96641-1-wangkefeng.wang@huawei.com/) + + When the kernel copy a page from ksm_might_need_to_copy(), but runs + into an uncorrectable error, it will crash since poisoned page is + consumed by kernel, this is similar to the issue recently fixed by + Copy-on-write poison recovery. + +* [v1: kasan: use %zd format for printing size_t](http://lore.kernel.org/linux-mm/20230201071312.2224452-1-arnd@kernel.org/) + + The size_t type depends on the architecture, so %lu does not work + on most 32-bit ones: + + In file included from include/kunit/assert.h:13, + from include/kunit/test.h:12, + from mm/kasan/report.c:12: + mm/kasan/report.c: In function 'describe_object_addr': + include/linux/kern_levels.h:5:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] + mm/kasan/report.c:270:9: note: in expansion of macro 'pr_err' + 270 | pr_err("The buggy address is located %d bytes %s of\n" + | ^ + +* [v1: mm/khugepaged: skip shmem with armed userfaultfd](http://lore.kernel.org/linux-mm/20230201034137.2463113-1-stevensd@google.com/) + + Collapsing memory in a vma that has an armed userfaultfd results in + zero-filling any missing pages, which breaks user-space paging for those + filled pages. Avoid khugepage bypassing userfaultfd by not collapsing + pages in shmem reached via scanning a vma with an armed userfaultfd if + doing so would zero-fill any pages. + +* [v1: mm: move FOLL_PIN debug accounting under CONFIG_DEBUG_VM](http://lore.kernel.org/linux-mm/54b0b07a-c178-9ffe-b5af-088f3c21696c@kernel.dk/) + + which wasn't there before. The node page state counters are percpu, but + with a very low threshold. On my setup, every 108th update ends up + needing to punt to two atomic_lond_add()'s, which is causing this above regression. + +* [v1: mm,page_alloc,cma: configurable CMA utilization](http://lore.kernel.org/linux-mm/20230131071052.GB19285@hu-sbhattip-lv.qualcomm.com/) + + Commit 16867664936e ("mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations") + added support to use CMA pages when more than 50% of total free pages in + the zone are free CMA pages. + +* [v1: mm/gup: Add folio to list when folio_isolate_lru() succeed](http://lore.kernel.org/linux-mm/20230131063206.28820-1-Kuan-Ying.Lee@mediatek.com/) + + If we call folio_isolate_lru() successfully, we will get + return value 0. We need to add this folio to the movable_pages_list. + +* [v2: mm-unstable: Convert a couple migrate functions to use folios](http://lore.kernel.org/linux-mm/20230130214352.40538-1-vishal.moola@gmail.com/) + + This patch set introduces folio_movable_ops() and converts 3 functions + in mm/migrate.c to use folios. It also introduces + folio_get_nontail_page() for folio conversions which may want to + distinguish between head and tail pages. + +#### 文件系统 + +* [v2: Support negative dentries on case-insensitive ext4 and f2fs](http://lore.kernel.org/linux-fsdevel/20230203210039.16289-1-krisman@suse.de/) + + This patchset enables negative dentries for case-insensitive directories + in ext4/f2fs. It solves the corner cases for this feature, including + those already tested by fstests (generic/556). It also solves an + existing bug with the existing implementation where old negative + dentries are left behind after a directory conversion to case-insensitive. + +* [v2: fsdax: dax_unshare_iter() should return a valid length](http://lore.kernel.org/linux-fsdevel/1675388906-50-1-git-send-email-ruansy.fnst@fujitsu.com/) + + The copy_mc_to_kernel() will return 0 if it executed successfully. + Then the return value should be set to the length it copied. + +* [v1: RESEND: pipe: avoid creating empty pipe buffers](http://lore.kernel.org/linux-fsdevel/20230131121127.466443-1-wiktorg@google.com/) + + pipe_write cannot be called on notification pipes so + post_one_notification cannot race it. + Locking and second pipe_full check are thus redundant. + +* [v9: DEPT(Dependency Tracker)](http://lore.kernel.org/linux-fsdevel/1675154394-25598-1-git-send-email-max.byungchul.park@gmail.com/) + + Nevertheless, I apologize for the lack of document. I promise to add it + before it gets needed to use DEPT's APIs by users. For now, you can use + DEPT just with CONFIG_DEPT on. + +* [GIT PULL: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-fsdevel/3351099.1675077249@warthog.procyon.org.uk/) + + Could you consider pulling this patchset into the block tree? I think that + Al's fears wrt to pinned pages being removed from page tables causing deadlock + have been answered. Granted, there is still the issue of how to handle + vmsplice and a bunch of other places to fix, not least skbuff handling. + +* [v11: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-fsdevel/20230130074129.28120-1-naresh.kamboju@linaro.org/) + + Build test pass on arm, arm64, i386, mips, parisc, powerpc, riscv, s390, sh, + sparc and x86_64. + Boot and LTP smoke pass on qemu-arm64, qemu-armv7, qemu-i386 and qemu-x86_64. + +* [v4: RESEND: fs: coredump: using preprocessor directives for dump_emit_page](http://lore.kernel.org/linux-fsdevel/20230130013347.17654-1-xiehongyu1@kylinos.cn/) + + When CONFIG_COREDUMP is set and CONFIG_ELF_CORE is not, you'll get warnings + like: + fs/coredump.c:841:12: error: ‘dump_emit_page’ defined but not used + [-Werror=unused-function] + 841 | static int dump_emit_page(struct coredump_params *cprm, struct + page *page) + +* [v1: fscrypt: Copy the memcg information to the ciphertext page](http://lore.kernel.org/linux-fsdevel/20230129121851.2248378-1-willy@infradead.org/) + + Both f2fs and ext4 end up passing the ciphertext page to + wbc_account_cgroup_owner(). At the moment, the ciphertext page appears + to belong to no cgroup, so it is accounted to the root_mem_cgroup instead of whatever cgroup the original page was in. + +* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) + + This removes the dependency on interrupts to wake up task. Set task + state as TASK_RUNNING, if need_resched() returns true, + while polling for IO completion. + Earlier, polling task used to sleep, relying on interrupt to wake it up. + This made some IO take very long when interrupt-coalescing is enabled in NVMe. + +#### 网络设备 + +* [v2: net-next: add support for per action hw stats](http://lore.kernel.org/netdev/20230205135525.27760-1-ozsh@nvidia.com/) + + This series provides the platform to query per action stats for in_hw flows. + + The first four patches are preparation patches with no functionality change. + The fifth patch re-uses the existing flow action stats api to query action + stats for both classifier and action dumps. + The rest of the patches add per action stats support to the Mellanox driver. + +* [v1: net-next: net: move more duplicate code of ovs and tc conntrack into nf_conntrack_ovs](http://lore.kernel.org/netdev/cover.1675548023.git.lucien.xin@gmail.com/) + + We've moved some duplicate code into nf_nat_ovs in: + + "net: eliminate the duplicate code in the ct nat functions of ovs and tc" + +* [v3: net-next: tuntap: correctly initialize socket uid](http://lore.kernel.org/netdev/20230131-tuntap-sk-uid-v3-0-81188b909685@diag.uniroma1.it/) + + sock_init_data() assumes that the `struct socket` passed in input is + contained in a `struct socket_alloc` allocated with sock_alloc(). + However, tap_open() and tun_chr_open() pass a `struct socket` embedded + in a `struct tap_queue` and `struct tun_file` respectively, both + allocated with sk_alloc(). + This causes a type confusion when issuing a container_of() with + SOCK_INODE() in sock_init_data() which results in assigning a wrong + sk_uid to the `struct sock` in input. + +* [v1: net-next: vxlan: Add MDB support](http://lore.kernel.org/netdev/20230204170801.3897900-1-idosch@nvidia.com/) + + This patchset implements MDB support in the VXLAN driver, allowing it to + selectively forward IP multicast traffic to VTEPs with interested + receivers instead of flooding it to all the VTEPs as BUM. + +* [v1: net-next:pull request: implement devlink reload in ice](http://lore.kernel.org/netdev/20230203211456.705649-1-anthony.l.nguyen@intel.com/) + + Michal Swiatkowski says: + + This is a part of changes done in patchset [0]. Resource management is + kind of controversial part, so I split it into two patchsets. + + It is the first one, covering refactor and implement reload API call. + +* [v1: firmware: qcom_scm: Move qcom_scm.h to include/linux/firmware/qcom/](http://lore.kernel.org/netdev/20230203210956.3580811-1-quic_eberman@quicinc.com/) + + Move include/linux/qcom_scm.h to include/linux/firmware/qcom/qcom_scm.h. + This removes 1 of a few remaining Qualcomm-specific headers into a more + approciate subdirectory under include/. + +* [v1: net-next: ionic: rx buffers and on-chip descriptors](http://lore.kernel.org/netdev/20230203210016.36606-1-shannon.nelson@amd.com/) + + We start with a couple of house-keeping patches that were + originally presented for 'net', then we add support for on-chip + descriptor rings and Rx buffer page cacheing. + +* [v2: 9p/client: don't assume signal_pending() clears on recalc_sigpending()](http://lore.kernel.org/netdev/9422b998-5bab-85cc-5416-3bb5cf6dd853@kernel.dk/) + + signal_pending() really means that an exit to userspace is required to + clear the condition, as it could be either an actual signal, or it could + be TWA_SIGNAL based task_work that needs processing. The 9p client + does a recalc_sigpending() to take care of the former, but that still + leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL + task_work pending, then we'll sit in a tight loop spinning as + signal_pending() remains true even after recalc_sigpending(). + +* [v11: nvme-tcp receive offloads](http://lore.kernel.org/netdev/20230203132705.627232-1-aaptel@nvidia.com/) + + Here is the next iteration of our nvme-tcp receive offload series. + + The main changes are in patch 3 (netlink). + + Rebased on top of today net-next + + The changes are also available through git: + + Repo: https://github.com/aaptel/linux.git branch nvme-rx-offload-v11 + Web: https://github.com/aaptel/linux/tree/nvme-rx-offload-v11 + + The NVMeTCP offload was presented in netdev 0x16 (video now available): + - https://netdevconf.info/0x16/session.html?NVMeTCP-Offload-%E2%80%93-Implementation-and-Performance-Gains + - https://youtu.be/W74TR-SNgi4 + +* [v1: atm: eni: replace DPRINTK macro with pr_debug()](http://lore.kernel.org/netdev/00f95478-c9cc-1f4b-820e-d427a9113418@icloud.com/) + + The macro DPRINTK is in use in lots of different source files, varying in + their implementation. One of those files is drivers/atm/eni.c. + + Replacing them with pr_debug() and their counterparts makes it more + consistent and easier to read. + +* [v1: Bluetooth: Make sure LE create conn cancel is sent when timeout](http://lore.kernel.org/netdev/20230203173900.1.I9ca803e2f809e339da43c103860118e7381e4871@changeid/) + + When sending LE create conn command, we set a timer with a duration of + HCI_LE_CONN_TIMEOUT before timing out and calling + create_le_conn_complete. Additionally, when receiving the command + complete, we also set a timer with the same duration to call le_conn_timeout. + +* [v1: Bluetooth: Free potentially unfreed SCO connection](http://lore.kernel.org/netdev/20230203173024.1.Ieb6662276f3bd3d79e9134ab04523d584c300c45@changeid/) + + When it happens, hci_cs_setup_sync_conn won't be able to obtain the + reference to the SCO connection, so it will be stuck and potentially hinder subsequent connections to the same device. + + This patch prevents that by also deleting the SCO connection if it is + still not established when the corresponding ACL connection is deleted. + +* [v3: net-next: Wangxun interrupt and RxTx support](http://lore.kernel.org/netdev/20230203091135.3294377-1-jiawenwu@trustnetic.com/) + + Configure interrupt, setup RxTx ring, support to receive and transmit packets. + +* [v1: net: ethernet: mtk_eth_soc: various enhancements](http://lore.kernel.org/netdev/cover.1675407169.git.daniel@makrotopia.org/) + + This series brings a variety of fixes and enhancements for mtk_eth_soc, + adds support for the MT7981 SoC and facilitates sharing the SGMII PCS + code between mtk_eth_soc and mt7530. + +* [v7: io_uring: add napi busy polling support](http://lore.kernel.org/netdev/20230203060850.3060238-1-shr@devkernel.io/) + + This adds the napi busy polling support in io_uring.c. It adds a new + napi_list to the io_ring_ctx structure. This list contains the list of + napi_id's that are currently enabled for busy polling. This list is + used to determine which napi id's enabled busy polling. For faster + access it also adds a hash table. + +* [v1: next: wifi: mwifiex: Replace one-element array with flexible-array member](http://lore.kernel.org/netdev/Y9xkjXeElSEQ0FPY@work/) + + One-element arrays are deprecated, and we are replacing them with flexible + array members instead. So, replace one-element array with flexible-array + member in struct mwifiex_ie_types_rates_param_set. + +* [v1: next: wifi: mwifiex: Replace one-element arrays with flexible-array members](http://lore.kernel.org/netdev/Y9xkECG3uTZ6T1dN@work/) + + One-element arrays are deprecated, and we are replacing them with flexible + array members instead. So, replace one-element arrays with flexible-array + members in multiple structures. + +* [v2: net-next: net: page_pool: use in_softirq() instead](http://lore.kernel.org/netdev/20230203011612.194701-1-dqfext@gmail.com/) + + We use BH context only for synchronization, so we don't care if it's + actually serving softirq or not. + + As a side node, in case of threaded NAPI, in_serving_softirq() will + return false because it's in process context with BH off, making + page_pool_recycle_in_cache() unreachable. + +#### 安全增强 + +* [v1: media: imx-jpeg: Bounds check sizeimage access](http://lore.kernel.org/linux-hardening/20230204183804.never.323-kees@kernel.org/) + + The call of mxc_jpeg_get_plane_size() from mxc_jpeg_dec_irq() sets + plane_no argument to 1. + +* [v1: scsi: mpi3mr: Replace 1-element array with flex-array](http://lore.kernel.org/linux-hardening/20230204183715.never.937-kees@kernel.org/) + + Nothing else defined MPI3_NVME_ENCAP_CMD_MAX, so the "command" + buffer was being defined as a fake flexible array of size 1. Replace + this with a proper flex array. + +* [v1: USB: ene_usb6250: Allocate enough memory for full object](http://lore.kernel.org/linux-hardening/20230204183546.never.849-kees@kernel.org/) + + The allocation of PageBuffer is 512 bytes in size, but the dereferencing + of struct ms_bootblock_idi (also size 512) happens at a calculated offset + within the allocation, which means the object could potentially extend + beyond the end of the allocation. Avoid this case by just allocating + enough space to catch any accesses beyond the end. + +* [v1: btrfs: sysfs: Handle NULL return values](http://lore.kernel.org/linux-hardening/20230204183510.never.909-kees@kernel.org/) + + Each of to_fs_info(), discard_to_fs_info(), and to_space_info() can + return NULL values. Check for these so it's not possible to perform + calculations against NULL pointers. + +* [v1: bpf: Replace bpf_lpm_trie_key 0-length array with flexible array](http://lore.kernel.org/linux-hardening/20230204183241.never.481-kees@kernel.org/) + + Replace deprecated 0-length array in struct bpf_lpm_trie_key with flexible array. + + This includes fixing the selftest which was incorrectly using a + variable length struct as a header, identified earlier[1]. Avoid this + by just explicitly including the prefixlen member instead of struct + bpf_lpm_trie_key. + + [1] https://lore.kernel.org/all/202206281009.4332AA33@keescook/ + +* [v2: lm85: Bounds check to_sensor_dev_attr()->index usage](http://lore.kernel.org/linux-hardening/20230203223250.gonna.713-kees@kernel.org/) + + The index into various register arrays was not bounds checked. Provide a + simple wrapper to bounds check the index, adding robustness in the face + of memory corruption, unexpected index manipulation, etc. + +* [v1: randstruct: temporarily disable clang support](http://lore.kernel.org/linux-hardening/20230203194201.92015-1-ebiggers@kernel.org/) + + Randstruct with clang is currently unsafe to use in any clang release + that supports it, due to a clang bug that is causing miscompilations: + "-frandomize-layout-seed inconsistently randomizes all-function-pointers + structs" (https://github.com/llvm/llvm-project/issues/60349). Disable + it temporarily until the bug is fixed and the fix is released in a clang + version that can be checked for. + +* [v1: uaccess: Add minimum bounds check on kernel buffer size](http://lore.kernel.org/linux-hardening/20230203193523.never.667-kees@kernel.org/) + + While there is logic about the difference between ksize and usize, + copy_struct_from_user() didn't check the size of the destination buffer + (when it was known) against ksize. Add this check so there is an upper + bounds check on the possible memset() call, otherwise lower bounds + checks made by callers will trigger bounds warnings under -Warray-bounds. + +* [v2: arm64: Support Clang UBSAN trap codes for better reporting](http://lore.kernel.org/linux-hardening/20230203173946.gonna.972-kees@kernel.org/) + + When building with CONFIG_UBSAN_TRAP=y on arm64, Clang encodes the UBSAN + check (handler) type in the esr. Extract this and actually report these + traps as coming from the specific UBSAN check that tripped. + +* [v1: pstore/blk: Export a method to implemente panic_write()](http://lore.kernel.org/linux-hardening/20230203113515.93540-1-victor@allwinnertech.com/) + + The panic_write() is necessary to write the pstore frontend message + to blk devices when panic. Here is a way to register panic_write when + we use "best_effort" way to register the pstore blk-backend. + +* [v1: next: xen: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/Y9xjN6Wa3VslgXeX@work/) + + One-element arrays are deprecated, and we are replacing them with flexible + array members instead. So, replace one-element array with flexible-array + member in struct xen_page_directory. + + This helps with the ongoing efforts to tighten the FORTIFY_SOURCE + routines on memcpy() and help us make progress towards globally + enabling -fstrict-flex-arrays=3 [1]. + +* [v1: next: xfs: Replace one-element arrays with flexible-array members](http://lore.kernel.org/linux-hardening/Y9xiYmVLRIKdpJcC@work/) + + One-element arrays are deprecated, and we are replacing them with flexible + array members instead. So, replace one-element arrays with flexible-array + members in structures xfs_attr_leaf_name_local and + xfs_attr_leaf_name_remote. + +* [v2: 4.14: Backport oops_limit to 4.14](http://lore.kernel.org/linux-hardening/20230203003354.85691-1-ebiggers@kernel.org/) + + This series backports the patchset + "exit: Put an upper limit on how often we can oops" + (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#u) + to 4.14, as recommended at + https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html + +* [v2: 4.19: Backport oops_limit to 4.19](http://lore.kernel.org/linux-hardening/20230203002717.49198-1-ebiggers@kernel.org/) + + This series backports the patchset + "exit: Put an upper limit on how often we can oops" + (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#u) + to 4.19, as recommended at + https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html + +* [v1: 5.4: Backport oops_limit to 5.4](http://lore.kernel.org/linux-hardening/20230202044255.128815-1-ebiggers@kernel.org/) + + This series backports the patchset + "exit: Put an upper limit on how often we can oops" + (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#u) + to 5.4, as recommended at + https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html + This follows the backports to 5.10 and 5.15 which already released. + +* [v1: use canonical ftrace path whenever possible](http://lore.kernel.org/linux-hardening/20230130181915.1113313-1-zwisler@google.com/) + + The canonical location for the tracefs filesystem is at /sys/kernel/tracing. + + But, from Documentation/trace/ftrace.rst: + + Before 4.1, all ftrace tracing control files were within the debugfs + file system, which is typically located at /sys/kernel/debug/tracing. + +#### 异步 IO + +* [v7: liburing: add api for napi busy poll](http://lore.kernel.org/io-uring/20230205002424.102422-1-shr@devkernel.io/) + + This adds two new api's to set/clear the napi busy poll settings. The two + new functions are called: + - io_uring_register_napi + - io_uring_unregister_napi + + The patch series also contains the documentation for the two new functions + and two example programs. The client program is called napi-busy-poll-client + and the server program napi-busy-poll-server. The client measures the + roundtrip times of requests. + +* [v2: io_uring,audit: don't log IORING_OP_MADVISE](http://lore.kernel.org/io-uring/b5dfdcd541115c86dbc774aa9dd502c964849c5f.1675282642.git.rgb@redhat.com/) + + fadvise and madvise both provide hints for caching or access pattern for + file and memory respectively. Skip them. + +* [GIT PULL: Upgrade to clang-17 (for liburing's CI)](http://lore.kernel.org/io-uring/a9aac5c7-425d-8011-3c7c-c08dfd7d7c2f@gnuweeb.org/) + + clang-17 is now available. Upgrade the clang version in the liburing's + CI to clang-17. + + Two prep patches to address `-Wextra-semi-stmt` warnings: + + - Remove unnecessary semicolon (Alviro) + + - Wrap the CHECK() macro with a do-while statement (Alviro) + +#### Rust For Linux + +* [v1: rust: sync: Arc: Implement Debug and Display](http://lore.kernel.org/rust-for-linux/20230201232244.212908-1-boqun.feng@gmail.com/) + + I found that our Arc doesn't implement `Debug` or `Display` when I tried + to play with them, therefore add these implementation. + + Wedson, I know that you are considering to get rid of `ArcBorrow`, so + the patch #3 may have some conflicts with what you may be working on. + +* [v3: rust: MAINTAINERS: Add the zulip link](http://lore.kernel.org/rust-for-linux/20230201184525.272909-1-boqun.feng@gmail.com/) + + Zulip organization "rust-for-linux" was created 2 years ago[1] and has + proven to be a great place for Rust related discussion, therefore + add the information in MAINTAINERS file so that newcomers have more + options to find guide and help. + +* [v1: rust: add this_module macro](http://lore.kernel.org/rust-for-linux/20230131130841.318301-1-yakoyoku@gmail.com/) + + Adds a Rust equivalent to the handy THIS_MODULE macro from C. + +#### BPF + +* [v2: bpf-next: Add support for tracing programs in BPF_PROG_RUN](http://lore.kernel.org/bpf/20230203182812.20657-1-grantseltzer@gmail.com/) + + This patch changes the behavior of how BPF_PROG_RUN treats tracing + (fentry/fexit) programs. Previously only a return value is injected + but the actual program was not run. New behavior mirrors that of + running raw tracepoint BPF programs which actually runs the + instructions of the program via `bpf_prog_run()` + +* [v1: uapi: add missing ip/ipv6 header dependencies for linux/stddef.h](http://lore.kernel.org/bpf/20230203160448.1314205-1-herton@redhat.com/) + + Since commit 58e0be1ef6118 ("net: use struct_group to copy ip/ipv6 + header addresses"), ip and ipv6 headers started to use the __struct_group + definition, which is defined at include/uapi/linux/stddef.h. However, + linux/stddef.h isn't explicitly included in include/uapi/linux/{ip,ipv6}.h, + +* [v3: bpf-next: Document kfunc lifecycle / stability expectations](http://lore.kernel.org/bpf/20230203155727.793518-1-void@manifault.com/) + + This is v3 of the proposal for documenting BPF kfunc lifecycle and + stability. + +* [v1: bpf-next: libbpf: allow users to set kprobe/uprobe attach mode](http://lore.kernel.org/bpf/20230203031742.1730761-1-imagedong@tencent.com/) + + By default, libbpf will attach the kprobe/uprobe eBPF program in the + latest mode that supported by kernel. In this series, we add the support + to let users manually attach kprobe/uprobe in legacy or perf mode in the + 1th patch. + + And in the 2th patch, we add the selftests for it. + + *** BLURB HERE *** + +* [v2: perf lock contention: Improve aggr x filter combination](http://lore.kernel.org/bpf/20230203021324.143540-1-namhyung@kernel.org/) + + The callstack filter can be useful to debug lock issues but it has a + limitation that it only works with caller aggregation mode (which is the + default setting). IOW it cannot filter by callstack when showing tasks + or lock addresses/names. + +* [v1: bpf-next: selftests/bpf: Initialize tc in xdp_synproxy](http://lore.kernel.org/bpf/20230202235335.3403781-1-iii@linux.ibm.com/) + + xdp_synproxy/xdp fails in CI with: + + Error: bpf_tc_hook_create: File exists + + The XDP version of the test should not be calling bpf_tc_hook_create(); + the reason it's happening anyway is that if we don't specify --tc on the + command line, tc variable remains uninitialized. + +* [v1: tools/resolve_btfids: Tidy HOST_OVERRIDES](http://lore.kernel.org/bpf/20230202224253.40283-1-irogers@google.com/) + + Don't set EXTRA_CFLAGS to HOSTCFLAGS, ensure CROSS_COMPILE isn't + passed through. + + This patch is based on top of: + https://lore.kernel.org/bpf/20230202112839.1131892-1-jolsa@kernel.org/ + +* [v1: net: virtio-net: Keep stop() to follow mirror sequence of open()](http://lore.kernel.org/bpf/20230202163516.12559-1-parav@nvidia.com/) + + Cited commit in fixes tag frees rxq xdp info while RQ NAPI is + still enabled and packet processing may be ongoing. + + Follow the mirror sequence of open() in the stop() callback. + This ensures that when rxq info is unregistered, no rx + packet processing is ongoing. + +* [v1: bpf-next: tools/resolve_btfids: Compile resolve_btfids as host program](http://lore.kernel.org/bpf/20230202112839.1131892-1-jolsa@kernel.org/) + + Making resolve_btfids to be compiled as host program so + we can avoid cross compile issues as reported by Nathan. + + Also we no longer need HOST_OVERRIDES for BINARY target, + just for 'prepare' targets. + +* [v1: virtio-net: support AF_XDP zero copy](http://lore.kernel.org/bpf/20230202110058.130695-1-xuanzhuo@linux.alibaba.com/) + + XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero + copy feature of xsk (XDP socket) needs to be supported by the driver. The + performance of zero copy is very good. mlx5 and intel ixgbe already support + this feature, This patch set allows virtio-net to support xsk's zerocopy xmit feature. + +* [v2: bpf-next: libbpf: Add wakeup_events to creation options](http://lore.kernel.org/bpf/20230202062549.632425-1-arilou@gmail.com/) + + Add option to set when the perf buffer should wake up, by default the + perf buffer becomes signaled for every event that is being pushed to it. + +* [v1: virtio-net: close() to follow mirror of open()](http://lore.kernel.org/bpf/20230202050038.3187-1-parav@nvidia.com/) + + This two small patches improves ndo_close() callback to follow + the mirror sequence of ndo_open() callback. This improves the code auditing + and also ensure that xdp rxq info is not unregistered while NAPI on RXQ is ongoing. + +* [v1: bpf-next: bpf, mm: bpf memory usage](http://lore.kernel.org/bpf/20230202014158.19616-1-laoar.shao@gmail.com/) + + Currently we can't get bpf memory usage reliably. bpftool now shows the + bpf memory footprint, which is difference with bpf memory usage. + +* [v2: tools/resolve_btfids: Tidy host CFLAGS forcing](http://lore.kernel.org/bpf/20230201213743.44674-1-irogers@google.com/) + + Avoid passing CROSS_COMPILE to submakes and ensure CFLAGS is forced to + HOSTCFLAGS for submake builds. This fixes problems with cross + compilation. + + Tidy to not unnecessarily modify/export CFLAGS, make the override for + prepare and build clearer. + +* [v3: Documentation/bpf: Document API stability expectations for kfuncs](http://lore.kernel.org/bpf/20230201174449.94650-1-toke@redhat.com/) + + Following up on the discussion at the BPF office hours (and subsequent + discussion), this patch adds a description of API stability expectations + for kfuncs. The goal here is to manage user expectations about what kind of + stability can be expected for kfuncs exposed by the kernel. + +* [v1: Add ftrace direct call for arm64](http://lore.kernel.org/bpf/20230201163420.1579014-1-revest@chromium.org/) + + This series adds ftrace direct call support to arm64. + This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64. + + It is meant to apply on top of the arm64 tree which contains Mark Rutland's + series on CALL_OPS [1] under the for-next/ftrace tag. + +* [v1: bpf-next: bpf: Replace BPF_ALU and BPF_JMP with BPF_ALU32 and BPF_JMP64](http://lore.kernel.org/bpf/1675254998-4951-1-git-send-email-yangtiezhu@loongson.cn/) + + The intention of this patchset is to make the code more readable, + no functional changes, based on bpf-next. + + If this patchset makes no sense, please ignore it and sorry for that. + +* [v5: bpf-next: xdp: introduce xdp-feature support](http://lore.kernel.org/bpf/cover.1675245257.git.lorenzo@kernel.org/) + + Introduce the capability to export the XDP features supported by the NIC. + Introduce a XDP compliance test tool (xdp_features) to check the features + exported by the NIC match the real features supported by the driver. + Allow XDP_REDIRECT of non-linear XDP frames into a devmap. + +* [v1: bpf-next: ice: add XDP mbuf support](http://lore.kernel.org/bpf/20230131204506.219292-1-maciej.fijalkowski@intel.com/) + + although this work started as an effort to add multi-buffer XDP support + to ice driver, as usual it turned out that some other side stuff needed to be addressed, so let me give you an overview. + +* [v3: bpf-next: BPF rbtree next-gen datastructure](http://lore.kernel.org/bpf/20230131180016.3368305-1-davemarchevsky@fb.com/) + + This series adds a rbtree datastructure following the "next-gen + datastructure" precedent set by recently-added linked-list [0]. This is + a reimplementation of previous rbtree RFC [1] to use kfunc + kptr + instead of adding a new map type. + +* [v2: bpf-next: bpf: Refactor release_regno searching logic](http://lore.kernel.org/bpf/20230131171038.2648165-1-davemarchevsky@fb.com/) + + Kfuncs marked KF_RELEASE indicate that they release some + previously-acquired arg. The verifier assumes that such a function will + only have one arg reg w/ ref_obj_id set, and that that arg is the one to + be released. Multiple kfunc arg regs have ref_obj_id set is considered + an invalid state. + +* [v1: dwarves: dwarves: sync with libbpf-1.1](http://lore.kernel.org/bpf/1675169241-32559-1-git-send-email-alan.maguire@oracle.com/) + + This will pull in BTF dedup improvements + + de048b6 libbpf: Resolve enum fwd as full enum64 and vice versa + f3c51fe libbpf: Btf dedup identical struct test needs check for nested structs/arrays + +* [v2: net-next: vsock: add support for sockmap](http://lore.kernel.org/bpf/20230118-support-vsock-sockmap-connectible-v2-0-58ffafde0965@bytedance.com/) + + Add support for sockmap to vsock. + + We're testing usage of vsock as a way to redirect guest-local UDS requests to + the host and this patch series greatly improves the performance of such a setup. + +* [v3: net: ixgbe: allow to increase MTU to 3K with XDP enabled](http://lore.kernel.org/bpf/20230131032357.34029-1-kerneljasonxing@gmail.com/) + + Recently I encountered one case where I cannot increase the MTU size + directly from 1500 to a much bigger value with XDP enabled if the + server is equipped with IXGBE card, which happened on thousands of + servers in production environment. After appling the current patch, + we can set the maximum MTU size to 3K. + +* [v1: bpf-next: selftests/bpf: Try to address xdp_metadata crashes](http://lore.kernel.org/bpf/20230130215137.3473320-1-sdf@google.com/) + + Commit e04ce9f4040b ("selftests/bpf: Make crashes more debuggable in + test_progs") hasn't uncovered anything interesting besides + confirming that the test passes successfully, but crashes eventually [0]. + +* [v1: bpf: add bpf_link support for BPF_NETFILTER programs](http://lore.kernel.org/bpf/20230130150432.24924-1-fw@strlen.de/) + + Doesn't apply, doesn't work -- there is no BPF_NETFILTER program type. + + nf_hook_run_bpf() (c-function that creates the program context and + calls the real bpf prog) would be "updated" to use the bpf dispatcher to + avoid the indirect call overhead. + + Does that seem ok to you? I'd ignore the bpf dispatcher for now and would work on the needed verifier changes first. + +### 周边技术动态 + +#### Qemu + +* [v10: riscv: Allow user to set the satp mode](http://lore.kernel.org/qemu-devel/20230203055812.257458-1-alexghiti@rivosinc.com/) + + This introduces new properties to allow the user to set the satp mode, + see patch 3 for full syntax. In addition, it prevents cpus to boot in a satp mode they do not support (see patch 4). + +* [v10: hw/riscv: handle kernel_entry high bits with 32bit CPUs](http://lore.kernel.org/qemu-devel/20230202135810.1657792-1-dbarboza@ventanamicro.com/) + + This new version removed the translate_fn() from patch 1 because it + wasn't removing the sign-extension for pentry as we thought it would. + A more detailed explanation is given in the commit msg of patch 1. + + We're now retrieving the 'lowaddr' value from load_elf_ram_sym() and + using it when we're running a 32-bit CPU. This worked with 32 bit 'virt' machine booting with the -kernel option. + +* [v1: Add RISC-V vector cryptography extensions](http://lore.kernel.org/qemu-devel/20230202124230.295997-1-lawrence.hunter@codethink.co.uk/) + + This patch series introduces an implementation for the six instruction sets + of the draft RISC-V vector cryptography extensions specification. + + This patch set implements the instruction sets as per the 20221202 + version of the specification (1). We plan to update to the latest spec + once stabilised. + +* [v1: Add basic ACPI support for risc-v virt](http://lore.kernel.org/qemu-devel/20230202045223.2594627-1-sunilvl@ventanamicro.com/) + + This series adds the basic ACPI support for the RISC-V virt machine. + Currently only INTC interrupt controller specification is approved by the + UEFI forum. External interrupt controller support in ACPI is in progress. + + The basic infrstructure changes are mostly leveraged from ARM. + +* [v1: target/riscv: Add RVV registers to log](http://lore.kernel.org/qemu-devel/20230201142454.109260-1-ivan.klokov@syntacore.com/) + + Added QEMU option 'rvv' to add RISC-V RVV registers to log like regular regs. + +* [v2: target/riscv: set tval for triggered watchpoints](http://lore.kernel.org/qemu-devel/20230131170955.752743-1-geomatsi@gmail.com/) + + According to priviledged spec, if [sm]tval is written with a nonzero + value when a breakpoint exception occurs, then [sm]tval will contain + the faulting virtual address. Set tval to hit address when breakpoint exception is triggered by hardware watchpoint. + +#### U-Boot + +* [v2: Migrate to split config](http://lore.kernel.org/u-boot/20230204002619.938387-1-sjg@chromium.org/) + + U-Boot uses an SPL prefix on CONFIG options to indicate when an option + relates to SPL. For example, while CONFIG_TEXT_BASE is the text base for + U-Boot proper, CONFIG_SPL_TEXT_BASE is the text base for SPL. + +* [v1: RFC: Migrate to split config](http://lore.kernel.org/u-boot/20230131152702.249197-1-sjg@chromium.org/) + + U-Boot uses an SPL prefix on CONFIG options to indicate when an option + relates to SPL. For example, while CONFIG_TEXT_BASE is the text base for + U-Boot proper, CONFIG_SPL_TEXT_BASE is the text base for SPL. + + Within the code it is possible do things like CONFIG_VAL(TEXT_BASE) to + get that value. It returns the appropriate option, depending on the phase being built. + +* [v2: riscv: cpu: ax25: Simplify cache enabling logic in harts_early_init()](http://lore.kernel.org/u-boot/20230131094034.12423-1-peterlin@andestech.com/) + + This patch improves the cache enabling operation in harts_early_init(), + also moves the CSR definition to include/asm/arch-andes/csr.h and drops + unnecessary i/d-cache disable functions from cleanup_before_linux(). + +## 20230129:第 31 期 + +### 内核动态 + +#### RISC-V 架构支持 + +* [v2: mm, arch: add generic implementation of pfn_valid() for FLATMEM](http://lore.kernel.org/linux-riscv/20230129124235.209895-1-rppt@kernel.org/) + + Every architecture that supports FLATMEM memory model defines its own + version of pfn_valid() that essentially compares a pfn to max_mapnr. + +* [v1: riscv: Add header include guards to insn.h](http://lore.kernel.org/linux-riscv/20230129094242.282620-1-liaochang1@huawei.com/) + + Add header include guards to insn.h to prevent repeating declaration of + any identifiers in insn.h. + +* [v1: riscv: support arch_has_hw_pte_young()](http://lore.kernel.org/linux-riscv/20230129064956.143664-1-tjytimi@163.com/) + + The arch_has_hw_pte_young() is false for riscv by default. If it's + false, page table walk is almost skipped for MGLRU reclaim. And it + will also cause useless step in __wp_page_copy_user(). + +* [v5: riscv: improve boot time isa extensions handling](http://lore.kernel.org/linux-riscv/20230128172856.3814-1-jszhang@kernel.org/) + + Generally, riscv ISA extensions are fixed for any specific hardware + platform, so a hart's features won't change after booting, this + chacteristic makes it straightforward to use a static branch to check + a specific ISA extension is supported or not to optimize performance. + +* [v2: RISC-V KVM virtualize AIA CSRs](http://lore.kernel.org/linux-riscv/20230128072737.2995881-1-apatel@ventanamicro.com/) + + The RISC-V AIA specification is now frozen as-per the RISC-V international + process. The latest frozen specifcation can be found at: + https://github.com/riscv/riscv-aia/releases/download/1.0-RC1/riscv-interrupts-1.0-RC1.pdf + +* [v3: KVM perf support](http://lore.kernel.org/linux-riscv/20230127182558.2416400-1-atishp@rivosinc.com/) + + This series extends perf support for KVM. The KVM implementation relies + on the SBI PMU extension and trap n emulation of hpmcounter CSRs. + The KVM implementation exposes the virtual counters to the guest and internally + manage the counters using kernel perf counters. + +* [v2: RISC-V: KVM: Redirect illegal instruction traps to guest](http://lore.kernel.org/linux-riscv/20230127112934.2749592-1-apatel@ventanamicro.com/) + + The M-mode redirects an unhandled illegal instruction trap back + to S-mode. However, KVM running in HS-mode terminates the VS-mode + software when it receives illegal instruction trap. + +* [v1: hwrng: starfive - Enable compile testing](http://lore.kernel.org/linux-riscv/Y9OveVKTkX8cRhyP@gondor.apana.org.au/) + + Enable compile testing for jh7110. Also remove the dependency on + HW_RANDOM. + +* [v3: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230127091051.1465278-1-jeeheng.sia@starfivetech.com/) + + This series adds RISC-V Hibernation/suspend to disk support. + Low level Arch functions were created to support hibernation. + swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write + cpu state onto the stack, then calling swsusp_save() to save the memory + image. + +* [v2: -next: riscv: mm: hugetlb: Enable ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP](http://lore.kernel.org/linux-riscv/20230127050421.1920048-1-guoren@kernel.org/) + + Add HVO support for RISC-V; see commit 6be24bed9da3 ("mm: hugetlb: + introduce a new config HUGETLB_PAGE_FREE_VMEMMAP"). This patch is + similar to commit 1e63ac088f20 ("arm64: mm: hugetlb: enable + HUGETLB_PAGE_FREE_VMEMMAP for arm64"), and riscv's motivation is the + same as arm64. The current riscv was ready to enable HVO after fixup, + ref commit d33deda095d3 ("riscv/mm: hugepage's PG_dcache_clean flag + is only set in head page"). + +* [v2: KVM: Add a common API for range-based TLB invalidation](http://lore.kernel.org/linux-riscv/20230126184025.2294823-1-dmatlack@google.com/) + + This series introduces a common API for performing range-based TLB + invalidation. + This series is based on patches 29-33 from (2.), but I made some further cleanups after looking at it a second time. + + Tested on x86_64 and ARM64 using KVM selftests. + +* [GIT PULL: RISC-V Devicetrees for v6.3](http://lore.kernel.org/linux-riscv/Y9LP+Za1h0fkBa58@spud/) + + DT stuff here for v6.3! I was kinda hoping to have a VisionFive 2 DT + for you, but alas no. + The changelog looks a bit odd since it's filled with un-reviewed + commits of my own, but they went as a PR to Palmer & are in riscv's + for-next too: + https://lore.kernel.org/all/167225428483.14530.3368527680488639805.b4-ty@rivosinc.com/ + They might also pop up as part of the Allwinner DT PR, if the D1 stuff + lands for v6.3, which I hope does happen! + +* [GIT PULL: RISC-V SoC drivers for v6.3](http://lore.kernel.org/linux-riscv/Y9LNIm9pkr+Owv%2Fe@spud/) + + I'm sending this one perhaps earlier than needed given there's going + to be -rc8 this time around, just in case something about the PMU + driver isn't to your liking. It'd be nice if there was a subsystem for + these power management units as I wasn't sure if the API usage was + correct. Heiko, who has experience from the rockchip driver, reviewed + it, so I am happy with that. + +* [v15: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230126172516.1580058-1-guoren@kernel.org/) + + The patches convert riscv to use the generic entry infrastructure from + kernel/entry/*. Some optimization for entry.S with new .macro and merge + ret_from_kernel_thread into ret_from_fork. + +* [v1: riscv: kprobe: Optimize kprobe with accurate atomicity](http://lore.kernel.org/linux-riscv/20230126161559.1467374-1-guoren@kernel.org/) + + The previous implementation was based on the stop_matchine mechanism, + which reduced the speed of arm/disarm_kprobe. Using minimum ebreak + instruction would get accurate atomicity. + + This patch removes the patch_text of riscv, which is based on + stop_machine. Then riscv only reserved patch_text_nosync, and developers + need to be more careful in dealing with patch_text atomicity. + +* [v2: Allwinner D1 power domain support](http://lore.kernel.org/linux-riscv/20230126063419.15971-1-samuel@sholland.org/) + + This series adds support for the power controller found in D1 and other + recent Allwinner SoCs. There is no first-party documentation, but there + are a couple of vendor drivers for different hardware revisions[1][2], + and the register definitions were easy to verify empirically. + +* [v5: riscv: Allwinner D1/D1s platform support](http://lore.kernel.org/linux-riscv/20230126045738.47903-1-samuel@sholland.org/) + + This series adds the Kconfig/defconfig plumbing and devicetrees for a + range of Allwinner D1 and D1s-based boards. Many features are already + enabled, including USB, Ethernet, and WiFi. + + This version drops all boards/nodes with missing YAML bindings, so at + least some support can get merged for v6.3. + +* [v13: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230125142056.18356-1-andy.chiu@sifive.com/) + + This patchset is implemented based on vector 1.0 spec to add vector support + in riscv Linux kernel. There are some assumptions for this implementations. + +* [v1: riscv: mm: Implement pmdp_collapse_flush for THP](http://lore.kernel.org/linux-riscv/20230125125512.2494577-1-mchitale@ventanamicro.com/) + + When THP is enabled, 4K pages are collapsed into a single huge + page using the generic pmdp_collapse_flush() which will further + use flush_tlb_range() to shoot-down stale TLB entries. Unfortunately, + the generic pmdp_collapse_flush() only invalidates cached leaf PTEs + using address specific SFENCEs which results in repetitive (or + unpredictable) page faults on RISC-V implementations which cache non-leaf PTEs. + +* [v3: RISC-V kasan rework](http://lore.kernel.org/linux-riscv/20230125082333.1577572-1-alexghiti@rivosinc.com/) + + As described in patch 2, our current kasan implementation is intricate, + so I tried to simplify the implementation and mimic what arm64/x86 are doing. + +* [v5: riscv: Use PUD/P4D/PGD pages for the linear mapping](http://lore.kernel.org/linux-riscv/20230125081214.1576313-1-alexghiti@rivosinc.com/) + + This patchset intends to improve tlb utilization by using hugepages for + the linear mapping. + +* [v2: dt-bindings: Introduce dual-link panels & panel-vendors](http://lore.kernel.org/linux-riscv/20230124101238.4542-1-a-bhatia1@ti.com/) + + The third patch introduces a dt-binding for generic dual-link LVDS + panels. These panels do not have any documented constraints, except for + their timing characteristics. Further, these panels have 2 pixel-sinks. + +* [v3: resend: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230123105135.814154-1-alexghiti@rivosinc.com/) + + Add 2 early command line parameters that allow to downgrade satp mode + (using the same naming as x86): + - "no5lvl": use a 4-level page table (down from sv57 to sv48) + - "no4lvl": use a 3-level page table (down from sv57/sv48 to sv39) + + Note that going through the device tree to get the kernel command line + works with ACPI too since the efi stub creates a device tree anyway with + the command line. + +* [v2: RISC-V: Apply Zicboz to clear_page](http://lore.kernel.org/linux-riscv/20230122191328.1193885-1-ajones@ventanamicro.com/) + + When the Zicboz extension is available we can more rapidly zero naturally + aligned Zicboz block sized chunks of memory. As pages are always page + aligned and are larger than any Zicboz block size will be, then + clear_page() appears to be a good candidate for the extension. + +#### 进程调度 + +* [v1: net: sched: sch: Bounds check priority](http://lore.kernel.org/lkml/20230127224036.never.561-kees@kernel.org/) + + Nothing was explicitly bounds checking the priority index used to access + clpriop[]. WARN and bail out early if it's pathological. + +* [v1: sched/fair: sanitize vruntime of entity being placed](http://lore.kernel.org/lkml/20230127163230.3339408-1-rkagan@amazon.de/) + + When a scheduling entity is placed onto cfs_rq, its vruntime is pulled + to the base level (around cfs_rq->min_vruntime), so that the entity + doesn't gain extra boost when placed backwards. + + However, if the entity being placed wasn't executed for a long time, its + vruntime may get too far behind (e.g. while cfs_rq was executing a + low-weight hog), which can inverse the vruntime comparison due to s64 + overflow. This results in the entity being placed with its original + vruntime way forwards, so that it will effectively never get to the cpu. + +* [v3: sched: Store restrict_cpus_allowed_ptr() call state](http://lore.kernel.org/lkml/20230127015527.466367-1-longman@redhat.com/) + + The user_cpus_ptr field was originally added by commit b90ca8badbd1 + ("sched: Introduce task_struct::user_cpus_ptr to track requested + affinity"). It was used only by arm64 arch due to possible asymmetric + CPU setup. + + Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested + cpumask"), task_struct::user_cpus_ptr is repurposed to store user + requested cpu affinity specified in the sched_setaffinity(). + +* [v1: sched/rt: Add a comment for the existence of task_is_realtime()](http://lore.kernel.org/lkml/20230123042729.30268-1-dave@stgolabs.net/) + + ... such that users don't wonder about it when we have rt_task(). + +* [GIT PULL: sched/urgent for v6.2-rc6](http://lore.kernel.org/lkml/Y80pqpsa%2Ff2eEcYP@zn.tnic/) + + please pull a couple of urgent scheduler fixes for 6.2. + + Thx. + +#### 内存管理 + +* [v2: Some small improvements for memblock.](http://lore.kernel.org/linux-mm/20230129090034.12310-1-zhangpeng.00@bytedance.com/) + + Some small optimizations for memblock. + +* [v1: -next: memory tier: release the new_memtier in find_create_memory_tier()](http://lore.kernel.org/linux-mm/20230129040651.1329208-1-tongtiangen@huawei.com/) + + In find_create_memory_tier(), if failed to register device, then we should + release new_memtier from the tier list and put device instead of memtier. + +* [v2: mm/migrate: Continue to migrate for non-hugetlb folios](http://lore.kernel.org/linux-mm/20230129033910.1327277-1-chenwandun@huawei.com/) + + migrate_hugetlbs returns -ENOMEM when no enough hugetlb, + however there may be free non-hugetlb folios available, + so continue to migrate for non-hugetlb folios. + +* [v1: mm/migrate: Continue to migrate for small pages](http://lore.kernel.org/linux-mm/20230129025404.1262745-1-chenwandun@huawei.com/) + + migrate_hugetlbs returns -ENOMEM when no enough huge page, + however maybe there are still free small pages, so continue + to migrate for small pages. + +* [v4: kasan: infer allocation size by scanning metadata](http://lore.kernel.org/linux-mm/20230129021437.18812-1-Kuan-Ying.Lee@mediatek.com/) + + Make KASAN scan metadata to infer the requested allocation size instead of + printing cache->object_size. + + This patch fixes confusing slab-out-of-bounds reports as reported in: + + https://bugzilla.kernel.org/show_bug.cgi?id=216457 + +* [v1: mm/swapfile: add cond_resched() in get_swap_pages()](http://lore.kernel.org/linux-mm/20230128094757.1060525-1-xialonglong1@huawei.com/) + + The softlockup still occurs in get_swap_pages() under memory pressure. + 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each + zram device is 50MB with same priority as si. Use the stress-ng tool + to increase memory pressure, causing the system to oom frequently. + +* [v3: Add overflow checks for several syscalls](http://lore.kernel.org/linux-mm/20230128063229.989058-1-mawupeng1@huawei.com/) + + While testing mlock, we have a problem if the len of mlock is ULONG_MAX. + The return value of mlock is zero. But nothing will be locked since the + len in do_mlock overflows to zero due to the following code in mlock: + + len = PAGE_ALIGN(len + (offset_in_page(start))); + +* [v2: Per-VMA locks](http://lore.kernel.org/linux-mm/20230127194110.533103-1-surenb@google.com/) + + Previous version: + +* [v8: DEPT(Dependency Tracker)](http://lore.kernel.org/linux-mm/1674782358-25542-1-git-send-email-max.byungchul.park@gmail.com/) + + Nevertheless, I apologize for the lack of document. I promise to add it + before it gets needed to use DEPT's APIs by users. For now, you can use + DEPT just with CONFIG_DEPT on. + +* [v1: ipc/shm: Introduce new do_vma_munmap() to munmap](http://lore.kernel.org/linux-mm/20230126212049.980501-1-Liam.Howlett@oracle.com/) + + The shm already has the vma iterator in position for a write. + do_vmi_munmap() searches for the correct position and aligns the write, + so it is not the right function to use in this case. + +* [v1: mm: Add memcpy_from_file_folio()](http://lore.kernel.org/linux-mm/20230126201552.1681588-1-willy@infradead.org/) + + This is the equivalent of memcpy_from_page(). It differs in that it + takes the position in a file instead of offset in a folio, it accepts + the total number of bytes to be copied (instead of the number of bytes + to be copied from this folio) and it returns how many bytes were copied + from the folio, rather than making the caller calculate that and then + checking if the caller got it right. + +* [v1: Convert writepage_t to use a folio](http://lore.kernel.org/linux-mm/20230126201255.1681189-1-willy@infradead.org/) + + Against next-20230125. More folioisation. I split out the mpage + work from everything else because it completely dominated the patch, + but some implementations I just converted outright. + +* [v1: highmem: Round down the address passed to kunmap_flush_on_unmap()](http://lore.kernel.org/linux-mm/20230126200727.1680362-1-willy@infradead.org/) + + We already round down the address in kunmap_local_indexed() which is + the other implementation of __kunmap_local(). The only implementation + of kunmap_flush_on_unmap() is PA-RISC which is expecting a page-aligned + address. This may be causing PA-RISC to be flushing the wrong addresses + currently. + +* [v4: introduce vm_flags modifier functions](http://lore.kernel.org/linux-mm/20230126193752.297968-1-surenb@google.com/) + + This patchset was originally published as a part of per-VMA locking [1] and + was split after suggestion that it's viable on its own and to facilitate + the review process. It is now a preprequisite for the next version of per-VMA + lock patchset, which reuses vm_flags modifier functions to lock the VMA when + vm_flags are being updated. + +* [v8: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230126175356.1582123-1-nphamcs@gmail.com/) + + There is currently no good way to query the page cache state of large + file sets and directory trees. There is mincore(), but it scales poorly: + the kernel writes out a lot of bitmap data that userspace has to + aggregate, when the user really doesn not care about per-page information + in that case. The user also needs to mmap and unmap each file as it goes + along, which can be quite slow as well. + +* [v1: mm/highmem: Align-down to page the address for kunmap_flush_on_unmap()](http://lore.kernel.org/linux-mm/20230126143346.12086-1-fmdefrancesco@gmail.com/) + + If ARCH_HAS_FLUSH_ON_KUNMAP is defined (PA-RISC case), __kunmap_local() + calls kunmap_flush_on_unmap(). The latter currently flushes the wrong + address (as confirmed by Matthew Wilcox and Helge Deller). Al Viro + proposed to call kunmap_flush_on_unmap() on an aligned-down to page + address in order to fix this issue. Consensus has been reached on this + solution. + +* [v11: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-mm/20230126141626.2809643-1-dhowells@redhat.com/) + + Here are patches to provide support for extracting pages from an iov_iter + and to use this in the extraction functions in the block layer bio code. + +* [v1: iov_iter: Use __bitwise with the extraction_flags](http://lore.kernel.org/linux-mm/2638928.1674729230@warthog.procyon.org.uk/) + + Interestingly, things like __be32 are __bitwise. I wonder if that actually + makes sense or if it was just convenient so stop people doing arithmetic on + them. I guess doing AND/OR/XOR on them isn't a problem provided both + arguments are appropriately byte-swapped. + +* [v2: mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups](http://lore.kernel.org/linux-mm/20230125225358.2576151-1-zokeefe@google.com/) + + This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE + might identify a pte-mapped hugepage, only to have khugepaged race-in, free + the pte table, and clear the pmd. + +* [v10: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-mm/20230125210657.2335748-1-dhowells@redhat.com/) + + Here are patches to provide support for extracting pages from an iov_iter + and to use this in the extraction functions in the block layer bio code. + +* [v2: nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE](http://lore.kernel.org/linux-mm/167467815773.463042.7022545814443036382.stgit@dwillia2-xfh.jf.intel.com/) + + Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE") + + ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page) + potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this + doubles the amount of capacity stolen from user addressable capacity for + everyone, regardless of whether they are using the debug option. Revert + that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but + allow for debug scenarios to proceed with creating debug sized page maps + with a compile option to support debug scenarios. + +* [v2: mm/madvise: add vmstat statistics for madvise_[cold|pageout]](http://lore.kernel.org/linux-mm/20230125005457.4139289-1-minchan@kernel.org/) + + madvise LRU manipulation APIs need to scan address ranges to find + present pages at page table and provides advice hints for them. + + Likewise pg[scan/steal] count on vmstat, madvise_pg[scanned/hinted] + shows the proactive reclaim efficiency so this patch adds those + two statistics in vmstat. + +* [v1: Revert "mm: kmemleak: alloc gray object for reserved region with direct map"](http://lore.kernel.org/linux-mm/20230124230254.295589-1-isaacmanjarres@google.com/) + + Kmemleak operates by periodically scanning memory regions for pointers + to allocated memory blocks to determine if they are leaked or not. + However, reserved memory regions can be used for DMA transactions + between a device and a CPU, and thus, wouldn't contain pointers to + allocated memory blocks, making them inappropriate for kmemleak to + scan. Thus, revert this commit. + +* [v2: mm: kasan: reset page tags properly with sampling](http://lore.kernel.org/linux-mm/5dbd866714b4839069e2d8469ac45b60953db290.1674592780.git.andreyknvl@google.com/) + + The implementation of page_alloc poisoning sampling assumed that + tag_clear_highpage resets page tags for __GFP_ZEROTAGS allocations. + However, this is no longer the case since commit 70c248aca9e7 + ("mm: kasan: Skip unpoisoning of user pages"). + + This leads to kernel crashes when MTE-enabled userspace mappings are + used with Hardware Tag-Based KASAN enabled. + +#### 文件系统 + +* [v4: pipe: use __pipe_{lock,unlock} instead of spinlock](http://lore.kernel.org/linux-fsdevel/20230129060452.7380-1-zhanghongchen@loongson.cn/) + + Use spinlock in pipe_{read,write} cost too much time,IMO + pipe->{head,tail} can be protected by __pipe_{lock,unlock}. + On the other hand, we can use __pipe_{lock,unlock} to protect + the pipe->{head,tail} in pipe_resize_ring and + post_one_notification. + +* [v1: fscrypt: support decrypting data from large folios](http://lore.kernel.org/linux-fsdevel/20230127224202.355629-1-ebiggers@kernel.org/) + + Try to make the filesystem-level decryption functions in fs/crypto/ + aware of large folios. This includes making fscrypt_decrypt_bio() + support the case where the bio contains large folios, and making + fscrypt_decrypt_pagecache_blocks() take a folio instead of a page. + +* [v1: fsverity: support verifying data from large folios](http://lore.kernel.org/linux-fsdevel/20230127221529.299560-1-ebiggers@kernel.org/) + + Try to make fs/verity/verify.c aware of large folios. This includes + making fsverity_verify_bio() support the case where the bio contains + large folios, and adding a function fsverity_verify_folio() which is the + equivalent of fsverity_verify_page(). + +* [v1: multiblock allocator improvements](http://lore.kernel.org/linux-fsdevel/cover.1674822311.git.ojaswin@linux.ibm.com/) + + This patchset intends to improve some of the shortcomings of mb allocator + that we had noticed while running various tests and workloads in a + POWERPC machine with 64k block size. + +* [v1: Convert most of ext4 to folios](http://lore.kernel.org/linux-fsdevel/20230126202415.1682629-1-willy@infradead.org/) + + This, on top of a number of patches currently in next and a few patches + sent to the mailing lists earlier today, converts most of ext4 to use + folios instead of pages. It does not add support for large folios. + It does not convert mballoc to use folios. write_begin() and write_end() + still take a page parameter instead of a folio. + +* [v1: fs: gracefully handle ->get_block not mapping bh in __mpage_writepage](http://lore.kernel.org/linux-fsdevel/20230126085155.26395-1-jack@suse.cz/) + + When filesystem's ->get_block function does not map the buffer head when + called from __mpage_writepage(), the function will happily go and pass + bogus bdev and block number to bio allocation routines which leads to + crashes sooner or later. E.g. UDF can do this because it doesn't want to + allocate blocks from ->writepages callbacks. + +* [v1: proc: Add allowlist for procfs files](http://lore.kernel.org/linux-fsdevel/cover.1674660533.git.legion@kernel.org/) + + The patch expands subset= option. If the proc is mounted with the + subset=allowlist option, the /proc/allowlist file will appear. This file + contains the filenames and directories that are allowed for this + mountpoint. By default, /proc/allowlist contains only its own name. + +* [v2: udf: Unify aops](http://lore.kernel.org/linux-fsdevel/20230125093914.24627-1-jack@suse.cz/) + + this patch series makes UDF use the same address_space_operations for both + normal and in-ICB files as switching aops on live files is prone to races as + spotted by syzbot. When already dealing with this code, switch readpage, + writepage, in-ICB expanding functions from using kmap_atomic() to use + kmap_local_page(). + +#### 网络设备 + +* [v1: net-next: netlink: provide an ability to set default extack message](http://lore.kernel.org/netdev/d4843760219f20367c27472f084bd8aa729cf321.1674995155.git.leon@kernel.org/) + + In netdev common pattern, extack pointer is forwarded to the drivers + to be filled with error message. However, the caller can easily + overwrite the filled message. + + Instead of adding multiple "if (!extack->_msg)" checks before any + NL_SET_ERR_MSG() call, which appears after call to the driver, let's + add new macro to common code. + +* [v6: net-next: net/sched: cls_api: Support hardware miss to tc action](http://lore.kernel.org/netdev/20230129101613.17201-1-paulb@nvidia.com/) + + This series adds support for hardware miss to instruct tc to continue execution + in a specific tc action instance on a filter's action list. The mlx5 driver patch + (besides the refactors) shows its usage instead of using just chain restore. + + Currently a filter's action list must be executed all together or + not at all as driver are only able to tell tc to continue executing from a + specific tc chain, and not a specific filter/action. + +* [v1: vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit](http://lore.kernel.org/netdev/20230129091145.2837-1-liubo03@inspur.com/) + + Follow the advice of the Documentation/filesystems/sysfs.rst + and show() should only use sysfs_emit() or sysfs_emit_at() + when formatting the value to be returned to user space. + +* [v1: net-next: net/tls: tls_is_tx_ready() checked list_entry](http://lore.kernel.org/netdev/20230128-list-entry-null-check-tls-v1-1-525bbfe6f0d0@diag.uniroma1.it/) + + tls_is_tx_ready() checks that list_first_entry() does not return NULL. + This condition can never happen. For empty lists, list_first_entry() + returns the list_entry() of the head, which is a type confusion. + Use list_first_entry_or_null() which returns NULL in case of empty lists. + +* [v1: can: etas_es58x: do not send disable channel command if device is unplugged](http://lore.kernel.org/netdev/20230128133815.1796221-1-mailhol.vincent@wanadoo.fr/) + + When turning the network interface down, es58x_stop() is called and + will send a command to the ES58x device to disable the channel + c.f. es58x_ops::disable_channel(). + + However, if the device gets unplugged while the network interface is + still up, es58x_ops::disable_channel() will obviously fail to send the + URB command and the driver emits below error message: + + es58x_submit_urb: USB send urb failure: -ENODEV + + Check the usb device state before sending the disable channel command + in order to silence above error message. + +* [v1: net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC](http://lore.kernel.org/netdev/20230128094232.2451947-1-arinc.unal@arinc9.com/) + + According to my tests on MT7621AT and MT7623NI SoCs, hardware DSA untagging + won't work on the second MAC. Therefore, disable this feature when the + second MAC of the MT7621 and MT7623 SoCs is being used. + +* [v1: net-next: sh: checksum: add missing linux/uaccess.h include](http://lore.kernel.org/netdev/20230128073108.1603095-1-kuba@kernel.org/) + + SuperH does not include uaccess.h, even tho it calls access_ok(). + +* [v1: net-next: net: phy: motorcomm: change the phy id of yt8521 and yt8531s to lowercase](http://lore.kernel.org/netdev/20230128063558.5850-2-Frank.Sae@motor-comm.com/) + + The phy id is usually defined in lower case. + +* [v1: vhost/vdpa: Add MSI translation tables to iommu for software-managed MSI](http://lore.kernel.org/netdev/20230128031740.166743-1-sunnanyong@huawei.com/) + + Once enable iommu domain for one device, the MSI + translation tables have to be there for software-managed MSI. + Otherwise, platform with software-managed MSI without an + irq bypass function, can not get a correct memory write event + from pcie, will not get irqs. + The solution is to obtain the MSI phy base address from + iommu reserved region, and set it to iommu MSI cookie, + then translation tables will be created while request irq. + +* [v1: ixgbe: Panic during XDP_TX with > 64 CPUs](http://lore.kernel.org/netdev/20230128011213.150171-1-jjh@daedalian.us/) + + In commit 'ixgbe: let the xdpdrv work with more than 64 cpus' + (4fe815850bdc8d4cc94e06fe1de069424a895826), support was added to allow + XDP programs to run on systems with more than 64 CPUs by locking the + XDP TX rings and indexing them using cpu % 64 (IXGBE_MAX_XDP_QS). + + Upon trying this out patch via the Intel 5.18.6 out of tree driver + on a system with more than 64 cores, the kernel paniced with an + array-index-out-of-bounds at the return in ixgbe_determine_xdp_ring in + ixgbe.h, which means ixgbe_determine_xdp_q_idx was just returning the + cpu instead of cpu % IXGBE_MAX_XDP_QS. + +* [v1: Bluetooth: hci_conn: Refactor hci_bind_bis() since it always succeeds](http://lore.kernel.org/netdev/20230128005150.never.909-kees@kernel.org/) + + The compiler thinks "conn" might be NULL after a call to hci_bind_bis(), + which cannot happen. Avoid any confusion by just making it not return a + value since it cannot fail. Fixes the warnings seen with GCC 13: + + In function 'arch_atomic_dec_and_test', + inlined from 'atomic_dec_and_test' at ../include/linux/atomic/atomic-instrumented.h:576:9, + inlined from 'hci_conn_drop' at ../include/net/bluetooth/hci_core.h:1391:6, + inlined from 'hci_connect_bis' at ../net/bluetooth/hci_conn.c:2124:3: + ../arch/x86/include/asm/rmwcc.h:37:9: warning: array subscript 0 is outside array bounds of 'atomic_t[0]' [-Warray-bounds=] + 37 | asm volatile (fullop CC_SET(cc) \ + | ^ + + ... + In function 'hci_connect_bis': + cc1: note: source object is likely at address zero + +* [v1: net: ethernet: mtk_eth_soc: Avoid truncating allocation](http://lore.kernel.org/netdev/20230127223853.never.014-kees@kernel.org/) + + There doesn't appear to be a reason to truncate the allocation used for + flow_info, so do a full allocation and remove the unused empty struct. + GCC does not like having a reference to an object that has been + partially allocated, as bounds checking may become impossible when + such an object is passed to other code. + +* [v1: net: dsa: microchip: ptp: add one more PTP dependency](http://lore.kernel.org/netdev/20230127221323.2522421-1-arnd@kernel.org/) + + When only NET_DSA_MICROCHIP_KSZ8863_SMI is built-in but + PTP is a loadable module, the ksz_ptp support still causes + a link failure: + + ld.lld-16: error: undefined symbol: ptp_clock_index + >>> referenced by ksz_ptp.c + >>> drivers/net/dsa/microchip/ksz_ptp.o:(ksz_get_ts_info) in archive vmlinux.a + + Add the same dependency here that exists with the KSZ9477_I2C + and KSZ_SPI drivers. + +* [v2: net-next: ibmvnic: Toggle between queue types in affinity mapping](http://lore.kernel.org/netdev/20230127214358.318152-1-nnac123@linux.ibm.com/) + + Previously, ibmvnic IRQs were assigned to CPU numbers by assigning all + the IRQs for transmit queues then assigning all the IRQs for receive + queues. With multi-threaded processors, in a heavy RX or TX environment, + physical cores would either be overloaded or underutilized (due to the + IRQ assignment algorithm). This approach is sub-optimal because IRQs for + the same subprocess (RX or TX) would be bound to adjacent CPU numbers, + meaning they were more likely to be contending for the same core. + +* [v3: net-next: net/sched: transition act_pedit to rcu and percpu stats](http://lore.kernel.org/netdev/20230127192752.3643015-1-pctammela@mojatatu.com/) + + The software pedit action didn't get the same love as some of the + other actions and it's still using spinlocks and shared stats. + Transition the action to rcu and percpu stats which improves the + action's performance dramatically. + +* [v9: bpf-next: Add skb + xdp dynptrs](http://lore.kernel.org/netdev/20230127191703.3864860-1-joannelkoong@gmail.com/) + + This patchset is the 2nd in the dynptr series. The 1st can be found here [0]. + + When comparing the differences in runtime for packet parsing without dynptrs + vs. with dynptrs, there is no noticeable difference. Patch 5 contains more + details as well as examples of how to use skb and xdp dynptrs. + +* [v1: net-next: gve: Introduce a way to disable queue formats](http://lore.kernel.org/netdev/20230127190744.3721063-1-jeroendb@google.com/) + + The device is capable of simultaneously supporting multiple + queue formats. With this change the driver can deliberately pick a queue format. + +* [v5: net-next: Allow offloading of UDP NEW connections via act_ct](http://lore.kernel.org/netdev/20230127183845.597861-1-vladbu@nvidia.com/) + + Currently only bidirectional established connections can be offloaded + via act_ct. Such approach allows to hardcode a lot of assumptions into + act_ct, flow_table and flow_offload intermediate layer codes. + +* [v1: selftests: net: udpgso_bench_tx: Introduce exponential back-off retries](http://lore.kernel.org/netdev/20230127181625.286546-1-andrei.gherzan@canonical.com/) + + The tx and rx test programs are used in a couple of test scripts including + "udpgro_bench.sh". Taking this as an example, when the rx/tx programs + are invoked subsequently, there is a chance that the rx one is not ready to + accept socket connections. + +* [v3: Introduce STM32 system bus](http://lore.kernel.org/netdev/20230127164040.1047583-1-gatien.chevallier@foss.st.com/) + + Document STM32 System Bus. This bus is intended to control firewall + access for the peripherals connected to it. + + For every peripheral, the bus checks the firewall registers to see + if the peripheral is configured as non-secure. If the peripheral + is configured as secure, the node is marked populated, so the + device won't be probed. + +* [v15: net-next: vmxnet3: Add XDP support.](http://lore.kernel.org/netdev/20230127163027.60672-1-u9012063@gmail.com/) + + The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT. + + Background: + The vmxnet3 rx consists of three rings: ring0, ring1, and dataring. + For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma + mapped to the ring's descriptor. If LRO is enabled and packet size larger + than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of + the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. + +* [[PATCH bpf-next RFC V1] selftests/bpf: xdp_hw_metadata clear metadata when -EOPNOTSUPP](http://lore.kernel.org/netdev/167482734243.892262.18210955230092032606.stgit@firesoul/) + + The AF_XDP userspace part of xdp_hw_metadata see non-zero as a signal of + the availability of rx_timestamp and rx_hash in data_meta area. The + kernel-side BPF-prog code doesn't initialize these members when kernel + returns an error e.g. -EOPNOTSUPP. This memory area is not guaranteed to + be zeroed, and can contain garbage/previous values, which will be read + and interpreted by AF_XDP userspace side. + +* [v1: net-next: Adding Sparx5 ES2 VCAP support](http://lore.kernel.org/netdev/20230127130830.1481526-1-steen.hegelund@microchip.com/) + + This provides the Egress Stage 2 (ES2) VCAP (Versatile Content-Aware + Processor) support for the Sparx5 platform. + + The ES2 VCAP is an Egress Access Control VCAP that uses frame keyfields and + previously classified keyfields to apply e.g. policing, trapping or + mirroring to frames. + +* [v2: net: ixgbe: allow to increase MTU to some extent with XDP enabled](http://lore.kernel.org/netdev/20230127122018.2839-1-kerneljasonxing@gmail.com/) + + I encountered one case where I cannot increase the MTU size directly + from 1500 to 2000 with XDP enabled if the server is equipped with + IXGBE card, which happened on thousands of servers in production environment. + +* [v1: pull request for net-next: batman-adv 2023-01-27](http://lore.kernel.org/netdev/20230127102133.700173-1-sw@simonwunderlich.de/) + + The following changes since commit 88603b6dc419445847923fcb7fe5080067a30f98: + + Linux 6.2-rc2 (2023-01-01 13:53:16 -0800) + + are available in the Git repository at: + + git://git.open-mesh.org/linux-merge.git tags/batadv-next-pullrequest-20230127 + + for you to fetch changes up to 0c4061c0d0e2c381ffe4d8b7c62ea69ad8132071: + + batman-adv: tvlv: prepare for tvlv enabled multicast packet type (2023-01-21 19:01:59 +0100) + +* [v1: page_pool: add a comment explaining the fragment counter usage](http://lore.kernel.org/netdev/20230127101627.891614-1-ilias.apalodimas@linaro.org/) + + When reading the page_pool code the first impression is that keeping + two separate counters, one being the page refcnt and the other being + fragment pp_frag_count, is counter-intuitive. + + However without that fragment counter we don't know when to reliably + destroy or sync the outstanding DMA mappings. So let's add a comment + explaining this part. + +* [v1: net-next: net: netlink: recommend policy range validation](http://lore.kernel.org/netdev/20230127084506.09f280619d64.I5dece85f06efa8ab0f474ca77df9e26d3553d4ab@changeid/) + + For large ranges (outside of s16) the documentation currently + recommends open-coding the validation, but it's better to use + the NLA_POLICY_FULL_RANGE() or NLA_POLICY_FULL_RANGE_SIGNED() + policy validation instead; recommend that. + +* [v1: net-next: net: bcmgenet: Add a check for oversized packets](http://lore.kernel.org/netdev/20230127000819.3934-1-f.fainelli@gmail.com/) + + Occasionnaly we may get oversized packets from the hardware which + exceed the nomimal 2KiB buffer size we allocate SKBs with. Add an early + check which drops the packet to avoid invoking skb_over_panic() and move + on to processing the next packet. + +#### 安全增强 + +* [v1: scsi: aacraid: Allocate cmd_priv with scsicmd](http://lore.kernel.org/linux-hardening/20230128000409.never.976-kees@kernel.org/) + + The aac_priv() helper assumes that the private cmd area immediately + follows struct scsi_cmnd. Allocate this space as part of scsicmd, + else there is a risk of heap overflow. + +* [v1: regulator: max77802: Bounds check regulator id against opmode](http://lore.kernel.org/linux-hardening/20230127225203.never.864-kees@kernel.org/) + + Explicitly bounds-check the id before accessing the opmode array. Seen + with GCC 13: + + ../drivers/regulator/max77802-regulator.c: In function 'max77802_enable': + ../drivers/regulator/max77802-regulator.c:217:29: warning: array subscript [0, 41] is outside array bounds of 'unsigned int[42]' [-Warray-bounds=] + 217 | if (max77802->opmode[id] == MAX77802_OFF_PWRREQ) + | + + ^ + + ../drivers/regulator/max77802-regulator.c:62:22: note: while referencing 'opmode' + 62 | unsigned int opmode[MAX77802_REG_MAX]; + | ^ + +* [v1: ASoC: kirkwood: Iterate over array indexes instead of using pointer math](http://lore.kernel.org/linux-hardening/20230127224128.never.410-kees@kernel.org/) + + Walking the dram->cs array was seen as accesses beyond the first array + item by the compiler. Instead, use the array index directly. This allows + for run-time bounds checking under CONFIG_UBSAN_BOUNDS as well. + +* [v1: scripts/dtc: Replace 0-length arrays with flexible arrays](http://lore.kernel.org/linux-hardening/20230127224101.never.746-kees@kernel.org/) + + Replace the 0-length array with a C99 flexible array. Seen with GCC 13 + under -fstrict-flex-arrays: + + In file included from ../lib/fdt_ro.c:2: + ../lib/../scripts/dtc/libfdt/fdt_ro.c: In function 'fdt_get_name': + ../lib/../scripts/dtc/libfdt/fdt_ro.c:319:24: warning: 'strrchr' reading 1 or more bytes from a region of size 0 [-Wstringop-overread] + 319 | leaf = strrchr(nameptr, '/'); + | ^ + +* [v1: coda: Avoid partial allocation of sig_inputArgs](http://lore.kernel.org/linux-hardening/20230127223921.never.882-kees@kernel.org/) + + GCC does not like having a partially allocation object, since it cannot + reason about it for bounds checking when it is passed to other code. + Instead, fully allocate sig_inputArgs. + +* [v1: iommufd: Add top-level bounds check on kernel buffer size](http://lore.kernel.org/linux-hardening/20230127223816.never.413-kees@kernel.org/) + + While the op->size assignments are already bounds-checked at static + initializer time, these limits aren't aggregated and tracked when doing + later variable range checking under -Warray-bounds. Help the compiler + see that we know what we're talking about, and we'll never ask to + write more that sizeof(ucmd.cmd) bytes during the memset() inside + copy_struct_from_user(). + +* [v1: lm85: Bounds check to_sensor_dev_attr()->index usage](http://lore.kernel.org/linux-hardening/20230127223744.never.113-kees@kernel.org/) + + The index into various register arrays was not bounds checked. Add checking. + +* [v2: ACPICA: Replace fake flexible arrays with flexible array members](http://lore.kernel.org/linux-hardening/20230127191621.gonna.262-kees@kernel.org/) + + One-element arrays (and multi-element arrays being treated as + dynamically sized) are deprecated[1] and are being replaced with + flexible array members in support of the ongoing efforts to tighten the + FORTIFY_SOURCE routines on memcpy(), correctly instrument array indexing + with UBSAN_BOUNDS, and to globally enable -fstrict-flex-arrays=3. + +* [v1: powerpc/rtas: Replace one-element arrays with flexible arrays](http://lore.kernel.org/linux-hardening/20230127085023.271674-1-ajd@linux.ibm.com/) + + Using a one-element array as a fake flexible array is deprecated. + + Replace the one-element flexible arrays in rtas-types.h with C99 standard + flexible array members instead. + + This helps us move towards enabling -fstrict-flex-arrays=3 in future. + +* [v1: x86: enable Data Operand Independent Timing Mode](http://lore.kernel.org/linux-hardening/20230125012801.362496-1-ebiggers@kernel.org/) + + According to documentation that Intel published recently [1], Intel CPUs + based on the Ice Lake and later microarchitectures don't guarantee "data + operand independent timing" by default. I.e., instruction execution + times may depend on the values of data operated on. + +* [v1: 5.10: Backport oops_limit to 5.10](http://lore.kernel.org/linux-hardening/20230124193004.206841-1-ebiggers@kernel.org/) + + This series backports the patchset + "exit: Put an upper limit on how often we can oops" + (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#u) + to 5.10, as recommended at + https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html + +* [v1: 5.15: Backport oops_limit to 5.15](http://lore.kernel.org/linux-hardening/20230124185110.143857-1-ebiggers@kernel.org/) + + This series backports the patchset + "exit: Put an upper limit on how often we can oops" + (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#u) + to 5.15, as recommended at + https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html + +#### 异步 IO + +* [v1: liburing: liburing: patches for drain bug](http://lore.kernel.org/io-uring/20230127111133.2551653-1-dylany@meta.com/) + + Two patches for the drain bug I just sent a patch for. Patch 1 definitely + fails, but patch 2 I am sending just in case as it exercises some more code paths. + +* [v1: io_uring: always prep_async for drain requests](http://lore.kernel.org/io-uring/20230127105911.2420061-1-dylany@meta.com/) + + Drain requests all go through io_drain_req, which has a quick exit in case + there is nothing pending (ie the drain is not useful). In that case it can + run the issue the request immediately. + + However for safety it queues it through task work. + The problem is that in this case the request is run asynchronously, but the async work has not been prepared through io_req_prep_async. + +* [v1: io_uring: handle TIF_NOTIFY_RESUME when checking for task_work](http://lore.kernel.org/io-uring/be6fa09b-8a27-412b-52af-1cd3bc896ad4@kernel.dk/) + + If TIF_NOTIFY_RESUME is set, then we need to call resume_user_mode_work() + for PF_IO_WORKER threads. They never return to usermode, hence never get + a chance to process any items that are marked by this flag. Most notably + this includes the final put of files, but also any throttling markers set by block cgroups. + +* [v1: io_uring: initialize count variable to 0](http://lore.kernel.org/io-uring/20230124125805.630359-1-trix@redhat.com/) + + The clang build fails with + io_uring/io_uring.c:1240:3: error: variable 'count' is uninitialized + when used here [-Werror,-Wuninitialized] + count += handle_tw_list(node, &ctx, &uring_locked, &fake); + ^ + + The commit listed in the fixes: removed the initialization of count. + +* [v1: liburing: deferred tw msg_ring tests](http://lore.kernel.org/io-uring/cover.1674523156.git.asml.silence@gmail.com/) + + Add a regression test for a recent null deref regression with + disabled deferred ring and cover a couple more deferred tw cases. + +* [v1: for-next: normal tw optimisation + refactoring](http://lore.kernel.org/io-uring/cover.1674484266.git.asml.silence@gmail.com/) + + 1-5 are random refactoring patches + 6 is a prep patch, which also helps to inline handle_tw_list + 7 returns a link tw run optimisation for normal tw + +* [v2: io_uring/net: cache provided buffer group value for multishot receives](http://lore.kernel.org/io-uring/f1a1ba93-1adf-63fa-6f0f-f3182f165841@kernel.dk/) + + If we're using ring provided buffers with multishot receive, and we end + up doing an io-wq based issue at some points that also needs to select + a buffer, we'll lose the initially assigned buffer group as + io_ring_buffer_select() correctly clears the buffer group list as the + issue isn't serialized by the ctx uring_lock. This is fine for normal + receives as the request puts the buffer and finishes, but for multishot, + we will re-arm and do further receives. + +#### Rust For Linux + +* [v2: rust: MAINTAINERS: Add the zulip link](http://lore.kernel.org/rust-for-linux/20230128072258.3384037-1-boqun.feng@gmail.com/) + + Zulip organization "rust-for-linux" has been created since about 2 years + ago[1], and proven to be a great place for Rust related discussion, + therefore add the information in MAINTAINERS file so that newcomers have + more options to find guide and help. + + [1]: https://lore.kernel.org/rust-for-linux/CANiq72=xVaMQkgCA9rspjV8bhWDGqAn4x78B0_4U1WBJYj1PiA@mail.gmail.com/ + +* [v1: Rust enablement for AArch64](http://lore.kernel.org/rust-for-linux/20230125163739.3798252-1-Jamie.Cunliffe@arm.com/) + + The first patch is from Miguel's tree to enable Rust support for + AArch64. This has been tested with the Rust samples, and the generated code has also been manually inspected. + +* [v1: x86/insn_decoder_test: allow longer symbol-names](http://lore.kernel.org/rust-for-linux/320c4dba-9919-404b-8a26-a8af16be1845@app.fastmail.com/) + + Increase the allowed line-length of the insn-decoder-test to 4k to allow + for symbol-names longer than 256 characters. + + The insn-decoder-test takes objdump output as input, which may contain + symbol-names as instruction arguments. + +#### BPF + +* [v1: bpf-next: bpf: Build-time assert that cpumask offset is zero](http://lore.kernel.org/bpf/20230128141537.100777-1-void@manifault.com/) + + The first element of a struct bpf_cpumask is a cpumask_t. This is done + to allow struct bpf_cpumask to be cast to a struct cpumask. If this + element were ever moved to another field, any BPF program passing a + struct bpf_cpumask * to a kfunc expecting a const struct cpumask * would + immediately fail to load. Add a build-time assertion so this is + assumption is captured and verified. + +* [v4: bpf-next: xdp: introduce xdp-feature support](http://lore.kernel.org/bpf/cover.1674913191.git.lorenzo@kernel.org/) + + Introduce the capability to export the XDP features supported by the NIC. + Introduce a XDP compliance test tool (xdp_features) to check the features + exported by the NIC match the real features supported by the driver. + Allow XDP_REDIRECT of non-linear XDP frames into a devmap. + Export XDP features for each XDP capable driver. + Extend libbpf netlink implementation in order to support netlink_generic protocol. + +* [v1: bpf-next: selftest/bpf: Make crashes more debuggable in test_progs](http://lore.kernel.org/bpf/20230127215705.1254316-1-sdf@google.com/) + + Reset stdio before printing verbose log of the SIGSEGV'ed test. + Otherwise, it's hard to understand what's going on in the cases like [0]. + + 0: https://github.com/kernel-patches/bpf/actions/runs/4019879316/jobs/6907358876 + +* [v1: bpf-next: Add support for tracing programs in BPF_PROG_RUN](http://lore.kernel.org/bpf/20230127214353.628551-1-grantseltzer@gmail.com/) + + This patch changes the behavior of how BPF_PROG_RUN treats tracing + (fentry/fexit) programs. Previously only a return value is injected + but the actual program was not run. New behavior mirrors that of + running raw tracepoint BPF programs which actually runs the + instructions of the program via `bpf_prog_run()` + +* [v1: bpf-next: New benchmark for hashmap lookups](http://lore.kernel.org/bpf/20230127181457.21389-1-aspsk@isovalent.com/) + + Add a new benchmark for hashmap lookups and fix several typos. See individual + commits for descriptions. + + One thing to mention here is that in commit 3 I've patched bench so that now + command line options can be reused by different benchmarks. + +* [v2: bpf-next: selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata](http://lore.kernel.org/bpf/20230126225030.510629-1-sdf@google.com/) + + The existing timestamping_enable() is a no-op because it applies + to the socket-related path that we are not verifying here anymore. + +* [v1: perf lock contention: Add -S/--callstack-filter option](http://lore.kernel.org/bpf/20230126000936.3017683-1-namhyung@kernel.org/) + + The -S/--callstack-filter is to limit display entries having the given + string in the callstack (not only in the caller in the output). + + The following example shows lock contention results if the callstack + has 'net' substring somewhere. + +* [v3: bpf-next: Enable bpf_setsockopt() on ktls enabled sockets.](http://lore.kernel.org/bpf/20230125201608.908230-1-kuifeng@meta.com/) + + This patchset implements a change to bpf_setsockopt() which allows + ktls enabled sockets to be used with the SOL_TCP level. This is + necessary as when ktls is enabled, it changes the function pointer of + setsockopt of the socket, which bpf_setsockopt() checks in order to + make sure that the socket is a TCP socket. Checking sk_protocol + instead of the function pointer will ensure that bpf_setsockopt() with + the SOL_TCP level still works on sockets with ktls enabled. + +* [v4: bpf-next: Enable struct_ops programs to be sleepable](http://lore.kernel.org/bpf/20230125164735.785732-1-void@manifault.com/) + + This is part 4 of https://lore.kernel.org/bpf/20230123232228.646563-1-void@manifault.com/ + + Part 3: https://lore.kernel.org/all/20230125050359.339273-1-void@manifault.com/ + Part 2: https://lore.kernel.org/all/20230124160802.1122124-1-void@manifault.com/ + +* [v2: net: xdp: execute xdp_do_flush() before napi_complete_done()](http://lore.kernel.org/bpf/20230125074901.2737-1-magnus.karlsson@gmail.com/) + + Make sure that xdp_do_flush() is always executed before + napi_complete_done(). This is important for two reasons. First, a + redirect to an XSKMAP assumes that a call to xdp_do_redirect() from + napi context X on CPU Y will be followed by a xdp_do_flush() from the + same napi context and CPU. This is not guaranteed if the + napi_complete_done() is executed before xdp_do_flush(), as it tells + the napi logic that it is fine to schedule napi context X on another CPU. + +* [v1: bpf-next: bpftool: disable bpfilter kernel config checks](http://lore.kernel.org/bpf/20230125025516.5603-1-chethan.suresh@sony.com/) + + We've experienced similar issues about bpfilter like below: + https://github.com/moby/moby/issues/43755 + https://lore.kernel.org/bpf/CAADnVQJ5MxGkq=ng214aYoH-NmZ1gjoS=ZTY1eU-Fag4RwZjdg@mail.gmail.com/ + + Considering the current development status of bpfilter, + disable bpfilter kernel config checks in bpftool feature. + +* [v1: tracing: Have bpf and perf reuse the tracefs TRACE_EVENT macros](http://lore.kernel.org/bpf/20230124202238.563854686@goodmis.org/) + + When reviewing Linyu Yuan patches[1] where the change was to move most + the macros from perf and bpf into stages, I realized that the macros + that makes up perf and bpf events are duplicated from the tracefs + macros that were moved into the stages directory. One reason to move + them into that directory was to remove duplicate code. + +### 周边技术动态 + +#### Qemu + +* [v8: riscv: Allow user to set the satp mode](http://lore.kernel.org/qemu-devel/20230125162010.1615787-1-alexghiti@rivosinc.com/) + + This introduces new properties to allow the user to set the satp mode, + see patch 3 for full syntax. In addition, it prevents cpus to boot in a + satp mode they do not support (see patch 4). + +* [v1: hw/riscv: boot: Don't use CSRs if they are disabled](http://lore.kernel.org/qemu-devel/20230123035754.75553-1-alistair.francis@opensource.wdc.com/) + + If the CSRs and CSR instructions are disabled because the Zicsr + extension isn't enabled then we want to make sure we don't run any CSR + instructions in the boot ROM. + + This patches removes the CSR instructions from the reset-vec if the + extension isn't enabled. We replace the instruction with a NOP instead. + +#### U-Boot + +* [v2: dm: Move to new driver model schema for device tree tags](http://lore.kernel.org/u-boot/20230129012652.83432-1-sjg@chromium.org/) + + Now that a new schema has been accepted upstream, press it into service in + U-Boot. + +* [v1: elf: add Elf64_Sym](http://lore.kernel.org/u-boot/20230122190453.45033-1-kalle.wachsmuth@gmail.com/) + + Required as Elf_Sym in tools/prelink-riscv.inc. I assume people have + been using an OS-supplied elf.h, but macOS doesn't have that. + + Taken from + https://github.com/torvalds/linux/blob/v6.1/include/uapi/linux/elf.h + +* [v2: net: sun8i-emac: Allwinner D1 Support](http://lore.kernel.org/u-boot/20230122225107.62464-1-samuel@sholland.org/) + + D1 is a RISC-V SoC containing an EMAC compatible with the A64 EMAC. In a + very roundabout way, this series finishes adding support for the D1 EMAC: + patch 4 resolves a compiler warning when building the driver for RISC-V. + The rest of the series is just cleanup requested by Jagan. + +## 20230122:第 30 期 + +### 内核动态 + +#### RISC-V 架构支持 + +* [v2: Upstream kvx Linux port](http://lore.kernel.org/linux-riscv/20230120141002.2442-1-ysionneau@kalray.eu/) + + This patch series adds support for the kv3-1 CPU architecture of the kvx family + found in the Coolidge (aka MPPA3-80) SoC of Kalray. + + This is an RFC, since kvx support is not yet upstreamed into gcc/binutils, + therefore this patch series cannot be merged into Linux for now. + +* [v1: Add new partial clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230120024445.244345-1-xingyu.wu@starfivetech.com/) + + This patch serises are to add new partial clock drivers and reset + supports about System-Top-Group(STG), Image-Signal-Process(ISP) + and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. + +* [v4: riscv: elf: add .riscv.attributes parsing](http://lore.kernel.org/linux-riscv/20230119221833.3629409-1-vineetg@rivosinc.com/) + + This implements the elf loader hook to parse RV specific + .riscv.attributes section. This section is inserted by compilers + (gcc/llvm) with build related information such as -march organized as + tag/value attribute pairs. + +* [v2: spi: Add support for stacked/parallel memories](http://lore.kernel.org/linux-riscv/20230119185342.2093323-1-amit.kumar-mahapatra@amd.com/) + + This patch is in the continuation to the discussions which happened on + 'commit f89504300e94 ("spi: Stacked/parallel memories bindings")' for + adding dt-binding support for stacked/parallel memories. + +* [v1: riscv: uapi: Lie about having futex()](http://lore.kernel.org/linux-riscv/20230119193924.21186-1-palmer@rivosinc.com/) + + Without this libstdc++ correctly detects the lack of a futex() syscall + on rv32 and uses a fallback that doesn't work because it depends on + 64-bit atomics. + +* [v1: KVM: Add a common API for range-based TLB invalidation](http://lore.kernel.org/linux-riscv/20230119173559.2517103-1-dmatlack@google.com/) + + This series introduces a common API for performing range-based TLB + invalidation. This is then used to supplant + kvm_arch_flush_remote_tlbs_memslot() and pave the way for two other + patch series. + +* [v4: Generic IPI sending tracepoint](http://lore.kernel.org/linux-riscv/20230119143619.2733236-1-vschneid@redhat.com/) + + Detecting IPI *reception* is relatively easy, e.g. using + trace_irq_handler_{entry,exit} or even just function-trace + flush_smp_call_function_queue() for SMP calls. + + Figuring out their *origin*, is trickier as there is no generic tracepoint tied + to e.g. smp_call_function(): + + o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them (cf. trace_call_function{_single}_entry()). + +* [v4: JH7110 PMU Support](http://lore.kernel.org/linux-riscv/20230119094447.21939-1-walker.chen@starfivetech.com/) + + This patchset adds PMU (Power Management Unit) controller driver for the + StarFive JH7110 SoC. In order to meet low power requirements, PMU is + designed for including multiple PM domains that can be used for power + gating of selected IP blocks for power saving by reduced leakage current. + +* [v3: riscv: Dump faulting instructions in oops handler](http://lore.kernel.org/linux-riscv/20230119074738.708301-1-bjorn@kernel.org/) + + RISC-V does not dump faulting instructions in the oops handler. This + series adds "Code:" dumps to the oops output together with + scripts/decodecode support. + +* [v1: Add RISC-V 32 NOMMU support](http://lore.kernel.org/linux-riscv/20230119052642.1112171-1-Mr.Bossman075@gmail.com/) + + This patch-set aims to add NOMMU support to RV32. + Many people want to build simple emulators or HDL + models of RISC-V this patch makes it posible to run linux on them. + +* [v2: PATCH: riscv: Introduce system suspend support](http://lore.kernel.org/linux-riscv/20230118180338.6484-1-ajones@ventanamicro.com/) + + Booting with an OpenSBI including the RFC series[1] implementing the + draft proposal for SBI system suspend[2] we can add system support to + Linux. This support implements "suspend-to-RAM", which means when a + kernel is built with CONFIG_SUSPEND 'echo mem > /sys/power/state' will + initiate a suspension. + +* [v5: Introduce __xchg, non-atomic xchg](http://lore.kernel.org/linux-riscv/20230118153529.57695-1-andrzej.hajda@intel.com/) + + There is lot of places it can be used in, I have just chosen + some of them. I can provide cocci script to detect others (not all), if necessary. + +* [v4: Add Ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230118061701.30047-1-yanhong.wang@starfivetech.com/) + + This series adds ethernet support for the StarFive JH7110 RISC-V SoC. The series + includes MAC driver. The MAC version is dwmac-5.20 (from Synopsys DesignWare). + For more information and support, you can visit RVspace wiki[1]. + +* [v5: hwrng: starfive: Add driver for TRNG module](http://lore.kernel.org/linux-riscv/20230117015445.32500-1-jiajie.ho@starfivetech.com/) + + This patch series adds kernel support for StarFive JH7110 hardware + random number generator. First 2 patches add binding docs and device + driver for this module. Patch 3 adds devicetree entry for VisionFive 2 SoC. + +* [v1: riscv: alternative: proceed one more instruction for auipc/jalr pair](http://lore.kernel.org/linux-riscv/20230115162811.3146-1-jszhang@kernel.org/) + + If we patched auipc + jalr pair, we'd better proceed one more + instruction. Andrew pointed out "There's not a problem now, since + we're only adding a fixup for jal, not jalr, but we should + future-proof this and there's no reason to revisit an already fixed-up + instruction anyway." + +* [v4: riscv: improve boot time isa extensions handling](http://lore.kernel.org/linux-riscv/20230115154953.831-1-jszhang@kernel.org/) + + Generally, riscv ISA extensions are fixed for any specific hardware + platform, so a hart's features won't change after booting, this + chacteristic makes it straightforward to use a static branch to check + a specific ISA extension is supported or not to optimize performance. + +#### 进程调度 + +* [v1: RESEND: sched: cpumask: improve on cpumask_local_spread() locality](http://lore.kernel.org/lkml/20230121042436.2661843-1-yury.norov@gmail.com/) + + This has significant performance implications on NUMA machines, for example + when using NUMA-aware allocated memory together with NUMA-aware IRQ affinity hints. + +* [v2: sched: Store restrict_cpus_allowed_ptr() call state](http://lore.kernel.org/lkml/20230121021749.55313-1-longman@redhat.com/) + + The user_cpus_ptr field was originally added by commit b90ca8badbd1 + ("sched: Introduce task_struct::user_cpus_ptr to track requested + affinity"). It was used only by arm64 arch due to possible asymmetric CPU setup. + +* [v2: sched: cpuset: Don't rebuild sched domains on suspend-resume](http://lore.kernel.org/lkml/20230120194822.962958-1-qyousef@layalina.io/) + + Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information") + enabled rebuilding sched domain on cpuset and hotplug operations to + correct deadline accounting. + +* [v2: sched/debug: Put sched/domains files under the verbose flag](http://lore.kernel.org/lkml/20230120163330.1334128-1-pauld@redhat.com/) + + The debug files under sched/domains can take a long time to regenerate, + especially when updates are done one at a time. Move these files under + the sched verbose debug flag. Allow changes to verbose to trigger + generation of the files. + +* [v4: sched/fair: unlink misfit task from cpu overutilized](http://lore.kernel.org/lkml/20230119174244.2059628-1-vincent.guittot@linaro.org/) + + By taking into account uclamp_min, the 1:1 relation between task misfit + and cpu overutilized is no more true as a task with a small util_avg may + not fit a high capacity cpu because of uclamp_min constraint. + +* [v2: sched: print parent comm in sched_show_task()](http://lore.kernel.org/lkml/20230119110642.GA6463@didi-ThinkCentre-M930t-N000/) + + Knowing who the parent is might be useful for debugging. + For example, we can sometimes resolve kernel hung tasks by stopping + the person who begins those hung tasks. + With the parent's name printed in sched_show_task(), + it might be helpful to let people know which "service" should be operated. + +* [v1: sched: Pass flags to cpufreq governor for RT tasks](http://lore.kernel.org/lkml/CAKns5cVijC_o13H7UM7WS2ckexP2y1aYJviqNcKeCE-y_2mcXQ@mail.gmail.com/) + + Right now only CFS tasks could pass flags to the cpufreq governor + but not RT tasks. This limits the ability of cpufreq governor to handle + RT tasks if it needs to. By passing flags of RT tasks will increase + the flexibility of the cpufreq governor. + +* [v1: sched/numa: Enhance vma scanning](http://lore.kernel.org/lkml/20230116022508.9ll5S8Dns4XZ2BB0GB_N7d_2xSQRlFyoWuvqInl536w@z/) + + The patchset proposes one of the enhancements to numa vma scanning + suggested by Mel. + + Existing mechanism of scan period involves, scan period derived from + per-thread stats. Process Adaptive autoNUMA [1] proposed to gather NUMA + fault stats at per-process level to capture aplication behaviour better. + + During that course of discussion, Mel proposed several ideas to enhance + current numa balancing. + +#### 内存管理 + +* [v1: mm-unstable: lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default](http://lore.kernel.org/linux-mm/20230121033942.350387-1-42.hyeyoo@gmail.com/) + + In workloads where this_cpu operations are frequently performed, + enabling DEBUG_PREEMPT may result in significant increase in + runtime overhead due to frequent invocation of + __this_cpu_preempt_check() function. + +* [v1: Convert a couple migrate functions to use folios](http://lore.kernel.org/linux-mm/20230121005622.57808-1-vishal.moola@gmail.com/) + + This patch set introduces folio_movable_ops() and converts 3 functions + in mm/migrate.c to use folios. + +* [v2: drivers/base/memory: Use array to show memory block state](http://lore.kernel.org/linux-mm/20230120233814.368803-1-gshan@redhat.com/) + + Use an array to show memory block state from '/sys/devices/system/ + memory/memoryX/state', to simplify the code. Besides, WARN_ON() + is removed since the warning can be caught by the return value, + which is "ERROR-UNKNOWN-%ld\n". A system reboot caused by WARN_ON() is definitely unexpected as Greg mentioned. + +* [[RFC RESEND PATCH 0/2] Add support for sharing page tables across processes (Previously mshare)](http://lore.kernel.org/linux-mm/20230120160816.AydRnPHkAimCUUAJa06mi8Hyi_bW0rS-DsXXwfFNLyo@z/) + + Memory pages shared between processes require a page table entry + (PTE) for each process. Each of these PTE consumes consume some of + the memory and as long as number of mappings being maintained is + small enough, this space consumed by page tables is not + objectionable. + +* [v1: memcpy_from_folio()](http://lore.kernel.org/linux-mm/Y8qr8c3+SJLGWhUo@casper.infradead.org/) + + I think this is probably the best option. We could have a loop that + kmaps each page in the folio, but that seems like excessive complexity. + I'm happy to have highmem systems be less efficient, since they are + anyway. Another potential area of concern is that folios can be quite + large and maybe having preemption disabled while we copy 2MB of data + might be a bad thing. + +* [v1: ASoC: SOF: sof-audio: prepare_widgets: Check swidget for NULL on sink failure](http://lore.kernel.org/linux-mm/20230120102125.30653-1-peter.ujfalusi@linux.intel.com/) + + If the swidget is NULL we skip the preparing of the widget and jump to + handle the sink path of the widget. + If the prepare fails in this case we would undo the prepare but the swidget + is NULL (we skipped the prepare for the widget). + + To avoid NULL pointer dereference in this case we must check swidget + against NULL pointer once again. + +* [v2: Introduce per NUMA node memory error statistics](http://lore.kernel.org/linux-mm/20230120034622.2698268-1-jiaqiyan@google.com/) + + In the RFC for Kernel Support of Memory Error Detection [1], one advantage + of software-based scanning over hardware patrol scrubber is the ability + to make statistics visible to system administrators. + +* [v1: linux-next: mm/hugetlb: replace get_hwpoison_huge_page() with get_hwpoison_hugetlb_folio() when !CONFIG_HUGETLBFS](http://lore.kernel.org/linux-mm/202301201036092738081@zte.com.cn/) + + When CONFIG_HUGETLBFS is not set, there are two problems. One + is implicit declaration of function get_hwpoison_hugetlb_folio(), + the other is get_hwpoison_huge_page() is defined but not used. + Fix them all by defining get_hwpoison_hugetlb_folio() instead of + get_hwpoison_huge_page() when !CONFIG_HUGETLB_PAGE. + +* [v5: Shadow stacks for userspace](http://lore.kernel.org/linux-mm/20230119212317.8324-1-rick.p.edgecombe@intel.com/) + + This series implements Shadow Stacks for userspace using x86's Control-flow + Enforcement Technology (CET). CET consists of two related security features: + shadow stacks and indirect branch tracking. This series implements just the + shadow stack part of this feature, and just for userspace. + +* [v1: convert hugetlb fault functions to folios](http://lore.kernel.org/linux-mm/20230119211446.54165-1-sidhartha.kumar@oracle.com/) + + This series converts the hugetlb page faulting functions to operate on + folios. These include hugetlb_no_page(), hugetlb_wp(), + copy_hugetlb_page_range(), and hugetlb_mcopy_atomic_pte(). + +* [v2: mm: In-kernel support for memory-deny-write-execute (MDWE)](http://lore.kernel.org/linux-mm/20230119160344.54358-1-joey.gouly@arm.com/) + + This is v2 of the MDWE patchset. + +* [v1: iov_iter: Add a function to extract a page list from an iterator](http://lore.kernel.org/linux-mm/20230119152926.2899954-1-dhowells@redhat.com/) + + Add a function, iov_iter_extract_pages(), to extract a list of pages from + an iterator. The pages may be returned with a reference added or a pin + added or neither, depending on the type of iterator and the direction of + transfer. The caller must pass FOLL_READ_FROM_MEM or FOLL_WRITE_TO_MEM + as part of gup_flags to indicate how the iterator contents are to be used. + +* [v3: mm/hugetlb: convert get_hwpoison_huge_page() to folios](http://lore.kernel.org/linux-mm/20230119011057.91349-1-sidhartha.kumar@oracle.com/) + + Straightforward conversion of get_hwpoison_huge_page() to + get_hwpoison_hugetlb_folio(). Reduces two references to a head page in + memory-failure.c + +#### 文件系统 + +* [v7: iov_iter: Improve page extraction (ref, pin or just list)](http://lore.kernel.org/linux-fsdevel/20230120175556.3556978-1-dhowells@redhat.com/) + + Here are patches to provide support for extracting pages from an iov_iter + and a patch to use the primary extraction function in the block layer bio code. + +* [v3: Composefs: an opportunistically sharing verified image filesystem](http://lore.kernel.org/linux-fsdevel/cover.1674227308.git.alexl@redhat.com/) + + Giuseppe Scrivano and I have recently been working on a new project we + call composefs. This is the first time we propose this publically and + we would like some feedback on it. + +* [v1: Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"](http://lore.kernel.org/linux-fsdevel/20230120141150.1278819-1-agruenba@redhat.com/) + + Commit b2b0a5e97855 switched from generic_writepages() to + filemap_fdatawrite_wbc() in gfs2_ail1_start_one() on the path to + replacing ->writepage() with ->writepages() and eventually eliminating + the former. Function gfs2_ail1_start_one() is called from + gfs2_log_flush(), our main function for flushing the filesystem log. + +* [v2: fs/aio: obey min_nr when doing wakeups](http://lore.kernel.org/linux-fsdevel/20230120140347.2133611-1-kent.overstreet@linux.dev/) + + I've been observing workloads where IPIs due to wakeups in + aio_complete() are + 15% of total CPU time in the profile. Most of those + wakeups are unnecessary when completion batching is in use in io_getevents(). + +* [v1: shmem: support idmapped mounts for tmpfs](http://lore.kernel.org/linux-fsdevel/20230120094346.3182328-1-gscrivan@redhat.com/) + + This patch enables idmapped mounts for tmpfs when CONFIG_SHMEM is defined. + Since all dedicated helpers for this functionality exist, in this + patch we just pass down the idmap argument from the VFS methods to the relevant helpers. + +* [v1: RESEND: fs/namespace: defer free_mount from namespace_unlock](http://lore.kernel.org/linux-fsdevel/20230119211455.498968-1-echanude@redhat.com/) + + With the following patch, namespace_unlock will queue up the resources that + needs to be released and defer the operation through call_rcu to return without + waiting for the grace period. + +* [v3: fs/aio: Replace kmap{,_atomic}() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20230119162055.20944-1-fmdefrancesco@gmail.com/) + + The use of kmap() and kmap_atomic() are being deprecated in favor of + kmap_local_page(). + + With kmap_local_page() the mappings are per thread, CPU local, can take + page faults, and can be called from any context (including interrupts). + It is faster than kmap() in kernels with HIGHMEM enabled. + +* [v1: dax: use switch statement over chained ifs](http://lore.kernel.org/linux-fsdevel/CAPOgqxF_xEgKspetRJ=wq1_qSG3h8mkyXC58TXkUvx0agzEm_A@mail.gmail.com/) + + This patch uses a switch statement for pe_order, which improves + readability and on some platforms may minorly improve performance. It + also, to improve readability, recognizes that `PAGE_SHIFT - PAGE_SHIFT' is a constant, and uses 0 in its place instead. + +* [v1: fs: Use CHECK_DATA_CORRUPTION() when kernel bugs are detected](http://lore.kernel.org/linux-fsdevel/20230116191425.458864-1-jannh@google.com/) + + Currently, filp_close() and generic_shutdown_super() use printk() to log + messages when bugs are detected. This is problematic because infrastructure + like syzkaller has no idea that this message indicates a bug. + In addition, some people explicitly want their kernels to BUG() when kernel + data corruption has been detected (CONFIG_BUG_ON_DATA_CORRUPTION). + +* [v3: ext4: Convert inode preallocation list to an rbtree](http://lore.kernel.org/linux-fsdevel/20230116080216.249195-1-ojaswin@linux.ibm.com/) + + This patch series aim to improve the performance and scalability of + inode preallocation by changing inode preallocation linked list to an + rbtree. I've ran xfstests quick on this series and plan to run auto group + as well to confirm we have no regressions. + +* [v2: eventfd: use a generic helper instead of an open coded wait_event](http://lore.kernel.org/linux-fsdevel/tencent_B0E8F40B6620BFE2E79CAA06EAADA085C907@qq.com/) + + Use wait_event_interruptible_locked_irq() in the eventfd_{write,read} to + avoid the longer, open coded equivalent. + +* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) + + This removes the dependency on interrupts to wake up task. Set task + state as TASK_RUNNING, if need_resched() returns true, + while polling for IO completion. + Earlier, polling task used to sleep, relying on interrupt to wake it up. + This made some IO take very long when interrupt-coalescing is enabled in NVMe. + +#### 安全增强 + +* [v1: gcc-plugins: Reorganize gimple includes for GCC 13](http://lore.kernel.org/linux-hardening/20230118202355.never.520-kees@kernel.org/) + + The gimple-iterator.h header must be included before gimple-fold.h + starting with GCC 13. Reorganize gimple headers to work for all GCC + versions. + +* [v3: kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST](http://lore.kernel.org/linux-hardening/20230118200653.give.574-kees@kernel.org/) + + Since the long memcpy tests may stall a system for tens of seconds + in virtualized architecture environments, split those tests off under + CONFIG_MEMCPY_SLOW_KUNIT_TEST so they can be separately disabled. + + Reviewed-and-tested-by: Guenter Roeck + +* [v2: KVM: x86: Replace 0-length arrays with flexible arrays](http://lore.kernel.org/linux-hardening/20230118195905.gonna.693-kees@kernel.org/) + + Zero-length arrays are deprecated[1]. Replace struct kvm_nested_state's + "data" union 0-length arrays with flexible arrays. (How are the + sizes of these arrays verified?) Detected with GCC 13, using + -fstrict-flex-arrays=3: + +#### 异步 IO + +* [v1: io_uring/poll: don't reissue in case of poll race on multishot request](http://lore.kernel.org/io-uring/8997c26b-c498-166d-d130-2caca08a3abb@kernel.dk/) + + A previous commit fixed a poll race that can occur, but it's only + applicable for multishot requests. For a multishot request, we can safely + ignore a spurious wakeup, as we never leave the waitqueue to begin with. + +* [v1: for-next: random for-next patches](http://lore.kernel.org/io-uring/cover.1673887636.git.asml.silence@gmail.com/) + + 1/5 returns back an old lost optimisation + Others are small cleanups + +* [v1: liburing: test lazy poll wq activation](http://lore.kernel.org/io-uring/cover.1673886955.git.asml.silence@gmail.com/) + + Some tests around DEFER_TASKRUN and lazy poll activation, with + 3/3 specifically testing the feature with disabled. + +* [v1: io_uring: make io_sqpoll_wait_sq return void](http://lore.kernel.org/io-uring/20230115071519.554282-1-quanfafu@gmail.com/) + + Change the return type to void since it always return 0, and no need + to do the checking in syscall io_uring_enter. + +#### Rust For Linux + +* [v1: scripts: `make rust-analyzer` for out-of-tree modules](http://lore.kernel.org/rust-for-linux/20230118160220.776302-1-varmavinaym@gmail.com/) + + Adds support for out-of-tree rust modules to use the `rust-analyzer` + make target to generate the rust-project.json file. + +### 周边技术动态 + +#### Qemu + +* [v1: riscv-to-apply queue](http://lore.kernel.org/qemu-devel/20230120073913.1028407-1-alistair.francis@opensource.wdc.com/) + + The following changes since commit 239b8b0699a222fd21da1c5fdeba0a2456085a47: + + Merge tag 'trivial-branch-for-8.0-pull-request' of https://gitlab.com/laurent_vivier/qemu into staging (2023-01-19 15:05:29 +0000) + +#### U-Boot + +* [v2: spl: spl_nor: add alternative SPL_LOAD_IMAGE_METHOD](http://lore.kernel.org/u-boot/20230119152822.1214202-1-dev@kicherer.org/) + + Add a second SPL_LOAD_IMAGE_METHOD BOOT_DEVICE_NOR2 to enable booting + from an alternative NOR address in case loading from the first address + fails - e.g., if no valid header is found. + +* [v2: Basic StarFive JH7110 RISC-V SoC support](http://lore.kernel.org/u-boot/20230118081132.31403-1-yanhong.wang@starfivetech.com/) + + This series of patches base on the latest branch/master, and add support + for the StarFive JH7110 RISC-V SoC and VisionFive V2 board. In order for + this to be achieved, the respective DT nodes have been added, and the + required defconfigs have been added to the boards' defconfig. What is more, + the basic required DM drivers have been added, such as reset, clock, pinctrl, + uart, ram etc. + +* [v1: event: Correct dependencies on the EVENT framework](http://lore.kernel.org/u-boot/20230116191207.151545-1-trini@konsulko.com/) + + The event framework is just that, a framework. Enabling it by itself + does nothing, so we shouldn't ask the user about it. Reword (and correct + typos) around this the option and help text. This also applies to + DM_EVENT and EVENT_DYNAMIC. Only EVENT_DEBUG and CMD_EVENT should be + visible to the user to select, when EVENT is selected. + +## 20230115:第 29 期 + +### 内核动态 + +#### RISC-V 架构支持 + +* [v1: Zbb + fast-unaligned string optimization](http://lore.kernel.org/linux-riscv/20230113212351.3534769-1-heiko@sntech.de/) + + For this it uses Palmer's series for hw-feature probing that would read + this property from firmware (devicetree), as the performance of unaligned + accesses is an implementation detail of the relevant cpu core. + +* [v5: Zbb string optimizations](http://lore.kernel.org/linux-riscv/20230113212301.3534711-1-heiko@sntech.de/) + + This series still tries to allow optimized string functions for specific + extensions. The last approach of using an inline base function to hold + the alternative calls did cause some issues in a number of places + +* [v1: RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes](http://lore.kernel.org/linux-riscv/20230113211955.3534431-1-heiko@sntech.de/) + + The __RISCV_INSN_FUNCS originally declared riscv_insn_is_* functions inside + the kprobes implementation. This got moved into a central header in + commit ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header"). + + Though it looks like I overlooked two of them, so fix that. FENCE itself is + an instruction defined directly by its own opcode, while the created + riscv_isn_is_system function covers all instructions defined under the SYSTEM opcode. + +* [v5: dt-bindings: riscv: add SBI PMU event mappings](http://lore.kernel.org/linux-riscv/20230113205435.122712-1-conor@kernel.org/) + + The SBI PMU extension requires a firmware to be aware of the event to + counter/mhpmevent mappings supported by the hardware. OpenSBI may use + DeviceTree to describe the PMU mappings. This binding is currently + described in markdown in OpenSBI (since v1.0 in Dec 2021) & used by QEMU since v7.2.0. + +* [v1: mm-unstable: mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs](http://lore.kernel.org/linux-riscv/20230113171026.582290-1-david@redhat.com/) + + This is the follow-up on [1]: + v2: mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages + + After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent + enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all + remaining architectures that support swap PTEs. + +* [v1: riscv: Add "Code:", and decodecode support](http://lore.kernel.org/linux-riscv/20230113144552.138081-1-bjorn@kernel.org/) + + From: Björn Töpel + + RISC-V does not have "Code:" dumps in the Oops output. This series + adds that, together with scripts/decodecode support. + +* [v1: irqchip/irq-sifive-plic: Add syscore callbacks for hibernation](http://lore.kernel.org/linux-riscv/20230113094216.116036-1-mason.huo@starfivetech.com/) + + The priority and enable registers of plic will be reset + during hibernation power cycle in poweroff mode, add the syscore callbacks to save/restore those registers. + +* [v1: Change PWM-controlled LED pin active mode and algorithm](http://lore.kernel.org/linux-riscv/20230113083115.2590-1-nylon.chen@sifive.com/) + + According to the circuit diagram of User LEDs - RGB described in the + manual hifive-unmatched-schematics-v3.pdf[0]. The behavior of PWM is acitve-high. + + According to the descriptionof PWM for pwmcmp in SiFive FU740-C000 + Manual[1]. + The pwm algorithm is (PW) pulse active time = (D) duty * (T) period[2]. + The `frac` variable is pulse "inactive" time so we need to invert it. + +* [v2: riscv: elf: add .riscv.attributes parsing](http://lore.kernel.org/linux-riscv/20230112210622.2337254-1-vineetg@rivosinc.com/) + + This implements the elf loader hook to parse RV specific + .riscv.attributes section. This section is inserted by compilers + (gcc/llvm) with build related information such as -march organized as + tag/value attribute pairs. + + It identifies the various attribute tags (and corresponding values) as currently specified in the psABI specification. + +* [v1: RISC-V KVM virtualize AIA CSRs](http://lore.kernel.org/linux-riscv/20230112140304.1830648-1-apatel@ventanamicro.com/) + + This series implements first phase of AIA virtualization which targets + virtualizing AIA CSRs. This also provides a foundation for the second + phase of AIA virtualization which will target in-kernel AIA irqchip + (including both IMSIC and APLIC). + + The first two patches are shared with the "Linux RISC-V AIA Support" + series which adds AIA driver support. + +* [v14: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230112095848.1464404-1-guoren@kernel.org/) + + The patches convert riscv to use the generic entry infrastructure from + kernel/entry/*. Some optimization for entry.S with new .macro and merge + ret_from_kernel_thread into ret_from_fork. + + The 1,2 are the preparation of generic entry. 3 7 are the main part of generic entry. + + All tested with rv64, rv32, rv64 + 32rootfs, all are passed. + +* [v7: -next: riscv: Optimize function trace](http://lore.kernel.org/linux-riscv/20230112090603.1295340-1-guoren@kernel.org/) + + The previous ftrace detour implementation fc76b8b8011 ("riscv: Using + PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems. + + This series adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS support for RISC-V. + SAMPLE_FTRACE_DIRECT and SAMPLE_FTRACE_DIRECT_MULTI are also included + here as the samples for testing DIRECT_CALLS related interface. + +* [v4: hwrng: starfive: Add driver for TRNG module](http://lore.kernel.org/linux-riscv/20230112043812.150393-1-jiajie.ho@starfivetech.com/) + + This patch series adds kernel support for StarFive hardware random + number generator. First 2 patches add binding docs and device driver for + this module. Patch 3 adds devicetree entry for VisionFive 2 SoC. + +* [v3: riscv: improve boot time isa extensions handling](http://lore.kernel.org/linux-riscv/20230111171027.2392-1-jszhang@kernel.org/) + + Generally, riscv ISA extensions are fixed for any specific hardware + platform, so a hart's features won't change after booting, this + chacteristic makes it straightforward to use a static branch to check + a specific ISA extension is supported or not to optimize performance. + +* [v3: PCI: microchip: Partition address translations](http://lore.kernel.org/linux-riscv/20230111125323.1911373-1-daire.mcnamara@microchip.com/) + + Microchip PolarFire SoC is a 64-bit device and has DDR starting at + Coreplex via an FPGA fabric. The AXI connections between the Coreplex and + the fabric are 64-bit and the AXI connections between the fabric and the + rootport are 32-bit. For the CPU CorePlex to act as an AXI-Master to the + PCIe devices and for the PCIe devices to act as bus masters to DDR at these + base addresses, the fabric can be customised to add/remove offsets for bits + customer's design. + +* [v2: Add a devicetree for the Aldec PolarFire SoC TySoM](http://lore.kernel.org/linux-riscv/20230111124106.2417152-1-conor.dooley@microchip.com/) + + The board has 32 GB of DDR but the DT I have access to only has a small + bit of that mapped. I tried accessing more DDR, but it was not possible + with the FPGA design as things stand. I'd rather have the devicetree + match what the vendor is shipping, so left the design/DDR as-was. + +* [v2: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230109062407.3235-1-jeeheng.sia@starfivetech.com/) + + This series adds RISC-V Hibernation/suspend to disk support. Low level Arch functions were created to support hibernation. + +* [v13: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230107113838.3969149-1-guoren@kernel.org/) + + The patches convert riscv to use the generic entry infrastructure from + kernel/entry/*. Some optimization for entry.S with new .macro and merge + ret_from_kernel_thread into ret_from_fork. + +#### 进程调度 + +* [v3: sched/fair: unlink misfit task from cpu overutilized](http://lore.kernel.org/lkml/20230113134056.257691-1-vincent.guittot@linaro.org/) + + By taking into account uclamp_min, the 1:1 relation between task misfit + and cpu overutilized is no more true as a task with a small util_avg of + may not fit a high capacity cpu because of uclamp_min constraint. + +* [v4: sched/fair: limit sched slice duration](http://lore.kernel.org/lkml/20230113133613.257342-1-vincent.guittot@linaro.org/) + + In presence of a lot of small weight tasks like sched_idle tasks, normal + or high weight tasks can see their ideal runtime (sched_slice) to increase + to hundreds ms whereas it normally stays below sysctl_sched_latency. + + Such long sched_slice can delay significantly the release of resources + as the tasks can wait hundreds of ms before the next running slot just + because of idle tasks queued on the rq. + +* [v1: sched: print parent comm in sched_show_task()](http://lore.kernel.org/lkml/20230113105413.GA30243@didi-ThinkCentre-M930t-N000/) + + Knowing who the parent is might be useful for debugging. + For example, we can sometimes resolve kernel hung tasks by stopping + the person who begins those hung tasks. + With the parent's name printed in sched_show_task(), + it might be helpful to let people know which "service" should be operated. Also, we move the parent info to a following new line. + +* [v1: sched/idle: Make idle poll dynamic per-cpu](http://lore.kernel.org/lkml/20230112162426.217522-1-bristot@kernel.org/) + + idle=poll is frequently used on ultra-low-latency systems. Examples of + such systems are high-performance trading and 5G NVRAM. The performance + gain is given by avoiding the idle driver machinery and by keeping the + CPU is always in an active state - avoiding (odd) hardware heuristics that are out of the control of the OS. + +* [v1: net: sched: disallow noqueue for qdisc classes](http://lore.kernel.org/lkml/20230109163906.706000-1-fred@cloudflare.com/) + + While experimenting with applying noqueue to a classful queue discipline, + + Fix this by not allowing classes to be assigned to the noqueue + discipline. Linux TC Notes states that classes cannot be set to + the noqueue discipline. [1] Let's enforce that here. + +#### 内存管理 + +* [v1: memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory](http://lore.kernel.org/linux-mm/DS0PR02MB90787835F5B9CB9771A20329C4C09@DS0PR02MB9078.namprd02.prod.outlook.com/) + + We're from the Linux memory team here at Qualcomm. We are currently devising a VM memory resizing feature where we dynamically inflate or deflate the Linux VM based on ongoing memory demands in the VM. We wanted to propose few details about this userspace daemon in form of RFC and wanted to know the upstream's opinion. + +* [v5: selftest/vm: add mremap expand merge offset test](http://lore.kernel.org/linux-mm/8ff3ba3cadc0b6c1b2688ae5c851bf73aa062d57.1673701836.git.lstoakes@gmail.com/) + + Add a test to assert that we can mremap() and expand a mapping starting + from an offset within an existing mapping. We unmap the last page in a 3 + page mapping to ensure that the remap should always succeed, before + remapping from the 2nd page. + +* [v3: mm-unstable: continue hugetlb folio conversion](http://lore.kernel.org/linux-mm/20230113223057.173292-1-sidhartha.kumar@oracle.com/) + + This series continues the conversion of core hugetlb functions to use + folios. This series converts many helper funtions in the hugetlb fault + path. This is in preperation for another series to convert the hugetlb + fault code paths to operate on folios. + +* [v4: Secure prandom_u32 invocations](http://lore.kernel.org/linux-mm/cover.1673470326.git.david.keisarschm@mail.huji.ac.il/) + + The security improvements for prandom_u32 done in commits c51f8f88d705 + from October 2020 and d4150779e60f from May 2022 didn't handle the cases + when prandom_bytes_state() and prandom_u32_state() are used. + +* [v5: scripts/gdb: add mm introspection utils](http://lore.kernel.org/linux-mm/20230113175151.22278-1-dmitrii.bundin.a@gmail.com/) + + This command provides a way to traverse the entire page hierarchy by a + given virtual address on x86. In addition to qemu's commands info + tlb/info mem it provides the complete information about the paging structure for an arbitrary virtual address. It supports 4KB/2MB/1GB and 5 level paging. + +* [v1: mm: populate multiple PTEs if file page is large folio](http://lore.kernel.org/linux-mm/20230113163538.23412-1-fengwei.yin@intel.com/) + + The page fault number can be reduced by batched PTEs population. + The batch size of PTEs population is not allowed to cross: + - page table boundaries + - vma range + - large folio size + - fault_around_bytes + +* [v1: Add tests for memblock_alloc_node()](http://lore.kernel.org/linux-mm/0c3fdce6-3180-89c6-ba9e-77b7e98a5691@mail.polimi.it/) + + These tests are aimed at verifying the memblock_alloc_node() to work as expected, so setting the + correct NUMA node for the new allocated region. The memblock_alloc_node() is mimicked by executing + the already implemented test function run_memblock_alloc_try_nid() and by setting the flags used + internally by the memblock_alloc_node(). The core check is between the requested NUMA node and the + `nid` field inside the memblock_region structure. These two are supposed to be equal in order for the test to succeed. + +* [v3: mm/page_ext: Do not allocate space for page_ext->flags if not needed](http://lore.kernel.org/linux-mm/20230113154253.92480-1-pasha.tatashin@soleen.com/) + + There is 8 byte page_ext->flags field allocated per page whenever + CONFIG_PAGE_EXTENSION is enabled. However, not every user of page_ext + uses flags. Therefore, check whether flags is needed at least by one + user and if so allocate space for it. + +* [v3: 0/6: Discard __GFP_ATOMIC](http://lore.kernel.org/linux-mm/20230113111217.14134-1-mgorman@techsingularity.net/) + + This replaces the "Discard __GFP_ATOMIC v2" series in mm-unstable. There + are changelog and patch replacements that make -fix patches impractical. + +* [v1: Some small improvements for memblock.](http://lore.kernel.org/linux-mm/20230113082659.65276-1-zhangpeng.00@bytedance.com/) + + I found some small optimizations while reading the code of memblock. + Please help to review. Thanks. + +* [v3: mm/vmalloc.c: allow vread() to read out vm_map_ram areas](http://lore.kernel.org/linux-mm/20230113031921.64716-1-bhe@redhat.com/) + + The normal vmalloc API uses struct vmap_area to manage the virtual + kernel area allocated, and associate a vm_struct to store more + information and pass out. However, area reserved through vm_map_ram() + interface doesn't allocate vm_struct to associate with. So the current code in vread() will skip the vm_map_ram area through 'if (!va->vm)' conditional checking. + +* [v1: mm-unstable: convert hugepage memory failure functions to folios](http://lore.kernel.org/linux-mm/20230112204608.80136-1-sidhartha.kumar@oracle.com/) + + This series contains a 1:1 straightforward page to folio conversion for + memory failure functions which deal with huge pages. I renamed a few + functions to fit with how other folio operating functions are named. + +* [v2: bpf-next: mm, bpf: Add BPF into /proc/meminfo](http://lore.kernel.org/linux-mm/20230112155326.26902-1-laoar.shao@gmail.com/) + + Currently there's no way to get BPF memory usage, while we can only + estimate the usage by bpftool or memcg, both of which are not reliable. + +* [v1: shmem: Convert shmem_write_end() to use a folio](http://lore.kernel.org/linux-mm/20230112131031.1209553-1-willy@infradead.org/) + + Use a folio internally to shmem_write_end() which saves a number of + calls to compound_head() and lets us get rid of the custom code to + zero out the rest of a THP and supports folios of arbitrary size. + +* [v1: -next: mm: madvise: use vm_normal_folio() in madvise_free_pte_range()](http://lore.kernel.org/linux-mm/20230112124028.16964-1-wangkefeng.wang@huawei.com/) + + There is already a vm_normal_folio(), use it to make + madvise_free_pte_range() only use a folio. + +* [v1: zsmalloc: turn chain size config option into UL constant](http://lore.kernel.org/linux-mm/20230112071443.1933880-1-senozhatsky@chromium.org/) + + This fixes + + >> mm/zsmalloc.c:122:59: warning: right shift count >= width of type [-Wshift-count-overflow] + + and + + >> mm/zsmalloc.c:224:28: error: variably modified 'size_class' at file scope + 224 | struct size_class *size_class[ZS_SIZE_CLASSES]; + +* [v1: Get rid of tail page fields](http://lore.kernel.org/linux-mm/20230111142915.1001531-1-willy@infradead.org/) + + Continue the shrinkage of the struct page definition by getting rid of + the 'first tail page' and 'second tail page' fields. I originally did + this patch set before Hugh's rewrite of the subpages_mapcount, so it + needed substantial updates; hope I didn't miss anything. + +#### 文件系统 + +* [v3: exfat: handle unreconized benign secondary entries](http://lore.kernel.org/linux-fsdevel/20230114041900.4458-1-linkinjeon@kernel.org/) + + Sony PXW-Z280 camera add vendor allocation entries to directory of + pictures. Currently, linux exfat does not support it and the file is + not visible. This patch handle vendor extension and allocation entries + as unreconized benign secondary entries. As described in the specification, + it is recognized but ignored, and when deleting directory entry set, the associated clusters allocation are removed as well as benign secondary directory entries. + +* [v3: vfs: provide automatic kernel freeze / resume](http://lore.kernel.org/linux-fsdevel/20230114003409.1168311-1-mcgrof@kernel.org/) + + Darrick J. Wong poked me about the status of the fs freez work, he's + right, it's been too long since the last spin. The last v2 attempt happened + in April 2021 [0], this just takes the feedback from Christoph and spins it + again. I've only done basic build tests on x86_64, and haven't yet run time + tested the stuff, but given the size of this set its better to review early + before getting stuck on details. So this is what I've ended up with so far. + +* [v1: lockref: stop doing cpu_relax in the cmpxchg loop](http://lore.kernel.org/linux-fsdevel/20230113184447.1707316-1-mjguzik@gmail.com/) + + On the x86-64 architecture even a failing cmpxchg grants exclusive + access to the cacheline, making it preferable to retry the failed op + immediately instead of stalling with the pause instruction. + +* [v2: Composefs: an opportunistically sharing verified image filesystem](http://lore.kernel.org/linux-fsdevel/cover.1673623253.git.alexl@redhat.com/) + + Giuseppe Scrivano and I have recently been working on a new project we + call composefs. This is the first time we propose this publically and + we would like some feedback on it. + + At its core, composefs is a way to construct and use read only images + that are used similar to how you would use e.g. loop-back mounted + squashfs images. On top of this composefs has two fundamental features. + +* [v1: fs: finish conversion to mnt_idmap](http://lore.kernel.org/linux-fsdevel/20230113-fs-idmapped-mnt_idmap-conversion-v1-0-fc84fa7eba67@kernel.org/) + + This series converts all places that currently still pass around a plain + namespace attached to a mount to passing around a separate type eliminating + all bugs that can arise from conflating filesystem and mount idmappings. + After this series nothing will have changed semantically. + +* [v3: RESEND: coredump: Use vmsplice_to_pipe() for pipes in dump_emit_page()](http://lore.kernel.org/linux-fsdevel/20230112224348.5384-1-yepeilin.cs@gmail.com/) + + Tested by dumping a 32-GByte core into a simple handler that splice()s + from stdin to disk in a loop, PIPE_DEF_BUFFERS (16) pages at a time. + +* [v6: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20230112115908.23662-1-nj.shetty@samsung.com/) + + The patch series covers the points discussed in November 2021 virtual + call [LSF/MM/BFP TOPIC] Storage: Copy Offload: 0: . + We have covered the initial agreed requirements in this patchset and further additional features suggested by community. + +* [v3: RESEND: nsfs: add compat ioctl handler](http://lore.kernel.org/linux-fsdevel/20221214-nsfs-ioctl-compat-v3-1-dce2d26e1fec@weissschuh.net/) + + As all parameters and return values of the ioctls have the same + representation on both 32bit and 64bit we can reuse the normal ioctl + handler for the compat handler via compat_ptr_ioctl(). + + All nsfs ioctls return a plain "int" filedescriptor which is a signed 4-byte integer type on both 32bit and 64bit. + +* [v5: iov_iter: Add extraction helpers](http://lore.kernel.org/linux-fsdevel/167344725490.2425628.13771289553670112965.stgit@warthog.procyon.org.uk/) + + Here are patches clean up some use of READ/WRITE and ITER_SOURCE/DEST, + patches to provide support for extracting pages from an iov_iter and a + patch to use the primary extraction function in the block layer bio code if + you could take a look? + +* [v2: erofs: support page cache sharing between EROFS images in fscache mode](http://lore.kernel.org/linux-fsdevel/20230111083158.23462-1-jefflexu@linux.alibaba.com/) + + changes since RFC: + - patch 2: allocate an anonymous file (realfile) when file is opened, + rather than allocate a single anonymous file for each blob at mount + time + - patch 7: add 'sharecache' mount option to control if page cache + sharing shall be enabled + +* [v4: Introduce daemon failover mechanism to recover from crashing](http://lore.kernel.org/linux-fsdevel/20230111052515.53941-1-zhujia.zj@bytedance.com/) + + In ondemand read mode, if user daemon closes anonymous fd(e.g. daemon + crashes), subsequent read and inflight requests based on these fd will + return -EIO. + Even if above mentioned case is tolerable for some individual users, but + when it happenens in real cloud service production environment, such IO + errors will be passed to cloud service users and impact its working jobs. + +* [v1: proc: introduce proc_statfs()](http://lore.kernel.org/linux-fsdevel/20230110152003.1118777-1-chao@kernel.org/) + + Introduce proc_statfs() to replace simple_statfs(), so that + f_bsize queried from statfs() can be consistent w/ the value we set in s_blocksize. + +* [v1: Reduce zonefs memory usage](http://lore.kernel.org/linux-fsdevel/20230110130830.246019-1-damien.lemoal@opensource.wdc.com/) + + This series improves memory usage by switching to using dynamically + allocated inodes and dentries, similarly to regular file systems. This + drastically reduces the memory consumption of zonefs when the file + system is mounted. E.g., for a 26 TB SMR HDD with over 95000 zones, + memory usage is decreased from about 130 MB down to a little over 5 MB. + +* [v1: fs: kill old ms_* flags for internal sb](http://lore.kernel.org/linux-fsdevel/20230110022554.1186499-1-mcgrof@kernel.org/) + + David had started the sb flag split for internal flags through + commit e462ec50cb5 ("VFS: Differentiate mount flags (MS_*) from internal + superblock flags") but it seems we just never axed out the old flag usage. + +* [v2: fs/aio: Replace kmap{,_atomic}() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20230109175629.9482-1-fmdefrancesco@gmail.com/) + + The use of kmap_local_page() in fs/aio.c is "safe" in the sense that the + code don't hands the returned kernel virtual addresses to other threads + and there are no nestings which should be handled with the stack based + (LIFO) mappings/un-mappings order. + +* [v2: fs/sysv: Replace kmap() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20230109170639.19757-1-fmdefrancesco@gmail.com/) + + kmap() is deprecated in favor of kmap_local_page(). + + There are two main problems with kmap(): (1) It comes with an overhead as + the mapping space is restricted and protected by a global lock for + synchronization and (2) it also requires global TLB invalidation when the + kmap’s pool wraps and it might block when the mapping space is fully + utilized until a slot becomes available. + +* [v1: Checkpoint Support for Syscall User Dispatch](http://lore.kernel.org/linux-fsdevel/20230109153348.5625-1-gregory.price@memverge.com/) + + Syscall user dispatch makes it possible to cleanly intercept system + calls from user-land. However, most transparent checkpoint software + presently leverages some combination of ptrace and system call + injection to place software in a ready-to-checkpoint state. + +* [v7: Implement IOCTL to get and/or the clear info about PTEs](http://lore.kernel.org/linux-fsdevel/20230109064519.3555250-1-usama.anjum@collabora.com/) + + Stop using the soft-dirty flags for finding which pages have been + written to. It is too delicate and wrong as it shows more soft-dirty + pages than the actual soft-dirty pages. There is no interest in + correcting it [A]: B: as this is how the feature was written years ago. + It shouldn't be updated to changed behaviour. Peter Xu has suggested using the async version of the UFFD WP [C] as it is based inherently on the PTEs. + +* [v7: DEPT(Dependency Tracker)](http://lore.kernel.org/linux-fsdevel/1673235231-30302-1-git-send-email-byungchul.park@lge.com/) + + I've been developing a tool for detecting deadlock possibilities by + tracking wait/event rather than lock(?) acquisition order to try to + cover all synchonization machanisms. It's done on v6.2-rc2. + + https://github.com/lgebyungchulpark/linux-dept/commits/dept2.0_on_v6.2-rc2 + +#### 网络设备 + +* [v1: net: usb: sr9700: Handle negative len](http://lore.kernel.org/netdev/20230114182326.30479-1-szymon.heidrich@gmail.com/) + + Packet len computed as difference of length word extracted from + skb data and four may result in a negative value. In such case + processing of the buffer should be interrupted rather than + setting sr_skb->len to an unexpectedly large value (due to cast + from signed to unsigned integer) and passing sr_skb to + usbnet_skb_return. + +* [v6: net-next: net: ethernet: mtk_wed: introduce reset support](http://lore.kernel.org/netdev/cover.1673715298.git.lorenzo@kernel.org/) + + Introduce proper reset integration between ethernet and wlan drivers in order + to schedule wlan driver reset when ethernet/wed driver is resetting. + Introduce mtk_hw_reset_monitor work in order to detect possible DMA hangs. + +* [v2: bpf-next: xdp: introduce xdp-feature support](http://lore.kernel.org/netdev/cover.1673710866.git.lorenzo@kernel.org/) + + Introduce the capability to export the XDP features supported by the NIC. + Introduce a XDP compliance test tool (xdp_features) to check the features + exported by the NIC match the real features supported by the driver. + Allow XDP_REDIRECT of non-linear XDP frames into a devmap. + +* [v4: net-next: Add support for two classes of VCAP rules](http://lore.kernel.org/netdev/20230114134242.3737446-1-steen.hegelund@microchip.com/) + + For this to work the VCAP Loopups must be enabled from boot, so that the + "internal" clients like PTP can add rules that are always active. + + When the TC tool add a flower filter the VCAP rule corresponding to this filter will be disabled (kept in memory) until a TC matchall filter creates a link from chain 0 to the chain (lookup) where the flower filter was added. + +* [v2: net: tcp: avoid the lookup process failing to get sk in ehash table](http://lore.kernel.org/netdev/20230114132705.78400-1-kerneljasonxing@gmail.com/) + + While one cpu is working on looking up the right socket from ehash + table, another cpu is done deleting the request socket and is about + to add (or is adding) the big socket from the table. It means that + we could miss both of them, even though it has little chance. + +* [v2: net-next: unix: Improve locking scheme in unix_show_fdinfo()](http://lore.kernel.org/netdev/c6c7084c-56c7-cd37-befe-df718e080597@ya.ru/) + + After switching to TCP_ESTABLISHED or TCP_LISTEN sk_state, alive SOCK_STREAM + and SOCK_SEQPACKET sockets can't change it anymore (since commit 3ff8bff704f4 + "unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()"). + + Thus, we do not need to take lock here. + +* [v1: net-next: net: support ipv4 big tcp](http://lore.kernel.org/netdev/cover.1673666803.git.lucien.xin@gmail.com/) + + Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header + doesn't have exthdrs(options) for the BIG TCP packets' length. To make + it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to + indicate this might be a BIG TCP packet and use skb->len as the real IPv4 total length. + +* [v1: net-next: Small packet processing handling changes](http://lore.kernel.org/netdev/20230113223619.162405-1-parav@nvidia.com/) + + These two changes improve the small packet handling. + + Patch summary: + patch-1 fixes the length check by considering Ethernet 60B frame size + patch-2 avoids code duplication by reuses existing buffer free helper + +* [v10: net-next: virtio/vsock: replace virtio_vsock_pkt with sk_buff](http://lore.kernel.org/netdev/20230113222137.2490173-1-bobby.eshleman@bytedance.com/) + + This commit changes virtio/vsock to use sk_buff instead of + virtio_vsock_pkt. Beyond better conforming to other net code, using + sk_buff allows vsock to use sk_buff-dependent features in the future + (such as sockmap) and improves throughput. + +* [v2: net-next: Allow offloading of UDP NEW connections via act_ct](http://lore.kernel.org/netdev/20230113165548.2692720-1-vladbu@nvidia.com/) + + With all the necessary infrastructure in place modify act_ct to offload + UDP NEW as unidirectional connection. Pass reply direction traffic to CT + and promote connection to bidirectional when UDP connection state + changes to "assured". Rely on refresh mechanism to propagate connection + state change to supporting drivers. + +* [v3: net-next: net: dsa: mv88e6xxx: Enable PTP receive for mv88e6390](http://lore.kernel.org/netdev/20230113151258.196828-1-kurt@linutronix.de/) + + The switch receives management traffic such as STP and LLDP. However, PTP + messages are not received, only transmitted. + + Ideally, the switch would trap all PTP messages to the management CPU. This + particular switch has a PTP block which identifies PTP messages and traps them + to a dedicated port. There is a register to program this destination. This is + not used at the moment. + +* [v1: 5.10: mt76: move mt76_init_tx_queue in common code](http://lore.kernel.org/netdev/20230113150445.39286-1-n.zhandarovich@fintech.ru/) + + My apologies, I should've have explained my reasoning better. + + My issue with 5.10 version of mt7615_init_tx_queues() in drivers/net/wireless/mediatek/mt76/mt7615/dma.c is that return value of final call to mt7615_init_tx_queue() is not taken into account + when returning result of mt7615_init_tx_queues(). So, if last mt7615_init_tx_queue() fails (due to memory issues, for instance), parent function will still erroneously return 0. + +* [v1: ARM: imx: make Ethernet refclock configurable](http://lore.kernel.org/netdev/20230113142718.3038265-1-o.rempel@pengutronix.de/) + + Most of i.MX SoC variants have configurable FEC/Ethernet reference clock + used by RMII specification. This functionality is located in the + general purpose registers (GRPx) and till now was not implemented as part of SoC clock tree. + +* [v1: wireless/at76c50x-usb.c: Use devm_kmalloc replaces kmalloc](http://lore.kernel.org/netdev/20230113141231.71892-1-sensor1010@163.com/) + + use devm_kmalloc replaces kmalloc + +* [v2: net-next: net: use kmem_cache_free_bulk in kfree_skb_list](http://lore.kernel.org/netdev/167361788585.531803.686364041841425360.stgit@firesoul/) + + The kfree_skb_list function walks SKB (via skb->next) and frees them + individually to the SLUB/SLAB allocator (kmem_cache). It is more + efficient to bulk free them via the kmem_cache_free_bulk API. + + Netstack NAPI fastpath already uses kmem_cache bulk alloc and free APIs for SKBs. + +* [v1: wireless/at76c50x-usb.c: Use devm_kzalloc replaces kmalloc](http://lore.kernel.org/netdev/20230113133503.58336-1-sensor1010@163.com/) + + use devm_kzalloc replaces kamlloc + +* [v3: bpf-next: bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()](http://lore.kernel.org/netdev/cover.1673574419.git.william.xuanziyang@huawei.com/) + + Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). + Main use case is for using cls_bpf on ingress hook to decapsulate + IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. + + And add ipip6 and ip6ip decap testcases to verify that + bpf_skb_adjust_room() correctly decapsulate ipip6 and ip6ip tunnel packets. + +* [v4: net-next: virtio-net: support multi buffer xdp](http://lore.kernel.org/netdev/20230113080016.45505-1-hengqi@linux.alibaba.com/) + + Currently, virtio net only supports xdp for single-buffer packets + or linearized multi-buffer packets. This patchset supports xdp for + multi-buffer packets, then larger MTU can be used if xdp sets the xdp.frags. This does not affect single buffer handling. + +* [v1: net-next: r8169: reset bus if NIC isn't accessible after tx timeout](http://lore.kernel.org/netdev/85f2b5e5-ea85-3a84-1a5e-c4f84897ac04@gmail.com/) + + ASPM issues may result in the NIC not being accessible any longer. + In this case disabling ASPM may not work. Therefore detect this case + by checking whether register reads return + 0, and try to make the + NIC accessible again by resetting the secondary bus. + + * [v1: : net PATCH v2] octeontx2-pf: Avoid use of GFP_KERNEL in atomic context: (http://lore.kernel.org/netdev/20230113061902.6061-1-gakula@marvell.com/) + + Using GFP_KERNEL in preemption disable context, causing below warning + when CONFIG_DEBUG_ATOMIC_SLEEP is enabled. + + To avoid use of GFP_ATOMIC for memory allocation, disable preemption + after all memory allocation is done. + + Fixes: 4af1b64f80fb ("octeontx2-pf: Fix lmtst ID used in aura free") + +* [v1: net: sched: gred: prevent races when adding offloads to stats](http://lore.kernel.org/netdev/20230113044137.1383067-1-kuba@kernel.org/) + + Naresh reports seeing a warning that gred is calling + u64_stats_update_begin() with preemption enabled. + Arnd points out it's coming from _bstats_update(). + + We should be holding the qdisc lock when writing + to stats, they are also updated from the datapath. + +* [v2: Add eqos and fec support for imx93](http://lore.kernel.org/netdev/20230113033347.264135-1-xiaoning.wang@nxp.com/) + + This patchset add imx93 support for dwmac-imx glue driver. + There are some changes of GPR implement. + And add fec and eqos nodes for imx93 dts. + +* [v1: net-next: add some vf fault detect patch for hns](http://lore.kernel.org/netdev/20230113020829.48451-1-lanhao@huawei.com/) + + Currently hns3 driver supports vf fault detect feature.Patch #1 is + add hns3 vf fault detect cap bit support.Patch #2 is add vf fault process in hns3 ras. + +* [v3: net: sch_htb: Avoid grafting on htb_destroy_class_offload when destroying htb](http://lore.kernel.org/netdev/20230113005528.302625-1-rrameshbabu@nvidia.com/) + + Peek at old qdisc and graft only when deleting a leaf class in the htb, + rather than when deleting the htb itself. Do not peek at the qdisc of the + netdev queue when destroying the htb. The caller may already have grafted a new qdisc that is not part of the htb structure being destroyed. + +#### 安全增强 + +* [v3: firmware: coreboot: Check size of table entry and split memcpy](http://lore.kernel.org/linux-hardening/20230112230312.give.446-kees@kernel.org/) + + The memcpy() of the data following a coreboot_table_entry couldn't + be evaluated by the compiler under CONFIG_FORTIFY_SOURCE. To make it + easier to reason about, add an explicit flexible array member to struct + coreboot_device so the entire entry can be copied at once. Additionally, validate the sizes before copying. + +* [v3: kmod: harden user namespaces with new kernel.ns_modules_allowed sysctl](http://lore.kernel.org/linux-hardening/20230112131911.7684-1-vegard.nossum@oracle.com/) + + This mitigation obviously offers no protection if the vulnerable module is + already loaded, but for many of these exploits the vast majority of users + will never actually load or use these modules on purpose; in other words, + for the vast majority of users, this would block exploits for the above list of vulnerabilities. + +* [v1: pstore/ram: Rework logic for detecting ramoops](http://lore.kernel.org/linux-hardening/1673428065-22356-1-git-send-email-quic_mojha@quicinc.com/) + + The reserved memory region for ramoops is assumed to be at a fixed + and known location when read from the devicetree. This is not desirable + in environments where it is preferred the region to be dynamically + allocated at runtime, as opposed to being fixed at compile time. + +* [v1: next: x86/fpu: Replace zero-length array with flexible-array member](http://lore.kernel.org/linux-hardening/Y7zCFpa2XNs%2Fo9YQ@work/) + + Zero-length arrays are deprecated1] and we are moving towards + adopting C99 flexible-array members instead. So, replace zero-length + array declaration in struct xregs_state with flex-array member. + +* [v1: next: RDMA/erdma: Replace zero-length arrays with flexible-array members](http://lore.kernel.org/linux-hardening/Y7zCBqwC1LtabRJ9@work/) + + Zero-length arrays are deprecated[1] and we are moving towards + adopting C99 flexible-array members instead. So, replace zero-length + arrays, in a couple of structures, with flex-array members. + +* [v1: next: nvmem: u-boot-env: replace zero-length array with flexible-array member](http://lore.kernel.org/linux-hardening/Y7zB+s2AC6O+CRR+@work/) + + Zero-length arrays are deprecated[1] and we are moving towards + adopting C99 flexible-array members instead. So, replace zero-length + array declaration in struct u_boot_env_image_broadcom with flex-array member. + +* [v1: next: habanalabs: Replace zero-length arrays with flexible-array members](http://lore.kernel.org/linux-hardening/Y7zB4z5cxpFkPXKV@work/) + + Zero-length arrays are deprecated[1] and we are moving towards + adopting C99 flexible-array members instead. So, replace zero-length + arrays in a couple of structures with flex-array members. + +* [v1: next: cifs: Replace zero-length arrays with flexible-array members](http://lore.kernel.org/linux-hardening/Y7zBtCZ%2FeRJCpjBf@work/) + + Zero-length arrays are deprecated[1] and we are moving towards + adopting C99 flexible-array members instead. So, replace zero-length + arrays in a couple of structures with flex-array members. + + This helps with the ongoing efforts to tighten the FORTIFY_SOURCE + routines on memcpy() and help us make progress towards globally + enabling -fstrict-flex-arrays=3 [2]. + +* [v6: arm64: dts: qcom: sm6125: UFS and xiaomi-laurel-sprout support](http://lore.kernel.org/linux-hardening/20230108195336.388349-1-they@mint.lgbt/) + + Introduce Universal Flash Storage support on SM6125 and add support for the Xiaomi Mi A3 based on the former platform. Uses the name xiaomi-laurel-sprout instead of the official codename (laurel_sprout) due to naming limitations in the kernel. + +#### 异步 IO + +* [v1: liburing: liburing.map: Export `io_uring_{enable_rings,register_restrictions}`](http://lore.kernel.org/io-uring/20230114035405.429608-1-ammar.faizi@intel.com/) + + When adding these two functions, Stefano didn't add + io_uring_enable_rings() and io_uring_register_restrictions() to + liburing.map. It causes a linking problem. Add them to liburing.map. + +* [v1: io_uring: Add NULL checks for current->io_uring](http://lore.kernel.org/io-uring/20230111101907.600820-1-baijiaju1990@gmail.com/) + + As described in a previous commit 998b30c3948e, current->io_uring could + be NULL, and thus a NULL check is required for this variable. + +* [v1: io_uring/poll: add hash if ready poll request can't complete inline](http://lore.kernel.org/io-uring/559d2a90-25c5-626c-c643-25a86cf15e6a@kernel.dk/) + + If we don't, then we may lose access to it completely, leading to a + request leak. This will eventually stall the ring exit process as well. + + Fixes: 49f1c68e048f ("io_uring: optimise submission side poll_refs") + +#### Rust For Linux + +* [v3: scripts: Exclude Rust CUs with pahole](http://lore.kernel.org/rust-for-linux/20230111152050.559334-1-yakoyoku@gmail.com/) + + Version 1.24 of pahole has the capability to exclude compilation units + (CUs) of specific languages [1]: 2: . Rust, as of writing, is not + currently supported by pahole and if it's used with a build that has + BTF debugging enabled it results in malformed kernel and module + binaries [3]. So it's better for pahole to exclude Rust CUs until + support for it arrives. + +* [v1: rust: print: avoid evaluating arguments in `pr_*` macros in `unsafe` blocks](http://lore.kernel.org/rust-for-linux/20230109204912.539790-1-ojeda@kernel.org/) + + At the moment it is possible to perform unsafe operations in + the arguments of `pr_*` macros since they are evaluated inside + an `unsafe` block: + + let x = &10u32 as *const u32; + pr_info!("{}", *x); + + In other words, this is a soundness issue. + + Fix it so that it requires an explicit `unsafe` block. + +* [Fwd: v1: bpf: scripts: Exclude Rust CUs with pahole](http://lore.kernel.org/rust-for-linux/0ca4ad02-af27-0d1f-8750-1ff6b34e8d2a@gmail.com/) + + I see, I was making a dependency on `auto.conf` in `pahole-flags.sh` but + the former gets generated after the latter is called, so that's the + reason behind the `grep` errors. Sent a new version of the patch. + +* [v2: kbuild: rust: move rust/target.json to scripts/](http://lore.kernel.org/rust-for-linux/20230107094545.3384745-1-masahiroy@kernel.org/) + + scripts/ is a better place to generate files used treewide. + + With target.json moved to scripts/, you do not need to add target.json + to no-clean-files or MRPROPER_FILES. + + 'make clean' does not visit scripts/, but 'make mrproper' does. + +#### BPF + +* [v1: bpf: Add CONFIG_BPF_HELPER_STRICT](http://lore.kernel.org/bpf/SJ0PR04MB7248C599DE6F006F94997CF180C39@SJ0PR04MB7248.namprd04.prod.outlook.com/) + + In container environment, ebpf helpers could be used maliciously to + leak information, DOS, even escape from containers. + CONFIG_BPF_HELPER_STRICT is as a mitigation of it. + Related Link: https://rolandorange.zone/report.html + +* [v1: bpftool: Always disable stack protection for clang](http://lore.kernel.org/bpf/74cd9d2e-6052-312a-241e-2b514a75c92c@applied-asynchrony.com/) + + When the clang toolchain has stack protection enabled in order to be consistent + with gcc - which just happens to be the case on Gentoo - the bpftool build fails + +* [v1: tools/resolve_btfids: Install subcmd headers](http://lore.kernel.org/bpf/20230112004024.1934601-1-irogers@google.com/) + + Previously tools/lib/subcmd was added to the include path, switch to + installing the headers and then including from that directory. This + avoids dependencies on headers internal to tools/lib/subcmd. Add the + missing subcmd directory to the affected #include. + +* [v7: bpf-next: xdp: hints via kfuncs](http://lore.kernel.org/bpf/20230112003230.3779451-1-sdf@google.com/) + + Please see the first patch in the series for the overall + design and use-cases. + + See the following email from Toke for the per-packet metadata overhead: + https://lore.kernel.org/bpf/20221206024554.3826186-1-sdf@google.com/T/#m49d48ea08d525ec88360c7d14c4d34fb0e45e798 + +* [v1: Add and use run_command_strbuf](http://lore.kernel.org/bpf/20230110222003.1591436-1-irogers@google.com/) + + It is commonly useful to run a command using "/bin/sh -c" (like popen) + and to place the output in a string. Move strbuf to libapi, add a new + run_command that places output in a strbuf, then use it in help and + llvm in perf. Some small strbuf efficiency improvements are + included. Whilst adding a new function should increase lines-of-code, + by sharing two similar usages in perf llvm and perf help, the overall + lines-of-code is moderately reduced. + +* [v1: bpf-next: bpftool: Add missing quotes to libbpf bootstrap submake vars](http://lore.kernel.org/bpf/20230110014504.3120711-1-james.hilliard1@gmail.com/) + + When passing compiler variables like CC=$(HOSTCC) to a submake + we must ensure the variable is quoted in order to handle cases + where $(HOSTCC) may be multiple binaries. + + For example when using ccache $HOSTCC may be: + "/usr/bin/ccache /usr/bin/gcc" + + If we pass CC without quotes like CC=$(HOSTCC) only the first + "/usr/bin/ccache" part will be assigned to the CC variable which + will cause an error due to dropping the "/usr/bin/gcc" part of + the variable in the submake invocation. + +* [v1: Assume libbpf 1.0 in build](http://lore.kernel.org/bpf/20230109203424.1157561-1-irogers@google.com/) + + Rather than build a binary that would fail at runtime it is + preferrential just to build libbpf statically and link against + that. The static version is in the kernel tools tree and newer than 1.0. + + These patches change the libbpf test to only pass when at least + version 1.0 is installed, then remove the conditional build and feature logic. + +* [v1: bpf-next: bpf: Do not allow to load sleepable BPF_TRACE_RAW_TP program](http://lore.kernel.org/bpf/20230109143716.2332415-1-jolsa@kernel.org/) + + Currently we allow to load any tracing program as sleepable, but BPF_TRACE_RAW_TP can't sleep. Making the check explicit for tracing programs attach types, so sleepable BPF_TRACE_RAW_TP will fail to load. + + Updating the verifier error to mention iter programs as well. + +* [v1: libbpf: resolve kernel function name optimization for kprobe](http://lore.kernel.org/bpf/20230109094247.1464856-1-imagedong@tencent.com/) + + The function name in kernel may be changed by the compiler. For example, + the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. + + This kind optimization can happen in any kernel function. Therefor, we should conside this case. + +* [Fwd: v1: bpf: scripts: Exclude Rust CUs with pahole](http://lore.kernel.org/bpf/0ca4ad02-af27-0d1f-8750-1ff6b34e8d2a@gmail.com/) + + I see, I was making a dependency on `auto.conf` in `pahole-flags.sh` but + the former gets generated after the latter is called, so that's the + reason behind the `grep` errors. Sent a new version of the patch. + +### 周边技术动态 + +#### Qemu + +* [v7: hw/riscv: clear kernel_entry high bits with 32bit CPUs](http://lore.kernel.org/qemu-devel/20230113171805.470252-1-dbarboza@ventanamicro.com/) + + In this version I followed Bin Meng's suggestion and reverted patch 1 + back from what it was in the v5, acks included, and added a new patch + (3) to fix the problem detected with the Xvisor use case. I believe this + reflects that there is nothing particularly wrong with what we + did in the v5 patch and we're going an extra mile to fix what, at first + glance, is a bug somewhere else. + +* [v5: riscv: Allow user to set the satp mode](http://lore.kernel.org/qemu-devel/20230113103453.42776-1-alexghiti@rivosinc.com/) + + This introduces new properties to allow the user to set the satp mode, + see patch 1 for full syntax. + +* [v6: hw/riscv: consolidate kernel init in riscv_load_kernel()](http://lore.kernel.org/qemu-devel/20230112223444.484879-1-dbarboza@ventanamicro.com/) + + The first 9 patches are already available in riscv-to-apply.next. + + The only change made was in patch 10 where we're now handling the case + where load_elf_ram_sym is padding the resulting kernel_entry with 1s for + 32 bits. Patch 11 is unchanged. + +* [v1: target/riscv: Use TARGET_FMT_lx for env->mhartid](http://lore.kernel.org/qemu-devel/20230109152655.340114-1-bmeng@tinylab.org/) + + env->mhartid is currently casted to long before printed, which drops + the high 32-bit for rv64 on 32-bit host. Use TARGET_FMT_lx instead. + +## 20230109:第 28 期 + +### 内核动态 + +#### RISC-V 架构支持 + +* [v6: -next: riscv: Optimize function trace](http://lore.kernel.org/linux-riscv/20230107133549.4192639-1-guoren@kernel.org/) + + The previous ftrace detour implementation fc76b8b8011 ("riscv: Using + PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems. + + Patches 1,2,3 fixup above problems. Patches 4,5,6,7 are the features based on reduced detour code + patch, we include them in the series for test and maintenance. + +* [v13: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230107113838.3969149-1-guoren@kernel.org/) + + The patches convert riscv to use the generic entry infrastructure from + kernel/entry/*. Some optimization for entry.S with new .macro and merge + ret_from_kernel_thread into ret_from_fork. + +* [v6: RISC-V non-coherent function pointer based cache management operations + non-coherent DMA support for AX45MP](http://lore.kernel.org/linux-riscv/20230106185526.260163-1-prabhakar.mahadev-lad.rj@bp.renesas.com/) + + RISC-V non-coherent function pointer based cache management operations: + + This v6 version of the patch series add support to use function pointers for CMO + and switches the current CMO implementations for zicbom and T-HEAD to use function + pointers. + + non-coherent DMA support for AX45MP: + + On the Andes AX45MP core, cache coherency is a specification option so it + may not be supported. In this case DMA will fail. To get around with this + issue this patch series does the below: + +* [v1: MAINTAINERS: add an IRC entry for RISC-V](http://lore.kernel.org/linux-riscv/20230106125344.1685266-1-conor@kernel.org/) + + I remember being told "Just ping me on IRC" about patches, but googling + at the time was not helpful. #riscv on libera is not linux specific, + but a bunch of contributors etc do hang out there. + Add a link to the maintainers entry to help others find it in the future! + +* [v1: riscv: Introduce system suspend support](http://lore.kernel.org/linux-riscv/20230106113216.443057-1-ajones@ventanamicro.com/) + + Booting with an OpenSBI including the RFC series[1] implementing the + draft proposal for SBI system suspend[2] we can add system support to + Linux. This support implements "suspend-to-RAM", which means when a + kernel is built with CONFIG_SUSPEND 'echo mem > /sys/power/state' will + initiate a suspension. + + This has only been tested on QEMU using the OpenSBI system suspend + test. The test just waits 5 seconds and then resumes. To truly use + system suspend a platform must have a low-level firmware implementation + and provide at least one wake-up event, such as from a wakeup-capable + RTC alarm, to resume. + +* [v1: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230106060535.104321-1-jeeheng.sia@starfivetech.com/) + + This series adds RISC-V Hibernation/suspend to disk support. + Low level Arch functions were created to support hibernation. + swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write + cpu state onto the stack, then calling swsusp_save() to save the memory + image. + +* [v3: Add Ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230106030001.1952-1-yanhong.wang@starfivetech.com/) + + This series adds ethernet support for the StarFive JH7110 RISC-V SoC. The series + includes MAC driver. The MAC version is dwmac-5.20 (from Synopsys DesignWare). + For more information and support, you can visit RVspace wiki[1]. + + +* [v1: Support using physical addresses for RISC-V CMO](http://lore.kernel.org/linux-riscv/20230104074146.578485-1-uwu@icenowy.me/) + + Despite the official Zicbom extension only supports virtual addresses, + some vendor-specific extensions, e.g. Xtheadcmo, supports using directly + the physical address. + + This patchset tries to provide a CMO alternative macro variant that is + feed with both VA and PA (and the used one can be picked at runtime), + implement it with PA on T-Head cores, and utilize this variant for some + situations that PA is easily accessible. + +* [v4: arch: rename all internal names __xchg to __arch_xchg](http://lore.kernel.org/linux-riscv/20230105095426.2163354-1-andrzej.hajda@intel.com/) + + __xchg will be used for non-atomic xchg macro. + +* [v2: bpf-next: bpf, x86: Simplify the parsing logic of structure parameters](http://lore.kernel.org/linux-riscv/20230105035026.3091988-1-pulehui@huaweicloud.com/) + + Extra_nregs of structure parameters and nr_args can be + added directly at the beginning, and using a flip flag + to identifiy structure parameters. Meantime, renaming + some variables to make them more sense. + +* [v2: riscv: Move call to init_cpu_topology() to later initialization stage](http://lore.kernel.org/linux-riscv/20230105033705.3946130-1-leyfoon.tan@starfivetech.com/) + + If "capacity-dmips-mhz" is present in a CPU DT node, + topology_parse_cpu_capacity() will fail to allocate memory. + ARM64, with which this code path is shared, does not call + topology_parse_cpu_capacity() until later in boot where memory allocation + is available. + + Move init_cpu_topology(), which calls topology_parse_cpu_capacity(), to a + later initialization stage, to match ARM64. + +* [v4: arch_topology: Build cacheinfo from primary CPU](http://lore.kernel.org/linux-riscv/20230104183033.755668-1-pierre.gondois@arm.com/) + +* [v1: dt-bindings: Add a cpu-capacity property for RISC-V](http://lore.kernel.org/linux-riscv/20230104180513.1379453-1-conor@kernel.org/) + + Ever since RISC-V starting using generic arch topology code, the code + paths for cpu-capacity have been there but there's no binding defined to + actually convey the information. Defining the same property as used on + arm seems to be the only logical thing to do, so do it. + +* [v1: Upstream kvx Linux port](http://lore.kernel.org/linux-riscv/20230103164359.24347-1-ysionneau@kalray.eu/) + + This patch series adds support for the kv3-1 CPU architecture of the kvx family + found in the Coolidge (aka MPPA3-80) SoC of Kalray. + + This is an RFC, since kvx support is not yet upstreamed into gcc/binutils, + therefore this patch series cannot be merged into Linux for now. + +* [v2: Linux RISC-V AIA Support](http://lore.kernel.org/linux-riscv/20230103141409.772298-1-apatel@ventanamicro.com/) + + The RISC-V AIA specification is now frozen as-per the RISC-V international + process. The latest frozen specifcation can be found at: + https://github.com/riscv/riscv-aia/releases/download/1.0-RC1/riscv-interrupts-1.0-RC1.pdf + + This series adds required Linux irqchip drivers for AIA and it depends on + the recent "RISC-V IPI Improvements". + (Refer, https://lore.kernel.org/lkml/20221101143400.690000-1-apatel@ventanamicro.com/t/) + + To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). + +* [v16: RISC-V IPI Improvements](http://lore.kernel.org/linux-riscv/20230103141221.772261-1-apatel@ventanamicro.com/) + + This series aims to improve IPI support in Linux RISC-V in following ways: + 1) Treat IPIs as normal per-CPU interrupts instead of having custom RISC-V + specific hooks. This also makes Linux RISC-V IPI support aligned with + other architectures. + 2) Remote TLB flushes and icache flushes should prefer local IPIs instead + of SBI calls whenever we have specialized hardware (such as RISC-V AIA + IMSIC and RISC-V SWI) which allows S-mode software to directly inject + IPIs without any assistance from M-mode runtime firmware. + +* [v6: Improve CLOCK_EVT_FEAT_C3STOP feature setting](http://lore.kernel.org/linux-riscv/20230103141102.772228-1-apatel@ventanamicro.com/) + + This series improves the RISC-V timer driver to set CLOCK_EVT_FEAT_C3STOP + feature based on RISC-V platform capabilities. + +* [v2: bpf-next: Support bpf trampoline for RV64](http://lore.kernel.org/linux-riscv/20230103090756.1993820-1-pulehui@huaweicloud.com/) + + BPF trampoline is the critical infrastructure of the bpf + subsystem, acting as a mediator between kernel functions + and BPF programs. Numerous important features, such as + using ebpf program for zero overhead kernel introspection, + rely on this key component. We can't wait to support bpf + trampoline on RV64. The implementation of bpf trampoline + was closely to x86 and arm64 for future development. + + +* [Patch "riscv: add support for TIF_NOTIFY_SIGNAL" has been added to the 5.10-stable tree](http://lore.kernel.org/linux-riscv/1672731390244212@kroah.com/) + + This is a note to let you know that I've just added the patch titled + + riscv: add support for TIF_NOTIFY_SIGNAL + + to the 5.10-stable tree which can be found at: + http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary + +* [v1: dt-bindings: Introduce dual-link panels & panel-vendors](http://lore.kernel.org/linux-riscv/20230103064615.5311-1-a-bhatia1@ti.com/) + + Microtips Technology Solutions USA, and Lincoln Technology Solutions are + 2 display panel vendors, and the first 2 patches add their vendor + prefixes. + +* [v12: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230103033531.2011112-1-guoren@kernel.org/) + + The patches convert riscv to use the generic entry infrastructure from + kernel/entry/*. Some optimization for entry.S with new .macro and merge + ret_from_kernel_thread into ret_from_fork. + +* [v1: Temperature sensor support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230103013145.9570-1-hal.feng@starfivetech.com/) + + This patch series adds temperature sensor support for StarFive JH7110 SoC. + The last two patches depend on series + +* [v2: riscv: dts: renesas: rzfive-smarc-som: Enable OSTM nodes](http://lore.kernel.org/linux-riscv/20230102222233.274021-1-prabhakar.mahadev-lad.rj@bp.renesas.com/) + + Enable OSTM{1,2} nodes on RZ/Five SMARC SoM. + + Note, OSTM{1,2} nodes are enabled in the RZ/G2UL SMARC SoM DTSI [0] hence + deleting the disabled nodes from RZ/Five SMARC SoM DTSI enables it here + too as we include in RZ/Five SMARC SoM DTSI. + +* [v3: dt-bindings: riscv: add SBI PMU event mappings](http://lore.kernel.org/linux-riscv/20230102165551.1564960-1-conor@kernel.org/) + + The SBI PMU extension requires a firmware to be aware of the event to + counter/mhpmevent mappings supported by the hardware. OpenSBI may use + DeviceTree to describe the PMU mappings. This binding is currently + described in markdown in OpenSBI (since v1.0 in Dec 2021) & used by QEMU + since v7.2.0. + + Import the binding for use while validating dtb dumps from QEMU and + upcoming hardware (eg JH7110 SoC) that will make use of the event + mapping. + +* [v1: riscv, kprobes: Stricter c.jr/c.jalr decoding](http://lore.kernel.org/linux-riscv/20230102160748.1307289-1-bjorn@kernel.org/) + + In the compressed instruction extension, c.jr, c.jalr, c.mv, and c.add + is encoded the following way (each instruction is 16b): + + 100 0 rs1[4:0]!=0 00000 10: c.jr + 100 1 rs1[4:0]!=0 00000 10: c.jalr + 100 0 rd[4:0]!=0 rs2[4:0]!=0 10: c.mv + 100 1 rd[4:0]!=0 rs2[4:0]!=0 10: c.add + + The following logic is used to decode c.jr and c.jalr: + + insn & 0xf007 == 0x8002 => instruction is an c.jr + insn & 0xf007 == 0x9002 => instruction is an c.jalr + + When 0xf007 is used to mask the instruction, c.mv can be incorrectly + decoded as c.jr, and c.add as c.jalr. + + Correct the decoding by changing the mask from 0xf007 to 0xf07f. + +* [v1: RISC-V: define RUNTIME_DISCARD_EXIT](http://lore.kernel.org/linux-riscv/20230102124936.1363533-1-conor@kernel.org/) + + Masahiro noted: + > arch/riscv/kernel/vmlinux.lds.S clearly says: + > /* we have to discard exit text and such at runtime, not link time */ + > [...] + > so riscv should define RUNTIME_DISCARD_EXIT like x86, arm64. + + As things stand, no ill comes of this - but if "DISCARDS" was to be + re-ordered in the linker script, linking would fail. + Do as suggested by Masahiro and define RUNTIME_DISCARD_EXIT. + +#### 进程调度 + +* [v1: sched/topology: Add __init for sched_init_domains](http://lore.kernel.org/lkml/20230105014943.9857-1-huangbing775@126.com/) + + sched_init_domains is only used in initialization + +#### 内存管理 + +* [v7: DEPT(Dependency Tracker)](http://lore.kernel.org/linux-mm/1673235231-30302-1-git-send-email-byungchul.park@lge.com/) + + Just for those who want to try the latest version of DEPT. + +* [v1: mm-unstable: selftests/mm: convert missing vm->mm changes](http://lore.kernel.org/linux-mm/20230107230643.252273-1-sj@kernel.org/) + + Commit 6b380799d251 ("selftests/vm: rename selftests/vm to + selftests/mm") in mm-unstable is missing some files that need to be + updated for the renaming. This commit adds the changes. + +* [v4: iov_iter: Add extraction helpers](http://lore.kernel.org/linux-mm/167305160937.1521586.133299343565358971.stgit@warthog.procyon.org.uk/) + + Here are patches clean up some use of READ/WRITE and ITER_SOURCE/DEST, + patches to provide support for extracting pages from an iov_iter and a + patch to use the primary extraction function in the block layer bio code if + you could take a look? + +* [v3: Pages not released from memblock to the buddy allocator](http://lore.kernel.org/linux-mm/01010185892dd125-7738e4af-55c6-43b6-9cd9-d52dfea959d9-000000@us-west-2.amazonses.com/) + +* [v1: mm-unstable: mm: introduce folio_is_pfmemalloc](http://lore.kernel.org/linux-mm/20230106215251.599222-1-sidhartha.kumar@oracle.com/) + + Add a folio equivalent for page_is_pfmemalloc. This removes two instances + of page_is_pfmemalloc(folio_page(folio, 0)) so the folio can be used + directly. + +* [v1: add folio_headpage() macro](http://lore.kernel.org/linux-mm/20230106174028.151384-1-sj@kernel.org/) + + The standard idiom for getting head page of a given folio is + '&folio->page'. It is efficient and safe even if the folio is NULL, + because the offset of page field in folio is zero. However, it makes + the code not that easy to understand at the first glance, especially the + NULL safety. Also, sometimes people forget the idiom and use + 'folio_page(folio, 0)' instead. To make it easier to read and remember, + add a new macro function called 'folio_headpage()' with the NULL case + explanation. Then, replace the 'folio_page(folio, 0)' calls with + 'folio_headpage(folio)'. + +* [v1: shmem: optimize shmem_huge_enabled() and shmem_is_huge() when !CONFIG_TRANSPARENT_HUGEPAGE](http://lore.kernel.org/linux-mm/20230105230417.966438-1-ydroneaud@opteya.com/) + + When CONFIG_TRANSPARENT_HUGEPAGE is not set, shmem_is_huge() is not needed + outside of shmem.c. + +* [v1: memremap: Replace 0-length array with flexible array](http://lore.kernel.org/linux-mm/20230105220151.never.343-kees@kernel.org/) + + Zero-length arrays are deprecated[1]. Replace struct ethtool_rxnfc's + "rule_locs" 0-length array with a flexible array. Detected with GCC 13, + using -fstrict-flex-arrays=3 + +* [v2: Split netmem from struct page](http://lore.kernel.org/linux-mm/20230105214631.3939268-1-willy@infradead.org/) + + The MM subsystem is trying to reduce struct page to a single pointer. + The first step towards that is splitting struct page by its individual + users, as has already been done with folio and slab. This patchset does + that for netmem which is used for page pools. + + There are some relatively significant reductions in kernel text size + from these changes. They don't appear to affect performance at all, + but it's nice to save a bit of memory. + +* [v1: Based on latest mm-unstable (85b44c25cd1e).](http://lore.kernel.org/linux-mm/20230105101844.1893104-1-jthoughton@google.com/) + + This series introduces the concept of HugeTLB high-granularity mapping + (HGM). This series teaches HugeTLB how to map HugeTLB pages at + high-granularity, similar to how THPs can be PTE-mapped. + +#### 文件系统 + +* [v6: Turn iomap_page_ops into iomap_folio_ops](http://lore.kernel.org/linux-fsdevel/20230108194034.1444764-1-agruenba@redhat.com/) + + Here's an updated version of this patch queue. Changes since v5 [*]: + + * A new iomap-internal __iomap_get_folio() helper was added. + + * The previous iomap-internal iomap_put_folio() helper was renamed to + __iomap_put_folio() to mirror __iomap_get_folio(). + + * The comment describing struct iomap_folio_ops was still referring to + pages instead of folios in two places. + + Is this good enough for iomap-for-next now, please? + +* [v3: pipe: use __pipe_{lock,unlock} instead of spinlock](http://lore.kernel.org/linux-fsdevel/20230107012324.30698-1-zhanghongchen@loongson.cn/) + + Use spinlock in pipe_read/write cost too much time,IMO + pipe->{head,tail} can be protected by __pipe_{lock,unlock}. + On the other hand, we can use __pipe_{lock,unlock} to protect + the pipe->{head,tail} in pipe_resize_ring and + post_one_notification. + +* [v1: erofs: support page cache sharing between EROFS images in fscache mode](http://lore.kernel.org/linux-fsdevel/20230106125330.55529-1-jefflexu@linux.alibaba.com/) + + Erofs already supports chunk deduplication across different images to + minimize disk usage since v6.1. + +* [v2: filelock: move file locking definitions to separate header file](http://lore.kernel.org/linux-fsdevel/20230105211937.1572384-1-jlayton@kernel.org/) + + The file locking definitions have lived in fs.h since the dawn of time, + but they are only used by a small subset of the source files that + include it. + + Move the file locking definitions to a new header file, and add the + appropriate #include directives to the source files that need them. By + doing this we trim down fs.h a bit and limit the amount of rebuilding + that has to be done when we make changes to the file locking APIs. + +* [v1: filesystems: Simplify if conditional statements](http://lore.kernel.org/linux-fsdevel/20230105061831.3516-1-zeming@nfschina.com/) + + When the * p pointer is null, assign a value to res; otherwise, do not + execute the content in the conditional statement block. + +* [v5: Convert to filemap_get_folios_tag()](http://lore.kernel.org/linux-fsdevel/20230104211448.4804-1-vishal.moola@gmail.com/) + + This patch series replaces find_get_pages_range_tag() with + filemap_get_folios_tag(). This also allows the removal of multiple + calls to compound_head() throughout. + It also makes a good chunk of the straightforward conversions to folios, + and takes the opportunity to introduce a function that grabs a folio + from the pagecache. + + I've run xfstests on xfs, btrfs, ext4, f2fs, and nilfs2, but more testing may + be beneficial. The page-writeback and filemap changes implicitly work. Still + looking for review of cifs, gfs2, and ext4. + +* [v1: xfstests: add fuse support](http://lore.kernel.org/linux-fsdevel/20230104193932.984531-1-jakobunt@gmail.com/) + + This allows using any fuse filesystem that can be mounted with + + mount -t fuse.$FUSE_SUBTYP ... + +* [v1: fs: don't allocate blocks beyond EOF from __mpage_writepage](http://lore.kernel.org/linux-fsdevel/20230103104430.27749-1-jack@suse.cz/) + + When __mpage_writepage() is called for a page beyond EOF, it will go and + allocate all blocks underlying the page. This is not only unnecessary + but this way blocks can get leaked (e.g. if a page beyond EOF is marked + dirty but in the end write fails and i_size is not extended). + +* [v1: mm-unstable: mm/nommu: don't use VM_MAYSHARE for MAP_PRIVATE mappings](http://lore.kernel.org/linux-fsdevel/20230102160856.500584-1-david@redhat.com/) + + Trying to reduce the confusion around VM_SHARED and VM_MAYSHARE first + requires !CONFIG_MMU to stop using VM_MAYSHARE for MAP_PRIVATE mappings. + CONFIG_MMU only sets VM_MAYSHARE for MAP_SHARED mappings. + + This paves the way for further VM_MAYSHARE and VM_SHARED cleanups: for + example, renaming VM_MAYSHARED to VM_MAP_SHARED to make it cleaner what + is actually means. + + Let's first get the weird case out of the way and not use VM_MAYSHARE in + MAP_PRIVATE mappings, using a new VM_MAYOVERLAY flag instead. + +* [v3: Add new open(2) flag - O_EMPTY_PATH](http://lore.kernel.org/linux-fsdevel/20230101153752.20165-1-ahamza@ixsystems.com/) + + This patch adds a new flag O_EMPTY_PATH that allows openat and open + system calls to open a file referenced by fd if the path is empty, + and it is very similar to the FreeBSD O_EMPTY_PATH flag. This can be + beneficial in some cases since it would avoid having to grant /proc + +* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) + + This removes the dependency on interrupts to wake up task. Set task + state as TASK_RUNNING, if need_resched() returns true, + while polling for IO completion. + Earlier, polling task used to sleep, relying on interrupt to wake it up. + This made some IO take very long when interrupt-coalescing is enabled in + NVMe. + +#### 网络设备 + +* [v3: net-next: add PLCA RS support and onsemi NCN26000](http://lore.kernel.org/netdev/cover.1673222807.git.piergiorgio.beruto@gmail.com/) + + This patchset adds support for getting/setting the Physical Layer + Collision Avoidace (PLCA) Reconciliation Sublayer (RS) configuration and + status on Ethernet PHYs that supports it. + +* [v1: RFC: wifi: rtw88: Validate the eFuse structs](http://lore.kernel.org/netdev/20230108213114.547135-1-martin.blumenstingl@googlemail.com/) + + Add static assertions for the PCIe/USB offsets inside the eFuse structs + to ensure that the compiler doesn't add padding anywhere (relevant) + inside the structs. + +* [v1: net: Revert "r8169: disable detection of chip version 36"](http://lore.kernel.org/netdev/42e9674c-d5d0-a65a-f578-e5c74f244739@gmail.com/) + + This chip version seems to be very rare, but it exits in consumer + devices, see linked report. + + https://stackoverflow.com/questions/75049473/cant-setup-a-wired-network-in-archlinux-fresh-install + +* [v11: net-next: vmxnet3: Add XDP support.](http://lore.kernel.org/netdev/20230108181826.88882-1-u9012063@gmail.com/) + + The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT. + +* [v1: net: selftests/net: Isolate l2_tos_ttl_inherit.sh in its own netns.](http://lore.kernel.org/netdev/cover.1673191942.git.gnault@redhat.com/) + + l2_tos_ttl_inherit.sh uses a veth pair to run its tests, but only one + of the veth interfaces runs in a dedicated netns. The other one remains + in the initial namespace where the existing network configuration can + interfere with the setup used for the tests. + + Isolate both veth devices in their own netns and ensure everything gets + cleaned up when the script exits. + +* [v1: net: ena: initialize dim_sample](http://lore.kernel.org/netdev/20230108143843.2987732-1-trix@redhat.com/) + + clang static analysis reports this problem + drivers/net/ethernet/amazon/ena/ena_netdev.c:1821:2: warning: Passed-by-value struct + argument contains uninitialized data (e.g., field: 'comp_ctr') [core.CallAndMessage] + net_dim(&ena_napi->dim, dim_sample); + ^ + + net_dim can call dim_calc_stats() which uses the comp_ctr element, + so it must be initialized. + +* [v1: net-next: Add devlink support to ena](http://lore.kernel.org/netdev/20230108103533.10104-1-darinzon@amazon.com/) + + This patchset adds devlink support to the ena driver. + +* [v4: net-next: mv88e6xxx: Add MAB offload support](http://lore.kernel.org/netdev/20230108094849.1789162-1-netdev@kapio-technology.com/) + + This patch-set adds MAB [1] offload support in mv88e6xxx. + +* [v6: net-next: net: ngbe: Add ngbe mdio bus driver.](http://lore.kernel.org/netdev/20230108093903.27054-1-mengyuanlou@net-swift.com/) + + Add mdio bus register for ngbe. + The internal phy and external phy need to be handled separately. + Add phy changed event detection. + +* [v1: Add Auxiliary driver support](http://lore.kernel.org/netdev/20230108030208.26390-1-ajit.khaparde@broadcom.com/) + + Add auxiliary device driver for Broadcom devices. + The bnxt_en driver will register and initialize an aux device + if RDMA is enabled in the underlying device. + The bnxt_re driver will then probe and initialize the + RoCE interfaces with the infiniband stack. + + We got rid of the bnxt_en_ops which the bnxt_re driver used to + communicate with bnxt_en. + Similarly We have tried to clean up most of the bnxt_ulp_ops. + In most of the cases we used the functions and entry points provided + by the auxiliary bus driver framework. + And now these are the minimal functions needed to support the functionality. + +* [v4: sock: add tracepoint for send recv length](http://lore.kernel.org/netdev/20230108025545.338-1-cuiyunhui@bytedance.com/) + + Add 2 tracepoints to monitor the tcp/udp traffic + of per process and per cgroup. + + Regarding monitoring the tcp/udp traffic of each process, there are two + existing solutions, the first one is https://www.atoptool.nl/netatop.php. + The second is via kprobe/kretprobe. + + Netatop solution is implemented by registering the hook function at the + hook point provided by the netfilter framework. + + These hook functions may be in the soft interrupt context and cannot + directly obtain the pid. Some data structures are added to bind packets + and processes. For example, struct taskinfobucket, struct taskinfo ... + + Every time the process sends and receives packets it needs multiple + hashmaps,resulting in low performance and it has the problem fo inaccurate + tcp/udp traffic statistics(for example: multiple threads share sockets). + + We can obtain the information with kretprobe, but as we know, kprobe gets + the result by trappig in an exception, which loses performance compared + to tracepoint. + +* [v1: mt76: add wed reset callbacks](http://lore.kernel.org/netdev/cover.1673103214.git.lorenzo@kernel.org/) + + Introduce Wireless Ethernet Dispatcher reset callbacks in order to complete + reset requested by ethernet NIC. + + This patch is based on the following mtk_eth_soc series: + https://lore.kernel.org/netdev/cover.1673102767.git.lorenzo@kernel.org/T/#m830c78ce34a4383ae1dedc5349bed19a74dbf4af + +* [v3: net-next: net: ethernet: mtk_wed: introduce reset support](http://lore.kernel.org/netdev/cover.1673102767.git.lorenzo@kernel.org/) + + Introduce proper reset integration between ethernet and wlan drivers in order + to schedule wlan driver reset when ethernet/wed driver is resetting. + Introduce mtk_hw_reset_monitor work in order to detect possible DMA hangs. + +* [v1: net-next: net: ethernet: mtk_wed: get rid of queue lock for rx queue](http://lore.kernel.org/netdev/bff65ff7f9a269b8a066cae0095b798ad5b37065.1673102426.git.lorenzo@kernel.org/) + + mtk_wed_wo_queue_rx_clean and mtk_wed_wo_queue_refill routines can't run + concurrently so get rid of spinlock for rx queues. + +* [[net PATCH] octeontx2-pf: Use GFP_ATOMIC in atomic context](http://lore.kernel.org/netdev/20230107044139.25787-1-gakula@marvell.com/) + + Use GFP_ATOMIC flag instead of GFP_KERNEL while allocating memory + in atomic context. + +* [v9: net-next: virtio/vsock: replace virtio_vsock_pkt with sk_buff](http://lore.kernel.org/netdev/20230107002937.899605-1-bobby.eshleman@bytedance.com/) + + This commit changes virtio/vsock to use sk_buff instead of + virtio_vsock_pkt. Beyond better conforming to other net code, using + sk_buff allows vsock to use sk_buff-dependent features in the future + (such as sockmap) and improves throughput. + +* [v1: net: lan966x: Allow to add rules in TCAM even if not enabled](http://lore.kernel.org/netdev/20230106201507.2206113-1-horatiu.vultur@microchip.com/) + + The blamed commit implemented the vcap_operations to allow to add an + entry in the TCAM. One of the callbacks is to validate the supported + keysets. If the TCAM lookup was not enabled, then this will return + failure so no entries could be added. + This doesn't make much sense, as you can enable at a later point the + TCAM. Therefore change it such to allow entries in TCAM even it is not + enabled. + +* [v1: net: ipv6: prevent only DAD and RS sending for IFF_NO_ADDRCONF](http://lore.kernel.org/netdev/ab8f8ce5b99b658483214f3a9887c0c32efcca80.1673023907.git.lucien.xin@gmail.com/) + + Currently IFF_NO_ADDRCONF is used to prevent all ipv6 addrconf for the + slave ports of team, bonding and failover devices and it means no ipv6 + packets can be sent out through these slave ports. However, for team + device, "nsna_ping" link_watch requires ipv6 addrconf. Otherwise, the + link will be marked failure. + +* [v1: Let iommufd charge IOPTE allocations to the memory cgroup](http://lore.kernel.org/netdev/0-v1-6e8b3997c46d+89e-iommu_map_gfp_jgg@nvidia.com/) + + iommufd follows the same design as KVM and uses memory cgroups to limit + the amount of kernel memory a iommufd file descriptor can pin down. The + various internal data structures already use GFP_KERNEL_ACCOUNT to charge + its own memory. + + However, one of the biggest consumers of kernel memory is the IOPTEs + stored under the iommu_domain and these allocations are not tracked. + +* [v3: net-next: net: wwan: t7xx: fw flashing & coredump support](http://lore.kernel.org/netdev/cover.1673016069.git.m.chetan.kumar@linux.intel.com/) + + This patch series brings-in the support for FM350 wwan device firmware + flashing & coredump collection using devlink interface. + +* [v1: r8152: allow firmwares with NCM support](http://lore.kernel.org/netdev/20230106160739.100708-1-bjorn@mork.no/) + + Some device and firmware combinations with NCM support will + end up using the cdc_ncm driver by default. This is sub- + optimal for the same reasons we've previously accepted the + blacklist hack in cdc_ether. + + The recent support for subclassing the generic USB device + driver allows us to create a very slim driver with the same + functionality. This patch set uses that to implement a + device specific configuration default which is independent + of any USB interface drivers. This means that it works + equally whether the device initially ends up in NCM or ECM + mode, without depending on any code in the respective class + drivers. + +* [v1: net: gro: take care of DODGY packets](http://lore.kernel.org/netdev/20230106142523.1234476-1-edumazet@google.com/) + + Jaroslav reported a recent throughput regression with virtio_net + caused by blamed commit. + + It is unclear if DODGY GSO packets coming from user space + can be accepted by GRO engine in the future with minimal + changes, and if there is any expected gain from it. + + In the meantime, make sure to detect and flush DODGY packets. + +* [v1: net: lan966x: check for ptp to be enabled in lan966x_ptp_deinit()](http://lore.kernel.org/netdev/20230106134830.333494-1-clement.leger@bootlin.com/) + + If ptp was not enabled due to missing IRQ for instance, + lan966x_ptp_deinit() will dereference NULL pointers. + +* [v2: net: ipa: correct IPA v4.7 IMEM offset](http://lore.kernel.org/netdev/20230106132502.3307220-1-elder@linaro.org/) + + Commit b310de784bacd ("net: ipa: add IPA v4.7 support") was merged + despite an unresolved comment made by Konrad Dybcio. Konrad + observed that the IMEM region specified for IPA v4.7 did not match + that used downstream for the SM7225 SoC. In "lagoon.dtsi" present + in a Sony Xperia source tree, a ipa_smmu_ap node was defined with a + "qcom,additional-mapping" property that defined the IPA IMEM area + starting at offset 0x146a8000 (not 0x146a9000 that was committed). + + The IPA v4.7 target system used for testing uses the SM7225 SoC, so + we'll adhere what the downstream code specifies is the address of + the IMEM region used for IPA. + +* [v2: brcmfmac: Prefer DT board type over DMI board type](http://lore.kernel.org/netdev/20230106131905.81854-1-iivanov@suse.de/) + + The introduction of support for Apple board types inadvertently changed + the precedence order, causing hybrid SMBIOS+DT platforms to look up the + firmware using the DMI information instead of the device tree compatible + to generate the board type. Revert back to the old behavior, + as affected platforms use firmwares named after the DT compatible. + +* [v1: wpan-next: ieee802154: Beaconing support](http://lore.kernel.org/netdev/20230106113129.694750-1-miquel.raynal@bootlin.com/) + + Scanning being now supported, we can eg. play with hwsim to verify + everything works as soon as this series including beaconing support gets + merged. + +* [v1: ath10k USB support (QCA9377)](http://lore.kernel.org/netdev/20230106105853.3484381-1-alexander.stein@ew.tq-group.com/) + + apparently there have been several tries for adding ath10k USB support, see + [1] & [2]. There are probably even more. + This series is a first step for supporting my actual device, + a Silex SX-USBAC. This is a Bluetooth & WiFi combo device. + + I picked commit 131da4f5a5b9 ("HACK: ath10k: add start_once support") from + [2] and extracted the ath10k_hw_params_list entry from [3]. + Since v5.9, the base of [3], other required changes have already been + integrated. + For now I tested a very simple STA mode usage profile, using + wpa_supplicant on a WPA interface. AP is untested, module unloading not + supported, probably affected by the firmware start/stop patch 1 adds a + workaround. + + Reading the other, older series, apparently a lot has been merged already, + but I do not know what is still missing fpr proper USB support. + I would like to have a discussion for how to add support so the device is + at least probing and can be used rudimentary. + +* [v4: net-next: usbnet: optimize usbnet_bh() to reduce CPU load](http://lore.kernel.org/netdev/20230106104950.22741-1-lsahn@ooseel.net/) + + The current source pushes skb into dev-done queue by calling + skb_dequeue_tail() and then pop it by skb_dequeue() to branch to + rx_cleanup state for freeing urb/skb in usbnet_bh(). + +* [v1: net-next: Add IP_LOCAL_PORT_RANGE socket option](http://lore.kernel.org/netdev/20221221-sockopt-port-range-v1-0-e2b094b60ffd@cloudflare.com/) + + This patch set is a follow up to the "How to share IPv4 addresses by + partitioning the port space" talk given at LPC 2022 [1]. + +#### 安全增强 + +* [v6: arm64: dts: qcom: sm6125: UFS and xiaomi-laurel-sprout support](http://lore.kernel.org/linux-hardening/20230108195336.388349-1-they@mint.lgbt/) + + Introduce Universal Flash Storage support on SM6125 and add support for the Xiaomi Mi A3 based on the former platform. Uses the name xiaomi-laurel-sprout instead of the official codename (laurel_sprout) due to naming limitations in the kernel. + +* [v1: kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST](http://lore.kernel.org/linux-hardening/20230107040203.never.112-kees@kernel.org/) + + Since the long memcpy tests may stall a system for tens of seconds + in virtualized architecture environments, split those tests off under + CONFIG_MEMCPY_SLOW_KUNIT_TEST so they can be separately disabled. + +* [v2: firmware: coreboot: Check size of table entry and split memcpy](http://lore.kernel.org/linux-hardening/20230107031406.gonna.761-kees@kernel.org/) + + The memcpy() of the data following a coreboot_table_entry couldn't + be evaluated by the compiler under CONFIG_FORTIFY_SOURCE. To make it + easier to reason about, add an explicit flexible array member to struct + coreboot_device so the entire entry can be copied at once. Additionally, + validate the sizes before copying. Avoids this run-time false positive + warning: + + memcpy: detected field-spanning write (size 168) of single field "&device->entry" at drivers/firmware/google/coreboot_table.c:103 (size 8) + +* [v1: scsi: megaraid_sas: Add flexible array member for SGLs](http://lore.kernel.org/linux-hardening/20230106053153.never.999-kees@kernel.org/) + + struct MPI2_RAID_SCSI_IO_REQUEST ends with a single SGL, but expects to + copy multiple. Add a flexible array member so the compiler can reason + about the size of the memcpy(). This will avoid the run-time false + positive warning: + + memcpy: detected field-spanning write (size 128) of single field "&r1_cmd->io_request->SGL" at drivers/scsi/megaraid/megaraid_sas_fusion.c:3326 (size 16) + + This change results in no binary output differences. + +#### 异步 IO + +* [v1: liburing: Always enable CONFIG_NOLIBC if supported and deprecate --nolibc option](http://lore.kernel.org/io-uring/20230106155202.558533-1-ammar.faizi@intel.com/) + + This is an RFC patchset. It's already build-tested. + + Currently, the default liburing compilation uses libc as its dependency. + liburing doesn't depend on libc when it's compiled on x86-64, x86 + (32-bit), and aarch64. There is no benefit to having libc.so linked to + liburing.so on those architectures. + + Always enable CONFIG_NOLBIC if the arch is supported. If the + architecture is not supported, fallback to libc. + +* [v1: liburing: liburing micro-optimzation](http://lore.kernel.org/io-uring/20230106154259.556542-1-ammar.faizi@intel.com/) + + This series contains liburing micro-optimzation. There are two patches + in this series + +* [v1: io_uring: move 'poll_multi_queue' bool in io_ring_ctx](http://lore.kernel.org/io-uring/5c3b0571-ee3b-5bf1-50ce-a2009ee219d5@kernel.dk/) + + The cacheline section holding this variable has two gaps, where one is + caused by this bool not packing well with structs. This causes it to + blow into the next cacheline. Move the variable, shrinking io_ring_ctx + by a full cacheline in size. + +* [v1: io_uring/io-wq: free worker if task_work creation is canceled](http://lore.kernel.org/io-uring/1d287a8e-3c3f-4a8d-f6cc-8199b53ae886@kernel.dk/) + + If we cancel the task_work, the worker will never come into existance. + As this is the last reference to it, ensure that we get it freed + appropriately. + +#### Rust For Linux + +* [Fwd: v1: bpf: scripts: Exclude Rust CUs with pahole](http://lore.kernel.org/rust-for-linux/0ca4ad02-af27-0d1f-8750-1ff6b34e8d2a@gmail.com/) + + I see, I was making a dependency on `auto.conf` in `pahole-flags.sh` but + the former gets generated after the latter is called, so that's the + reason behind the `grep` errors. Sent a new version of the patch. + +* [v2: scripts: Exclude Rust CUs with pahole](http://lore.kernel.org/rust-for-linux/20230108021450.120791-1-yakoyoku@gmail.com/) + + Version 1.24 of pahole has the capability to exclude compilation units + (CUs) of specific languages. Rust, as of writing, is not currently + supported by pahole and if it's used with a build that has BTF debugging + enabled it results in malformed kernel and module binaries (see + Rust-for-Linux/linux#735). So it's better for pahole to exclude Rust + CUs until support for it arrives. + +* [v2: kbuild: rust: move rust/target.json to scripts/](http://lore.kernel.org/rust-for-linux/20230107094545.3384745-1-masahiroy@kernel.org/) + + scripts/ is a better place to generate files used treewide. + + With target.json moved to scripts/, you do not need to add target.json + to no-clean-files or MRPROPER_FILES. + + 'make clean' does not visit scripts/, but 'make mrproper' does. + +* [v1: scripts: store Makefiles in dictionary](http://lore.kernel.org/rust-for-linux/20230103210219.5690-1-apantykhin@gmail.com/) + +#### BPF + +* [v3: bpf-next: bpf: btf: limit logging of ignored BTF mismatches](http://lore.kernel.org/bpf/20230107025331.3240536-1-connoro@google.com/) + + Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF + mismatches are expected and module loading should proceed + anyway. Logging with pr_warn() on every one of these "benign" + mismatches creates unnecessary noise when many such modules are + loaded. Instead, handle this case with a single log warning that BTF + info may be unavailable. + + Mismatches also result in calls to __btf_verifier_log() via + __btf_verifier_log_type() or btf_verifier_log_member(), adding several + additional lines of logging per mismatched module. Add checks to these + paths to skip logging for module BTF mismatches in the "allow + mismatch" case. + + All existing logging behavior is preserved in the default + CONFIG_MODULE_ALLOW_BTF_MISMATCH=n case. + +* [v1: bpf-next: Annotate kfuncs with new __bpf_kfunc macro](http://lore.kernel.org/bpf/20230106195130.1216841-1-void@manifault.com/) + + BPF kfuncs are kernel functions that can be invoked by BPF programs. + kfuncs can be kernel functions which are also called elsewhere in the + main kernel (such as crash_kexec()), or may be functions that are only + meant to be used by BPF programs, such as bpf_task_acquire(), and which + are not called from anywhere else in the kernel. + + While thus far we haven't observed any issues such as kfuncs being + elided by the compiler, at some point we could easily run into problems + such as the following: + + - static kernel functions that are also used as kfuncs could be inlined + and/or elided by the compiler. + - BPF-specific kfuncs with external linkage may at some point be elided + by the compiler in LTO builds, when it's determined that they aren't + called anywhere. + + To address this, this patch set introduces a new __bpf_kfunc macro which + should be added to all kfuncs, and which will protect kfuncs from such + problems. Note that some kfuncs kind of try to do this already by + specifying noinline or __used. We are inconsistent in how this is + applied. __bpf_kfunc should provide a uniform and more-future-proof way + to do this. + +* [v1: bpf: skip task with pid=1 in send_signal_common()](http://lore.kernel.org/bpf/20230106084838.12690-1-sunhao.th@gmail.com/) + + The following kernel panic can be triggered when a task with pid=1 + attach a prog that attempts to send killing signal to itself, also + see [1] for more details: + + Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b + CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-09652-g59fe41b5255f #148 + Call Trace: + + __dump_stack lib/dump_stack.c:88 [inline] + dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106 + panic+0x2c4/0x60f kernel/panic.c:275 + do_exit.cold+0x63/0xe4 kernel/exit.c:789 + do_group_exit+0xd4/0x2a0 kernel/exit.c:950 + get_signal+0x2460/0x2600 kernel/signal.c:2858 + arch_do_signal_or_restart+0x78/0x5d0 arch/x86/kernel/signal.c:306 + exit_to_user_mode_loop kernel/entry/common.c:168 [inline] + exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 + __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] + syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 + do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86 + entry_SYSCALL_64_after_hwframe+0x63/0xcd + + So skip task with pid=1 in bpf_send_signal_common() to avoid the panic. + + [1] https://lore.kernel.org/bpf/20221222043507.33037-1-sunhao.th@gmail.com + +* [v1: bpf-next: bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()](http://lore.kernel.org/bpf/cover.1672976410.git.william.xuanziyang@huawei.com/) + + Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). + Main use case is for using cls_bpf on ingress hook to decapsulate + IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. + + And add ipip6 and ip6ip decap testcases to verify that + bpf_skb_adjust_room() correctly decapsulate ipip6 and ip6ip + tunnel packets. + +* [[RFC/PATCH] perf lock contention: Add -o/--lock-owner option](http://lore.kernel.org/bpf/20230105203231.1598936-1-namhyung@kernel.org/) + + When there're many lock contentions in the system, people sometimes + want to know who caused the contention, IOW who's the owner of the + locks. + + The -o/--lock-owner option tries to follow the lock owners for the + contended mutexes and rwsems from BPF, and then attributes the + contention time to the owner instead of the waiter. It's a best + effort approach to get the owner info at the time of the contention + and doesn't guarantee to have the precise tracking of owners if it's + changing over time. + + Currently it only handles mutex and rwsem that have owner field in + their struct and it basically points to a task_struct that owns the + lock at the moment. + +* [v1: bpf-next: libbpf: poison strlcpy()](http://lore.kernel.org/bpf/tencent_5695A257C4D16B4413036BA1DAACDECB0B07@qq.com/) + + Since commit 9fc205b413b3("libbpf: Add sane strncpy alternative and use + it internally") introduce libbpf_strlcpy(), thus add strlcpy() to a poison + list to prevent accidental use of it. + +* [v6: bpf-next: xdp: hints via kfuncs](http://lore.kernel.org/bpf/20230104215949.529093-1-sdf@google.com/) + + Please see the first patch in the series for the overall + design and use-cases. + +* [v1: bpf: skip invalid kfunc call in backtrack_insn](http://lore.kernel.org/bpf/20230104014709.9375-1-sunhao.th@gmail.com/) + + The verifier skips invalid kfunc call in check_kfunc_call(), which + would be captured in fixup_kfunc_call() if such insn is not + eliminated by dead code elimination. However, this can lead to the + following warning in backtrack_insn() + +* [v3: virtio-net: support multi buffer xdp](http://lore.kernel.org/bpf/20230103064012.108029-1-hengqi@linux.alibaba.com/) + + Currently, virtio net only supports xdp for single-buffer packets + or linearized multi-buffer packets. This patchset supports xdp for + multi-buffer packets, then larger MTU can be used if xdp sets the + xdp.frags. This does not affect single buffer handling. + + In order to build multi-buffer xdp neatly, we integrated the code + into virtnet_build_xdp_buff_mrg() for xdp. The first buffer is used + for prepared xdp buff, and the rest of the buffers are added to + its skb_shared_info structure. This structure can also be + conveniently converted during XDP_PASS to get the corresponding skb. + + Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail() + is based on the assumption of the page pool, + (rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag)) + is negative in most cases. So we didn't set xdp_rxq->frag_size in + virtnet_open() to disable the tail increase. + +### 周边技术动态 + +#### Qemu + +* [v3: riscv-to-apply queue](http://lore.kernel.org/qemu-devel/20230106031357.777790-1-alistair.francis@opensource.wdc.com/) + + The following changes since commit d1852caab131ea898134fdcea8c14bc2ee75fbe9: + + Merge tag 'python-pull-request' of https://gitlab.com/jsnow/qemu into staging (2023-01-05 16:59:22 +0000) + + are available in the Git repository at: + + https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20230106 + + for you to fetch changes up to bc92f261519d5c77c70cf2ebcf0a3b9a414d82d0: + + hw/intc: sifive_plic: Fix the pending register range check (2023-01-06 10:42:55 +1000) + + First RISC-V PR for QEMU 8.0 + + * Fix PMP propagation for tlb + * Collection of bug fixes + * Bump the OpenTitan supported version + * Add smstateen support + * Support native debug icount trigger + * Remove the redundant ipi-id property in the virt machine + * Support cache-related PMU events in virtual mode + * Add some missing PolarFire SoC io regions + * Fix mret exception cause when no pmp rule is configured + * Fix bug where disabling compressed instructions would crash QEMU + * Add Zawrs ISA extension support + * A range of code refactoring and cleanups + +#### U-Boot + +* [v4: riscv: ae350: support OpenSBI 1.0+ which enable FW_PIC](http://lore.kernel.org/u-boot/20230104023748.6109-1-rick@andestech.com/) + + Original OpenSBI (without FW_PIC) will relocate itself + from 0x1000000 to 0x0. After OpenSBI added FW_PIC codes, + it will not relocate any more and always run at 0x1000000. + Hence, it may overlap with Kernel memory region. So it is + necessary to change OpenSBI address from 0x1000000 to 0x0. + + More details can refer to commit cb052d771200 + ("riscv: qemu: spl: Fix booting Linux kernel with OpenSBI 1.0+") + +* [v3: riscv: ae350: support openSBI 1.0+ which enable FW_PIC](http://lore.kernel.org/u-boot/20230104020743.30046-1-rick@andestech.com/) + + Original openSBI (without FW_PIC) will relocate itself + from 0x1000000 to 0x0. After openSBI added FW_PIC codes, + it will not relocate any more and always run at 0x1000000. + Hence, it may overlap with Kernel memory region. So it is + necessary to change openSBI address from 0x1000000 to 0x0. + + More details can refer to commit cb052d771200 + ("riscv: qemu: spl: Fix booting Linux kernel with OpenSBI 1.0+") + +* [v2: riscv: ax25: bypass malloc when spl fit boots from ram](http://lore.kernel.org/u-boot/20230103082012.15379-1-rick@andestech.com/) + + When fit image boots from ram, the payload will + be prepared in the address of SPL_LOAD_FIT_ADDRESS. + In spl fit generic flow, it will malloc another + memory address and copy whole fit image to this + malloc address. But it is un-necessary for booting + from RAM. + + This patch improves this flow by declare the + board_spl_fit_buffer_addr() to replace the original one. + The larger image size (eq: Kernel Image 10 + 20MB), it + can save more booting time. + + Also enhance memcpy function by checking source and + destination address. If they are the same address, + just return and don't copy data anymore. + +* [v2: riscv: ae350: Enable CCTL_SUEN](http://lore.kernel.org/u-boot/20230103081713.15220-1-rick@andestech.com/) + + CCTL operations are available to Supervisor/User-mode + software under the control of the mcache_ctl.CCTL_SUEN + control bit. Enable it to support Supervisor(and User) + CCTL operations. + + +## 20230101:第 27 期 + +### 内核动态 + +#### RISC-V 架构支持 + +* v7: [RESEND: leds: Allwinner A100 LED controller support](http://lore.kernel.org/linux-riscv/20221231235541.13568-1-samuel@sholland.org/) + + This series adds bindings and a driver for the RGB LED controller found + in some Allwinner SoCs, starting with A100. The hardware in the R329 and + D1 SoCs appears to be identical. + +* [v4: riscv: Allwinner D1/D1s platform support](http://lore.kernel.org/linux-riscv/20221231233851.24923-1-samuel@sholland.org/) + + This series adds the Kconfig/defconfig plumbing and devicetrees for a + range of Allwinner D1 and D1s-based boards. Many features are already enabled, including USB, Ethernet, and WiFi. + +* [v2: clk: sunxi-ng: Allwinner R528/T113 clock support](http://lore.kernel.org/linux-riscv/20221231231429.18357-1-samuel@sholland.org/) + + R528 and T113 are SoCs based on the same design as D1/D1s, but with ARM + CPUs instead of RISC-V. They use the same CCU implementation, meaning + the CCU has gates/resets for all peripherals present on any SoC in this + family. I verified the CAN bus bits are also present on D1/D1s. + +* [v1: Allwinner D1 video engine support](http://lore.kernel.org/linux-riscv/20221231164628.19688-1-samuel@sholland.org/) + + This series finishes adding Cedrus support for Allwinner D1. I had + tested the hardware and documented the compatible string a while back, + but at the time I had a dummy SRAM section in the devicetree. Further + testing shows that there is no switchable SRAM section -- there is no + need for it, I was unable to guess the address, and the usual bits in + the SRAM controller register have no effect on the video engine. + +* [v3: arch: rename all internal names __xchg to __arch_xchg](http://lore.kernel.org/linux-riscv/20221230141552.128508-1-andrzej.hajda@intel.com/) + + __xchg will be used for non-atomic xchg macro. + +* [v1: riscv: dts: renesas: rzfive-smarc-som: Enable OSTM nodes](http://lore.kernel.org/linux-riscv/20221229230300.104524-1-prabhakar.mahadev-lad.rj@bp.renesas.com/) + + Enable OSTM{1,2} nodes on RZ/Five SMARC SoM. + + Note, OSTM{1,2} nodes are enabled in the RZ/G2UL SMARC SoM DTSI [0] hence + deleting the disabled nodes from RZ/Five SMARC SoM DTSI enables it here + too as we include [0] in RZ/Five SMARC SoM DTSI. + +* [v2: clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback](http://lore.kernel.org/linux-riscv/20221229224601.103851-1-prabhakar.mahadev-lad.rj@bp.renesas.com/) + + Having a clocksource_arch_init() callback always sets vdso_clock_mode to + VDSO_CLOCKMODE_ARCHTIMER if GENERIC_GETTIMEOFDAY is enabled, this is + required for the riscv-timer. + + This works for platforms where just riscv-timer clocksource is present. + On platforms where other clock sources are available we want them to + register with vdso_clock_mode set to VDSO_CLOCKMODE_NONE. + +* [v1: riscv: sbi: Switch to the sys-off handler API](http://lore.kernel.org/linux-riscv/20221228161915.13194-1-samuel@sholland.org/) + + I want to convert the axp20x PMIC poweroff handler to use the sys-off + API, so it can be used as a fallback for if the SBI poweroff handler + is unavailable. But the SBI poweroff handler still uses pm_power_off, so + done alone, this would cause the axp20x callback to be called first, + before the SBI poweroff handler has a chance to run. + +* [v2: hwrng: starfive - Add driver for TRNG module](http://lore.kernel.org/linux-riscv/20221228071103.91797-1-jiajie.ho@starfivetech.com/) + + This patch series adds kernel support for StarFive hardware random + number generator. First 2 patches add bindings documentation and driver + for this module. Patch 3 adds devicetree entry for VisionFive v2 SoC. + +* [v1: clocksource/drivers/riscv: Increase the clock source rating](http://lore.kernel.org/linux-riscv/20221228004444.61568-1-samuel@sholland.org/) + + RISC-V provides an architectural clock source via the time CSR. This + clock source exposes a 64-bit counter synchronized across all CPUs. + Because it is accessed using a CSR, it is much more efficient to read + than MMIO clock sources. For example, on the Allwinner D1, reading the + sun4i timer in a loop takes 131 cycles/iteration, while reading the RISC-V time CSR takes only 5 cycles/iteration. + +* [v2: dt-bindings: riscv: add SBI PMU event mappings](http://lore.kernel.org/linux-riscv/20221227194056.3891216-1-conor@kernel.org/) + + The SBI PMU extension requires a firmware to be aware of the event to + counter/mhpmevent mappings supported by the hardware. OpenSBI may use + DeviceTree to describe the PMU mappings. This binding is currently + described in markdown in OpenSBI (since v1.0 in Dec 2021) & used by QEMU since v7.2.0. + +* [v2: StarFive's SDIO/eMMC driver support](http://lore.kernel.org/linux-riscv/20221227122227.460921-1-william.qiu@starfivetech.com/) + + This patchset adds initial rudimentary support for the StarFive + designware mobile storage host controller driver. And this driver will + be used in StarFive's VisionFive 2 board. The main purpose of adding + this driver is to accommodate the ultra-high speed mode of eMMC. + +* [v5: Add OPTPROBES feature on RISCV](http://lore.kernel.org/linux-riscv/20221224114315.850130-1-chenguokai17@mails.ucas.ac.cn/) + + Add jump optimization support for RISC-V. + + Replaces ebreak instructions used by normal kprobes with an + auipc+jalr instruction pair, at the aim of suppressing the probe-hit overhead. + + All known optprobe-capable RISC architectures have been using a single + jump or branch instructions while this patch chooses not. RISC-V has a + quite limited jump range (4KB or 2MB) for both its branch and jump instructions, which prevent optimizations from supporting probes that spread all over the kernel. + +#### 进程调度 + +* [v2: sched/fair: unlink misfit task from cpu overutilized](http://lore.kernel.org/lkml/20221228165415.3436-1-vincent.guittot@linaro.org/) + + By taking into account uclamp_min, the 1:1 relation between task misfit + and cpu overutilized is no more true as a task with a small util_avg of + may not may not fit a high capacity cpu because of uclamp_min constraint. + + Add a new state in util_fits_cpu() to reflect the case that task would fit + a CPU except for the uclamp_min hint which is a performance requirement. + +* [v1: sched: print parent comm in sched_show_task()](http://lore.kernel.org/lkml/20221227161400.GA7646@didi-ThinkCentre-M930t-N000/) + + Knowing who the parent is might be useful for debugging. + For example, we can sometimes resolve kernel hung tasks by stopping + the person who begins those hung tasks. + With the parent's name printed in sched_show_task(), it might be helpful to let people know which "service" should be operated. + +* [v1: sched/cputime: Make cputime_adjust() more accurate](http://lore.kernel.org/lkml/20221226031010.4079885-1-maxing.lan@bytedance.com/) + + In the current algorithm of cputime_adjust(), the accumulated stime and + utime are used to divide the accumulated rtime. When the value is very + large, it is easy for the stime or utime not to be updated. + +#### 内存管理 + +* [v1: Get rid of first tail page fields](http://lore.kernel.org/linux-mm/20221231214610.2800682-1-willy@infradead.org/) + + Continue the shrinkage of the struct page definition by getting rid of the + 'first tail page' fields. I originally did this patch set before Hugh's + rewrite of the subpages_mapcount, so it needed substantial updates; + hope I didn't miss anything. + +* [v2: scripts/gdb: add mm introspection utils](http://lore.kernel.org/linux-mm/20221231171258.7907-1-dmitrii.bundin.a@gmail.com/) + + This command provides a way to traverse the entire page hierarchy by a + given virtual address on x86. In addition to qemu's commands info + tlb/info mem it provides the complete information about the paging structure for an arbitrary virtual address. It supports 4KB/2MB/1GB and 5 level paging. + +* [v2: mm: huge_memory: convert split_huge_pages_all() to use a folio](http://lore.kernel.org/linux-mm/20221230093020.9664-1-wangkefeng.wang@huawei.com/) + + Straightforwardly convert split_huge_pages_all() to use a folio. + +* [v4: -next: mm: convert page_idle/damon to use folios](http://lore.kernel.org/linux-mm/20221230070849.63358-1-wangkefeng.wang@huawei.com/) + +* [v3: mm/page_reporting: replace rcu_access_pointer() with rcu_dereference_protected()](http://lore.kernel.org/linux-mm/20221228175942.149491-1-sj@kernel.org/) + + Page reporting fetches pr_dev_info using rcu_access_pointer(), which is + for safely fetching a pointer that will not be dereferenced but could + concurrently updated. The code indeed does not dereference pr_dev_info + after fetching it using rcu_access_pointer(), but it fetches the pointer + while concurrent updates to the pointer is avoided by holding the update + side lock, page_reporting_mutex. + +* [v1: mm, slab: periodically resched in drain_freelist()](http://lore.kernel.org/linux-mm/b1808b92-86df-9f53-bfb2-8862a9c554e9@google.com/) + + drain_freelist() can be called with a very large number of slabs to free, + such as for kmem_cache_shrink(), or depending on various settings of the + slab cache when doing periodic reaping. + + If there is a potentially long list of slabs to drain, periodically + schedule to ensure we aren't saturating the cpu for too long. + +* [v1: arm64/vmalloc: use module region only for module_alloc() if CONFIG_RANDOMIZE_BASE is set](http://lore.kernel.org/linux-mm/20221227092634.445212-1-liushixin2@huawei.com/) + + After I add a 10GB pmem device, I got the following error message when + insert module: + + insmod: vmalloc error: size 16384, vm_struct allocation failed, + mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 + + Skip module region if not calling from module_alloc(). + +* [v1: migrate_pages(): batch TLB flushing](http://lore.kernel.org/linux-mm/20221227002859.27740-1-ying.huang@intel.com/) + + If multiple folios are passed to migrate_pages(), there are + opportunities to batch the TLB flushing and copying. That is, we can + change the code to something as follows, + + The total number of TLB flushing IPI can be reduced considerably. And we may use some hardware accelerator such as DSA to accelerate the folio copying. + +#### 文件系统 + +* [v1: fs/ext4: Replace kmap_atomic() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20221231174439.8557-1-fmdefrancesco@gmail.com/) + + However, the code within the mappings and un-mappings in ext4/inline.c + does not depend on the above-mentioned side effects. + + Therefore, a mere replacement of the old API with the new one is all it + is required (i.e., there is no need to explicitly add any calls to pagefault_disable() and/or preempt_disable()). + +* [v1: fs/ext2: Replace kmap_atomic() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20221231174205.8492-1-fmdefrancesco@gmail.com/) + + However, the code within the mapping and un-mapping in ext2_make_empty() + does not depend on the above-mentioned side effects. + + Therefore, a mere replacement of the old API with the new one is all it + is required (i.e., there is no need to explicitly add any calls to + pagefault_disable() and/or preempt_disable()). + +* [v5: Turn iomap_page_ops into iomap_folio_ops](http://lore.kernel.org/linux-fsdevel/20221231150919.659533-1-agruenba@redhat.com/) + + The patches are split up into relatively small pieces. That may seem + unnecessary, but at least it makes reviewing the patches easier. + +* [v1: fs/sysv: Replace kmap() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20221231075717.10258-1-fmdefrancesco@gmail.com/) + + kmap() is deprecated in favor of kmap_local_page(). + + There are two main problems with kmap(): (1) It comes with an overhead as + the mapping space is restricted and protected by a global lock for + synchronization and (2) it also requires global TLB invalidation when the + kmap’s pool wraps and it might block when the mapping space is fully + utilized until a slot becomes available. + +* [v4: -next: fs: coredump: using preprocessor directives for dump_emit_page](http://lore.kernel.org/linux-fsdevel/20221230022446.448179-1-xiehongyu1@kylinos.cn/) + + When CONFIG_COREDUMP is set and CONFIG_ELF_CORE is not, you'll get warnings + like: + fs/coredump.c:841:12: error: ‘dump_emit_page’ defined but not used + [v1: -Werror=unused-function] + 841 | static int dump_emit_page(struct coredump_params *cprm, struct + page *page) + + dump_emit_page only called in dump_user_range, since dump_user_range + using #ifdef preprocessor directives, use #ifdef for dump_emit_page too. + +* [v5: fs/ufs: Replace kmap() with kmap_local_page](http://lore.kernel.org/linux-fsdevel/20221229225100.22141-1-fmdefrancesco@gmail.com/) + + With kmap_local_page() the mappings are per thread, CPU local, can take + page faults, and can be called from any context (including interrupts). + It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, + the tasks can be preempted and, when they are scheduled to run again, the + kernel virtual addresses are restored and still valid. + +* [v2: Introduce provisioning primitives for thinly provisioned storage](http://lore.kernel.org/linux-fsdevel/20221229081252.452240-1-sarthakkukreti@chromium.org/) + + This patch series adds a mechanism to pass through provision requests on + stacked thinly provisioned storage devices/filesystems. + + The linux kernel provides several mechanisms to set up thinly provisioned + block storage abstractions (eg. dm-thin, loop devices over sparse files), + either directly as block devices or backing storage for filesystems. + +* [v1: Add new open(2) flag - O_EMPTY_PATH](http://lore.kernel.org/linux-fsdevel/20221228160249.428399-1-ahamza@ixsystems.com/) + + This patch adds a new flag O_EMPTY_PATH that allows openat and open + system calls to open a file referenced by fd if the path is empty, + and it is very similar to the FreeBSD O_EMPTY_PATH flag. + +* [v1: fs: nls: Simplification of ASCII and ISO-8859-1](http://lore.kernel.org/linux-fsdevel/20221226144301.16382-1-pali@kernel.org/) + + This is RFC patch series which simplify ASCII and ISO-8859-1 tables. + I'm not sure what is the direction of the nls code and duplicated + default/iso88591 tables, so I'm sending this series as RFC. + +* [v2: eventfd: use a generic helper instead of an open coded wait_event](http://lore.kernel.org/linux-fsdevel/tencent_1D2E4866B2223D9A19DF4FFB79AFAA955A05@qq.com/) + + Use wait_event_interruptible_locked_irq() in the eventfd_{write,read} to + avoid the longer, open coded equivalent. + +* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) + + This removes the dependency on interrupts to wake up task. Set task + state as TASK_RUNNING, if need_resched() returns true, + while polling for IO completion. + Earlier, polling task used to sleep, relying on interrupt to wake it up. + This made some IO take very long when interrupt-coalescing is enabled in NVMe. + +#### 网络设备 + +* [v1: net-next: net: ipa: simplify IPA interrupt handling](http://lore.kernel.org/netdev/20221230232230.2348757-1-elder@linaro.org/) + + One of the IPA's two IRQs fires when data on a suspended channel is + available (to request that the channel--or system--be resumed to + recieve the pending data). This interrupt also handles a few + conditions signaled by the embedded microcontroller. + + For this "IPA interrupt", the current code requires a handler to be + dynamically registered for each interrupt condition. Any condition + that has no registered handler is quietly ignored. This design is derived from the downstream IPA driver implementation. + +* [v1: net: ipa: use proper endpoint mask for suspend](http://lore.kernel.org/netdev/20221230223304.2137471-1-elder@linaro.org/) + + It is now possible for a system to have more than 32 endpoints. As + a result, registers related to endpoint suspend are parameterized, with 32 endpoints represented in one more registers. + +* [v2: net-next: r8169: disable ASPM in case of tx timeout](http://lore.kernel.org/netdev/06bab827-be4a-606e-7a01-52379b1e1a91@gmail.com/) + + There are still single reports of systems where ASPM incompatibilities + cause tx timeouts. It's not clear whom to blame, so let's disable + ASPM in case of a tx timeout. + +* [v1: igc: Mask replay rollover/timeout errors in I225_LMVP](http://lore.kernel.org/netdev/20221229122640.239859-1-rajat.khandelwal@linux.intel.com/) + + The CPU logs get flooded with replay rollover/timeout AER errors in + the system with i225_lmvp connected, usually inside thunderbolt devices. + + One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates + an Intel Foxville chipset, which uses the igc driver. + On connecting ethernet, CPU logs get inundated with these errors. + +* [v1: tcp/udp: add tracepoint for send recv length](http://lore.kernel.org/netdev/20221229080207.1029-1-cuiyunhui@bytedance.com/) + + Add a tracepoint for capturing TCP segments with + a send or receive length. This makes it easy to obtain + the packet sending and receiving information of each process in the user mode, such as the netatop tool. + +* [v1: wifi: ath9k: htc_hst: free skb in ath9k_htc_rx_msg() if there is no callback function](http://lore.kernel.org/netdev/20221228224047.146399-1-pchelkin@ispras.ru/) + + It is stated that ath9k_htc_rx_msg() either frees the provided skb or + passes its management to another callback function. However, the skb is + not freed in case there is no another callback function, and Syzkaller was + able to cause a memory leak. Also minor comment fix. + +* [v2: net: qed: allow sleep in qed_mcp_trace_dump()](http://lore.kernel.org/netdev/20221228220045.101647-1-csander@purestorage.com/) + + By default, qed_mcp_cmd_and_union() delays 10us at a time in a loop + that can run 500K times, so calls to qed_mcp_nvm_rd_cmd() + may block the current thread for over 5s. + We observed thread scheduling delays over 700ms in production, + with stacktraces pointing to this code as the culprit. + +* [v1: net: amd-xgbe: add missed tasklet_kill](http://lore.kernel.org/netdev/20221228081447.3400369-1-jiguang.xiao@windriver.com/) + + The driver does not call tasklet_kill in several places. + Add the calls to fix it. + +* [v2: net: hns3: refine the handling for VF heartbeat](http://lore.kernel.org/netdev/20221228062749.58809-1-lanhao@huawei.com/) + + Currently, the PF check the VF alive by the KEEP_ALVE + mailbox from VF. VF keep sending the mailbox per 2 + seconds. Once PF lost the mailbox for more than 8 + seconds, it will regards the VF is abnormal, and stop + notifying the state change to VF, include link state, + vf mac, reset, even though it receives the KEEP_ALIVE mailbox again. + +* [v1: rtw88: Add SDIO support](http://lore.kernel.org/netdev/20221227233020.284266-1-martin.blumenstingl@googlemail.com/) + + Recently the rtw88 driver has gained locking support for the "slow" bus + types (USB, SDIO) as part of USB support. Thanks to everyone who helped + make this happen! + + Based on the USB work (especially the locking part and various + bugfixes) this series adds support for SDIO based cards. It's the + result of a collaboration between Jernej and myself. Neither of us has + access to the rtw88 datasheets. All of our work is based on studying + the RTL8822BS and RTL8822CS vendor drivers and trial and error. + + Jernej and myself have tested this with RTL8822BS and RTL8822CS cards. + Other users have confirmed that RTL8821CS support is working as well. + RTL8723DS may also work (we tried our best to handle rtw_chip_wcpu_11n + where needed) but has not been tested at this point. + + Jernej's results with a RTL8822BS: + - Main functionality works + - Had a case where no traffic got across the link until he issued a + scan + + My results with a RTL8822CS: + - 2.4GHz and 5GHz bands are both working + - TX throughput on a 5GHz network is between 50 Mbit/s and 90 Mbit/s + - RX throughput on a 5GHz network is at 19 Mbit/s + - Sometimes there are frequent reconnects (once every 1-5 minutes) + after the link has been up for a long time (multiple hours). Today + I was unable to reproduce this though (I only had reconnect in 8 + hours). + + Why is this an RFC? + - It needs a through review especially by the rtw88 maintainers + - It's not clear to me how the "mmc: sdio" patch will be merged (will + Ulf take this or can we merge + +* [v1: net: dpaa2-mac: Get serdes only for backplane links](http://lore.kernel.org/netdev/20221227230918.2440351-1-sean.anderson@seco.com/) + + This implies that Linux only manages the SerDes when the link type is + backplane. From my testing, the link fails to come up when the link type is + phy, but does come up when it is backplane. Modify the condition in dpaa2_mac_connect to reflect this, moving the existing conditions to more appropriate places. + +* [v2: net-next: net: mdio: Start separating C22 and C45](http://lore.kernel.org/netdev/20221227-v6-2-rc1-c45-seperation-v2-0-ddb37710e5a7@walle.cc/) + + This patch set starts the separation of C22 and C45 MDIO bus + transactions at the API level to the MDIO Bus drivers. C45 read and + write ops are added to the MDIO bus driver structure, and the MDIO + core will try to use these ops if requested to perform a C45 + transfer. + +* [v1: ethtool-next: JSON output support for Netlink implementation of --show-ring option](http://lore.kernel.org/netdev/20221227175221.7762-1-glipus@gmail.com/) + + Add --json support for Netlink implementation of --show-ring option No changes for non-JSON output for this featire. + +* [v1: s390/qeth: convert sysfs snprintf to sysfs_emit](http://lore.kernel.org/netdev/20221227110352.1436120-1-zhangxuezhi3@gmail.com/) + + Follow the advice of the Documentation/filesystems/sysfs.rst + and show() should only use sysfs_emit() or sysfs_emit_at() + when formatting the value to be returned to user space. + +* [v1: iproute2: dcb: Do not leave ACKs in socket receive buffer](http://lore.kernel.org/netdev/20221227110318.2899056-1-idosch@nvidia.com/) + + Originally, the dcb utility only stopped receiving messages from a + socket when it found the attribute it was looking for. Cited commit + changed that, so that the utility will also stop when seeing an ACK + (NLMSG_ERROR message), by setting the NLM_F_ACK flag on requests. + + This is problematic because it means a successful request will leave an ACK in the socket receive buffer, causing the next request to bail before reading its response. + +* [v1: Introduce a vringh accessor for IO memory](http://lore.kernel.org/netdev/20221227022528.609839-1-mie@igel.co.jp/) + + Vringh is a host-side implementation of virtio rings, and supports the + vring located on three kinds of memories, userspace, kernel space and a + space translated iotlb. + + The goal of this patchset is to refactor vringh and introduce a new vringh + accessor for the vring located on the io memory region. + +* [v2: Add some USB hotspot IDs](http://lore.kernel.org/netdev/20221226234751.444917-1-mjg59@srcf.ucam.org/) + + Add a few additional IDs to support a couple of hotspots I had lying + around. V2 avoids reserving the PPP modem endpoint for the MDM9207 + devices. + +* [v2: net: net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers](http://lore.kernel.org/netdev/20221226114825.1937189-1-d-tatianin@yandex-team.ru/) + + This series fixes a potential NULL dereference in ethtool_get_phy_stats + while also attempting to refactor/split said function into multiple + helpers so that it's easier to reason about what's going on. + +* [v2: wireless-next: wl18xx: use strscpy() to instead of strncpy()](http://lore.kernel.org/netdev/202212261914060599112@zte.com.cn/) + + The implementation of strscpy() is more robust and safer. + That's now the recommended way to copy NUL-terminated strings. + +* [v1: virtio-net: don't busy poll for cvq command](http://lore.kernel.org/netdev/20221226074908.8154-1-jasowang@redhat.com/) + + The code used to busy poll for cvq command which turns out to have + several side effects: + + 1) infinite poll for buggy devices + 2) bad interaction with scheduler + + So this series tries to use sleep + timeout instead of busy polling. + +* [v1: batman-adv: Check return value](http://lore.kernel.org/netdev/20221224233311.48678-1-artem.chernyshev@red-soft.ru/) + + Check, if rtnl_link_register() call in batadv_init() was successful + + Found by Linux Verification Center (linuxtesting.org) with SVACE. + +#### BPF + +* [v1: bpf-next: Support for BPF_ST instruction in LLVM C compiler](http://lore.kernel.org/bpf/20221231163122.1360813-1-eddyz87@gmail.com/) + + Currently LLVM BPF back-end does not emit BPF_ST instruction and does not allow one to be specified as inline assembly. + + Recently I've been exploring ways to port some of the verifier test + cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly + and machinery provided in tools/testing/selftests/bpf/test_loader.c + (which should hopefully simplify tests maintenance). + +* [v1: bpf-next: libbpf: Add LoongArch support to bpf_tracing.h](http://lore.kernel.org/bpf/20221231100757.3177034-1-hengqi.chen@gmail.com/) + + Add PT_REGS macros for LoongArch ([v1: 0]). + +* [v1: bpf-next: bpf: Handle reuse in bpf memory alloc](http://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/) + + Tndles element reuse in bpf memory allocator. The immediate reuse of + freed elements may lead to two problems in htab map: + reuse will reinitialize special fields (e.g., bpf_spin_lock) in htab map value and it may corrupt lookup procedure with BFP_F_LOCK flag which acquires bpf-spin-lock during value copying. The corruption of bpf-spin-lock may result in hard lock-up. + +* [bpf helpers freeze. Was: v2: bpf-next: Dynptr convenience helpers](http://lore.kernel.org/bpf/20221225215210.ekmfhyczgubx4rih@macbook-pro-6.dhcp.thefacebook.com/) + + uapi helpers vs kfuncs argument is not a black and white comparison. + It's not just stable vs unstable. + uapi has strict rules and helpers in uapi/bpf.h have to follow those rules. + While kfuncs in terms of stability are equivalent to EXPORT_SYMBOL_GPL. + +### 周边技术动态 + +#### Qemu + +* [v1: riscv: do not set the rounding mode via `gen_set_rm`](http://lore.kernel.org/qemu-devel/20221229172734.119600-1-abdulras@google.com/) + + Setting the rounding mode via the `gen_set_rm` call would alter the + state of the disassembler, resetting the `TransOp` in the assembler + context. When we subsequently set the rounding mode to the desired + value, we would trigger an assertion in `decode_save_opc`. + +* [v2: hw/riscv: Improve Spike HTIF emulation fidelity](http://lore.kernel.org/qemu-devel/20221229091828.1945072-1-bmeng@tinylab.org/) + + At present the 32-bit OpenSBI generic firmware image does not boot on + Spike, only 64-bit image can. This is due to the HTIF emulation does + not implement the proxy syscall interface which is required for the + 32-bit HTIF console output. + + An OpenSBI bug fix [1] is also needed when booting the plain binary image. + +#### Buildroot + +* [v2: package/qemu: refactor target emulator selection](http://lore.kernel.org/buildroot/20221227114842.2620182-1-unixmania@gmail.com/) + + Since CUSTOM_TARGETS does not select FDT, we can get build errors like + this: + + ../meson.build:2778:2: ERROR: Problem encountered: fdt not available but required by targets x86_64-softmmu + + We could select FDT when CUSTOM_TARGETS is set, but this would force an + unnecessary dependency on dtc, as BR2_PACKAGE_QEMU_SYSTEM does. + +#### U-Boot + +* [v1: Pull request for efi-2023-01-rc5-2](http://lore.kernel.org/u-boot/80393d33-8a51-2840-b5c5-112298e4c5aa@gmx.de/) + + Pull request for efi-2023-01-rc5-2 + + Documentation: + + * Reorganize existing TI docs and add K3 generation page + * Add texinfodocs and infodocs targets + * Update qemu-ppce500 documentation + * Use "changesets" not "csets" in statistics pages + + UEFI + + * Fix merging of preseeded non-volatile variables + * Fix a return value in the EFI_HII_DATABASE_PROTOCOL + * Set UEFI specification version to 2.10 diff --git a/news/README.md b/news/README.md index 76454cddc68a1d7917ff62aec4aa936a647815ee..bda1ad9718019d762d23a100e460905974e39df5 100644 --- a/news/README.md +++ b/news/README.md @@ -3,6 +3,7 @@ ## 往年技术动态汇总 * [2022 年](2022.md) +* [2023 年 - 上半年](2023-1st-half.md) ## 20230604:第 48 期 @@ -1042,20413 +1043,3 @@ > early stage. It compiles, and boots Linux kernels, but there is no PLL > driver I can find currently. So clocks are still hanging in PROBE_DEFER. > - - -## 20230528:第 47 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v1: riscv: Reduce ARCH_KMALLOC_MINALIGN to 8](http://lore.kernel.org/linux-riscv/20230526165958.908-1-jszhang@kernel.org/)** - -> Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E -> 64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel -> Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus -> it brings some bad effects to for coherent platforms: -> -> Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and -> kmalloc-8 slab caches don't exist any more, they are replaced with -> either kmalloc-128 or kmalloc-64. -> - -**[v1: RISC-V: mark hibernation as nonportable](http://lore.kernel.org/linux-riscv/20230526-astride-detonator-9ae120051159@wendy/)** - -> Hibernation support depends on firmware marking its reserved/PMP -> protected regions as not accessible from Linux. -> The latest versions of the de-facto SBI implementation (OpenSBI) do -> not do this, having dropped the no-map property to enable 1 GiB huge -> page mappings by the kernel. -> This was exposed by commit 3335068f8721 ("riscv: Use PUD/P4D/PGD pages -> for the linear mapping"), which made the first 2 MiB of DRAM (where SBI -> typically resides) accessible by the kernel. -> - -**[v2: RISC-V: KVM: Ensure SBI extension is enabled](http://lore.kernel.org/linux-riscv/20230526102540.105013-1-ajones@ventanamicro.com/)** - -> Ensure guests can't attempt to invoke SBI extension functions when the -> SBI extension's probe function has stated that the extension is not -> available. -> - -**[v1: Add initialization of clock for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230526062529.46747-1-william.qiu@starfivetech.com/)** - -> This patchset adds initial rudimentary support for the StarFive -> Quad SPI controller driver. And this driver will be used in -> StarFive's VisionFive 2 board. In 6.4, the QSPI_AHB and QSPI_APB -> clocks changed from the default ON state to the default OFF state, -> so these clocks need to be enabled in the driver.At the same time, -> dts patch is added to this series. -> - -**[v2: RISCV: Add KVM_GET_REG_LIST API](http://lore.kernel.org/linux-riscv/cover.1684999824.git.haibo1.xu@intel.com/)** - -> KVM_GET_REG_LIST will dump all register IDs that are available to -> KVM_GET/SET_ONE_REG and It's very useful to identify some platform -> regression issue during VM migration. -> - -**[v1: riscv: Kconfig: Add select ARM_AMBA to SOC_STARFIVE](http://lore.kernel.org/linux-riscv/20230525061836.79223-1-jiajie.ho@starfivetech.com/)** - -> Selects ARM_AMBA platform support for StarFive SoCs required by spi and -> crypto dma engine. -> - -**[v1: tools/nolibc: riscv: Add full rv32 support](http://lore.kernel.org/linux-riscv/cover.1684949267.git.falcon@tinylab.org/)** - -> In the first series [1], we have fixed up the compile errors about -> _start and __NR_llseek for rv32, but left compile errors about tons of -> time32 syscalls (removed after kernel commit d4c08b9776b3 ("riscv: Use -> latest system call ABI")) and the missing fstat in nolibc-test.c [2], -> now we have fixed up all of them. -> - -**[v1: Add support for Allwinner GPADC on D1/T113s/R329 SoCs](http://lore.kernel.org/linux-riscv/20230524082744.3215427-1-bigunclemax@gmail.com/)** - -> This series adds support for general purpose ADC (GPADC) on new -> Allwinner's SoCs, such as D1, T113s and R329. The implemented driver -> provides basic functionality for getting ADC channels data. -> - -**[v2: dmaengine: pl330: rename _start to prevent build error](http://lore.kernel.org/linux-riscv/20230524045310.27923-1-rdunlap@infradead.org/)** - -> "_start" is used in several arches and proably should be reserved -> for ARCH usage. Using it in a driver for a private symbol can cause -> a build error when it conflicts with ARCH usage of the same symbol. -> - -**[v1: riscv: mm: try VMA lock-based page fault handling first](http://lore.kernel.org/linux-riscv/20230523165942.2630-1-jszhang@kernel.org/)** - -> Attempt VMA lock-based page fault handling first, and fall back to the -> existing mmap_lock-based handling if that fails. -> - -**[v2: riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION](http://lore.kernel.org/linux-riscv/20230523165502.2592-1-jszhang@kernel.org/)** - -> When trying to run linux with various opensource riscv core on -> resource limited FPGA platforms, for example, those FPGAs with less -> than 16MB SDRAM, I want to save mem as much as possible. One of the -> major technologies is kernel size optimizations, I found that riscv -> does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which -> passes -fdata-sections, -ffunction-sections to CFLAGS and passes the -> --gc-sections flag to the linker. -> - -**[v3: Add Zawrs support and use it for spinlocks](http://lore.kernel.org/linux-riscv/20230521114715.955823-1-heiko.stuebner@vrull.eu/)** - -> Zawrs [0] was ratified in november 2022 [1], so I've resurrect the patch -> adding Zawrs support for spinlocks and adapted it to recent kernel -> changes. -> -> Also incorporated are the nice comments David Laight provided on v2. -> - -**[v1: tools/nolibc: autodetect stackprotector availability from compiler](http://lore.kernel.org/linux-riscv/20230521-nolibc-automatic-stack-protector-v1-0-dad6c80c51c1@weissschuh.net/)** - -> As suggested by Willy it is possible to detect the availability of -> stackprotector via preprocessor defines. -> Make use of that to simplify the code and interface of nolibc. -> - -**[v1: RISC-V: KVM: Redirect AMO load/store misaligned traps to guest](http://lore.kernel.org/linux-riscv/20230520150116.7451-1-waylingII@gmail.com/)** - -> The M-mode redirects an unhandled misaligned trap back -> to S-mode when not delegating it to VS-mode(hedeleg). -> However, KVM running in HS-mode terminates the VS-mode -> software when back from M-mode. -> The KVM should redirect the trap back to VS-mode, and -> let VS-mode trap handler decide the next step. -> - -#### 进程调度 - -**[v1: sched/psi: make psi_cgroups_enabled static](http://lore.kernel.org/lkml/20230525103428.49712-1-linmiaohe@huawei.com/)** - -> The static key psi_cgroups_enabled is only used inside file psi.c. -> Make it static. -> - -**[v1: sched/fair: Don't balance task to its current running CPU](http://lore.kernel.org/lkml/20230524072018.62204-1-yangyicong@huawei.com/)** - -> Further investigation shows that the warning is superfluous, the migration -> disabled task is just going to be migrated to its current running CPU. -> This is because that on load balance if the dst_cpu is not allowed by the -> task, we'll re-select a new_dst_cpu as a candidate. If no task can be -> balanced to dst_cpu we'll try to balance the task to the new_dst_cpu -> instead. In this case when the migration disabled task is not on CPU it -> only allows to run on its current CPU, load balance will select its -> current CPU as new_dst_cpu and later triggers the the warning above. -> - -**[v1: sched/deadline: simplify dl_bw_cpus() using cpumask_weight_and()](http://lore.kernel.org/lkml/20230522115605.1238227-1-linmiaohe@huawei.com/)** - -> cpumask_weight_and() can be used to count of bits both in rd->span and -> cpu_active_mask. No functional change intended. -> - -#### 内存管理 - -**[v1: Do not print page type when the page has no type](http://lore.kernel.org/linux-mm/ZHI0YKzZADjr1nyq@casper.infradead.org/)** - -> It is confusing and unnecessary to print the page type when the -> page has no type. -> - -**[v4: block: Make old dio use iov_iter_extract_pages() and page pinning](http://lore.kernel.org/linux-mm/20230526214142.958751-1-dhowells@redhat.com/)** - -> Here are three patches that go on top of the similar patches for bio -> structs now in the block tree that make the old block direct-IO code use -> iov_iter_extract_pages() and page pinning. -> - -**[v1: tmpfs.5: extend with new noswap documentation](http://lore.kernel.org/linux-mm/20230526210703.934922-1-mcgrof@kernel.org/)** - -> Linux commit 2c6efe9cf2d7 ("shmem: add support to ignore swap") -> merged as of v6.4 added support to disable swap for tmpfs mounts. -> -> This extends the man page to document that. -> - -**[v3: mm: zswap: shrink until can accept](http://lore.kernel.org/linux-mm/20230526183227.793977-1-cerasuolodomenico@gmail.com/)** - -> This update addresses an issue with the zswap reclaim mechanism, which -> hinders the efficient offloading of cold pages to disk, thereby -> compromising the preservation of the LRU order and consequently -> diminishing, if not inverting, its performance benefits. -> - -**[v1: net-next: crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES)](http://lore.kernel.org/linux-mm/20230526143104.882842-1-dhowells@redhat.com/)** - -> Here's the fourth tranche of patches towards providing a MSG_SPLICE_PAGES -> internal sendmsg flag that is intended to replace the ->sendpage() op with -> calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol -> that it should splice the pages supplied if it can. -> - -**[v2: -next: memblock: unify memblock dump and debugfs show](http://lore.kernel.org/linux-mm/20230526120505.123693-1-wangkefeng.wang@huawei.com/)** - -> There are two interfaces to show the memblock information, memblock_dump_all() -> and /sys/kernel/debug/memblock/, but the content is displayed separately, -> let's unify them in case of more different changes over time. -> - -**[v2: add support for blocksize > PAGE_SIZE](http://lore.kernel.org/linux-mm/20230526075552.363524-1-mcgrof@kernel.org/)** - -> This is an initial attempt to add support for block size > PAGE_SIZE for tmpfs. -> Why would you want this? It helps us experiment with higher order folio uses -> with fs APIS and helps us test out corner cases which would likely need -> to be accounted for sooner or later if and when filesystems enable support -> for this. Better review early and burn early than continue on in the wrong -> direction so looking for early feedback. -> - -**[v2: block: simplify with PAGE_SECTORS_SHIFT](http://lore.kernel.org/linux-mm/20230526073336.344543-1-mcgrof@kernel.org/)** - -> A bit of block drivers have their own incantations with -> PAGE_SHIFT - SECTOR_SHIFT. Just simplfy and use PAGE_SECTORS_SHIFT -> all over. -> -> Based on linux-next next-20230525. -> - -**[v2: x86/mce: set MCE_IN_KERNEL_COPYIN for all MC-Safe Copy](http://lore.kernel.org/linux-mm/20230526063242.133656-1-wangkefeng.wang@huawei.com/)** - -> Both EX_TYPE_FAULT_MCE_SAFE and EX_TYPE_DEFAULT_MCE_SAFE exception -> fixup types are used to identify fixups which allow in kernel #MC -> recovery, that is the Machine Check Safe Copy. -> - -**[v4: mm, compaction: Skip all non-migratable pages during scan](http://lore.kernel.org/linux-mm/20230525191507.160076-1-khalid.aziz@oracle.com/)** - -> Pages pinned in memory through extra refcounts can not be migrated. -> Currently as isolate_migratepages_block() scans pages for -> compaction, it skips any pinned anonymous pages. All non-migratable -> pages should be skipped and not just the anonymous pinned pages. -> - -**[v16: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-mm/20230525085517.281529-1-usama.anjum@collabora.com/)** - -> This syscall is used in Windows applications and games etc. This syscall is -> being emulated in pretty slow manner in userspace. Our purpose is to -> enhance the kernel such that we translate it efficiently in a better way. -> Currently some out of tree hack patches are being used to efficiently -> emulate it in some kernels. We intend to replace those with these patches. -> So the whole gaming on Linux can effectively get benefit from this. It -> means there would be tons of users of this code. -> - -**[v1: zonefs: Call zonefs_io_error() on any error from filemap_splice_read()](http://lore.kernel.org/linux-mm/3788353.1685003937@warthog.procyon.org.uk/)** - -> Call zonefs_io_error() after getting any error from filemap_splice_read() -> in zonefs_file_splice_read(), including non-fatal errors such as ENOMEM, -> EINTR and EAGAIN. -> - -**[v1: mm/memcontrol: export memcg.swap watermark via sysfs for v2 memcg](http://lore.kernel.org/linux-mm/20230524181734.125696-1-lars@pixar.com/)** - -> This patch is similar to commit 8e20d4b33266 ("mm/memcontrol: export -> memcg->watermark via sysfs for v2 memcg"), but exports the swap counter's -> watermark. -> - -**[v5: mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8](http://lore.kernel.org/linux-mm/20230524171904.3967031-1-catalin.marinas@arm.com/)** - -> Another version of the series reducing the kmalloc() minimum alignment -> on arm64 to 8 (from 128). Other architectures can easily opt in by -> defining ARCH_KMALLOC_MINALIGN as 8 and selecting -> DMA_BOUNCE_UNALIGNED_KMALLOC. -> - -**[v1: net-next: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 3](http://lore.kernel.org/linux-mm/20230524153311.3625329-1-dhowells@redhat.com/)** - -> Here's the third tranche of patches towards providing a MSG_SPLICE_PAGES -> internal sendmsg flag that is intended to replace the ->sendpage() op with -> calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol -> that it should splice the pages supplied if it can and copy them if not. -> - -**[v3: Optimize mremap during mutual alignment within PMD](http://lore.kernel.org/linux-mm/20230524153239.3036507-1-joel@joelfernandes.org/)** - -> The main changes are: -> 1. Care to be taken to move purely within a VMA, in other words this check -> in call_align_down(): -> if (vma->vm_start <= addr_masked) -> return false; -> -> As an example of why this is needed: -> Consider the following range which is 2MB aligned and is -> a part of a larger 10MB range which is not shown. Each -> character is 256KB below making the source and destination -> 2MB each. The lower case letters are moved (s to d) and the -> upper case letters are not moved. -> - -**[v1: mm/slab: add new flag SLAB_NO_MERGE to avoid merging per slab](http://lore.kernel.org/linux-mm/20230524101748.30714-1-dsterba@suse.com/)** - -> Add a flag that allows to disable merging per slab. This can be used for -> more fine grained control over the caches or for debugging builds where -> separate slabs can verify that no objects leak. -> -> The slab_nomerge boot option is too coarse and would need to be enabled -> on all testing hosts. There are some other ways how to disable merging, -> e.g. a slab constructor but this disables poisoning besides that it adds -> additional overhead. Other flags are internal and may have other -> semantics. -> - -**[v1: mm: deduct the number of pages reclaimed by madvise from workingset](http://lore.kernel.org/linux-mm/1684919574-28368-1-git-send-email-zhaoyang.huang@unisoc.com/)** - -> The pages reclaimed by madvise_pageout are made of inactive and dropped from LRU -> forcefully, which lead to the coming up refault pages possess a large refault -> distance than it should be. These could affect the accuracy of thrashing when -> madvise_pageout is used as a common way of memory reclaiming as ANDROID does now. -> - -**[v4: net-next/mm: page_pool: new approach for leak detection and shutdown phase](http://lore.kernel.org/linux-mm/168485351546.2849279.13771638045665633339.stgit@firesoul/)** - -> Patchset change summary: -> - Remove PP workqueue and inflight warnings, instead rely on inflight -> pages to trigger cleanup -> - Moves leak detection to the MM-layer page allocator when combined -> with CONFIG_DEBUG_VM. -> - -**[v1: mm/slab: rename CONFIG_SLAB to CONFIG_SLAB_DEPRECATED](http://lore.kernel.org/linux-mm/20230523091139.21449-1-vbabka@suse.cz/)** - -> As discussed at LSF/MM [1] [2] and with no objections raised there, -> deprecate the SLAB allocator. Rename the user-visible option so that -> users with CONFIG_SLAB=y get a new prompt with explanation during make -> oldconfig, while make olddefconfig will just switch to SLUB. -> - -#### 文件系统 - -**[v1: NFSD: recall write delegation on GETATTR conflict](http://lore.kernel.org/linux-fsdevel/1685122722-18287-1-git-send-email-dai.ngo@oracle.com/)** - -> This patch series adds the recall of write delegation when there is -> conflict with a GETATTR and a counter in /proc/net/rpc/nfsd to keep -> count of this recall. -> - -**[v1: init: Add support for rootwait timeout parameter](http://lore.kernel.org/linux-fsdevel/20230526130716.2932507-1-loic.poulain@linaro.org/)** - -> Add an optional timeout arg to 'rootwait' as the maximum time in -> seconds to wait for the root device to show up before attempting -> forced mount of the root filesystem. -> -> This can be helpful to force boot failure and restart in case the -> root device does not show up in time, allowing the bootloader to -> take any appropriate measures (e.g. recovery, A/B switch, retry...). -> - -**[v1: block layer patches for bcachefs](http://lore.kernel.org/linux-fsdevel/20230525214822.2725616-1-kent.overstreet@linux.dev/)** - -> Jens, here's the full series of block layer patches needed for bcachefs: -> -> Some of these (added exports, zero_fill_bio_iter?) can probably go with -> the bcachefs pull and I'm just including here for completeness. The main -> ones are the bio_iter patches, and the __invalidate_super() patch. -> - -**[v2: Add support for Vendor Defined Error Types in Einj Module](http://lore.kernel.org/linux-fsdevel/20230525204422.4754-1-Avadhut.Naik@amd.com/)** - -> This patchset adds support for Vendor Defined Error types in the einj -> module by exporting a binary blob file in module's debugfs directory. -> Userspace tools can write OEM Defined Structures into the blob file as -> part of injecting Vendor defined errors. -> - -**[v1: multiblock allocator improvements](http://lore.kernel.org/linux-fsdevel/cover.1685009579.git.ojaswin@linux.ibm.com/)** - -> So this patch was intended to remove a dead if-condition but it was not -> actually dead code and removing it was causing a performance regression. -> Unfortunately I somehow missed that when I was reviewing his patchset -> and it already went in so I had to revert the commit. I've added details -> of the regression and root cause in the revert commit. Also attaching -> the performance numbers I observer: -> - -**[v4: bpf-next: Add O_PATH-based BPF_OBJ_PIN and BPF_OBJ_GET support](http://lore.kernel.org/linux-fsdevel/20230523170013.728457-1-andrii@kernel.org/)** - -> This feature is inspired as a result of recent conversations during -> LSF/MM/BPF 2023 conference about shortcomings of being able to perform BPF -> objects pinning only using lookup-based paths. -> - -**[v1: fs: use UB-safe check for signed addition overflow in remap_verify_area](http://lore.kernel.org/linux-fsdevel/20230523162628.17071-1-dsterba@suse.com/)** - -> As loff_t is a signed type, we should use the safe overflow checks -> instead of relying on compiler implementation. -> -> The bogus values are intentional and the test is supposed to verify the -> boundary conditions. -> - -**[v3: arch: Make virt_to_pfn into a static inline](http://lore.kernel.org/linux-fsdevel/20230503-virt-to-pfn-v6-4-rc1-v3-0-a16c19c03583@linaro.org/)** - -> This is an attempt to harden the typing on virt_to_pfn() -> and pfn_to_virt(). -> -> Making virt_to_pfn() a static inline taking a strongly typed -> (const void *) makes the contract of a passing a pointer of that -> type to the function explicit and exposes any misuse of the -> macro virt_to_pfn() acting polymorphic and accepting many types -> such as (void *), (unitptr_t) or (unsigned long) as arguments -> without warnings. -> - -**[v21: block: Use page pinning](http://lore.kernel.org/linux-fsdevel/20230522205744.2825689-1-dhowells@redhat.com/)** - -> This patchset rolls page-pinning out to the bio struct and the block layer, -> using iov_iter_extract_pages() to get pages and noting with BIO_PAGE_PINNED -> if the data pages attached to a bio are pinned. If the data pages come -> from a non-user-backed iterator, then the pages are left unpinned and -> unref'd, relying on whoever set up the I/O to do the retaining. -> - -#### 网络设备 - -**[v1: wifi: rsi: Do not configure WoWlan in shutdown hook if not enabled](http://lore.kernel.org/netdev/20230527222833.273741-1-marex@denx.de/)** - -> In case WoWlan was never configured during the operation of the system, -> the hw->wiphy->wowlan_config will be NULL. rsi_config_wowlan() checks -> whether wowlan_config is non-NULL and if it is not, then WARNs about it. -> The warning is valid, as during normal operation the rsi_config_wowlan() -> should only ever be called with non-NULL wowlan_config. In shutdown this -> rsi_config_wowlan() should only ever be called if WoWlan was configured -> before by the user. -> - -**[v1: net: ipa: Use the correct value for IPA_STATUS_SIZE](http://lore.kernel.org/netdev/7ae8af63b1254ab51d45c870e7942f0e3dc15b1e.camel@web.de/)** - -> commit b8dc7d0eea5a7709bb534f1b3ca70d2d7de0b42c introduced -> IPA_STATUS_SIZE as a replacement for the size of the removed struct -> ipa_status. sizeof(struct ipa_status) was sizeof(__le32[8]), use this -> as IPA_STATUS_SIZE. -> - -**[v1: net-next: liquidio: Use vzalloc()](http://lore.kernel.org/netdev/93b010824d9d92376e8d49b9eb396a0fa0c0ac80.1685216322.git.christophe.jaillet@wanadoo.fr/)** - -> Use vzalloc() instead of hand writing it with vmalloc()+memset(). -> This is less verbose. -> - -**[v2: net-next: net: dsa: mv88e6xxx: implement USXGMII mode for mv88e6393x](http://lore.kernel.org/netdev/20230527172024.9154-1-michal.smulski@ooma.com/)** - -> Enable USXGMII mode for mv88e6393x chips. Tested on Marvell 88E6191X. -> - -**[v2: net-next: netlink: specs: add ynl spec for ovs_flow](http://lore.kernel.org/netdev/20230527133107.68161-1-donald.hunter@gmail.com/)** - -> Add a ynl specification for ovs_flow. The spec is sufficient to dump ovs -> flows but some attrs have been left as binary blobs because ynl doesn't -> support C arrays in struct definitions yet. -> - -**[v1: net-next: net: phy: smsc: add WoL support to LAN8740/LAN8742 PHYs.](http://lore.kernel.org/netdev/1685151574-2752-1-git-send-email-Tristram.Ha@microchip.com/)** - -> Microchip LAN8740/LAN8742 PHYs support basic unicast, broadcast, and -> Magic Packet WoL. They have one pattern filter matching up to 128 bytes -> of frame data, which can be used to implement ARP or multicast WoL. -> - -**[v1: net: netlink: specs: correct types of legacy arrays](http://lore.kernel.org/netdev/20230526220653.65538-1-kuba@kernel.org/)** - -> ethtool has some attrs which dump multiple scalars into -> an attribute. The spec currently expects one attr per entry. -> - -**[v4: iproute2: vxlan: option printing](http://lore.kernel.org/netdev/20230526174141.5972-1-stephen@networkplumber.org/)** - -> This patchset makes printing of vxlan details more consistent. -> It also adds extra verbose output. The boolean options -> are now brinted after all the non-boolean options. -> - -**[v1: net: tcp: deny tcp_disconnect() when threads are waiting](http://lore.kernel.org/netdev/20230526163458.2880232-1-edumazet@google.com/)** - -> Historically connect(AF_UNSPEC) has been abused by syzkaller -> and other fuzzers to trigger various bugs. -> -> A recent one triggers a divide-by-zero [1], and Paolo Abeni -> was able to diagnose the issue. -> - -**[v1: net: af_packet: do not use READ_ONCE() in packet_bind()](http://lore.kernel.org/netdev/20230526162320.5816-1-kuniyu@amazon.com/)** - -> Date: Fri, 26 May 2023 15:43:42 +0000 -> > A recent patch added READ_ONCE() in packet_bind() and packet_bind_spkt() -> > -> > This is better handled by reading pkt_sk(sk)->num later -> > in packet_do_bind() while appropriate lock is held. -> > -> > READ_ONCE() in writers are often an evidence of something being wrong. -> > -> > Fixes: 822b5a1c17df ("af_packet: Fix data-races of pkt_sk(sk)->num.") -> > -> - -**[v2: iproute2: Add ability to specify eBPF map pin path](http://lore.kernel.org/netdev/20230526150921.338906-1-mtottenh@akamai.com/)** - -> We have a use case where we have several different applications composed of -> sets of eBPF programs (programs that may be attached at the TC/XDP layers), -> that need to share maps and not conflict with each other. -> - -**[v1: net: usb: qmi_wwan: Set DTR quirk for BroadMobi BM818](http://lore.kernel.org/netdev/20230526-bm818-dtr-v1-1-64bbfa6ba8af@puri.sm/)** - -> BM818 is based on Qualcomm MDM9607 chipset. -> - -**[v1: net-next: devlink: Spelling corrections](http://lore.kernel.org/netdev/20230526-devlink-spelling-v1-1-9a3e36cdebc8@kernel.org/)** - -> Make some minor spelling corrections in comments. -> -> Found by inspection. -> - -**[v1: bpf: netfilter: add BPF_NETFILTER bpf_attach_type](http://lore.kernel.org/netdev/20230526121124.3915-1-fw@strlen.de/)** - -> Andrii Nakryiko writes: -> -> And we currently don't have an attach type for NETLINK BPF link. -> Thankfully it's not too late to add it. I see that link_create() in -> kernel/bpf/syscall.c just bypasses attach_type check. We shouldn't -> have done that. Instead we need to add BPF_NETLINK attach type to enum -> bpf_attach_type. And wire all that properly throughout the kernel and -> libbpf itself. -> - -**[v1: net-next: net: dpaa2-mac: use correct interface to free mdiodev](http://lore.kernel.org/netdev/E1q2VsB-008QlZ-El@rmk-PC.armlinux.org.uk/)** - -> Rather than using put_device(&mdiodev->dev), use the proper interface -> provided to dispose of the mdiodev - that being mdio_device_free(). -> - -**[v1: net: rxrpc: Truncate UTS_RELEASE for rxrpc version](http://lore.kernel.org/netdev/654974.1685100894@warthog.procyon.org.uk/)** - -> UTS_RELEASE has a maximum length of 64 which can cause rxrpc_version to -> exceed the 65 byte message limit. -> -> Per the rx spec[1]: "If a server receives a packet with a type value of 13, -> and the client-initiated flag set, it should respond with a 65-byte payload -> containing a string that identifies the version of AFS software it is -> running." -> - -**[v1: net-next: net: pcs: add helpers to xpcs and lynx to manage mdiodev](http://lore.kernel.org/netdev/ZHCGZ8IgAAwr8bla@shell.armlinux.org.uk/)** - -> This morning, we have had two instances where the destruction of the -> MDIO device associated with XPCS and Lynx has been wrong. Rather than -> allowing this pattern of errors to continue, let's make it easier for -> driver authors to get this right by adding a helper. -> - -**[v2: net/sched: act_pedit: Parse L3 Header for L4 offset](http://lore.kernel.org/netdev/20230526095810.280474-1-mtottenh@akamai.com/)** - -> Instead of relying on skb->transport_header being set correctly, opt -> instead to parse the L3 header length out of the L3 headers for both -> IPv4/IPv6 when the Extended Layer Op for tcp/udp is used. This fixes a -> bug if GRO is disabled, when GRO is disabled skb->transport_header is -> set by __netif_receive_skb_core() to point to the L3 header, it's later -> fixed by the upper protocol layers, but act_pedit will receive the SKB -> before the fixups are completed. -> - -**[v1: net-next: support non-frag page for page_pool_alloc_frag()](http://lore.kernel.org/netdev/20230526092616.40355-1-linyunsheng@huawei.com/)** - -> In [1], there is a use case to use frag support in page -> pool to reduce memory usage, and it may request different -> frag size depending on the head/tail room space for -> xdp_frame/shinfo and mtu/packet size. When the requested -> frag size is large enough that a single page can not be -> split into more than one frag, using frag support only -> have performance penalty because of the extra frag count -> handling for frag support. -> - -**[v3: Add motorcomm phy pad-driver-strength-cfg support](http://lore.kernel.org/netdev/20230526090502.29835-1-samin.guo@starfivetech.com/)** - -> The motorcomm phy (YT8531) supports the ability to adjust the drive -> strength of the rx_clk/rx_data, and the default strength may not be -> suitable for all boards. So add configurable options to better match -> the boards.(e.g. StarFive VisionFive 2) -> -> The first patch adds a description of dt-bingding, and the second patch adds -> YT8531's parsing and settings for pad-driver-strength-cfg. -> - -**[v7: net-next: Wangxun netdev features support](http://lore.kernel.org/netdev/20230526090230.71487-1-mengyuanlou@net-swift.com/)** - -> Implement tx_csum and rx_csum to support hardware checksum offload. -> Implement ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid. -> Implement ndo_set_features. -> Enable macros in netdev features which wangxun can support. -> - -**[v3: hv_netvsc: Allocate rx indirection table size dynamically](http://lore.kernel.org/netdev/1685080949-18316-1-git-send-email-shradhagupta@linux.microsoft.com/)** - -> Allocate the size of rx indirection table dynamically in netvsc -> from the value of size provided by OID_GEN_RECEIVE_SCALE_CAPABILITIES -> query instead of using a constant value of ITAB_NUM. -> - -**[v1: Truncate UTS_RELEASE for rxrpc version](http://lore.kernel.org/netdev/20230525211346.718562-1-Kenny.Ho@amd.com/)** - -> UTS_RELEASE has maximum length of 64 which can cause rxrpc_version to -> exceed the 65 byte message limit. -> -> Per https://web.mit.edu/kolya/afs/rx/rx-spec -> "If a server receives a packet with a type value of 13, and the -> client-initiated flag set, it should respond with a 65-byte payload -> containing a string that identifies the version of AFS software it is -> running." -> - -#### 安全增强 - -**[v2: checkpatch: Check for strcpy and strncpy too](http://lore.kernel.org/linux-hardening/20230526172508.gonna.793-kees@kernel.org/)** - -> Warn about strcpy(), strncpy(), and strlcpy(). Suggest strscpy() and -> include pointers to the open KSPP issues for each, which has further -> details and replacement procedures. -> - -**[v2: leds: as3645a: Replace strlcpy with strscpy](http://lore.kernel.org/linux-hardening/20230524144824.2360607-1-azeemshaikh38@gmail.com/)** - -> Part of a tree-wide effort to remove deprecated strlcpy()[1] and replace -> it with strscpy()[2]. No return values were used, so direct replacement -> is safe. -> - -**[v1: next: nfsd: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/ZG1d51tGG4c97qqb@work/)** - -> One-element arrays are deprecated, and we are replacing them with -> flexible array members instead. So, replace a one-element array -> with a flexible-arrayº member in struct vbi_anc_data and refactor -> the rest of the code, accordingly. -> - -**[v1: next: media: pci: cx18-av-vbi: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/ZG1YVji9thTLWeRm@work/)** - -> One-element arrays are deprecated, and we are replacing them with flexible -> array members instead. So, replace one-element arrays with flexible-array -> members in struct vbi_anc_data. -> - -**[v2: next: scsi: lpfc: Use struct_size() helper](http://lore.kernel.org/linux-hardening/ZG0fDdY%2FPPQ%2Fijlt@work/)** - -> Prefer struct_size() over open-coded versions of idiom: -> -> sizeof(struct-with-flex-array) + sizeof(typeof-flex-array-elements) * count -> - -**[v2: fscrypt: Replace 1-element array with flexible array](http://lore.kernel.org/linux-hardening/20230523165458.gonna.580-kees@kernel.org/)** - -> 1-element arrays are deprecated and are being replaced with C99 -> flexible arrays[1]. -> -> As sizes were being calculated with the extra byte intentionally, -> propagate the difference so there is no change in binary output. -> -> [1] https://github.com/KSPP/linux/issues/79 -> - -**[v1: next: vfio/ccw: Use struct_size() helper](http://lore.kernel.org/linux-hardening/f657276073630e806e69726a40ad1cc85101448a.1684805398.git.gustavoars@kernel.org/)** - -> Prefer struct_size() over open-coded versions. -> - -**[v1: next: vfio/ccw: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/3c10549ebe1564eade68a2515bde233527376971.1684805398.git.gustavoars@kernel.org/)** - -> One-element arrays are deprecated, and we are replacing them with flexible -> array members instead. So, replace one-element array with flexible-array -> member in struct vfio_ccw_parent and refactor the the rest of the code -> accordingly. -> - -**[v1: lkdtm/bugs: Switch from 1-element array to flexible array](http://lore.kernel.org/linux-hardening/20230522212949.never.283-kees@kernel.org/)** - -> The testing for ARRAY_BOUNDS just wants an uninstrumented array, -> and the proper flexible array definition is fine for that. -> - -**[v2: md/raid5: Convert stripe_head's "dev" to flexible array member](http://lore.kernel.org/linux-hardening/20230522212114.gonna.589-kees@kernel.org/)** - -> Replace old-style 1-element array of "dev" in struct stripe_head with -> modern C99 flexible array. In the future, we can additionally annotate -> it with the run-time size, found in the "disks" member. -> - -**[v1: overflow: Add struct_size_t() helper](http://lore.kernel.org/linux-hardening/20230522211810.never.421-kees@kernel.org/)** - -> While struct_size() is normally used in situations where the structure -> type already has a pointer instance, there are places where no variable -> is available. In the past, this has been worked around by using a typed -> NULL first argument, but this is a bit ugly. Add a helper to do this, -> and replace the handful of instances of the code pattern with it. -> - -#### 异步 IO - -**[v2: io_uring: unlock sqd->lock before sq thread release CPU](http://lore.kernel.org/io-uring/20230525082626.577862-1-wenwen.chen@samsung.com/)** - -> The sq thread actively releases CPU resources by calling the -> cond_resched() and schedule() interfaces when it is idle. Therefore, -> more resources are available for other threads to run. -> - -#### Rust For Linux - -**[v2: scripts: read cfgs from Makefile for rust-analyzer](http://lore.kernel.org/rust-for-linux/20230520231701.46008-1-yakoyoku@gmail.com/)** - -> Both `core` and `alloc` had their `cfgs` missing in `rust-project.json`, -> to remedy this `generate_rust_analyzer.py` scans the Makefile from -> inside the `rust` directory for them to be added to a dictionary that -> each key corresponds to a crate and each value, to an array of `cfgs`. -> - -#### BPF - -**[v1: bpf-next: bpf: replace open code with for allocated object check](http://lore.kernel.org/bpf/20230527122706.59315-1-danieltimlee@gmail.com/)** - -> From commit 282de143ead9 ("bpf: Introduce allocated objects support"), -> With this allocated object with BPF program, (PTR_TO_BTF_ID | MEM_ALLOC) -> has been a way of indicating to check the type is the allocated object. -> - -**[v1: bpf-next: bpf, vmtest: Build test_progs and friends as statically linked](http://lore.kernel.org/bpf/05b5dd79465be41ff8cf8b56b694118a0aa7ae12.1685140942.git.daniel@iogearbox.net/)** - -> With the specified TRUNNER_LDFLAGS out of vmtest to force static linking -> runners like test_progs/test_maps/etc work just fine. -> - -**[v6: RESEND: libbpf: kprobe.multi: Filter with available_filter_functions](http://lore.kernel.org/bpf/20230526155026.1419390-1-liu.yun@linux.dev/)** - -> When using regular expression matching with "kprobe multi", it scans all -> the functions under "/proc/kallsyms" that can be matched. However, not all -> of them can be traced by kprobe.multi. If any one of the functions fails -> to be traced, it will result in the failure of all functions. The best -> approach is to filter out the functions that cannot be traced to ensure -> proper tracking of the functions. -> - -**[v5: libbpf: kprobe.multi: Filter with available_filter_functions](http://lore.kernel.org/bpf/20230526122053.1373871-1-liu.yun@linux.dev/)** - -> When using regular expression matching with "kprobe multi", it scans all -> the functions under "/proc/kallsyms" that can be matched. However, not all -> of them can be traced by kprobe.multi. If any one of the functions fails -> to be traced, it will result in the failure of all functions. The best -> approach is to filter out the functions that cannot be traced to ensure -> proper tracking of the functions. -> - -**[v1: Type aware module allocator](http://lore.kernel.org/bpf/20230526051529.3387103-1-song@kernel.org/)** - -> This set implements the second part of module type aware allocator -> (module_alloc_type), which was discussed in [1]. This part contains the -> interface of the new allocator, as well as changes in x86 code to use the -> new allocator (modules, BPF, ftrace, kprobe). -> - -**[v1: dwarves: pahole: avoid adding same struct structure to two rb trees](http://lore.kernel.org/bpf/20230525235949.2978377-1-eddyz87@gmail.com/)** - -> This commit modifies resort_classes() to re-use 'structures__tree' and -> to reset 'rb_node' fields before adding structure instances to the -> tree for a second time. -> - -**[v1: bpf-next: selftests/bpf: Check whether to run selftest](http://lore.kernel.org/bpf/20230525232248.640465-1-deso@posteo.net/)** - -> The sockopt test invokes test__start_subtest and then unconditionally -> asserts the success. That means that even if deny-listed, any test will -> still run and potentially fail. -> Evaluate the return value of test__start_subtest() to achieve the -> desired behavior, as other tests do. -> - -**[v1: bpf: utilize table ID in bpf_fib_lookup helper](http://lore.kernel.org/bpf/20230505-bpf-add-tbid-fib-lookup-v1-0-fd99f7162e76@gmail.com/)** - -> This patchset adds the ability to specify a table ID to the -> `bpf_fib_lookup` BPF helper. -> -> A new `tbid` field is added to `struct fib_bpf_lookup`. -> When the `fib_bpf_lookup` helper is called with the -> `BPF_FIB_LOOKUP_DIRECT` flag and the `tbid` is set to an integer greater -> then 0, the `tbid` field will be interpreted as the table ID to use for -> the fib lookup. -> - -**[v1: bpf-next: libbpf: add netfilter link attach helper](http://lore.kernel.org/bpf/20230525110100.8212-1-fw@strlen.de/)** - -> When initial netfilter bpf program type support got added one -> suggestion was to extend libbpf with a helper to ease attachment -> of nf programs to the hook locations. -> -> Add such a helper and a demo test case that attaches a dummy -> program to various combinations. -> - -**[v1: bpf-next: bpf: Export rx queue info for reuseport ebpf prog](http://lore.kernel.org/bpf/20230525033757.47483-1-jdamato@fastly.com/)** - -> BPF_PROG_TYPE_SK_REUSEPORT / sk_reuseport ebpf programs do not have -> access to the queue_mapping or napi_id of the incoming skb. Having -> this information can help ebpf progs determine which listen socket to -> select. -> - -**[v1: bpf-next: libbpf: change var type in datasec resize func](http://lore.kernel.org/bpf/20230525001323.8554-1-inwardvessel@gmail.com/)** - -> This changes a local variable type that stores a new array id to match -> the return type of btf__add_array(). -> - -**[v1: bpf-next: Relax checks for unprivileged bpf() commands](http://lore.kernel.org/bpf/20230524225421.1587859-1-andrii@kernel.org/)** - -> During last relaxation of bpf syscall's capabilities checks ([0]), the model -> of FD-based ownership was established: if process through whatever means got -> FD for some BPF object (map, prog, etc), it should be able to perform -> operations on this object without extra CAP_SYS_ADMIN or CAP_BPF capabilities. -> - -**[v1: bpf-next: Revamp bpf_attr and make it easier to evolve](http://lore.kernel.org/bpf/20230524210243.605832-1-andrii@kernel.org/)** - -> RFC patch set revamping anonymous substructs of union bpf_attr, which would -> allow nicer and more coherent evolution of bpf() syscall arguments, especially -> for commands like BPF_MAP_CREATE and BPF_PROG_LOAD. See patch #1 for -> justification and more details. Patch #2 demonstrates how straightforward it -> is to switch to new-style substricts in kernel code (and keep in mind that -> this is optional until we need some new field for a given command, so we can -> do it completely asynchronously from landing bpf_attr changes themselves). -> Patch #3 shows also similar libbpf changes, except for libbpf single patches -> switches over entire libbpf code base to new-style substructs (except -> skel_internal.h, due to concerns that users might be reliant on outdated -> system-wide linux/bpf.h UAPI header). -> - -**[v3: bpf-next: libbpf: capability for resizing datasec maps](http://lore.kernel.org/bpf/20230524004537.18614-1-inwardvessel@gmail.com/)** - -> Due to the way the datasec maps like bss, data, rodata are memory -> mapped, they cannot be resized with bpf_map__set_value_size() like -> non-datasec maps can. This series offers a way to allow the resizing of -> datasec maps, by having the mapped regions resized as needed and also -> adjusting associated BTF info if possible. -> - -**[v3: dwarves: Support for new btf_type_tag encoding](http://lore.kernel.org/bpf/20230524001825.2688661-1-eddyz87@gmail.com/)** - -> In recent discussion in BPF mailing list ([1], look for Solution #2) -> participants agreed to add a new DWARF representation for -> "btf_type_tag" annotations. -> -> Existing representation is DW_TAG_LLVM_annotation object attached as a -> child to a DW_TAG_pointer_type. It means that "btf_type_tag" -> annotation is attached to a pointee type. -> - -**[v1: libbpf: kprobe.multi: Filter with blacklist and available_filter_functions](http://lore.kernel.org/bpf/20230523132547.94384-1-liu.yun@linux.dev/)** - -> When using regular expression matching with "kprobe multi", it scans all -> the functions under "/proc/kallsyms" that can be matched. However, not all -> of them can be traced by kprobe.multi. If any one of the functions fails -> to be traced, it will result in the failure of all functions. The best -> approach is to filter out the functions that cannot be traced to ensure -> proper tracking of the functions. -> - -**[v1: Bring back vmlinux.h generation](http://lore.kernel.org/bpf/20230522204047.800543-1-irogers@google.com/)** - -> Commit 760ebc45746b ("perf lock contention: Add empty 'struct rq' to -> satisfy libbpf 'runqueue' type verification") inadvertently created a -> declaration of 'struct rq' that conflicted with a generated -> vmlinux.h's: -> -> Fix the issue by moving the declaration to vmlinux.h. So this can't -> happen again, bring back build support for generating vmlinux.h then -> add build tests. -> - -### 周边技术动态 - -#### Qemu - -**[v2: target/riscv: Add RISC-V Virtual IRQs and IRQ filtering support](http://lore.kernel.org/qemu-devel/20230526162308.22892-1-rkanwal@rivosinc.com/)** - -> This series adds M and HS-mode virtual interrupt and IRQ filtering support. -> This allows inserting virtual interrupts from M/HS-mode into S/VS-mode -> using mvien/hvien and mvip/hvip csrs. IRQ filtering is a use case of -> this change, i-e M-mode can stop delegating an interrupt to S-mode and -> instead enable it in MIE and receive those interrupts in M-mode and then -> selectively inject the interrupt using mvien and mvip. -> - -**[v5: hw/riscv/virt: pflash improvements](http://lore.kernel.org/qemu-devel/20230526121006.76388-1-sunilvl@ventanamicro.com/)** - -> This series improves the pflash usage in RISC-V virt machine with solutions to -> below issues. -> -> 1) Currently the first pflash is reserved for ROM/M-mode firmware code. But S-mode -> payload firmware like EDK2 need both pflash devices to have separate code and variable -> store so that OS distros can keep the FW code as read-only. -> - -**[v3: target/riscv: Add support for PC-relative translation](http://lore.kernel.org/qemu-devel/20230526072124.298466-1-liweiwei@iscas.ac.cn/)** - -> This patchset tries to add support for PC-relative translation. -> -> The existence of CF_PCREL can improve performance with the guest -> kernel's address space randomization. Each guest process maps libc.so -> (et al) at a different virtual address, and this allows those -> translations to be shared. -> - -**[v3: Add RISC-V KVM AIA Support](http://lore.kernel.org/qemu-devel/20230526062509.31682-1-yongxuan.wang@sifive.com/)** - -> This series adds support for KVM AIA in RISC-V architecture. -> -> In order to test these patches, we require Linux with KVM AIA support which can -> be found in the qemu_kvm_aia branch at https://github.com/yong-xuan/linux.git -> This kernel branch is based on the riscv_aia_v1 branch available at -> https://github.com/avpatel/linux.git, and it also includes two additional -> patches that fix a KVM AIA bug and reply to the query of KVM_CAP_IRQCHIP. -> - -**[v3: hw/riscv: virt: Assume M-mode FW in pflash0 only when "-bios none"](http://lore.kernel.org/qemu-devel/20230523102805.100160-1-sunilvl@ventanamicro.com/)** - -> Currently, virt machine supports two pflash instances each with -> 32MB size. However, the first pflash is always assumed to -> contain M-mode firmware and reset vector is set to this if -> enabled. Hence, for S-mode payloads like EDK2, only one pflash -> instance is available for use. This means both code and NV variables -> of EDK2 will need to use the same pflash. -> - -**[v3: target/riscv: Add Smrnmi support.](http://lore.kernel.org/qemu-devel/20230522131123.3498539-1-tommy.wu@sifive.com/)** - -> This patchset added support for Smrnmi Extension in RISC-V. -> -> RNMI also has higher priority than any other interrupts or exceptions -> and cannot be disabled by software. -> -> RNMI may be used to route to other devices such as Bus Error Unit or -> Watchdog Timer in the future. -> - -#### U-Boot - -**[v1: riscv: Initial support for Lichee PI 4A board](http://lore.kernel.org/u-boot/20230526124107.894-1-dlan@gentoo.org/)** - -> Sipeed's Lichee PI 4A board is based on T-HEAD's TH1520 SoC which consists of -> quad core XuanTie C910 CPU, plus one C906 CPU and one E902 CPU. -> -> In this series, the UART, basic device tree, CPU, PLIC are enabled, making it -> capable of running in serial console mode. -> - -**[v4: Add ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/u-boot/20230525093637.31364-1-yanhong.wang@starfivetech.com/)** - -> This series of patches base on the latest branch/master,and -> adds ethernet support for the StarFive JH7110 RISC-V SoC. -> The series includes EEPROM, PHY and MAC drivers. The PHY model is -> YT8531 (from Motorcomm Inc), and the MAC version is dwmac-5.20 -> (from Synopsys DesignWare). -> - -**[v1: arch: riscv: jh7110: Correctly zero L2 LIM](http://lore.kernel.org/u-boot/1684668616-358043-1-git-send-email-ganboing@gmail.com/)** - -> Background information: -> JH7110 SPL runs in L2 LIM (2M in size mapped at 0x8000000). It -> consists of 16 0x20000 sized regions, each one can be used as -> either L2 cache way or SRAM (not both). From top to bottom, there're -> ways 0-15. The way 0 is always enabled, at most 0x1e0000 can be used. -> - -## 20230521:第 46 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v1: tools/nolibc: autodetect stackprotector availability from compiler](http://lore.kernel.org/linux-riscv/20230521-nolibc-automatic-stack-protector-v1-0-dad6c80c51c1@weissschuh.net/)** - -> As suggested by Willy it is possible to detect the availability of -> stackprotector via preprocessor defines. -> Make use of that to simplify the code and interface of nolibc. -> - -**[v1: RISC-V: KVM: Redirect AMO load/store misaligned traps to guest](http://lore.kernel.org/linux-riscv/20230520150116.7451-1-waylingII@gmail.com/)** - -> The M-mode redirects an unhandled misaligned trap back -> to S-mode when not delegating it to VS-mode(hedeleg). -> However, KVM running in HS-mode terminates the VS-mode -> software when back from M-mode. -> The KVM should redirect the trap back to VS-mode, and -> let VS-mode trap handler decide the next step. -> Here is a way to handle misaligned traps in KVM, -> not only directing them to VS-mode or terminate it. -> - -**[v1: perf parse-regs: Refactor arch related functions](http://lore.kernel.org/linux-riscv/20230520025537.1811986-1-leo.yan@linaro.org/)** - -> The register parsing have two levels: one level is under 'arch' folder, -> another level is under 'util' folder. A good design is 'arch' folder -> handles architecture specific operations and provides APIs for upper -> layer, on the other hand, 'util' folder should be general and simply -> calls APIs to talk to arch layer. -> - -**[v1: riscv: hibernation: Replace jalr with jr before suspend_restore_regs](http://lore.kernel.org/linux-riscv/20230519060854.214138-1-suagrfillet@gmail.com/)** - -> No need to link the x1/ra reg via jalr before suspend_restore_regs -> So it's better to replace jalr with jr. -> - -**[v2: Add Sipeed Lichee Pi 4A RISC-V board support](http://lore.kernel.org/linux-riscv/20230518184541.2627-1-jszhang@kernel.org/)** - -> Sipeed's Lichee Pi 4A development board uses Lichee Module 4A core -> module which is powered by T-HEAD's TH1520 SoC. Add minimal device -> tree files for the core module and the development board. -> - -**[v1: riscv: Allow disable vdso support](http://lore.kernel.org/linux-riscv/cover.1684430522.git.falcon@tinylab.org/)** - -> This is part of my tinylinux work for RISC-V, see related patchsets: -> -> * RISC-V: Enable dead code elimination, v3 [1] -> * tools/nolibc: riscv: Fix up compile error for rv32, v1 [2] -> * Add dead syscalls elimination support, RFC [3] -> - -**[v20: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230518161949.11203-1-andy.chiu@sifive.com/)** - -> This patchset is implemented based on vector 1.0 spec to add vector support -> in riscv Linux kernel. There are some assumptions for this implementations. -> - -**[v4: riscv: add Bouffalolab bl808 support](http://lore.kernel.org/linux-riscv/20230518152244.2178-1-jszhang@kernel.org/)** - -> This series adds Bouffalolab uart driver and basic devicetrees for -> Bouffalolab bl808 SoC and Sipeed M1s dock board. -> - -**[v1: riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode](http://lore.kernel.org/linux-riscv/20230518131013.3366406-1-guoren@kernel.org/)** - -> This patch series adds s64ilp32 support to riscv. The term s64ilp32 -> means smode-xlen=64 and -mabi=ilp32 (ints, longs, and pointers are all -> 32-bit), i.e., running 32-bit Linux kernel on pure 64-bit supervisor -> mode. There have been many 64ilp32 abis existing, such as mips-n32 [1], -> arm-aarch64ilp32 [2], and x86-x32 [3], but they are all about userspace. -> Thus, this should be the first time running a 32-bit Linux kernel with -> the 64ilp32 ABI at supervisor mode (If not, correct me). -> - -**[v18: Microchip Soft IP corePWM driver](http://lore.kernel.org/linux-riscv/20230518-reactive-nursing-23b7fe093048@wendy/)** - -> Another version, although a lot smaller of a range-diff than previously! -> All you get this time is the one change requested by Uwe on v17, along -> with a rebase on -rc1. -> - -**[v6: Add JH7110 USB and USB PHY driver support](http://lore.kernel.org/linux-riscv/20230518112750.57924-1-minda.chen@starfivetech.com/)** - -> This patchset adds USB driver and USB PHY for the StarFive JH7110 SoC. -> USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. -> The patch has been tested on the VisionFive 2 board. -> - -**[v6: Add STG/ISP/VOUT clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230518101234.143748-1-xingyu.wu@starfivetech.com/)** - -> This patch serises are base on the basic JH7110 SYSCRG/AONCRG -> drivers and add new partial clock drivers and reset supports -> about System-Top-Group(STG), Image-Signal-Process(ISP) -> and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. These -> clocks and resets could be used by DMA, VIN and Display modules. -> - -**[v1: dt-bindings: riscv: deprecate riscv,isa](http://lore.kernel.org/linux-riscv/20230518-thermos-sanitary-cf3fbc777ea1@wendy/)** - -> When the RISC-V dt-bindings were accepted upstream in Linux, the base -> ISA etc had yet to be ratified. By the ratification of the base ISA, -> incompatible changes had snuck into the specifications - for example the -> Zicsr and Zifencei extensions were spun out of the base ISA. -> - -**[v1: RISC-V KVM in-kernel AIA irqchip](http://lore.kernel.org/linux-riscv/20230517105135.1871868-1-apatel@ventanamicro.com/)** - -> This series adds in-kernel AIA irqchip which only trap-n-emulate IMSIC and -> APLIC MSI-mode for Guest. The APLIC MSI-mode trap-n-emulate is optional so -> KVM user space can emulate APLIC entirely in user space. -> - -**[v3: RISC-V: Enable dead code elimination](http://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/)** - -> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for RISC-V, allowing -> the user to enable dead code elimination. In order for this to work, -> ensure that we keep the alternative table by annotating them with KEEP. -> - -**[v3: perf vendor events riscv: add T-HEAD C9xx JSON file](http://lore.kernel.org/linux-riscv/IA1PR20MB4953B6C4CB711506CF542737BB7E9@IA1PR20MB4953.namprd20.prod.outlook.com/)** - -> These events are the max that c9xx series support. -> Since T-HEAD let manufacturers decide whether events are usable, -> the final support of the perf events is determined by the pmu node -> of the soc dtb. -> - -**[v1: irq_work: consolidate arch_irq_work_raise prototypes](http://lore.kernel.org/linux-riscv/20230516200341.553413-1-arnd@kernel.org/)** - -> The prototype was hidden on x86, which causes a warning: -> -> kernel/irq_work.c:72:13: error: no previous prototype for 'arch_irq_work_raise' [-Werror=missing-prototypes] -> -> Fix this by providing it in only one place that is always visible. -> - -**[v1: perf: add T-HEAD C9xx series cpu support](http://lore.kernel.org/linux-riscv/IA1PR20MB49539201E93DE46A9A2A8E74BB799@IA1PR20MB4953.namprd20.prod.outlook.com/)** - -> The T-HEAD C9xx series cpu is a series of riscv CPU IP. As this IP was -> proposed before the current riscv event standard. It has a non-standard -> events encoding for perf events and unimplemented MARCH and MIMP CSR. -> This patch add these events to support C9xx cpus. -> - -#### 进程调度 - -**[v1: RESEND: sched/nohz: Add HRTICK_BW for using cfs bandwidth with nohz_full](http://lore.kernel.org/lkml/20230518132038.3534728-1-pauld@redhat.com/)** - -> CFS bandwidth limits and NOHZ full don't play well together. Tasks -> can easily run well past their quotas before a remote tick does -> accounting. This leads to long, multi-period stalls before such -> tasks can run again. Use the hrtick mechanism to set a sched -> tick to fire at remaining_runtime in the future if we are on -> a nohz full cpu, if the task has quota and if we are likely to -> disable the tick (nr_running == 1). This allows for bandwidth -> accounting before tasks go too far over quota. -> - -**[v1: sched: core: Simplify cpuset_cpumask_can_shrink()](http://lore.kernel.org/lkml/20230518203416.3323-1-zeming@nfschina.com/)** - -> Remove useless intermediate variable "ret" and its initialization. -> Directly return dl_cpuset_cpumask_can_shrink() result. -> - -**[v1: sched/rt: Print curr when RT throttling activated](http://lore.kernel.org/lkml/20230516122202.954313-1-alex@shruggie.ro/)** - -> We may meet the issue, that one RT thread occupied the cpu by 950ms/1s, -> The RT thread maybe is a business thread or other unknown thread. -> -> Currently, it only outputs the print "sched: RT throttling activated" -> when RT throttling happen. It is hard to know what is the RT thread, -> For further analysis, we need add more prints. -> - -**[v1: sched/fair: Introduce SIS_PAIR to wakeup task on local idle core first](http://lore.kernel.org/lkml/20230516011159.4552-1-yu.c.chen@intel.com/)** - -> The will-it-scale context_switch1 test case exposes the issue. The -> test platform has 2 x 56C/112T and 224 CPUs in total. To evaluate the -> C2C overhead within 1 LLC, will-it-scale was tested with 1 socket/node -> online, so there are 56C/112T CPUs when running will-it-scale. -> - -**[v3: sched: Consider CPU contention in frequency, EAS max util & load-balance busiest CPU selection](http://lore.kernel.org/lkml/20230515115735.296329-1-dietmar.eggemann@arm.com/)** - -> This is the implementation of the idea to factor in CPU runnable_avg -> into the CPU utilization getter functions (so called 'runnable -> boosting') as a way to consider CPU contention for: -> -> (a) CPU frequency -> (b) EAS' max util and -> (c) 'migrate_util' type load-balance busiest CPU selection. -> - -**[v1: sched/fair: Consider asymmetric scheduler groups in load balancer](http://lore.kernel.org/lkml/20230515114601.12737-1-huschle@linux.ibm.com/)** - -> The current load balancer implementation implies that scheduler groups, -> within the same scheduler domain, all host the same number of CPUs. -> -> This appears to be valid for non-s390 architectures. Nevertheless, s390 -> can actually have scheduler groups of unequal size. -> The current scheduler behavior causes some s390 configs to use SMT -> while some cores are still idle, leading to a performance degredation -> under certain levels of workload. -> - -**[GIT PULL: sched/urgent for v6.4-rc2](http://lore.kernel.org/lkml/20230514115312.GDZGDLqDPvR+M8m+1M@fat_crate.local/)** - -> please pull an urgent (oh well :)) sched fix for 6.4. -> -> Thx. -> - -#### 内存管理 - -**[v21: splice: Kill ITER_PIPE](http://lore.kernel.org/linux-mm/20230520000049.2226926-1-dhowells@redhat.com/)** - -> I've split off splice patchset and moved the block patches to a separate -> branch (though they are dependent on this one). -> -> This patchset kills off ITER_PIPE to avoid a race between truncate, -> iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio with -> unpinned/unref'ed pages from an O_DIRECT splice read. This causes memory -> corruption[2]. Instead, we use filemap_splice_read(), which invokes the -> buffered file reading code and splices from the pagecache into the pipe; -> copy_splice_read(), which bulk-allocates a buffer, reads into it and then -> pushes the filled pages into the pipe; or handle it in filesystem-specific -> code. -> - -**[v2: change ->index to PAGE_SIZE for hugetlb pages](http://lore.kernel.org/linux-mm/20230519220142.212051-1-sidhartha.kumar@oracle.com/)** - -> This patchset adds new wrappers for hugetlb code to to interact with the -> page cache. These wrappers calculate a linear page index as this is now -> what the page cache expects for hugetlb pages as well. -> - -**[v2: Optimize mremap during mutual alignment within PMD](http://lore.kernel.org/linux-mm/20230519190934.339332-1-joel@joelfernandes.org/)** - -> Here is v2 of the mremap start address optimization / fix for exec warning. -> -> 2. Fix issue with bogus return value found by Linus if we broke out of the -> above loop for the first PMD itself. -> - -**[v1: mm: compaction: avoid GFP_NOFS ABBA deadlock](http://lore.kernel.org/linux-mm/20230519111359.40475-1-hannes@cmpxchg.org/)** - -> During stress testing with higher-order allocations, a deadlock -> scenario was observed in compaction: One GFP_NOFS allocation was -> sleeping on mm/compaction.c::too_many_isolated(), while all CPUs in -> the system were busy with compactors spinning on buffer locks held by -> the sleeping GFP_NOFS allocation. -> - -**[v4: memblock: Add flags and nid info in memblock debugfs](http://lore.kernel.org/linux-mm/20230519105321.333-1-ssawgyw@gmail.com/)** - -> Currently, the memblock debugfs can display the count of memblock_type and -> the base and end of the reg. However, when memblock_mark_*() or -> memblock_set_node() is executed on some range, the information in the -> existing debugfs cannot make it clear why the address is not consecutive. -> - -**[v1: mm,page_owner: mark page_owner_threshold helpers as static](http://lore.kernel.org/linux-mm/20230519092800.3772196-1-arnd@kernel.org/)** - -> The newly added functions have no prototype: -> -> mm/page_owner.c:748:5: error: no previous prototype for 'page_owner_threshold_get' [-Werror=missing-prototypes] -> mm/page_owner.c:754:5: error: no previous prototype for 'page_owner_threshold_set' [-Werror=missing-prototypes] -> - -**[v1: iov_iter: Add automatic-alloc for ITER_BVEC and use in direct_splice_read()](http://lore.kernel.org/linux-mm/1740264.1684482558@warthog.procyon.org.uk/)** - -> If it's a problem that direct_splice_read() always allocates as much memory as -> is asked for and that will fit into the pipe when less could be allocated in -> the case that, say, an O_DIRECT-read will hit a hole and do a short read or a -> socket will return less than was asked for, something like the attached -> modification to ITER_BVEC could be made. -> - -**[v4: mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8](http://lore.kernel.org/linux-mm/20230518173403.1150549-1-catalin.marinas@arm.com/)** - -> That's the fourth version of the series reducing the kmalloc() minimum -> alignment on arm64 to 8 (from 128). -> -> The first 10 patches decouple ARCH_KMALLOC_MINALIGN from -> ARCH_DMA_MINALIGN and, for arm64, it limits the kmalloc() caches to -> those aligned to the run-time probed cache_line_size(). The advantage on -> arm64 is that we gain the kmalloc-{64,192} caches. -> - -**[v1: mm: page_alloc: set sysctl_lowmem_reserve_ratio storage-class-specifier to static](http://lore.kernel.org/linux-mm/20230518141119.927074-1-trix@redhat.com/)** - -> smatch reports -> mm/page_alloc.c:247:5: warning: symbol -> 'sysctl_lowmem_reserve_ratio' was not declared. Should it be static? -> -> This variable is only used in its defining file, so it should be static -> - -**[v1: mm/page_owner: set page_owner_* storage-class-specifier to static](http://lore.kernel.org/linux-mm/20230518134718.926663-1-trix@redhat.com/)** - -> smatch reports -> mm/page_owner.c:739:30: warning: symbol -> 'page_owner_stack_operations' was not declared. Should it be static? -> mm/page_owner.c:748:5: warning: symbol -> 'page_owner_threshold_get' was not declared. Should it be static? -> mm/page_owner.c:754:5: warning: symbol -> 'page_owner_threshold_set' was not declared. Should it be static? -> - -**[v9: net-next: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1](http://lore.kernel.org/linux-mm/20230518130713.1515729-1-dhowells@redhat.com/)** - -> Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES -> internal sendmsg flag that is intended to replace the ->sendpage() op with -> calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol -> that it should splice the pages supplied if it can and copy them if not. -> - -#### 文件系统 - -**[v1: Create large folios in iomap buffered write path](http://lore.kernel.org/linux-fsdevel/20230520163603.1794256-1-willy@infradead.org/)** - -> Wang Yugui has a workload which would be improved by using large folios. -> Until now, we've only created large folios in the readahead path, -> but this workload writes without reading. The decision of what size -> folio to create is based purely on the size of the write() call (unlike -> readahead where we keep history and can choose to create larger folios -> based on that history even if individual reads are small). -> - -**[v1: cachefiles: Allow the cache to be non-root](http://lore.kernel.org/linux-fsdevel/1853230.1684516880@warthog.procyon.org.uk/)** - -> Set mode 0600 on files in the cache so that cachefilesd can run as an -> unprivileged user rather than leaving the files all with 0. Directories -> are already set to 0700. -> - -**[v2: bpf-next: Add O_PATH-based BPF_OBJ_PIN and BPF_OBJ_GET support](http://lore.kernel.org/linux-fsdevel/20230518215444.1418789-1-andrii@kernel.org/)** - -> Add ability to specify pinning location within BPF FS using O_PATH-based FDs, -> similar to openat() family of APIs. Patch #1 adds necessary kernel-side -> changes. Patch #2 exposes this through libbpf APIs. Patch #3 uses new mount -> APIs (fsopen, fsconfig, fsmount) to demonstrated how now it's possible to work -> with detach-mounted BPF FS using new BPF_OBJ_PIN and BPF_OBJ_GET -> functionality. -> - -**[v2: Documentation: add initial iomap kdoc](http://lore.kernel.org/linux-fsdevel/20230518150105.3160445-1-mcgrof@kernel.org/)** - -> To help with iomap adoption / porting I set out the goal to try to -> help improve the iomap documentation and get general guidance for -> filesystem conversions over from buffer-head in time for this year's -> LSFMM. The end results thanks to the review of Darrick, Christoph and -> others is on the kernelnewbies wiki [0]. -> - -**[v1: squashfs: don't include buffer_head.h](http://lore.kernel.org/linux-fsdevel/20230517071622.245151-1-hch@lst.de/)** - -> Squashfs has stopped using buffers heads in 93e72b3c612adcaca1 -> ("squashfs: migrate from ll_rw_block usage to BIO"). -> - -**[v1: gfs2/buffer folio changes](http://lore.kernel.org/linux-fsdevel/20230517032442.1135379-1-willy@infradead.org/)** - -> This kind of started off as a gfs2 patch series, then became entwined -> with buffer heads once I realised that gfs2 was the only remaining -> caller of __block_write_full_page(). For those not in the gfs2 world, -> the big point of this series is that block_write_full_page() should now -> handle large folios correctly. -> - -**[v4: memcontrol: support cgroup level OOM protection](http://lore.kernel.org/linux-fsdevel/20230517032032.76334-1-chengkaitao@didiglobal.com/)** - -> Establish a new OOM score algorithm, supports the cgroup level OOM -> protection mechanism. When an global/memcg oom event occurs, we treat -> all processes in the cgroup as a whole, and OOM killers need to select -> the process to kill based on the protection quota of the cgroup. -> - -**[v1: ACPI: APEI: EINJ: Add support for vendor defined error types](http://lore.kernel.org/linux-fsdevel/d10df9d4-8cc7-b6f0-4096-cd0805407744@amd.com/)** - -> Noted. The only checkpatch warning that was ignored was pertaining -> to the usage of S_IWUSR macro with debugfs_create_blob. Had noticed that a -> majority of einj module's debugfs files have been created with S_IRUSR and -> S_IWUSR macros. So used them to maintain uniformity. -> Will switch to octal permissions though. -> - -**[v1: procfs: consolidate arch_report_meminfo declaration](http://lore.kernel.org/linux-fsdevel/20230516195834.551901-1-arnd@kernel.org/)** - -> The arch_report_meminfo() function is provided by four architectures, -> with a __weak fallback in procfs itself. On architectures that don't -> have a custom version, the __weak version causes a warning because -> of the missing prototype. -> - -**[v1: radix-tree: move declarations to header](http://lore.kernel.org/linux-fsdevel/20230516194212.548910-1-arnd@kernel.org/)** - -> The xarray.c file contains the only call to radix_tree_node_rcu_free(), -> and it comes with its own extern declaration for it. This means the -> function definition causes a missing-prototype warning: -> -> lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes] -> - -#### 网络设备 - -**[v5: iproute2-next: ip-link: add support for nolocalbypass in vxlan](http://lore.kernel.org/netdev/20230521054948.22753-1-vladimir@nikishkin.pw/)** - -> Add userspace support for the [no]localbypass vxlan netlink -> attribute. With localbypass on (default), the vxlan driver processes -> the packets destined to the local machine by itself, bypassing the -> userspace nework stack. With nolocalbypass the packets are always -> forwarded to the userspace network stack, so userspace programs, -> such as tcpdump have a chance to process them. -> - -**[v1: net-next: nfc: Switch i2c drivers back to use .probe()](http://lore.kernel.org/netdev/20230520172104.359597-1-u.kleine-koenig@pengutronix.de/)** - -> After commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new() -> call-back type"), all drivers being converted to .probe_new() and then -> convert back to (the new) .probe() to be able to eventually drop -> .probe_new() from struct i2c_driver. -> - -**[v1: net-next: net: phylink: require supported_interfaces to be filled](http://lore.kernel.org/netdev/E1q0K1u-006EIP-ET@rmk-PC.armlinux.org.uk/)** - -> We have been requiring the supported_interfaces bitmap to be filled in -> by MAC drivers that have a mac_select_pcs() method. Now that all MAC -> drivers fill in the supported_interfaces bitmap, it is time to enforce -> this. We have already required supported_interfaces to be set in order -> for optical SFPs to be configured in commit f81fa96d8a6c ("net: phylink: -> use phy_interface_t bitmaps for optical modules"). -> - -**[v1: net-next: net: sfp: add support for a couple of copper multi-rate modules](http://lore.kernel.org/netdev/E1q0JfS-006Dqc-8t@rmk-PC.armlinux.org.uk/)** - -> Add support for the Fiberstore SFP-10G-T and Walsun HXSX-ATRC-1 -> modules. Internally, the PCB silkscreen has what seems to be a part -> number of WT_502. Fiberstore use v2.2 whereas Walsun use v2.6. -> - -**[v1: net: macb: use correct __be32 and __be16 types](http://lore.kernel.org/netdev/20230519221942.53942-1-minhuadotchen@gmail.com/)** - -> This patch fixes the following sparse warnings. No functional changes. -> -> Use cpu_to_be16() and cpu_to_be32() to convert constants before comparing -> them with __be16 type of psrc/pdst and __be32 type of ip4src/ip4dst. -> Apply be16_to_cpu() in GEM_BFINS(). -> - -**[v7: virtio: pds_vdpa driver](http://lore.kernel.org/netdev/20230519215632.12343-1-shannon.nelson@amd.com/)** - -> This patchset implements a new module for the AMD/Pensando DSC that -> supports vDPA services on PDS Core VF devices. This code is based on -> and depends on include files from the pds_core driver described here[0]. -> The pds_core driver creates the auxiliary_bus devices that this module -> connects to, and this creates vdpa devices for use by the vdpa module. -> - -**[v2: can: esd_usb: More preparation before supporting esd CAN-USB/3](http://lore.kernel.org/netdev/20230519195600.420644-1-frank.jungclaus@esd.eu/)** - -> Apply another small batch of patches as preparation for adding support -> of the newly available esd CAN-USB/3 to esd_usb.c. -> - -**[v1: net-next: net/mlx5: Introduce SF direction](http://lore.kernel.org/netdev/20230519183044.19065-1-saeed@kernel.org/)** - -> Whenever multiple Virtual Network functions (VNFs) are used by Service -> Function Chaining (SFC), each packet is passing through all the VNFs, -> and each VNF is performing hairpin in order to pass the packet to the -> next function in the chain. -> - -**[v1: net: rtnetlink: not allow dev gro_max_size to exceed GRO_MAX_SIZE](http://lore.kernel.org/netdev/25a7b1b138e5ad3c926afce8cd4e08d8b7ef3af6.1684516568.git.lucien.xin@gmail.com/)** - -> In commit 0fe79f28bfaf ("net: allow gro_max_size to exceed 65536"), -> it limited GRO_MAX_SIZE to (8 * 65535) to avoid overflows, but also -> deleted the check of GRO_MAX_SIZE when setting the dev gro_max_size. -> - -**[v1: net-next: i40e: add PHY debug register dump](http://lore.kernel.org/netdev/20230519170208.2820484-1-anthony.l.nguyen@intel.com/)** - -> Implement ethtool register dump for some PHY registers in order to -> assist field debugging of link issues. -> - -**[v1: net-next:pull request: ice: allow matching on meta data](http://lore.kernel.org/netdev/20230519170018.2820322-1-anthony.l.nguyen@intel.com/)** - -> This patchset is intended to improve the usability of the switchdev -> slow path. Without matching on a meta data values slow path works -> based on VF's MAC addresses. It causes a problem when the VF wants -> to use more than one MAC address (e.g. when it is in trusted mode). -> - -**[v2: net-next: net: dsa: mv88e6xxx: add 88E6361 support](http://lore.kernel.org/netdev/20230519141303.245235-1-alexis.lothore@bootlin.com/)** - -> This series brings initial support for Marvell 88E6361 switch. -> -> MV88E6361 is a 8 ports switch with 5 integrated Gigabit PHYs and 3 -> 2.5Gigabit SerDes interfaces. It is in fact a new variant in the -> - port 0: MII, RMII, RGMII, 1000BaseX, 2500BaseX -> - port 3 to 7: triple speed internal phys -> - port 9 and 10: 1000BaseX, 25000BaseX -> - -**[v1: net-next: TCP splice improvements](http://lore.kernel.org/netdev/cover.1684501922.git.asml.silence@gmail.com/)** - -> The main part is in Patch 1, which optimises locking for successful -> blocking TCP splice read, following with a clean up in Patch 2. -> - -**[v1: net-next: net/tcp: refactor tcp_inet6_sk()](http://lore.kernel.org/netdev/16be6307909b25852744a67b2caf570efbb83c7f.1684502478.git.asml.silence@gmail.com/)** - -> Don't keep hand coded offset caluclations and replace it with -> container_of(). It should be type safer and a bit less confusing. -> -> It also makes it with a macro instead of inline function to preserve -> constness, which was previously casted out like in case of -> tcp_v6_send_synack(). -> - -**[v1: net-next: net: phy: add helpers for comparing phy IDs](http://lore.kernel.org/netdev/E1pzzm3-006BZJ-Bi@rmk-PC.armlinux.org.uk/)** - -> There are several places which open code comparing PHY IDs. Provide a -> couple of helpers to assist with this, using a slightly simpler test -> than the original: -> -> - phy_id_compare() compares two arbitary PHY IDs and a mask of the -> significant bits in the ID. -> - phydev_id_compare() compares the bound phydev with the specified -> PHY ID, using the bound driver's mask. -> - -**[v4: net-next: Fine-Tune Flow Control and Speed Configurations in Microchip KSZ8xxx DSA Driver](http://lore.kernel.org/netdev/20230519124700.635041-1-o.rempel@pengutronix.de/)** - -> change v4: -> - instead of downstream/upstream use CPU-port and PHY-port -> - adjust comments -> - minor fixes -> - -**[v3: net: stmmac: compare p->des0 and p->des1 with __le32 type values](http://lore.kernel.org/netdev/20230519115030.74493-1-minhuadotchen@gmail.com/)** - -> Use cpu_to_le32 to convert the constants to __le32 type -> before comparing them with p->des0 and p->des1 (they are __le32 type) -> and to fix following sparse warnings: -> -> drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c:110:23: sparse: warning: restricted __le32 degrades to integer -> drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c:110:50: sparse: warning: restricted __le32 degrades to integer -> - -**[v1: [net-next] net: ipconfig: move ic_nameservers_fallback into #ifdef block](http://lore.kernel.org/netdev/20230519093250.4011881-1-arnd@kernel.org/)** - -> The new variable is only used when IPCONFIG_BOOTP is defined and otherwise -> causes a warning: -> -> net/ipv4/ipconfig.c:177:12: error: 'ic_nameservers_fallback' defined but not used [-Werror=unused-variable] -> -> Move it next to the user. -> - -**[v2: net-next: net: fec: turn on XDP features](http://lore.kernel.org/netdev/20230519014825.1659331-1-wei.fang@nxp.com/)** - -> The XDP features are supported since the commit 66c0e13ad236 -> ("drivers: net: turn on XDP features"). Currently, the fec -> driver supports NETDEV_XDP_ACT_BASIC, NETDEV_XDP_ACT_REDIRECT -> and NETDEV_XDP_ACT_NDO_XMIT. So turn on these XDP features -> for fec driver. -> - -**[v1: net: stmmac: use le32_to_cpu for p->des0 and p->des1](http://lore.kernel.org/netdev/20230519002522.3648-1-minhuadotchen@gmail.com/)** - -> Use le32_to_cpu for p->des0 and p->des1 to fix the -> following sparse warnings: -> -> drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c:110:23: sparse: warning: restricted __le32 degrades to integer -> drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c:110:50: sparse: warning: restricted __le32 degrades to integer -> - -**[v13: io_uring: add napi busy polling support](http://lore.kernel.org/netdev/20230518211751.3492982-1-shr@devkernel.io/)** - -> This adds the napi busy polling support in io_uring.c. It adds a new -> napi_list to the io_ring_ctx structure. This list contains the list of -> napi_id's that are currently enabled for busy polling. This list is -> used to determine which napi id's enabled busy polling. For faster -> access it also adds a hash table. -> - -**[v6: Enable multiple MCAN on AM62x](http://lore.kernel.org/netdev/20230518193613.15185-1-jm@ti.com/)** - -> On AM62x there are two MCANs in MCU domain. The MCANs in MCU domain -> were not enabled since there is no hardware interrupt routed to A53 -> GIC interrupt controller. Therefore A53 Linux cannot be interrupted -> by MCU MCANs. -> - -**[v1: bpf-next: xsk: multi-buffer support](http://lore.kernel.org/netdev/20230518180545.159100-1-maciej.fijalkowski@intel.com/)** - -> This series of patches add multi-buffer support for AF_XDP. XDP and -> various NIC drivers already have support for multi-buffer packets. With -> this patch set, programs using AF_XDP sockets can now also receive and -> transmit multi-buffer packets both in copy as well as zero-copy mode. -> ZC multi-buffer implementation is based on ice driver. -> - -**[v1: nf: netfilter: ipset: Add schedule point in call_ad().](http://lore.kernel.org/netdev/20230518173300.34531-1-kuniyu@amazon.com/)** - -> syzkaller found a repro that causes Hung Task [0] with ipset. The repro -> first creates an ipset and then tries to delete a large number of IPs -> from the ipset concurrently: -> -> IPSET_ATTR_IPADDR_IPV4: 172.20.20.187 -> IPSET_ATTR_CIDR: 2 -> - -**[v3: net: fec: add dma_l.org/netdev/20230518150202.1920375-1-shenwei.wang@nxp.com/)** - -> Two dma_wmb() are added in the XDP TX path to ensure proper ordering of -> descriptor and buffer updates: -> 1. A dma_wmb() is added after updating the last BD to make sure -> the updates to rest of the descriptor are visible before -> transferring ownership to FEC. -> 2. A dma_wmb() is also added after updating the bdp to ensure these -> updates are visible before updating txq->bd.cur. -> 3. Start the xmit of the frame immediately right after configuring the -> tx descriptor. -> - -**[v1: bpf: Use call_rcu_hurry() with synchronize_rcu_mult()](http://lore.kernel.org/netdev/358bde93-4933-4305-ac42-4d6f10c97c08@paulmck-laptop/)** - -> The bpf_struct_ops_map_free() function must wait for both an RCU grace -> period and an RCU Tasks grace period, and so it passes call_rcu() and -> call_rcu_tasks() to synchronize_rcu_mult(). This works, but on ChromeOS -> and Android platforms call_rcu() can have lazy semantics, resulting in -> multi-second delays between call_rcu() invocation and invocation of the -> corresponding callback. -> - -**[GIT PULL: Networking for 6.4-rc3](http://lore.kernel.org/netdev/20230518132554.41223-1-pabeni@redhat.com/)** - -> The following changes since commit 6e27831b91a0bc572902eb065b374991c1ef452a: -> -> Merge tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2023-05-11 08:42:47 -0500) -> - -#### 安全增强 - -**[v1: Memory Mapping (VMA) protection using PKU - set 1](http://lore.kernel.org/linux-hardening/20230519011915.846407-1-jeffxu@chromium.org/)** - -> We're using PKU for in-process isolation to enforce control-flow integrity -> for a JIT compiler. In our threat model, an attacker exploits a -> vulnerability and has arbitrary read/write access to the whole process -> space concurrently to other threads being executed. This attacker can -> manipulate some arguments to syscalls from some threads. -> - -**[v1: next: ALSA: mixart: Replace one-element arrays with simple object declarations](http://lore.kernel.org/linux-hardening/ZGVlcpuvx1rSOMP8@work/)** - -> One-element arrays are deprecated, and we are replacing them with flexible -> array members, instead. However, in this case it seems those one-element -> arrays have never actually been used as fake flexible arrays. -> - -**[v1: md/raid5: Convert stripe_head's "dev" to flexible array member](http://lore.kernel.org/linux-hardening/20230517233313.never.130-kees@kernel.org/)** - -> Replace old-style 1-element array of "dev" in struct stripe_head with -> modern C99 flexible array. In the future, we can additionally annotate -> it with the run-time size, found in the "disks" member. -> - -**[v1: kbuild: Enable -fstrict-flex-arrays=3](http://lore.kernel.org/linux-hardening/20230517232801.never.262-kees@kernel.org/)** - -> The -fstrict-flex-arrays=3 option is now available with the release -> of GCC 13[1] and Clang 16[2]. This feature instructs the compiler to -> treat only C99 flexible arrays as dynamically sized for the purposes of -> object size calculations. In other words, the ancient practice of using -> 1-element arrays, or the GNU extension of using 0-sized arrays, as a -> dynamically sized array is disabled. This allows CONFIG_UBSAN_BOUNDS, -> CONFIG_FORTIFY_SOURCE, and other object-size aware features to behave -> unambiguously in the face of trailing arrays: only C99 flexible arrays -> are considered to be dynamically sized. -> - -**[v1: pid: Replace struct pid 1-element array with flex-array](http://lore.kernel.org/linux-hardening/20230517225838.never.965-kees@kernel.org/)** - -> For pid namespaces, struct pid uses a dynamically sized array member, -> "numbers". This was implemented using the ancient 1-element fake flexible -> array, which has been deprecated for decades. Replace it with a C99 -> flexible array, refactor the array size calculations to use struct_size(), -> and address elements via indexes. Note that the static initializer (which -> defines a single element) works as-is, and requires no special handling. -> - -**[v1: next: scsi: lpfc: Use struct_size() helper](http://lore.kernel.org/linux-hardening/99e06733f5f35c6cd62e05f530b93107bfd03362.1684358315.git.gustavoars@kernel.org/)** - -> Prefer struct_size() over open-coded versions of idiom: -> -> sizeof(struct-with-flex-array) + sizeof(typeof-flex-array-elements) * count -> -> where count is the max number of items the flexible array is supposed to -> contain. -> - -**[v1: next: scsi: lpfc: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/6c6dcab88524c14c47fd06b9332bd96162656db5.1684358315.git.gustavoars@kernel.org/)** - -> One-element arrays are deprecated, and we are replacing them with flexible -> array members instead. So, replace one-element arrays with flexible-array -> members in a couple of structures, and refactor the rest of the code, -> accordingly. -> - -**[v1: checkpatch: Check for strcpy and strncpy too](http://lore.kernel.org/linux-hardening/20230517201349.never.582-kees@kernel.org/)** - -> Warn about strcpy(), strncpy(), and strlcpy(). Suggest strscpy() and -> include pointers to the open KSPP issues for each, which has further -> details and replacement procedures. -> - -**[v2: Compiler Attributes: Add __counted_by macro](http://lore.kernel.org/linux-hardening/20230517190841.gonna.796-kees@kernel.org/)** - -> In an effort to annotate all flexible array members with their run-time -> size information, the "element_count" attribute is being introduced by -> Clang[1] and GCC[2] in future releases. This annotation will provide -> the CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE features the ability -> to perform run-time bounds checking on otherwise unknown-size flexible -> arrays. -> - -**[v1: next: media: venus: hfi_cmds: Replace fake flex-arrays with flexible-array members](http://lore.kernel.org/linux-hardening/ZGQrSQ%2FzHu+pk7WU@work/)** - -> One-element arrays are deprecated, and we are replacing them with flexible -> array members instead. So, replace one-element arrays with flexible-array -> members in multiple structures. -> - -**[v1: next: media: venus: hfi_cmds: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZGQn63U4IeRUiJWb@work/)** - -> One-element arrays are deprecated, and we are replacing them with flexible -> array members instead. So, replace one-element arrays with flexible-array -> members in struct hfi_sys_set_resource_pkt, and refactor the rest of -> the code, accordingly. -> - -**[v1: next: media: venus: hfi_cmds: Use struct_size() helper](http://lore.kernel.org/linux-hardening/fd52d6ddce285474615e4bd96931ab12a0da8199.1684278538.git.gustavoars@kernel.org/)** - -> Prefer struct_size() over open-coded versions of idiom: -> -> sizeof(struct-with-flex-array) + sizeof(typeof-flex-array-elements) * count -> -> where count is the max number of items the flexible array is supposed to -> contain. -> - -**[v1: next: media: venus: hfi_cmds: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/e4b13d7b79d1477e775c6d4564f7b23c4cf967f2.1684278538.git.gustavoars@kernel.org/)** - -> One-element arrays are deprecated, and we are replacing them with flexible -> array members instead. So, replace one-element arrays with flexible-array -> members in struct hfi_session_set_buffers_pkt, and refactor the rest of -> the code, accordingly. -> - -**[v1: next: media: venus: Replace one-element arrays with flexible-array members](http://lore.kernel.org/linux-hardening/ZGPk3PpvYzjD1+0%2F@work/)** - -> One-element arrays are deprecated, and we are replacing them with flexible -> array members instead. So, replace one-element arrays with flexible-array -> members in multiple structures, and refactor the rest of the code, -> accordingly. -> - -**[v1: next: iavf: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/ZGLR3H1OTgJfOdFP@work/)** - -> One-element arrays are deprecated, and we are replacing them with flexible -> array members instead. So, replace one-element array with flexible-array -> member in struct iavf_qvlist_info, and refactor the rest of the code, -> accordingly. -> - -**[v1: next: wifi: wil6210: fw: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper](http://lore.kernel.org/linux-hardening/ZGKHByxujJoygK+l@work/)** - -> Zero-length arrays are deprecated, and we are moving towards adopting -> C99 flexible-array members, instead. So, replace zero-length arrays -> declarations alone in structs with the new DECLARE_FLEX_ARRAY() -> helper macro. -> - -**[v1: next: wifi: wil6210: wmi: Replace zero-length array with DECLARE_FLEX_ARRAY() helper](http://lore.kernel.org/linux-hardening/ZGKHM+MWFsuqzTjm@work/)** - -> Zero-length arrays are deprecated, and we are moving towards adopting -> C99 flexible-array members, instead. So, replace zero-length arrays -> declarations alone in structs with the new DECLARE_FLEX_ARRAY() -> helper macro. -> - -**[v1: next: net: libwx: Replace zero-length array with flexible-array member](http://lore.kernel.org/linux-hardening/ZGKGwtsobVZecWa4@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated, and we are -> moving towards adopting C99 flexible-array members instead. -> - -**[v1: next: mlxfw: Replace zero-length array with DECLARE_FLEX_ARRAY() helper](http://lore.kernel.org/linux-hardening/ZGKGiBxP0zHo6XSK@work/)** - -> Zero-length arrays are deprecated and we are moving towards adopting -> C99 flexible-array members, instead. So, replace zero-length arrays -> declarations alone in structs with the new DECLARE_FLEX_ARRAY() -> helper macro. -> - -#### 异步 IO - -**[v1: net-next: minor tcp io_uring zc optimisations](http://lore.kernel.org/io-uring/cover.1684166247.git.asml.silence@gmail.com/)** - -> Patch 1 is a simple cleanup, patch 2 gives removes 2 atomics from the -> io_uring zc TCP submission path, which yielded extra 0.5% for my -> throughput CPU bound tests based on liburing/examples/send-zerocopy.c -> - -**[v1: for-next: Enable IOU_F_TWQ_LAZY_WAKE for passthrough](http://lore.kernel.org/io-uring/cover.1684154817.git.asml.silence@gmail.com/)** - -> Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough. -> -> The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1] -> patchset, but for a quick test I took fio/t/io_uring with 4 threads each -> reading their own drive and all pinned to the same CPU to make it CPU -> bound and got +10% throughput improvement. -> - -#### Rust For Linux - -**[v1: Bindings for the workqueue](http://lore.kernel.org/rust-for-linux/20230517203119.3160435-1-aliceryhl@google.com/)** - -> This patchset contains bindings for the kernel workqueue. -> -> One of the primary goals behind the design used in this patch is that we -> must support embedding the `work_struct` as a field in user-provided -> types, because this allows you to submit things to the workqueue without -> having to allocate, making the submission infallible. If we didn't have -> to support this, then the patch would be much simpler. One of the main -> things that make it complicated is that we must ensure that the function -> pointer in the `work_struct` is compatible with the struct it is -> contained within. -> - -**[v1: rust: networking and crypto abstractions](http://lore.kernel.org/rust-for-linux/010101881db036fb-2fb6981d-e0ef-4ad1-83c3-54d64b6d93b3-000000@us-west-2.amazonses.com/)** - -> This includes initial rust abstractions for networking and crypto. -> -> I've been working on in-kernel TLS 1.3 handshake in Rust on the top of -> this. Currently you can run simple TLS server code, which does a -> handshake, sets up kTLS (Kernel TLS offload) to read and write some -> bytes. -> - -#### BPF - -**[v9: bpf-next: bpf: Add socket destroy capability](http://lore.kernel.org/bpf/20230519225157.760788-1-aditi.ghag@isovalent.com/)** - -> This patch set adds the capability to destroy sockets in BPF. We plan to -> use the capability in Cilium to force client sockets to reconnect when -> their remote load-balancing backends are deleted. The other use case is -> on-the-fly policy enforcement where existing socket connections -> prevented by policies need to be terminated. -> - -**[v1: dwarves: Encoding function addresses using DECL_TAGs](http://lore.kernel.org/bpf/20230517161648.17582-1-alan.maguire@oracle.com/)** - -> As a means to continue the discussion in [1], which is -> concerned with finding the best long-term solution to -> having a BPF Type Format (BTF) representation of -> functions that is usable for tracing of edge cases, this -> proof-of-concept series is intended to explore one approach -> to adding information to help make tracing more accurate. -> - -**[v2: bpf-next: bpftool: specify XDP Hints ifname when loading program](http://lore.kernel.org/bpf/20230517160103.1088185-1-larysa.zaremba@intel.com/)** - -> Add ability to specify a network interface used to resolve -> XDP Hints kfuncs when loading program through bpftool. -> - -**[v1: bpf-next: selftests/bpf: add xdp_feature selftest for bond device](http://lore.kernel.org/bpf/64cb8f20e6491f5b971f8d3129335093c359aad7.1684329998.git.lorenzo@kernel.org/)** - -> Introduce selftests to check xdp_feature support for bond driver. -> - -**[v2: bpf-next: bpf: Show target_{obj,btf}_id for tracing link](http://lore.kernel.org/bpf/20230517103126.68372-1-laoar.shao@gmail.com/)** - -> The target_btf_id can help us understand which kernel function is -> linked by a tracing prog. The target_btf_id and target_obj_id have -> already been exposed to userspace, so we just need to show them. -> - -**[v1: selftests/bpf: Do not use sign-file as testcase](http://lore.kernel.org/bpf/88e3ab23029d726a2703adcf6af8356f7a2d3483.1684316821.git.legion@kernel.org/)** - -> The sign-file utility (from scripts/) is used in prog_tests/verify_pkcs7_sig.c, -> but the utility should not be called as a test. Executing this utility -> produces the following error: -> - -**[v1: support non-frag page for page_pool_alloc_frag()](http://lore.kernel.org/bpf/20230516124801.2465-1-linyunsheng@huawei.com/)** - -> In [1], there is a use case to use frag support in page -> pool to reduce memory usage, and it may request different -> frag size depending on the head/tail room space for -> xdp_frame/shinfo and mtu/packet size. When the requested -> frag size is large enough that a single page can not be -> split into more than one frag, using frag support only -> have performance penalty because of the extra frag count -> handling for frag support. -> - -**[v2: bpf-next: seltests/xsk: prepare for AF_XDP multi-buffer testing](http://lore.kernel.org/bpf/20230516103109.3066-1-magnus.karlsson@gmail.com/)** - -> Prepare the AF_XDP selftests test framework code for the upcoming -> multi-buffer support in AF_XDP. This so that the multi-buffer patch -> set does not become way too large. In that upcoming patch set, we are -> only including the multi-buffer tests together with any framework -> code that depends on the new options bit introduced in the AF_XDP -> multi-buffer implementation itself. -> - -**[v1: bpf-next: selftests/bpf: improve netcnt test robustness](http://lore.kernel.org/bpf/20230515204833.2832000-1-andrii@kernel.org/)** - -> Change netcnt to demand at least 10K packets, as we frequently see some -> stray packet arriving during the test in BPF CI. It seems more important -> to make sure we haven't lost any packet than enforcing exact number of -> packets. -> - -**[v1: bpf: samples/bpf: use canonical fallthrough pseudo-keyword in hbm.c](http://lore.kernel.org/bpf/20230515200207.2541162-1-andrii@kernel.org/)** - -> Rename now unsupported __fallthrough into fallthrough ([0]) in -> samples/bpf/hbm.c to fix samples/bpf compilation. -> -> [0] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through -> - -**[v2: iwl-net: ice: recycle/free all of the fragments from multi-buffer frame](http://lore.kernel.org/bpf/20230515135247.142105-1-maciej.fijalkowski@intel.com/)** - -> The ice driver caches next_to_clean value at the beginning of -> ice_clean_rx_irq() in order to remember the first buffer that has to be -> freed/recycled after main Rx processing loop. The end boundary is -> indicated by first descriptor of frame that Rx processing loop has ended -> its duties. Note that if mentioned loop ended in the middle of gathering -> multi-buffer frame, next_to_clean would be pointing to the descriptor in -> the middle of the frame BUT freeing/recycling stage will stop at the -> first descriptor. This means that next iteration of ice_clean_rx_irq() -> will miss the (first_desc, next_to_clean - 1) entries. -> - -**[v2: bpf-next: bpf: bpf trampoline improvements](http://lore.kernel.org/bpf/20230515130849.57502-1-laoar.shao@gmail.com/)** - -> When we run fexit bpf programs (e.g. attaching tcp_recvmsg) on our servers -> which were running old kernels, some of these servers crashed. Finally we -> figured out that it was caused by the same issue resolved by -> commit e21aa341785c ("bpf: Fix fexit trampoline."). After we backported -> that commit, the crash disappears. However new issues are introduced by -> that commit. This patchset fixes them. -> - -**[v1: bpf-next: bpf: btf: restore resolve_mode when popping the resolve stack](http://lore.kernel.org/bpf/20230515121521.30569-1-lmb@isovalent.com/)** - -> In commit 9b459804ff99 ("btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR") -> I fixed a bug that occurred during resolving of a DATASEC by strategically resetting -> resolve_mode. This fixes the immediate bug but leaves us open to future bugs where -> nested types have to be resolved. -> - -**[v1: Make fpobe + rethook immune to recursion](http://lore.kernel.org/bpf/20230515035215.Hx3AI5Kb65x5TpmiBhIKrdGS6XpIW09Y4phhBWXCDMg@z/)** - -> Current fprobe and rethook has some pitfalls and may introduce kernel stack recusion, especially in -> massive tracing scenario. -> -> For example, if (DEBUG_PREEMPT | TRACE_PREEMPT_TOGGLE) , preempt_count_{add, sub} can be traced via -> ftrace, if we happens to use fprobe + rethook based on ftrace to hook on those functions, -> recursion is introduced in functions like rethook_trampoline_handler and leads to kernel crash -> because of stack overflow. -> - -### 周边技术动态 - -#### Qemu - -**[v1: hw/riscv/opentitan: Correct QOM type/size of OpenTitanState](http://lore.kernel.org/qemu-devel/20230520054510.68822-1-philmd@linaro.org/)** - -> This series fix a QOM issue with the OpenTitanState -> structure, noticed while auditing QOM relations globally. -> - -**[v5: hw/riscv: qemu crash when NUMA nodes exceed available CPUs](http://lore.kernel.org/qemu-devel/20230519023758.1759434-1-yin.wang@intel.com/)** - -> Command "qemu-system-riscv64 -machine virt -> -m 2G -smp 1 -numa node,mem=1G -numa node,mem=1G" -> would trigger this problem.Backtrace with: -> #0 0x0000555555b5b1a4 in riscv_numa_get_default_cpu_node_id at ../hw/riscv/numa.c:211 -> #1 0x00005555558ce510 in machine_numa_finish_cpu_init at ../hw/core/machine.c:1230 -> #2 0x00005555558ce9d3 in machine_run_board_init at ../hw/core/machine.c:1346 -> #3 0x0000555555aaedc3 in qemu_init_board at ../softmmu/vl.c:2513 -> #4 0x0000555555aaf064 in qmp_x_exit_preconfig at ../softmmu/vl.c:2609 -> #5 0x0000555555ab1916 in qemu_init at ../softmmu/vl.c:3617 -> #6 0x000055555585463b in main at ../softmmu/main.c:47 -> This commit fixes the issue by adding parameter checks. -> - -**[v1: Add RISC-V Virtual IRQs and IRQ filtering support](http://lore.kernel.org/qemu-devel/20230518113838.130084-1-rkanwal@rivosinc.com/)** - -> This series adds M and HS-mode virtual interrupt and IRQ filtering support. -> This allows inserting virtual interrupts from M/HS-mode into S/VS-mode -> using mvien/hvien and mvip/hvip csrs. IRQ filtering is a use case of -> this change, i-e M-mode can stop delegating an interrupt to S-mode and -> instead enable it in MIE and receive those interrupts in M-mode and then -> selectively inject the interrupt using mvien and mvip. -> - -**[v9: target/riscv: rework CPU extension validation](http://lore.kernel.org/qemu-devel/20230517135714.211809-1-dbarboza@ventanamicro.com/)** - -> In this version we have a change in patch 11. We're now firing a -> GUEST_ERROR if write_misa() fails and we need to rollback (i.e. not -> change MISA ext). -> - -#### U-Boot - -**[v2: riscv: setup per-hart stack earlier](http://lore.kernel.org/u-boot/1684650044-313122-1-git-send-email-ganboing@gmail.com/)** - -> Harts need to use per-hart stack before any function call, even if that -> function is a simple one. When the callee uses stack for register save/ -> restore, especially RA, if nested call, concurrent access by multiple -> harts on the same stack will cause data-race. -> - -**[v1: riscv: add backtrace support](http://lore.kernel.org/u-boot/20230515130322.516871-1-ben.dooks@sifive.com/)** - -> When debugging, it is useful to have a backtrace to find -> out what is in the call stack as the previous function (RA) -> may not have been the culprit. -> - -## 20230507:第 45 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v3: Allwinner R329/D1/R528/T113s SPI support](http://lore.kernel.org/linux-riscv/20230506232616.1792109-1-bigunclemax@gmail.com/)** - -> This series is attempt to revive previous work to add support for SPI -> controller which is used in newest Allwinner's SOCs R329/D1/R528/T113s -> https://lore.kernel.org/lkml/BYAPR20MB2472E8B10BFEF75E7950BBC0BCF79@BYAPR20MB2472.namprd20.prod.outlook.com/ -> - -**[v1: riscv: mm: use bitmap_zero() API](http://lore.kernel.org/linux-riscv/202305061711417142802@zte.com.cn/)** - -> bitmap_zero() is faster than bitmap_clear(), so use bitmap_zero() -> instead of bitmap_clear(). -> - -**[v1: RISC-V: KVM: use bitmap_zero() API](http://lore.kernel.org/linux-riscv/202305061710302032748@zte.com.cn/)** - -> bitmap_zero() is faster than bitmap_clear(), so use bitmap_zero() -> instead of bitmap_clear(). -> - -**[v3: Add TDM audio on StarFive JH7110](http://lore.kernel.org/linux-riscv/20230506090116.9206-1-walker.chen@starfivetech.com/)** - -> This patchset adds TDM audio driver for the StarFive JH7110 SoC. The -> first patch adds device tree binding for TDM module. The second patch -> adds tdm driver support for JH7110 SoC. The last patch adds device node -> of tdm and sound card to JH7110 dts. -> -> The series has been tested on the VisionFive 2 board by plugging an -> audio expansion board. -> -> For more information of audio expansion board, you can take a look -> at the following webpage: -> https://wiki.seeedstudio.com/ReSpeaker_2_Mics_Pi_HAT/ -> - -**[v1: perf build: Add system include paths to BPF builds](http://lore.kernel.org/linux-riscv/20230506021450.3499232-1-irogers@google.com/)** - -> There are insufficient headers in tools/include to satisfy building -> BPF programs and their header dependencies. Add the system include -> paths from the non-BPF clang compile so that these headers can be -> found. -> -> This code was taken from: -> tools/testing/selftests/bpf/Makefile -> - -**[GIT PULL: RISC-V Patches for the 6.4 Merge Window, Part 2](http://lore.kernel.org/linux-riscv/mhng-b783c0bb-3d23-4767-9c69-a39f805a8544@palmer-ri-x1c9/)** - -> -> RISC-V Patches for the 6.4 Merge Window, Part 2 -> -> * Support for hibernation. -> * .rela.dyn has been moved to init. -> * A fix for the SBI probing to allow for implementation-defined -> behavior. -> * Various other fixes and cleanups throughout the tree. -> -> There are still a few minor build issues with drivers, but patches are on the -> lists. Aside from that things look good with a merge from Linus' master as of -> last night, I've got another test running now but I don't see anything scary. -> - -**[v1: riscv: Optimize memset](http://lore.kernel.org/linux-riscv/6d1cbe2e.3c31d.187eb14d990.Coremail.zhangfei@nj.iscas.ac.cn/)** - -> -> This patch has been optimized for memset data sizes less than 16 bytes. -> Compared to byte by byte storage, significant performance improvement has been achieved. -> - -**[v1: riscv: dts: allwinner: d1: Add SPI0 controller node](http://lore.kernel.org/linux-riscv/20230505074701.1030980-1-bigunclemax@gmail.com/)** - -> Some boards form the MangoPi family (MQ\MQ-Dual\MQ-R) may have -> an optional SPI flash that connects to the SPI0 controller. -> This controller is already supported by sun8i-h3-spi driver. -> So let's add its DT node. -> - -**[v2: RISC-V: Detect Ssqosid extension and handle sqoscfg CSR](http://lore.kernel.org/linux-riscv/20230430-riscv-cbqri-rfc-v2-v2-0-8e3725c4a473@baylibre.com/)** - -> This RFC series adds initial support for the Ssqosid extension and the -> sqoscfg CSR as specified in Chapter 2 of the RISC-V Capacity and -> Bandwidth Controller QoS Register Interface (CBQRI) specification [1]. -> -> QoS (Quality of Service) in this context is concerned with shared -> resources on an SoC such as cache capacity and memory bandwidth. Intel -> and AMD already have QoS features on x86, and there is an existing user -> interface in Linux: the resctrl virtual filesystem [2]. -> -> The sqoscfg CSR provides a mechanism by which a software workload (e.g. -> a process or a set of processes) can be associated with a resource -> control ID (RCID) and a monitoring counter ID (MCID) that accompanies -> each request made by the hart to shared resources like cache. CBQRI -> defines operations to configure resource usage limits, in the form of -> capacity or bandwidth, for an RCID. CBQRI also defines operations to -> configure counters to track the resource utilization of an MCID. -> -> The CBQRI spec is still in draft state and is undergoing review [3]. It -> is possible there will be changes to the Ssqosid extension and the CBQRI -> spec. For example, the CSR address for sqoscfg is not yet finalized. -> -> My goal for this RFC is to determine if the 2nd patch is an acceptable -> approach to handling sqoscfg when switching tasks. This RFC was tested -> against a QEMU branch that implements the Ssqosid extension [4]. A test -> driver [5] was used to set sqoscfg for the current process. This allows -> __switch_to_sqoscfg() to be tested without resctrl. -> -> This series is based on riscv/for-next at: -> -> b09313dd2e72 ("RISC-V: hwprobe: Explicity check for -1 in vdso init") -> - -**[v2: Split ptdesc from struct page](http://lore.kernel.org/linux-riscv/20230501192829.17086-1-vishal.moola@gmail.com/)** - -> The MM subsystem is trying to shrink struct page. This patchset -> introduces a memory descriptor for page table tracking - struct ptdesc. -> -> This patchset introduces ptdesc, splits ptdesc from struct page, and -> converts many callers of page table constructor/destructors to use ptdescs. -> -> Ptdesc is a foundation to further standardize page tables, and eventually -> allow for dynamic allocation of page tables independent of struct page. -> However, the use of pages for page table tracking is quite deeply -> ingrained and varied across archictectures, so there is still a lot of -> work to be done before that can happen. -> -> This is rebased on next-20230428. -> - -**[v3: riscv: allow case-insensitive ISA string parsing](http://lore.kernel.org/linux-riscv/tencent_E6911C8D71F5624E432A1AFDF86804C3B509@qq.com/)** - -> This patchset allows case-insensitive ISA string parsing, which is -> needed in the ACPI environment. As the RISC-V Hart Capabilities Table -> (RHCT) description in UEFI Forum ECR[1] shows the format of the ISA -> string is defined in the RISC-V unprivileged specification[2]. However, -> the RISC-V unprivileged specification defines the ISA naming strings are -> case-insensitive while the current ISA string parser in the kernel only -> accepts lowercase letters. In this case, the kernel should allow -> case-insensitive ISA string parsing. Moreover, this reason has been -> discussed in Conor's patch[3]. And I have also checked the current ISA -> string parsing in the recent ACPI support patch[4] will also call -> `riscv_fill_hwcap` function as DT we use now. -> -> The original motivation for my patch v1[5] is that some SoC generators -> will provide generated DT with illegal ISA string in dt-binding such as -> rocket-chip, which will even cause kernel panic in some cases as I -> mentioned in v1[5]. Now, the rocket-chip has been fixed in PR #3333[6]. -> However, when using some specific version of rocket-chip with -> illegal ISA string in DT, this patchset will also work for parsing -> uppercase letters correctly in DT, thus will have better compatibility. -> -> In summary, this patch not only works for case-insensitive ISA string -> parsing to meet the requirements in ECR[1] but also can be a workaround -> for some specific versions of rocket-chip. -> - -#### 进程调度 - -**[v2: sched/debug: correct printing for rq->nr_uninterruptible](http://lore.kernel.org/lkml/20230506074253.44526-1-yanyan.yan@antgroup.com/)** - -> Commit e6fe3f422be1 ("sched: Make multiple runqueue task counters -> 32-bit") changed the type for rq->nr_uninterruptible from "unsigned -> long" to "unsigned int", but left wrong cast print to -> /sys/kernel/debug/sched/debug and to the console. -> -> For example, nr_uninterruptible's value is fffffff7 with type -> "unsigned int", (long)nr_uninterruptible shows 4294967287 while -> (int)nr_uninterruptible prints -9. So using int cast fixes wrong -> printing. -> - -**[v1: sched: core: Simplify init_sched_mm_cid()](http://lore.kernel.org/lkml/20230507023352.2784-1-kunyu@nfschina.com/)** - -> int mm_users variable definition move to variable usage location. -> - -**[v2: sched/deadline: cpuset: Rework DEADLINE bandwidth restoration](http://lore.kernel.org/lkml/20230503072228.115707-1-juri.lelli@redhat.com/)** - -> Qais reported [1] that iterating over all tasks when rebuilding root -> domains for finding out which ones are DEADLINE and need their bandwidth -> correctly restored on such root domains can be a costly operation (10+ -> ms delays on suspend-resume). He proposed we skip rebuilding root -> domains for certain operations, but that approach seemed arch specific -> and possibly prone to errors, as paths that ultimately trigger a rebuild -> might be quite convoluted (thanks Qais for spending time on this!). -> -> This is v2 of an alternative approach (v1 at [3]) to fix the problem. -> - -**[v1: sched/numa: Disjoint set vma scan improvements](http://lore.kernel.org/lkml/cover.1683033105.git.raghavendra.kt@amd.com/)** - -> -> While this has improved significant system time overhead, there are corner -> cases, which genuinely needs some relaxation for e.g., concern raised by -> PeterZ where unfairness amongst the thread belonging to disjoint set of VMSs -> can potentially amplify the side effects of vma regions belonging to some of -> the tasks being left unscanned. -> -> With this patch I am seeing good improvement in numa01_THREAD_ALLOC case, -> but please note that with [1] there was a drastic decrease in system time when -> benchmarks run, this patch adds back some of the system time. -> - -**[v2: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230430171809.124686-1-yury.norov@gmail.com/)** - -> for_each_cpu() is widely used in kernel, and it's beneficial to create -> a NUMA-aware version of the macro. -> -> Recently added for_each_numa_hop_mask() works, but switching existing -> codebase to it is not an easy process. -> -> This series adds for_each_numa_cpu(), which is designed to be similar to -> the for_each_cpu(). It allows to convert existing code to NUMA-aware as -> simple as adding a hop iterator variable and passing it inside new macro. -> for_each_numa_cpu() takes care of the rest. -> -> At the moment, we have 2 users of NUMA-aware enumerators. One is -> Melanox's in-tree driver, and another is Intel's in-review driver: -> -> https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/ -> -> Both real-life examples follow the same pattern: -> -> for_each_numa_hop_mask(cpus, prev, node) { -> for_each_cpu_andnot(cpu, cpus, prev) { -> if (cnt++ == max_num) -> goto out; -> do_something(cpu); -> } -> prev = cpus; -> } -> -> With the new macro, it has a more standard look, like this: -> -> for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) { -> if (cnt++ == max_num) -> break; -> do_something(cpu); -> } -> -> Straight conversion of existing for_each_cpu() codebase to NUMA-aware -> version with for_each_numa_hop_mask() is difficult because it doesn't -> take a user-provided cpu mask, and eventually ends up with open-coded -> double loop. With for_each_numa_cpu() it shouldn't be a brainteaser. -> Consider the NUMA-ignorant example: -> -> cpumask_t cpus = get_mask(); -> int cnt = 0, cpu; -> -> for_each_cpu(cpu, cpus) { -> if (cnt++ == max_num) -> break; -> do_something(cpu); -> } -> -> Converting it to NUMA-aware version would be as simple as: -> -> cpumask_t cpus = get_mask(); -> int node = get_node(); -> int cnt = 0, hop, cpu; -> -> for_each_numa_cpu(cpu, hop, node, cpus) { -> if (cnt++ == max_num) -> break; -> do_something(cpu); -> } -> -> The latter looks more verbose and avoids from open-coding that annoying -> double loop. Another advantage is that it works with a 'hop' parameter with -> the clear meaning of NUMA distance, and doesn't make people not familiar -> to enumerator internals bothering with current and previous masks machinery. -> - -#### 内存管理 - -**[v1: filemap: Handle error return from __filemap_get_folio()](http://lore.kernel.org/linux-mm/20230506160415.2992089-1-willy@infradead.org/)** - -> Smatch reports that filemap_fault() was missed in the conversion of -> __filemap_get_folio() error returns from NULL to ERR_PTR. -> - -**[v1: mm/gup: add missing gup_must_unshare() check to gup_huge_pgd()](http://lore.kernel.org/linux-mm/cb971ac8dd315df97058ea69442ecc007b9a364a.1683381545.git.lstoakes@gmail.com/)** - -> All other instances of gup_huge_pXd() perform the unshare check, so update -> the PGD-specific function to do so as well. -> -> While checking pgd_write() might seem unusual, this function already -> performs such a check via pgd_access_permitted() so this is in line with -> the existing implementation. -> - -**[v3: memcontrol: support cgroup level OOM protection](http://lore.kernel.org/linux-mm/20230506114948.6862-1-chengkaitao@didiglobal.com/)** - -> Establish a new OOM score algorithm, supports the cgroup level OOM -> protection mechanism. When an global/memcg oom event occurs, we treat -> all processes in the cgroup as a whole, and OOM killers need to select -> the process to kill based on the protection quota of the cgroup -> - -**[v1: RESEND: Make PCMCIA and QCOM_HIDMA depend on HAS_IOMEM](http://lore.kernel.org/linux-mm/20230506111628.712316-1-bhe@redhat.com/)** - -> This is suggested by Niklas when he reviewed patches related to s390 -> part: -> https://lore.kernel.org/all/d78edb587ecda0aa09ba80446d0f1883e391996d.camel@linux.ibm.com/T/#u -> -> v1 link: -> https://lore.kernel.org/all/20230216073403.451455-1-bhe@redhat.com/T/#u -> -> This resend v1 with Niklas and Arnd's ack tags added. -> - -**[v1: mbind.2: Clarify MPOL_MF_MOVE with MPOL_INTERLEAVE policy](http://lore.kernel.org/linux-mm/20230505194858.23539-1-mike.kravetz@oracle.com/)** - -> There was user confusion about specifying MPOL_MF_MOVE* with -> MPOL_INTERLEAVE policy [1]. Add clarification. -> -> [1] https://lore.kernel.org/linux-mm/20230501185836.GA85110@monkey/ -> - -**[v1: mm/hugetlb: revert use of page_cache_next_miss()](http://lore.kernel.org/linux-mm/20230505185301.534259-1-sidhartha.kumar@oracle.com/)** - -> As reported by Ackerley[1], the use of page_cache_next_miss() in -> hugetlbfs_fallocate() introduces a bug where a second fallocate() call to -> same offset fails with -EEXIST. Revert this change and go back to the -> previous method of using get from the page cache and then dropping the -> reference on success. -> -> hugetlbfs_pagecache_present() was also refactored to use -> page_cache_next_miss(), revert the usage there as well. -> -> User visible impacts include hugetlb fallocate incorrectly returning -> EEXIST if pages are already present in the file. In addition, hugetlb -> pages will not be included in core dumps if they need to be brought in via -> GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages -> already present in the cache. It may try to allocate a new page and -> potentially return ENOMEM as opposed to EEXIST. -> - -**[v2: shmemfs stable directory cookies](http://lore.kernel.org/linux-mm/168331111400.20728.2327812215536431362.stgit@oracle-102.nfsv4bat.org/)** - -> The following series is for continued discussion of the need for -> and implementation of stable directory cookies for shmemfs/tmpfs. -> -> Based on one of Andrew's review comments, I've split this one patch -> into a series to (hopefully) reduce its complexity and make it -> easier to analyze the changes. -> -> Although the patch(es) have been passing functional tests for -> several weeks, there have been some reports of performance -> regressions that we still need to get to the bottom of. -> -> We might consider a simpler lseek/readdir implementation, as using -> an xarray is effective but a bit of overkill. I'd like to avoid a -> linked list implementation as that is known to have significant -> performance impact past a dozen or so list entries. -> - -**[v2: maple_tree: Make maple state reusable after mas_empty_area()](http://lore.kernel.org/linux-mm/20230505145829.74574-1-zhangpeng.00@bytedance.com/)** - -> Make mas->min and mas->max point to a node range instead of a leaf entry -> range. This allows mas to still be usable after mas_empty_area() returns. -> Users would get unexpected results from other operations on the maple -> state after calling the affected function. -> - -**[v1: sysctl: add config to make randomize_va_space RO](http://lore.kernel.org/linux-mm/20230504213002.56803-1-michael.mccracken@gmail.com/)** - -> Add config RO_RANDMAP_SYSCTL to set the mode of the randomize_va_space -> sysctl to 0444 to disallow all runtime changes. This will prevent -> accidental changing of this value by a root service. -> -> The config is disabled by default to avoid surprises. -> - -**[v9: mm/gup: disallow GUP writing to file-backed mappings by default](http://lore.kernel.org/linux-mm/cover.1683235180.git.lstoakes@gmail.com/)** - -> Writing to file-backed mappings which require folio dirty tracking using -> GUP is a fundamentally broken operation, as kernel write access to GUP -> mappings do not adhere to the semantics expected by a file system. -> -> A GUP caller uses the direct mapping to access the folio, which does not -> cause write notify to trigger, nor does it enforce that the caller marks -> the folio dirty. -> -> The problem arises when, after an initial write to the folio, writeback -> results in the folio being cleaned and then the caller, via the GUP -> interface, writes to the folio again. -> -> As a result of the use of this secondary, direct, mapping to the folio no -> write notify will occur, and if the caller does mark the folio dirty, this -> will be done so unexpectedly. -> -> For example, consider the following scenario:- -> -> 1. A folio is written to via GUP which write-faults the memory, notifying -> the file system and dirtying the folio. -> 2. Later, writeback is triggered, resulting in the folio being cleaned and -> the PTE being marked read-only. -> 3. The GUP caller writes to the folio, as it is mapped read/write via the -> direct mapping. -> 4. The GUP caller, now done with the page, unpins it and sets it dirty -> (though it does not have to). -> -> This change updates both the PUP FOLL_LONGTERM slow and fast APIs. As -> pin_user_pages_fast_only() does not exist, we can rely on a slightly -> imperfect whitelisting in the PUP-fast case and fall back to the slow case -> should this fail. -> - -**[v1: MDWE without inheritance](http://lore.kernel.org/linux-mm/20230504170942.822147-1-revest@chromium.org/)** - -> Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags -> current with a flag that prevents pages that were previously not executable from -> becoming executable. -> -> This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK) -> -> At Google, we've been using a somewhat similar downstream patch for a few years -> now. To make the adoption of this feature easier, we've had it support a mode in -> which the W^X flag does not propagate to children. For example, this is handy if -> a C process which wants W^X protection suspects it could start children -> processes that would use a JIT. -> -> I'd like to align our features with the upstream prctl. This series proposes a -> new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It -> sets a different flag in current that is not in MMF_INIT_MASK and which does not -> propagate. -> -> As part of looking into MDWE, I also fixed a couple of things in the MDWE test. -> - -**[v1: mm: always respect QUEUE_FLAG_STABLE_WRITES on the block device](http://lore.kernel.org/linux-mm/20230504105624.9789-1-idryomov@gmail.com/)** - -> Commit 1cb039f3dc16 ("bdi: replace BDI_CAP_STABLE_WRITES with a queue -> and a sb flag") introduced a regression for the raw block device use -> case. Capturing QUEUE_FLAG_STABLE_WRITES flag in set_bdev_super() has -> the effect of respecting it only when there is a filesystem mounted on -> top of the block device. If a filesystem is not mounted, block devices -> that do integrity checking return sporadic checksum errors. -> -> Additionally, this commit made the corresponding sysfs knob writeable -> for debugging purposes. However, because QUEUE_FLAG_STABLE_WRITES flag -> is captured when the filesystem is mounted and isn't consulted after -> that anywhere outside of swap code, changing it doesn't take immediate -> effect even though dumping the knob shows the new value. With no way -> to dump SB_I_STABLE_WRITES flag, this is needlessly confusing. -> -> Resurrect the original stable writes behavior by changing -> folio_wait_stable() to account for the case of a raw block device and -> also: -> -> - for the case of a filesystem, test QUEUE_FLAG_STABLE_WRITES flag -> each time instead of capturing it in the superblock so that changes -> are reflected immediately (thus aligning with the case of a raw block -> device) -> - retain SB_I_STABLE_WRITES flag for filesystems that need stable -> writes independent of the underlying block device (currently just -> NFS) -> - -**[v1: [For stable 5.4] mm: migrate: buffer_migrate_page_norefs() fallback migrate not uptodate pages](http://lore.kernel.org/linux-mm/20230503163426.5538-2-findns94@gmail.com/)** - -> Recently we notice that ext4 filesystem occasionally fail to read -> metadata from disk and report error message, but the disk and block -> layer looks fine. After analyse, we lockon commit 88dbcbb3a484 -> ("blkdev: avoid migration stalls for blkdev pages"). It provide a -> migration method for the bdev, we could move page that has buffers -> without extra users now, but it will lock the buffers on the page, which -> breaks a lot of current filesystem's fragile metadata read operations, -> like ll_rw_block() for common usage and ext4_read_bh_lock() for ext4, -> these helpers just trylock the buffer and skip submit IO if it lock -> failed, many callers just wait_on_buffer() and conclude IO error if the -> buffer is not uptodate after buffer unlocked. -> -> This issue could be easily reproduced by add some delay just after -> buffer_migrate_lock_buffers() in __buffer_migrate_page() and do -> fsstress on ext4 filesystem. -> -> EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #73193: -> comm fsstress: reading directory lblock 0 -> EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #75334: -> comm fsstress: reading directory lblock 0 -> -> Something like ll_rw_block() should be used carefully and seems could -> only be safely used for the readahead case. So the best way is to fix -> the read operations in filesystem in the long run, but now let us avoid -> this issue first. This patch avoid this issue by fallback to migrate -> pages that are not uptodate like fallback_migrate_page(), those pages -> that has buffers may probably do read operation soon. -> - -**[v3: fs: implement multigrain timestamps](http://lore.kernel.org/linux-mm/20230503142037.153531-1-jlayton@kernel.org/)** - -> -> This is a follow-up of the patches I posted last week [1]. The main -> change in this set is that it no longer uses the lowest-order bit in the -> tv_nsec field, and instead uses one of the higher-order bits (#31, -> specifically) since they are otherwise unused. This change makes things -> much simpler, and we no longer need to twiddle s_time_gran for it. -> - -**[v13: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230503013608.2431726-1-nphamcs@gmail.com/)** - -> -> This series of patches introduces a new system call, cachestat, that -> summarizes the page cache statistics (number of cached pages, dirty -> pages, pages marked for writeback, evicted pages etc.) of a file, in a -> specified range of bytes. It also include a selftest suite that tests some -> typical usage. Currently, the syscall is only wired in for x86 -> architecture. -> - -**[v1: fs: hugetlbfs: Set vma policy only when needed for allocating folio](http://lore.kernel.org/linux-mm/20230502235622.3652586-1-ackerleytng@google.com/)** - -> Calling hugetlb_set_vma_policy() later avoids setting the vma policy -> and then dropping it on a page cache hit. -> - -**[v5: bio: check return values of bio_add_page](http://lore.kernel.org/linux-mm/20230502101934.24901-1-johannes.thumshirn@wdc.com/)** - -> -> This series converts the callers of bio_add_page() which can easily use -> __bio_add_page() to using it and checks the return of bio_add_page() for -> callers that don't work on a freshly created bio. -> - -#### 文件系统 - -**[v1: bpf-next: Introduce bpf iterators for file-system](http://lore.kernel.org/linux-fsdevel/20230507040107.3755166-1-houtao@huaweicloud.com/)** - -> -> The patchset attempts to provide more observability for the file-system -> as proposed in [0]. Compared to drgn [1], the bpf iterator for file-system -> has fewer dependencies (e.g., no need for vmlinux) and more accurate -> results. -> - -**[GIT PULL: Pipe FMODE_NOWAIT support](http://lore.kernel.org/linux-fsdevel/26aba1b5-8393-a20a-3ce9-f82425673f4d@kernel.dk/)** - -> Here's the revised edition of the FMODE_NOWAIT support for pipes, in -> which we just flag it as such supporting FMODE_NOWAIT unconditionally, -> but clear it if we ever end up using splice/vmsplice on the pipe. The -> pipe read/write side is perfectly fine for nonblocking IO, however -> splice and vmsplice can potentially wait for IO with the pipe lock held. -> - -**[v6: Introduce block provisioning primitives](http://lore.kernel.org/linux-fsdevel/20230506062909.74601-1-sarthakkukreti@chromium.org/)** - -> This patch series covers iteration 6 of adding support for block -> provisioning requests. -> - -**[v1: fuse: add a new flag to allow shared mmap in FOPEN_DIRECT_IO mode](http://lore.kernel.org/linux-fsdevel/20230505081652.43008-1-hao.xu@linux.dev/)** - -> FOPEN_DIRECT_IO is usually set by fuse daemon to indicate need of strong -> coherency, e.g. network filesystems. Thus shared mmap is disabled since -> it leverages page cache and may write to it, which may cause -> inconsistence. But FOPEN_DIRECT_IO can be used not for coherency but to -> reduce memory footprint as well, e.g. reduce guest memory usage with -> virtiofs. Therefore, add a new flag FOPEN_DIRECT_IO_SHARED_MMAP to allow -> shared mmap for these cases. -> - -**[v1: -next: lsm: Change inode_setattr() to take struct](http://lore.kernel.org/linux-fsdevel/20230505081200.254449-1-xiujianfeng@huawei.com/)** - -> I am working on adding xattr/attr support for landlock [1], so we can -> control fs accesses such as chmod, chown, uptimes, setxattr, etc.. inside -> landlock sandbox. -> - -**[v3: dax: enable dax fault handler to report VM_FAULT_HWPOISON](http://lore.kernel.org/linux-fsdevel/20230505011747.956945-1-jane.chu@oracle.com/)** - -> When multiple processes mmap() a dax file, then at some point, -> a process issues a 'load' and consumes a hwpoison, the process -> receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb -> set for the poison scope. Soon after, any other process issues -> a 'load' to the poisoned page (that is unmapped from the kernel -> side by memory_failure), it receives a SIGBUS with -> si_code = BUS_ADRERR and without valid si_lsb. -> -> This is confusing to user, and is different from page fault due -> to poison in RAM memory, also some helpful information is lost. -> -> Channel dax backend driver's poison detection to the filesystem -> such that instead of reporting VM_FAULT_SIGBUS, it could report -> VM_FAULT_HWPOISON. -> - -**[v1: Supporting same fsid filesystems mounting on btrfs](http://lore.kernel.org/linux-fsdevel/20230504170708.787361-1-gpiccoli@igalia.com/)** - -> Currently, we cannot reliably mount same fsid filesystems even one at -> a time in btrfs, but if users want to mount them at the same time, it's -> pretty much impossible. Other filesystems like ext4 are capable of that. -> -> The goal is to allow systems with A/B partitioning scheme (like the -> Steam Deck console or various mobile devices) to be able to hold -> the same filesystem image in both partitions; it also allows to have -> block device level check for filesystem integrity - this is used in the -> Steam Deck image installation, to check if the current read-only image -> is pristine. A bit more details are provided in the following ML thread: -> -> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/ -> -> The mechanism used to achieve it is based in the metadata_uuid feature, -> leveraging such code infrastructure for that. The patches are based on -> kernel 6.3 and were tested both in a virtual machine as well as in the -> Steam Deck. Comments, suggestions and overall feedback is greatly -> appreciated - thanks in advance! -> - -**[GIT PULL: sysctl changes for v6.4-rc4 v2](http://lore.kernel.org/linux-fsdevel/ZFKzZeAs5Mdfv5ha@bombadil.infradead.org/)** - -> -> As mentioned on my first pull request for sysctl-next, for v6.4-rc1 -> we're very close to being able to deprecating register_sysctl_paths(). -> I was going to assess the situation after the first week of the merge -> window. -> -> That time is now and things are looking good. We only have one stragglers -> on the patch which had already an ACK for so I'm picking this up here now and -> the last patch is the one that uses an axe. Some careful eyeballing would -> be appreciated by others. If this doesn't get properly reviewed I can also -> just hold off on this in my tree for the next merge window. Either way is -> fine by me. -> -> I have boot tested the last patch and 0-day build completed successfully. -> - -**[v1: block atomic writes](http://lore.kernel.org/linux-fsdevel/20230503183821.1473305-1-john.g.garry@oracle.com/)** - -> This series introduces a new proposal to implementing atomic writes in the -> kernel. -> -> This series takes the approach of adding a new "atomic" flag to each of -> pwritev2() and iocb->ki_flags - RWF_ATOMIC and IOCB_ATOMIC, respectively. -> When set, these indicate that we want the write issued "atomically". I -> have seen a similar flag for pwritev2() touted on the lists previously. -> -> Only direct IO is supported and for block devices and xfs. -> -> The atomic writes feature requires dedicated HW support, like -> SCSI WRITE_ATOMIC_16 command. -> -> The goal here is to provide an interface that allow applications use -> application-specific block sizes larger than logical block size -> reported by the storage device or larger than filesystem block size as -> reported by stat(). -> -> With this new interface, application blocks will never be torn or -> fractured. For a power fail, for each individual application block, all or -> none of the data to be written. A racing atomic write and read will mean -> that the read sees all the old data or all the new data, but never a mix -> of old and new. -> - -**[v4: fs: allow to mount beneath top mount](http://lore.kernel.org/linux-fsdevel/20230202-fs-move-mount-replace-v4-0-98f3d80d7eaa@kernel.org/)** - -> More common use-cases will just be things like: -> -> mount -t btrfs /dev/sdA /mnt -> mount -t xfs /dev/sdB --beneath /mnt -> umount /mnt -> -> after which we'll have updated from a btrfs filesystem to a xfs -> filesystem without ever revealing the underlying mountpoint. - -**[v24: xfs: online repair for fs summary counters with exclusive fsfreeze](http://lore.kernel.org/linux-fsdevel/168308293319.734377.10454919162350827812.stgit@frogsfrogsfrogs/)** - -> A longstanding deficiency in the online fs summary counter scrubbing -> code is that it hasn't any means to quiesce the incore percpu counters -> while it's running. There is no way to coordinate with other threads -> are reserving or freeing free space simultaneously, which leads to false -> error reports. Right now, if the discrepancy is large, we just sort of -> shrug and bail out with an incomplete flag, but this is lame. -> -> For repair activity, we actually /do/ need to stabilize the counters to -> get an accurate reading and install it in the percpu counter. To -> improve the former and enable the latter, allow the fscounters online -> fsck code to perform an exclusive mini-freeze on the filesystem. The -> exclusivity prevents userspace from thawing while we're running, and the -> mini-freeze means that we don't wait for the log to quiesce, which will -> make both speedier. -> - -**[v1: sysctl: death to register_sysctl_paths()](http://lore.kernel.org/linux-fsdevel/20230503023329.752123-1-mcgrof@kernel.org/)** - -> -> As mentioned on my first pull request for sysctl-next, for v6.4-rc1 -> we're very close to being able to deprecating register_sysctl_paths(). -> I was going to assess the situation after the first week of the merge -> window. -> -> That time is now and things are looking good. We only have one stragglers -> on the patch which had already an ACK for so I'm picking this up here now and -> the last patch is the one that uses an axe. Some careful eyeballing would -> be appreciated by others. If this doesn't get properly reviewed I can also -> just hold off on this in my tree for the next merge window. Either way is -> fine by me. -> -> I have boot tested the last patch and 0-day build is ongoing. You can give -> it a day for a warm fuzzy build test result. -> - -**[v1: Rework locking when rendering mountinfo cgroup paths](http://lore.kernel.org/linux-fsdevel/20230502133847.14570-1-mkoutny@suse.com/)** - -> Idea for these modification came up when css_set_lock seemed unneeded in -> cgroup_show_path. -> -> It's a delicate change, so the deciding factor was when cgroup_show_path popped -> up also in some profiles of frequent mountinfo readers. -> -> The idea is to trade the exclusive css_set_lock for the shared -> namespace_sem when rendering cgroup paths. Details are described more in -> individual commits. -> - -**[v2: Prepare for supporting more filesystems with fanotify](http://lore.kernel.org/linux-fsdevel/20230502124817.3070545-1-amir73il@gmail.com/)** - -> -> Following v2 incorporates a few fixes and ACKs from review of v1 [1]. -> -> While fanotify relaxes the requirements for filesystems to support -> reporting fid to require only the ->encode_fh() operation, there are -> currently no new filesystems that meet the relaxed requirements. -> -> Patches to add ->encode_fh() to overlay with default configuation -> are available on my github branch [2]. I will re-post them after -> this patch set will be approved. -> -> Based on the discussion on the UAPI alternatives, I kept the -> AT_HANDLE_FID UAPI, which seems the simplest of them all. -> -> There is an LTP test [3] that tests reporting fid from overlayfs, -> which also demonstrates the use of AT_HANDLE_FID for requesting a -> non-decodeable file handle by userspace and there is a man page -> draft [4] for the documentation of the AT_HANDLE_FID flags. -> - -**[v1: FUSE: add another flag to support shared mmap in FOPEN_DIRECT_IO mode](http://lore.kernel.org/linux-fsdevel/5683716d-9b1d-83d6-9dd1-a7ad3d05cbb1@linux.dev/)** - -> From discussion with Bernd, I get that FOPEN_DIRECT_IO is designed for -> those user cases where users want strong coherency like network -> filesystems, where one server serves multiple remote clients. And thus -> shared mmap is disabled since local page cache existence breaks this -> kind of coherency. -> -> But here our use case is one virtiofs daemon serve one guest vm, We use -> FOPEN_DIRECT_IO to reduce memory footprint not for coherency. So we -> expect shared mmap works in this case. Here I suggest/am implementing -> adding another flag to indicate this kind of cases----use -> FOPEN_DIRECT_IO not for coherency----so that shared mmap works. -> - -**[v1: Memory allocation profiling](http://lore.kernel.org/linux-fsdevel/20230501165450.15352-1-surenb@google.com/)** - -> Memory allocation profiling infrastructure provides a low overhead -> mechanism to make all kernel allocations in the system visible. It can be -> used to monitor memory usage, track memory hotspots, detect memory leaks, -> identify memory regressions. -> -> To keep the overhead to the minimum, we record only allocation sizes for -> every allocation in the codebase. With that information, if users are -> interested in more detailed context for a specific allocation, they can -> enable in-depth context tracking, which includes capturing the pid, tgid, -> task name, allocation size, timestamp and call stack for every allocation -> at the specified code location. -> - -**[v2: permit write-sealed memfd read-only shared mappings](http://lore.kernel.org/linux-fsdevel/cover.1682890156.git.lstoakes@gmail.com/)** - -> The man page for fcntl() describing memfd file seals states the following -> about F_SEAL_WRITE:- -> -> Furthermore, trying to create new shared, writable memory-mappings via -> mmap(2) will also fail with EPERM. -> -> With emphasis on _writable_. In turns out in fact that currently the kernel -> simply disallows _all_ new shared memory mappings for a memfd with -> F_SEAL_WRITE applied, rendering this documentation inaccurate. -> - -#### 网络设备 - -**[v2: Make iscsid-kernel communications namespace-aware](http://lore.kernel.org/netdev/20230506232930.195451-1-cleech@redhat.com/)** - -> This set of patches modifies the kernel iSCSI initiator communications -> so that they are namespace-aware. The goal is to allow multiple iSCSI -> daemon (iscsid) to run at once as long as they are in separate -> namespaces, and so that iscsid can run in containers. -> -> Container runtime environments seem to want to containerize their own -> components, and there have been complaints about the need to run iscsid -> from the host network namespace. There are still priviledged -> capabilities needed for iscsid, but these changes address the namespace -> issue. -> -> I've tested with iscsi_tcp and iser over rxe with an unmodified iscsid -> running in a podman container. -> -> Note that with iscsi_tcp, the connected socket will keep the network -> namespace alive after container exit. The namespace will exit once the -> connection terminates, and I'd recommend running with a iSCSI -> noop_out_timeout set to error out the connection after the routing has -> been removed. -> - -**[v1: net-next: net: openvswitch: Use struct_size()](http://lore.kernel.org/netdev/e7746fbbd62371d286081d5266e88bbe8d3fe9f0.1683388991.git.christophe.jaillet@wanadoo.fr/)** - -> Use struct_size() instead of hand writing it. -> This is less verbose and more informative. -> - -**[v1: can: kvaser_usb_leaf: Implement CAN 2.0 raw DLC functionality.](http://lore.kernel.org/netdev/20230506105529.4023-1-carsten.schmidt-achim@t-online.de/)** - -**[v7: bpf-next: Introduce a new kfunc of bpf_task_under_cgroup](http://lore.kernel.org/netdev/20230506031545.35991-1-zhoufeng.zf@bytedance.com/)** - -> Trace sched related functions, such as enqueue_task_fair, it is necessary to -> specify a task instead of the current task which within a given cgroup. -> - -**[v1: virtio_net: set default mtu to 1500 when 'Device maximum MTU' bigger than 1500](http://lore.kernel.org/netdev/20230506021529.396812-1-chenh@yusur.tech/)** - -> When VIRTIO_NET_F_MTU(3) Device maximum MTU reporting is supported. -> If offered by the device, device advises driver about the value of its -> maximum MTU. If negotiated, the driver uses mtu as the maximum -> MTU value. But there the driver also uses it as default mtu, -> some devices may have a maximum MTU greater than 1500, this may -> cause some large packages to be discarded, so I changed the MTU to a more -> general 1500 when 'Device maximum MTU' bigger than 1500. -> - -**[v1: wifi: mwifiex: Use default @max_active for workqueues](http://lore.kernel.org/netdev/ZFWI3PpJXeXXnHzi@slm.duckdns.org/)** - -> These workqueues only host a single work item and thus doen't need explicit -> concurrency limit. Let's use the default @max_active. This doesn't cost -> anything and clearly expresses that @max_active doesn't matter. -> - -**[v1: wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq](http://lore.kernel.org/netdev/ZFWIpN7HN431MVSI@slm.duckdns.org/)** - -> trans_pcie->rba.alloc_wq only hosts a single work item and thus doesn't need -> explicit concurrency limit. Let's use the default @max_active. This doesn't -> cost anything and clearly expresses that @max_active doesn't matter. -> - -**[GIT PULL: Networking for v6.4-rc1](http://lore.kernel.org/netdev/20230505214917.1453870-1-kuba@kernel.org/)** - -> -> Current release - regressions: -> -> - sched: act_pedit: free pedit keys on bail from offset check -> -> Current release - new code bugs: -> -> - pds_core: -> - Kconfig fixes (DEBUGFS and AUXILIARY_BUS) -> - fix mutex double unlock in error path -> -> Previous releases - regressions: -> -> - sched: cls_api: remove block_cb from driver_list before freeing -> -> - nf_tables: fix ct untracked match breakage -> -> - eth: mtk_eth_soc: drop generic vlan rx offload -> -> - sched: flower: fix error handler on replace -> -> Previous releases - always broken: -> -> - tcp: fix skb_copy_ubufs() vs BIG TCP -> -> - ipv6: fix skb hash for some RST packets -> -> - af_packet: don't send zero-byte data in packet_sendmsg_spkt() -> -> - rxrpc: timeout handling fixes after moving client call connection -> to the I/O thread -> -> - ixgbe: fix panic during XDP_TX with > 64 CPUs -> -> - igc: RMW the SRRCTL register to prevent losing timestamp config -> -> - dsa: mt7530: fix corrupt frames using TRGMII on 40 MHz XTAL MT7621 -> -> - r8152: -> - fix flow control issue of RTL8156A -> - fix the poor throughput for 2.5G devices -> - move setting r8153b_rx_agg_chg_indicate() to fix coalescing -> - enable autosuspend -> -> - ncsi: clear Tx enable mode when handling a Config required AEN -> -> - octeontx2-pf: macsec: fixes for CN10KB ASIC rev -> -> Misc: -> -> - 9p: remove INET dependency -> - -**[v2: net-next: netfilter: nft_set_pipapo: Use struct_size()](http://lore.kernel.org/netdev/687973f7f0f77a456ee2ebabd75cec61cba2eb98.1683321933.git.christophe.jaillet@wanadoo.fr/)** - -> Use struct_size() instead of hand writing it. -> This is less verbose and more informative. -> - -**[v1: RDMA/mana_ib: Use v2 version of cfg_rx_steer_req to enable RX coalescing](http://lore.kernel.org/netdev/1683312708-24872-1-git-send-email-longli@linuxonhyperv.com/)** - -> With RX coalescing, one CQE entry can be used to indicate multiple packets -> on the receive queue. This saves processing time and PCI bandwidth over -> the CQ. -> - -**[v1: siw on tunnel devices](http://lore.kernel.org/netdev/168330051600.5953.11366152375575299483.stgit@oracle-102.nfsv4bat.org/)** - -> Chalk this one up to yet another crazy idea. -> -> At NFS testing events, we'd like to test NFS/RDMA over the event's -> private network. We can do that with iWARP using siw from guests. -> -> If the guest itself is on the VPN, that means siw's slave device -> is a tun device. Such devices have no MAC address. That breaks the -> RDMA core's ability to find the correct egress device for siw when -> given a source IP address. -> -> We've worked around this in the past with various software hacks, -> but we'd rather see full support for this capability in stock -> kernels. -> -> A direct and perhaps naive way to do that is to give loopback and -> tun devices their own artificial MAC addresses for this purpose. -> - -**[v1: iproute2-next: mptcp: add support for implicit flag](http://lore.kernel.org/netdev/1eaea070b52f2db1f310506ac49f4b5d51b5704c.1683294873.git.aclaudi@redhat.com/)** - -> Kernel supports implicit flag since commit d045b9eb95a9 ("mptcp: -> introduce implicit endpoints"), included in v5.18. -> -> Let's add support for displaying it to iproute2. -> -> Before this change: -> $ ip mptcp endpoint show -> 10.0.2.2 id 1 rawflags 10 -> -> After this change: -> $ ip mptcp endpoint show -> 10.0.2.2 id 1 implicit -> - -**[v1: ipsec: af_key: Reject optional tunnel/BEET mode templates in outbound policies](http://lore.kernel.org/netdev/46fcb205-989e-4ea7-463d-e72b85db9e71@strongswan.org/)** - -> xfrm_state_find() uses `encap_family` of the current template with -> the passed local and remote addresses to find a matching state. -> If an optional tunnel or BEET mode template is skipped in a mixed-family -> scenario, there could be a mismatch causing an out-of-bounds read as -> the addresses were not replaced to match the family of the next template. -> -> While there are theoretical use cases for optional templates in outbound -> policies, the only practical one is to skip IPComp states in inbound -> policies if uncompressed packets are received that are handled by an -> implicitly created IPIP state instead. -> - -**[v1: ipsec: xfrm: Reject optional tunnel/BEET mode templates in outbound policies](http://lore.kernel.org/netdev/5d5bf4d9-5b63-ae0d-2f65-770e911ea7d6@strongswan.org/)** - -> xfrm_state_find() uses `encap_family` of the current template with -> the passed local and remote addresses to find a matching state. -> If an optional tunnel or BEET mode template is skipped in a mixed-family -> scenario, there could be a mismatch causing an out-of-bounds read as -> the addresses were not replaced to match the family of the next template. -> -> While there are theoretical use cases for optional templates in outbound -> policies, the only practical one is to skip IPComp states in inbound -> policies if uncompressed packets are received that are handled by an -> implicitly created IPIP state instead. -> - -**[v1: net: socket: Use fdget() and fdput()](http://lore.kernel.org/netdev/202305051706416319733@zte.com.cn/)** - -> By using the fdget function, the socket object, can be quickly obtained -> from the process's file descriptor table without the need to obtain the -> file descriptor first before passing it as a parameter to the fget -> function. -> - -**[v2: Add motorcomm phy pad-driver-strength-cfg support](http://lore.kernel.org/netdev/20230505090558.2355-1-samin.guo@starfivetech.com/)** - -> The motorcomm phy (YT8531) supports the ability to adjust the drive -> strength of the rx_clk/rx_data, and the default strength may not be -> suitable for all boards. So add configurable options to better match -> the boards.(e.g. StarFive VisionFive 2) -> -> The first patch adds a description of dt-bingding, and the second patch adds -> YT8531's parsing and settings for pad-driver-strength-cfg. -> - -**[v6: net-next: TXGBE PHYLINK support](http://lore.kernel.org/netdev/20230505074228.84679-1-jiawenwu@trustnetic.com/)** - -> Implement I2C, SFP, GPIO and PHYLINK to setup TXGBE link. -> -> Because our I2C and PCS are based on Synopsys Designware IP-core, extend -> the i2c-designware and pcs-xpcs driver to realize our functions. -> - -**[v1: vhost_net: Use fdget() and fdput()](http://lore.kernel.org/netdev/202305051424047152799@zte.com.cn/)** - -> convert the fget()/fput() uses to fdget()/fdput(). -> - -**[v6: can: usb: f81604: add Fintek F81604 support](http://lore.kernel.org/netdev/20230505022317.22417-1-peter_hong@fintek.com.tw/)** - -> This patch adds support for Fintek USB to 2CAN controller. -> - -#### 安全增强 - -**[v1: Hypervisor-Enforced Kernel Integrity](http://lore.kernel.org/linux-hardening/20230505152046.6575-1-mic@digikod.net/)** - -> This patch series is a proof-of-concept that implements new KVM features -> (extended page tracking, MBEC support, CR pinning) and defines a new API to -> protect guest VMs. No VMM (e.g., Qemu) modification is required. -> -> The main idea being that kernel self-protection mechanisms should be delegated -> to a more privileged part of the system, hence the hypervisor. It is still the -> role of the guest kernel to request such restrictions according to its -> configuration. The high-level security guarantees provided by the hypervisor -> are semantically the same as a subset of those the kernel already enforces on -> itself (CR pinning hardening and memory page table protections), but with much -> higher guarantees. -> -> We'd like the mainline kernel to support such hardening features leveraging -> virtualization. We're looking for reviews and comments that can help mainline -> these two parts: the KVM implementation and the guest kernel API layer designed -> to support different hypervisors. The struct heki_hypervisor enables to plug in -> - -**[v1: Compiler Attributes: Add __counted_by macro](http://lore.kernel.org/linux-hardening/20230504181636.never.222-kees@kernel.org/)** - -> In an effort to annotate all flexible array members with their run-time -> size information, the "element_count" attribute is being introduced by -> Clang[1] and GCC[2] in future releases. This annotation will provide -> the CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE features the ability -> to perform run-time bounds checking on otherwise unknown-size flexible -> arrays. -> -> Even though the attribute is under development, we can start the -> annotation process in the kernel. This requires defining a macro for -> it, even if we have to change the name of the actual attribute later. -> Since it is likely that this attribute may change its name to "counted_by" -> in the future (to better align with a future total bytes "sized_by" -> attribute), name the wrapper macro "__counted_by", which also reads more -> clearly (and concisely) in structure definitions. -> -> [1] https://reviews.llvm.org/D148381 -> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896 -> - -#### 异步 IO - -**[v1: io_uring: set plug tags for same file](http://lore.kernel.org/io-uring/20230504162427.1099469-1-kbusch@meta.com/)** - -> io_uring tries to optimize allocating tags by hinting to the plug how -> many it expects to need for a batch instead of allocating each tag -> individually. But io_uring submission queueus may have a mix of many -> devices for io, so the number of io's counted may be overestimated. This -> can lead to allocating too many tags, which adds overhead to finding -> that many contiguous tags, freeing up the ones we didn't use, and may -> starve out other users that can actually use them. -> -> When starting a new batch of uring commands, count only commands that -> match the file descriptor of the first seen for this optimization. -> - -**[v4: io_uring: Pass the whole sqe to commands](http://lore.kernel.org/io-uring/20230504121856.904491-1-leitao@debian.org/)** - -> These three patches prepare for the sock support in the io_uring cmd, as -> described in the following RFC: -> -> Since the support linked above depends on other refactors, such as the sock -> ioctl() sock refactor, I would like to start integrating patches that have -> consensus and can bring value right now. This will also reduce the -> patchset size later. -> -> Regarding to these three patches, they are simple changes that turn -> io_uring cmd subsystem more flexible (by passing the whole SQE to the -> command), and cleaning up an unnecessary compile check. -> -> These patches were tested by creating a file system and mounting an NVME disk -> using ubdsrv/ublkb0. -> - -**[v12: io_uring: add napi busy polling support](http://lore.kernel.org/io-uring/20230502165332.2075091-1-shr@devkernel.io/)** - -> This adds the napi busy polling support in io_uring.c. It adds a new -> napi_list to the io_ring_ctx structure. This list contains the list of -> napi_id's that are currently enabled for busy polling. This list is -> used to determine which napi id's enabled busy polling. For faster -> access it also adds a hash table. -> -> When a new napi id is added, the hash table is used to locate if -> the napi id has already been added. When processing the busy poll -> loop the list is used to process the individual elements. -> -> io-uring allows specifying two parameters: -> - busy poll timeout and -> - prefer busy poll to call of io_napi_busy_loop() -> This sets the above parameters for the ring. The settings are passed -> with a new structure io_uring_napi. -> -> There is also a corresponding liburing patch series, which enables this -> feature. The name of the series is "liburing: add add api for napi busy -> poll timeout". It also contains two programs to test the this. -> -> Testing has shown that the round-trip times are reduced to 38us from -> 55us by enabling napi busy polling with a busy poll timeout of 100us. -> More detailled results are part of the commit message of the first -> patch. -> - -**[v1: io_uring: undeprecate epoll_ctl support](http://lore.kernel.org/io-uring/20230501185240.352642-1-info@bnoordhuis.nl/)** - -> Libuv recently started using it so there is at least one consumer now. -> - -**[v1: Rethinking splice](http://lore.kernel.org/io-uring/cover.1682701588.git.asml.silence@gmail.com/)** - -> IORING_OP_SPLICE has problems, many of them are fundamental and rooted -> in the uapi design, see the patch 8 description. This patchset introduces -> a different approach, which came from discussions about splices -> and fused commands and absorbed ideas from both of them. We remove -> reliance onto pipes and registering "spliced" buffers with data as an -> io_uring's registered buffer. Then the user can use it as a usual -> registered buffer, e.g. pass it to IORING_OP_WRITE_FIXED. -> -> Once a buffer is released, it'll be returned back to the file it -> originated from via a callback. It's carried on on the level of the -> enitre buffer rather than on per-page basis as with splice, which, -> as noted by Ming, will allow more optimisations. -> -> The communication with the target file is done by a new fops callback, -> however the end mean of getting a buffer might change. It also peels -> layers of code compared to splice requests, which helps it to be more -> flexible and support more cases. For instance, Ming has a case where -> it's beneficial for the target file to provide a buffer to be filled -> with read/recv/etc. requests and then returned back to the file. -> - -**[v1: io_uring attached nvme queue](http://lore.kernel.org/io-uring/20230429093925.133327-1-joshi.k@samsung.com/)** - -> This series shows one way to do what the title says. -> -> This puts up a more direct/lean path that enables -> - submission from io_uring SQE to NVMe SQE -> - completion from NVMe CQE to io_uring CQE -> Essentially cutting the hoops (involving request/bio) for nvme io path. -> -> Also, io_uring ring is not to be shared among application threads. -> Application is responsible for building the sharing (if it feels the -> need). This means ring-associated exclusive queue can do away with some -> synchronization costs that occur for shared queue. -> -> Primary objective is to amp up of efficiency of kernel io path further -> (towards PCIe gen N, N+1 hardware). -> And we are seeing some asks too [1]. -> - -#### Rust For Linux - -**[v2: rust: str: add conversion from `CStr` to `CString`](http://lore.kernel.org/rust-for-linux/20230503141016.683634-1-aliceryhl@google.com/)** - -> These methods can be used to copy the data in a temporary c string into -> a separate allocation, so that it can be accessed later even if the -> original is deallocated. -> -> The API in this change mirrors the standard library API for the `&str` -> and `String` types. The `ToOwned` trait is not implemented because it -> assumes that allocations are infallible. -> - -**[v1: Rust null block driver](http://lore.kernel.org/rust-for-linux/20230503090708.2524310-1-nmi@metaspace.dk/)** - -> -> A null block driver is a good opportunity to evaluate Rust bindings for the -> block layer. It is a small and simple driver and thus should be simple to reason -> about. Further, the null block driver is not usually deployed in production -> environments. Thus, it should be fairly straight forward to review, and any -> potential issues are not going to bring down any production workloads. -> - -**[v1: rust: error: add ERESTARTSYS error code](http://lore.kernel.org/rust-for-linux/20230503083941.499090-1-aliceryhl@google.com/)** - -> This error code was probably excluded here originally because it never -> actually reaches user programs when a syscall returns it. However, from -> the perspective of a kernel driver, it is still a perfectly valid error -> type, that the driver might need to return. E.g., this can be necessary -> when a signal occurs during sleep. -> - -**[v1: rust: error: allow specifying error type on `Result`](http://lore.kernel.org/rust-for-linux/20230502124015.356001-1-aliceryhl@google.com/)** - -> Currently, if the `kernel::error::Result` type is in scope (which is -> often is, since it's in the kernel's prelude), you cannot write -> `Result` when you want to use a different error -> type than `kernel::error::Error`. -> -> To solve this we change the error type from being hard-coded to just -> being a default generic parameter. This still lets you write `Result` -> when you just want to use the `Error` error type, but also lets you -> write `Result` when necessary. -> - -#### BPF - -**[v2: bpf-next: bpftool: Support bpffs mountpoint as pin path for prog loadall](http://lore.kernel.org/bpf/1683342439-3677-1-git-send-email-yangpc@wangsu.com/)** - -> Currently, when using prog loadall, if the pin path is a bpffs -> mountpoint, bpffs will be repeatedly mounted to the parent directory -> of the bpffs mountpoint path. -> -> For example, -> $ bpftool prog loadall test.o /sys/fs/bpf -> currently bpffs will be repeatedly mounted to /sys/fs. -> - -**[v3: bpf-next: Dynptr Verifier Adjustments](http://lore.kernel.org/bpf/20230506013134.2492210-1-drosen@google.com/)** - -> These patches relax a few verifier requirements around dynptrs. -> Patches 1-3 are unchanged from v2, apart from rebasing -> Patch 4 is the same as in v1, see -> https://lore.kernel.org/bpf/CA+PiJmST4WUH061KaxJ4kRL=fqy3X6+Wgb2E2rrLT5OYjUzxfQ@mail.gmail.com/ -> Patch 5 adds a test for the change in Patch 4 -> - -**[v1: bpf: netdev: init the offload table earlier](http://lore.kernel.org/bpf/20230505215836.491485-1-kuba@kernel.org/)** - -> Some netdevices may get unregistered before late_initcall(), -> we have to move the hashtable init earlier. -> - -**[v1: bpf-next: RFC: bpf: query effective progs without cgroup_mutex](http://lore.kernel.org/bpf/20230505184550.1386802-1-sdf@google.com/)** - -> We're observing some stalls on the heavily loaded machines -> in the cgroup_bpf_prog_query path. This is likely due to -> being blocked on cgroup_mutex. -> -> IIUC, the cgroup_mutex is there mostly to protect the non-effective -> fields (cgrp->bpf.progs) which might be changed by the update path. -> For the BPF_F_QUERY_EFFECTIVE case, all we need is to rcu_dereference -> a bunch of pointers (and keep them around for consistency), so -> let's do it. -> -> Sending out as an RFC because it looks a bit ugly. It would also -> be nice to handle non-effective case locklessly as well, but it -> might require a larger rework. -> - -**[v3: bpf-next: Add precision propagation for subprogs and callbacks](http://lore.kernel.org/bpf/20230505043317.3629845-1-andrii@kernel.org/)** - -> -> This patch set teaches BPF verifier to support SCALAR precision -> backpropagation across multiple frames (for subprogram calls and callback -> simulations) and addresses most practical situations (SCALAR stack -> loads/stores using registers other than r10 being the last remaining -> limitation, though thankfully rarely used in practice). -> - -**[v4: bpf-next: bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen](http://lore.kernel.org/bpf/20230504184349.3632259-1-sdf@google.com/)** - -> optval larger than PAGE_SIZE leads to EFAULT if the BPF program -> isn't careful enough. This is often overlooked and might break -> completely unrelated socket options. Instead of EFAULT, -> let's ignore BPF program buffer changes. See the first patch for -> more info. -> -> In addition, clearly document this corner case and reset optlen -> in our selftests (in case somebody copy-pastes from them). -> - -**[v3: net: bonding: add xdp_features support](http://lore.kernel.org/bpf/5969591cfc2336e45de08e1d272bdcee30942fb7.1683191281.git.lorenzo@kernel.org/)** - -> Introduce xdp_features support for bonding driver according to the slave -> devices attached to the master one. xdp_features is required whenever we -> want to xdp_redirect traffic into a bond device and then into selected -> slaves attached to it. -> - -**[v1: bpf-next: bpf_refcount followups (part 1)](http://lore.kernel.org/bpf/20230504053338.1778690-1-davemarchevsky@fb.com/)** - -> This series is the first of two (or more) followups to address issues in the -> bpf_refcount shared ownership implementation discovered by Kumar. -> Specifically, this series addresses the "bpf_refcount_acquire on non-owning ref -> in another tree" scenario described in [0], and does _not_ address issues -> raised in [1]. Further followups will address the other issues. -> - -**[v7: bpf-next: bpf: Add socket destroy capability](http://lore.kernel.org/bpf/20230503225351.3700208-1-aditi.ghag@isovalent.com/)** - -> This patch adds the capability to destroy sockets in BPF. We plan to use -> the capability in Cilium to force client sockets to reconnect when their -> remote load-balancing backends are deleted. The other use case is -> on-the-fly policy enforcement where existing socket connections prevented -> by policies need to be terminated. -> - -**[[RFC/PATCH] libbpf: Store zero fd to fd_array for loader kfunc relocation](http://lore.kernel.org/bpf/20230503172441.2138444-1-jolsa@kernel.org/)** - -> When moving some of the test kfuncs to bpf_testmod I hit an issue -> when some of the object's kfuncs are in module and some in vmlinux. -> -> The problem is that both vmlinux and module kfuncs get btf_fd_idx -> index into fd_array, but we store to it the BTF fd value only for -> module's kfunc. -> -> Then after the program is loaded we check if fd_array[btf_fd_idx] != 0 -> and close the fd. -> -> When the object has kfuncs from both vmlinux and module, the fd from -> fd_array[btf_fd_idx] from previous load will be there for vmlinux kfunc -> and we close unrelated fd (of the program we just loaded in my case). -> -> Not sure if there's easier way to clear the fd_array between the -> loads, but the change below seems to fix the issue for me. -> - -**[v1: bpf-next: Centralize BPF permission checks](http://lore.kernel.org/bpf/20230502230619.2592406-1-andrii@kernel.org/)** - -> This patch set refactors BPF subsystem permission checks for BPF maps and -> programs, localizes them in one place, and ensures all parts of BPF ecosystem -> (BPF verifier and JITs, and their supporting infra) use recorded effective -> capabilities, stored in respective bpf_map or bpf_prog structs, for further -> decision making. -> -> This allows for more explicit and centralized handling of BPF-related -> capabilities and makes for simpler further BPF permission model evolution, to -> be proposed and discussed in follow up patch sets. -> - -**[v1: bpf-next: bpf: Emit struct bpf_tcp_sock type in vmlinux BTF](http://lore.kernel.org/bpf/20230502180543.1832140-1-yhs@fb.com/)** - -> In one of our internal testing, we found a case where -> - uapi struct bpf_tcp_sock is in vmlinux.h where vmlinux.h is not -> generated from the testing kernel -> - struct bpf_tcp_sock is not in vmlinux BTF -> -> The above combination caused bpf load failure as the following -> memory access -> struct bpf_tcp_sock *tcp_sock = ...; -> ... tcp_sock->snd_cwnd ... -> needs CORE relocation but the relocation cannot be resolved since -> the kernel BTF does not have corresponding type. -> -> Similar to other previous cases (nf_conn___init, tcp6_sock, mctcp_sock, etc.), -> add the type to vmlinux BTF with BTF_EMIT_TYPE macro. -> - -**[v9: tracing: Add fprobe/tracepoint events](http://lore.kernel.org/bpf/168299383880.3242086.7182498102007986127.stgit@mhiramat.roam.corp.google.com/)** - -> With this fprobe events, we can continue to trace function entry/exit -> even if the CONFIG_KPROBES_ON_FTRACE is not available. Since -> CONFIG_KPROBES_ON_FTRACE requires the CONFIG_DYNAMIC_FTRACE_WITH_REGS, -> it is not available if the architecture only supports -> CONFIG_DYNAMIC_FTRACE_WITH_ARGS (e.g. arm64). And that means kprobe -> events can not probe function entry/exit effectively on such architecture. -> But this problem can be solved if the dynamic events supports fprobe events -> because fprobe events doesn't use kprobe but ftrace via fprobe. -> - -**[v3: bpf-next: Handle immediate reuse in bpf memory allocator](http://lore.kernel.org/bpf/20230429101215.111262-1-houtao@huaweicloud.com/)** - -> As discussed in v1, currently the freed objects in bpf memory allocator -> may be reused immediately by the new allocation, it introduces -> use-after-bpf-ma-free problem for non-preallocated hash map and makes -> lookup procedure return incorrect result. The immediate reuse also makes -> introducing new use case more difficult (e.g. qp-trie). -> -> The patch series tries to solve these problems by introducing -> BPF_MA_{REUSE|FREE}_AFTER_RCU_GP in bpf memory allocator. For -> REUSE_AFTER_GP, the freed objects are reused only after one RCU grace -> period and may be freed by bpf memory allocator after another -> RCU-tasks-trace grace period. So for bpf programs which care about reuse -> problem, these programs can use bpf_rcu_read_{lock,unlock}() to access -> these objects safely and for those which doesn't care, there will be -> safely use-after-bpf-ma-free because these objects have not been freed -> by bpf memory allocator. FREE_AFTER_GP behavior differently. Instead of -> making the freed elements being reusable after one RCU GP, it directly -> freed these elements back to slab after one RCU GP, so sleepable bpf -> program must use bpf_rcu_read_{lock,unlock}() to access elements -> allocated from FREE_AFTER_GP bpf memory allocator. -> -> Personally I prefer FREE_AFTER_RCU_GP because its implementation is much -> simpler compared with REUSE_AFTER_RCU and its memory usage is also better -> than REUSE_AFTER_GP. But its shortcoming is also obvious, so I want to get -> some feedback before putting in more effort. As usual, comments and -> suggestions are always welcome. - -### 周边技术动态 - -#### Qemu - -**[[PTACH v2 0/6] Add RISC-V KVM AIA Support](http://lore.kernel.org/qemu-devel/20230505113946.23433-1-yongxuan.wang@sifive.com/)** - -> This series adds support for KVM AIA in RISC-V architecture. -> -> In order to test these patches, we require Linux with KVM AIA support which can -> be found in the qemu_kvm_aia branch at https://github.com/yong-xuan/linux.git -> This kernel branch is based on the riscv_aia_v1 branch available at -> https://github.com/avpatel/linux.git, and it also includes two additional -> patches that fix a KVM AIA bug and reply to the query of KVM_CAP_IRQCHIP. -> - -**[v1: riscv-to-apply queue](http://lore.kernel.org/qemu-devel/20230505010241.21812-1-alistair.francis@wdc.com/)** - -> -> First RISC-V PR for 8.1 -> -> * CPURISCVState related cleanup and simplification -> * Refactor Zicond and reuse in XVentanaCondOps -> * Fix invalid riscv,event-to-mhpmcounters entry -> * Support subsets of code size reduction extension -> * Fix itrigger when icount is used -> * Simplification for RVH related check and code style fix -> * Add signature dump function for spike to run ACT tests -> * Rework MISA writing -> * Fix mstatus.MPP related support -> * Use check for relationship between Zdinx/Zhinx{min} and Zfinx -> * Fix the H extension TVM trap -> * A large collection of mstatus sum changes and cleanups -> * Zero init APLIC internal state -> * Implement query-cpu-definitions -> * Restore the predicate() NULL check behavior -> * Fix Guest Physical Address Translation -> * Make sure an exception is raised if a pte is malformed -> * Add Ventana's Veyron V1 CPU -> - -**[v3: linux-user: Add /proc/cpuinfo handler for RISC-V](http://lore.kernel.org/qemu-devel/mvmednx301n.fsf@suse.de/)** - -**[v1: tcg/riscv: Support for Zba, Zbb, Zicond extensions](http://lore.kernel.org/qemu-devel/20230503085657.1814850-1-richard.henderson@linaro.org/)** - -> Based-on: 20230503070656.1746170-1-richard.henderson@linaro.org -> ("v4: tcg: Improve atomicity support") -> -> I've been vaguely following the __hw_probe syscall progress -> in the upstream kernel. The initial version only handled -> bog standard F+D and C extensions, which everything expects -> to be present anyway, which was disappointing. But at least -> the basis is there for proper extensions. -> -> In the meantime, probe via sigill. Tested with qemu-on-qemu. -> I understand the Ventana core has all of these, if you'd be -> so kind as to test. -> - -#### U-Boot - -**[v3: SPL NVMe support](http://lore.kernel.org/u-boot/20230504095327.2791676-1-mchitale@ventanamicro.com/)** - -> This patchset adds support to load images of the SPL's next booting stage from a NVMe device. -> - -**[v2: SPL NVme support](http://lore.kernel.org/u-boot/20230502161902.1339861-1-mchitale@ventanamicro.com/)** - -> This patchset adds support to load images of the SPL's next booting stage from a NVMe device. -> - -## 20230501:第 44 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v1: RISC-V: Export Zba, Zbb to usermode via hwprobe](http://lore.kernel.org/linux-riscv/20230428190609.3239486-1-evan@rivosinc.com/)** - -> This change detects the presence of Zba and Zbb extensions and exports -> them per-hart to userspace via the hwprobe mechanism. Glibc can then use -> these in setting up hwcaps-based library search paths. -> - -**[GIT PULL: RISC-V Patches for the 6.4 Merge Window, Part 1](http://lore.kernel.org/linux-riscv/mhng-57198db1-de34-4dca-be9f-989b1137503e@palmer-ri-x1c9/)** - -> RISC-V Patches for the 6.4 Merge Window, Part 1 -> -> * Support for runtime detection of the Svnapot extension. -> * Support for Zicboz when clearing pages. -> * We've moved to GENERIC_ENTRY. -> * Support for !MMU on rv32 systems. -> * The linear region is now mapped via huge pages. -> * Support for building relocatable kernels. -> * Support for the hwprobe interface. -> * Various fixes and cleanups throughout the tree. -> - -**[v2: riscv: allow case-insensitive ISA string parsing](http://lore.kernel.org/linux-riscv/tencent_8492B68063042E768C758871A3171FBD2006@qq.com/)** - -> The original motivation for my patch v1[5] is that some SoC generators -> will provide generated DT with illegal ISA string in dt-binding such as -> rocket-chip, which will even cause kernel panic in some cases as I -> mentioned in v1[5]. Now, the rocket-chip has been fixed in PR #3333[6]. -> However, when using some specific version of rocket-chip with -> illegal ISA string in DT, this patchset will also work for parsing -> uppercase letters correctly in DT, thus will have better compatibility. -> - -**[v1: Limit the number of counter returned from SBI.](http://lore.kernel.org/linux-riscv/20230428110256.711352-1-v.v.mitrofanov@yadro.com/)** - -> Perf relies on reliability of SBI. If sth goes wrong the code trusts it. -> It happened due to some debug process that I passed more than -> RISCV_MAX_COUNTERS to perf from SBI. At the first glance there were -> bloating of kalloced variable pmu_ctr_list and counter mask recycle write. -> May be there were some other effects. But anyway it is better to add -> extra check. -> - -**[v1: -next: clk: sifive: Use devm_platform_ioremap_resource()](http://lore.kernel.org/linux-riscv/20230428070005.41192-1-yang.lee@linux.alibaba.com/)** - -> Convert platform_get_resource(),devm_ioremap_resource() to a single -> call to devm_platform_ioremap_resource(), as this is exactly what this -> function does. -> - -**[v2: RISC-V: Align SBI probe implementation with spec](http://lore.kernel.org/linux-riscv/20230427163626.101042-1-ajones@ventanamicro.com/)** - -> sbi_probe_extension() is specified with "Returns 0 if the given SBI -> extension ID (EID) is not available, or 1 if it is available unless -> defined as any other non-zero value by the implementation." -> Additionally, sbiret.value is a long. Fix the implementation to -> ensure any nonzero long value is considered a success, rather -> than only positive int values. -> - -**[v1: dt-bindings: riscv: explicitly mention assumption of Zicsr & Zifencei support](http://lore.kernel.org/linux-riscv/20230427-fence-blurred-c92fb69d4137@wendy/)** - -> The dt-binding was defined before the extraction of csr access and -> fence.i into their own extensions, and thus the presence of the I -> base extension implies Zicsr and Zifencei. -> There's no harm in adding them obviously, but for backwards -> compatibility with DTs that existed prior to that extraction, software -> is unable to differentiate between "i" and "i_zicsr_zifencei" without -> any further information. -> - -**[v1: RISC-V: KVM: Ensure SBI extension is enabled](http://lore.kernel.org/linux-riscv/20230426171328.69663-1-ajones@ventanamicro.com/)** - -> Ensure guests can't attempt to invoke SBI extension functions when the -> SBI extension's probe function has stated that the extension is not -> available. -> - -**[v1: Handle multi-letter extensions starting with caps in riscv,isa](http://lore.kernel.org/linux-riscv/20230426-satin-avenging-086d4e79a8dd@wendy/)** - -> Following on from [1] in which Yangyu reported kernel panics for a -> riscv,isa string containing "rv64ima_Zifencei", as the parser got -> confused by the capital letter, here's a small change to the parser to -> handle invalid extensions starting with capital & the removal of some -> inaccurate wording from the dt-binding. -> - -**[v1: dmaengine: xilinx: enable on RISC-V platform](http://lore.kernel.org/linux-riscv/20230426074248.19336-1-zong.li@sifive.com/)** - -> Enable the xilinx dmaengine driver on RISC-V platform. We have verified -> the CDMA on RISC-V platform, enable this configuration to allow build on -> RISC-V. -> - -**[v1: Allow case-insensitive RISC-V ISA string](http://lore.kernel.org/linux-riscv/tencent_1647475C9618C390BEC601BE2CC1206D0C07@qq.com/)** - -> According to RISC-V ISA specification, the ISA naming strings are -> case insensitive. The kernel docs require the riscv,isa string must -> be all lowercase to simplify parsing currently. However, this -> limitation is not consistent with RISC-V ISA Spec. -> - -**[v2: riscv: mm: Ensure prot of VM_WRITE and VM_EXEC must be readable](http://lore.kernel.org/linux-riscv/20230425102828.1616812-1-woodrow.shen@sifive.com/)** - -> Commit 8aeb7b17f04e ("RISC-V: Make mmap() with PROT_WRITE imply PROT_READ") -> allows riscv to use mmap with PROT_WRITE only, and meanwhile mmap with w+x -> is also permitted. However, when userspace tries to access this page with -> PROT_WRITE|PROT_EXEC, which causes infinite loop at load page fault as -> well as it triggers soft lockup. According to riscv privileged spec, -> "Writable pages must also be marked readable". The fix to drop the -> `PAGE_COPY_READ_EXEC` and then `PAGE_COPY_EXEC` would be just used instead. -> This aligns the other arches (i.e arm64) for protection_map. -> - -**[v1: Expose the isa-string via the AT_BASE_PLATFORM aux vector](http://lore.kernel.org/linux-riscv/20230424194911.264850-1-heiko.stuebner@vrull.eu/)** - -> The hwprobing infrastructure was merged recently [0] and contains a -> mechanism to probe both extensions but also microarchitecural features -> on a per-core level of detail. -> - -**[v1: RESEND: dt-bindings: riscv: add sv57 mmu-type](http://lore.kernel.org/linux-riscv/20230424-rival-habitual-478567c516f0@spud/)** - -> Dumping the dtb from new versions of QEMU warns that sv57 is an -> undocumented mmu-type. The kernel has supported sv57 for about a year, -> so bring it into the fold. -> - -**[v5: Add STG/ISP/VOUT clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230424135409.6648-1-xingyu.wu@starfivetech.com/)** - -> This patch serises are base on the basic JH7110 SYSCRG/AONCRG -> drivers and add new partial clock drivers and reset supports -> about System-Top-Group(STG), Image-Signal-Process(ISP) -> and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. These -> clocks and resets could be used by DMA, VIN and Display modules. -> - -**[v10: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230424092313.178699-1-alexghiti@rivosinc.com/)** - -> his new version gets rid of the limitation that prevented KASAN kernels -> to use the newly introduced parameters. -> -> While looking into KASLR, I fell onto commit aacd149b6238 ("arm64: head: -> avoid relocating the kernel twice for KASLR"): it allows to use the fdt -> functions very early in the boot process with KASAN enabled by simply -> compiling a new version of those functions without instrumentation. -> - -**[v1: riscv: replace deprecated scall with ecall](http://lore.kernel.org/linux-riscv/20230423223210.126948-1-maskray@google.com/)** - -> scall is a deprecated alias for ecall. ecall is used in several places, -> so there is no assembler compatibility concern. -> - -#### 进程调度 - -**[v2: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230430171809.124686-1-yury.norov@gmail.com/)** - -> for_each_cpu() is widely used in kernel, and it's beneficial to create -> a NUMA-aware version of the macro. -> -> Recently added for_each_numa_hop_mask() works, but switching existing -> codebase to it is not an easy process. -> - -**[v1: sched: core: Simplify sched_can_stop_tick()](http://lore.kernel.org/lkml/20230429002831.2875-1-zeming@nfschina.com/)** - -> Remove useless intermediate variable "fifo_nr_running". -> - -**[v1: sched: add ttwu_migration counter](http://lore.kernel.org/lkml/20230425012234.15388-1-shijie@os.amperecomputing.com/)** - -> This patch adds the ttwu_migration counter to record the migrations. -> Put it at the end, do not break some tools. -> - -#### 内存管理 - -**[v2: permit write-sealed memfd read-only shared mappings](http://lore.kernel.org/linux-mm/cover.1682890156.git.lstoakes@gmail.com/)** - -> The man page for fcntl() describing memfd file seals states the following -> about F_SEAL_WRITE:- -> -> Furthermore, trying to create new shared, writable memory-mappings via -> mmap(2) will also fail with EPERM. -> - -**[v1: mm/mmap/vma_merge: always check invariants](http://lore.kernel.org/linux-mm/df548a6ae3fa135eec3b446eb3dae8eb4227da97.1682885809.git.lstoakes@gmail.com/)** - -> We may still have inconsistent input parameters even if we choose not to -> merge and the vma_merge() invariant checks are useful for checking this -> with no production runtime cost (these are only relevant when -> CONFIG_DEBUG_VM is specified). -> - -**[v1: debugobjects,locking: Annotate __debug_object_init() wait type violation](http://lore.kernel.org/linux-mm/20230429100614.GA1489784@hirez.programming.kicks-ass.net/)** - -> On Tue, Apr 25, 2023 at 11:51:05PM +0800, Qi Zheng wrote: -> > I just tested the following code and -> > it can resolve the warning I encountered. :) -> - -**[v3: Reduce lock contention related with large folio](http://lore.kernel.org/linux-mm/20230429082759.1600796-1-fengwei.yin@intel.com/)** - -> yan tried to enable the large folio for anonymous mapping [1]. -> -> Unlike large folio for page cache which doesn't trigger frequent page -> allocation/free, large folio for anonymous mapping is allocated/freeed -> more frequently. So large folio for anonymous mapping exposes some lock -> contention. -> - -**[v3: migrate_pages: Avoid blocking for IO in MIGRATE_SYNC_LIGHT](http://lore.kernel.org/linux-mm/20230428135414.v3.1.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid/)** - -> The MIGRATE_SYNC_LIGHT mode is intended to block for things that will -> finish quickly but not for things that will take a long time. Exactly -> how long is too long is not well defined, but waits of tens of -> milliseconds is likely non-ideal. -> - -**[v3: net-next/mm: page_pool: new approach for leak detection and shutdown phase](http://lore.kernel.org/linux-mm/168269854650.2191653.8465259808498269815.stgit@firesoul/)** - -> The page_pool (PP) workqueue calling page_pool_release_retry generate -> too many false-positive reports. Further more, these reports of -> page_pool shutdown still having inflight packets are not very helpful -> to track down the root-cause. -> - -**[v8: mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files](http://lore.kernel.org/linux-mm/cover.1682598808.git.quic_charante@quicinc.com/)** - -> This patch aims to implement POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED -> advices to shmem files which can be helpful for the drivers who may want -> to manage the pages of shmem files on their own, like, that are created -> through shmem_file_setup[_with_mnt](). -> - -**[v2: memcg: OOM log improvements](http://lore.kernel.org/linux-mm/20230428132406.2540811-1-yosryahmed@google.com/)** - -> This short patch series brings back some cgroup v1 stats in OOM logs -> that were unnecessarily changed before. It also makes memcg OOM logs -> less reliant on printk() internals. -> - -**[v1: mm: Do not reclaim private data from pinned page](http://lore.kernel.org/linux-mm/20230428124140.30166-1-jack@suse.cz/)** - -> If the page is pinned, there's no point in trying to reclaim it. -> Furthermore if the page is from the page cache we don't want to reclaim -> fs-private data from the page because the pinning process may be writing -> to the page at any time and reclaiming fs private info on a dirty page -> can upset the filesystem (see link below). -> - -**[v1: mm: optimization on page allocation when CMA enabled](http://lore.kernel.org/linux-mm/1682679641-13652-1-git-send-email-zhaoyang.huang@unisoc.com/)** - -> Please be notice bellowing typical scenario that commit 168676649 introduce, -> that is, 12MB free cma pages 'help' GFP_MOVABLE to keep draining/fragmenting -> U&R page blocks until they shrink to 12MB without enter slowpath which against -> current reclaiming policy. This commit change the criteria from hard coded '1/2' -> to watermark check which leave U&R free pages stay around WMARK_LOW when being -> fallback. -> - -**[v5: mm/gup: disallow GUP writing to file-backed mappings by default](http://lore.kernel.org/linux-mm/6b73e692c2929dc4613af711bdf92e2ec1956a66.1682638385.git.lstoakes@gmail.com/)** - -> Writing to file-backed mappings which require folio dirty tracking using -> GUP is a fundamentally broken operation, as kernel write access to GUP -> mappings do not adhere to the semantics expected by a file system. -> - -**[v3: Preserved-over-Kexec RAM](http://lore.kernel.org/linux-mm/1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com/)** - -> Sending out this RFC in part to guage community interest. -> This patchset implements preserved-over-kexec memory storage or PKRAM as a -> method for saving memory pages of the currently executing kernel so that -> they may be restored after kexec into a new kernel. The patches are adapted -> from an RFC patchset sent out in 2013 by Vladimir Davydov [1]. They -> introduce the PKRAM kernel API. -> - -**[v2: Add support for sharing page tables across processes (Previously mshare)](http://lore.kernel.org/linux-mm/cover.1682453344.git.khalid.aziz@oracle.com/)** - -> This patch series adds a new flag to mmap() call - MAP_SHARED_PT. -> This flag can be specified along with MAP_SHARED by a process to -> hint to kernel that it wishes to share page table entries for this -> file mapping mmap region with other processes. Any other process -> that mmaps the same file with MAP_SHARED_PT flag can then share the -> same page table entries. Besides specifying MAP_SHARED_PT flag, the -> processes must map the files at a PMD aligned address with a size -> that is a multiple of PMD size and at the same virtual addresses. -> This last requirement of same virtual addresses can possibly be -> relaxed if that is the consensus. -> - -**[v4: shmem: Add user and group quota support for tmpfs](http://lore.kernel.org/linux-mm/20230426102008.2930932-1-cem@kernel.org/)** - -> Hello folks. -> -> This is the final version of the quota support from tmpfs, with all the issues -> addressed, and now including RwB tags on all patches, and should be ready for -> merge. Details are within each patch, and the original cover-letter below. -> - -**[v1: mm/oom_kill: system enters a state something like hang when running stress-ng](http://lore.kernel.org/linux-mm/20230426051030.112007-1-hui.wang@canonical.com/)** - -> When we run stress-ng on the UC (Ubuntu Core), the system will be in a -> state similar to hang. And we found if a testcase could introduce the -> oom (like stress-ng-bigheap, stress-ng-brk, ...) under the UC, it is -> highly possible that this testcase will make the system be in a state -> like hang. We had a discussion for this issue here: -> https://github.com/ColinIanKing/stress-ng/pull/270 -> - -**[v2: mm: compaction: optimize compact_memory to comply with the admin-guide](http://lore.kernel.org/linux-mm/tencent_DFF54DB2A60F3333F97D3F6B5441519B050A@qq.com/)** - -> For the /proc/sys/vm/compact_memory file, the admin-guide states: -> When 1 is written to the file, all zones are compacted such that free -> memory is available in contiguous blocks where possible. This can be -> important for example in the allocation of huge pages although processes -> will also directly compact memory as required -> - -**[v4: mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()](http://lore.kernel.org/linux-mm/5c26368865e79c743a453dea48d30670b19d2e4f.1682425534.git.baolin.wang@linux.alibaba.com/)** - -> Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which -> checks whether the given zone contains holes, and uses pfn_to_online_page() -> to validate if the start pfn is online and valid, as well as using pfn_valid() -> to validate the end pfn. -> - -**[GIT PULL: ext4 changes for the 6.4 merge window](http://lore.kernel.org/linux-mm/20230425041838.GA150312@mit.edu/)** - -> The following changes since commit e8d018dd0257f744ca50a729e3d042cf2ec9da65: -> -> Linux 6.3-rc3 (2023-03-19 13:27:55 -0700) -> -> are available in the Git repository at: -> -> https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus -> - -**[v2: fs: multigrain timestamps](http://lore.kernel.org/linux-mm/20230424151104.175456-1-jlayton@kernel.org/)** - -> While I don't think we can practically optimize away ctime updates -> like we do with i_version, I do like the idea of using this scheme to -> indicate when we need to use a high-res timestamp. -> - -**[v4: of: fdt: Scan /memreserve/ last](http://lore.kernel.org/linux-mm/20230424113846.46382-1-tanure@linux.com/)** - -> Change the scanning /memreserve/ and /reserved-memory node order to fix -> Kernel panic on Khadas Vim3 Board. -> -> If /memreserve/ goes first, the memory is reserved, but nomap can't be -> applied to the region. So the memory won't be used by Linux, but it is -> still present in the linear map as normal memory, which allows -> speculation. Legitimate access to adjacent pages will cause the CPU -> to end up prefetching into them leading to Kernel panic. -> - -**[v1: string: use __builtin_memcpy() in strlcpy/strlcat](http://lore.kernel.org/linux-mm/20230424112313.3408363-1-glider@google.com/)** - -> lib/string.c is built with -ffreestanding, which prevents the compiler -> from replacing certain functions with calls to their library versions. -> - -**[v1: -v2: mm,unmap: avoid flushing TLB in batch if PTE is inaccessible](http://lore.kernel.org/linux-mm/20230424065408.188498-1-ying.huang@intel.com/)** - -> The version 1 of this patch was merged in mm-unstable branch. If you -> want to move that patch into mm-stable recently, it may be better to -> update that patch with this new version firstly. If you want to do -> that after v6.4-rc1, I will rebase this patch and resend it after -> v6.4-rc1 is released. -> - -**[RFC: allow building a kernel without buffer_heads](http://lore.kernel.org/linux-mm/20230424054926.26927-1-hch@lst.de/)** - -> after all the talk about removing buffer_heads, here is a series that -> shows how to build a kernel without buffer_heads. And how unrealistic -> it is to remove the entirely. -> - -**[v1: mmzone: Introduce for_each_populated_zone_pgdat()](http://lore.kernel.org/linux-mm/20230424030756.1795926-1-yajun.deng@linux.dev/)** - -> Instead of define an index and determining if the zone has memory, -> introduce for_each_populated_zone_pgdat() helper that can be used -> to iterate over each populated zone in pgdat, and convert the most -> obvious users to it. -> - -#### 文件系统 - -**[GIT PULL: iomap: new code for 6.4](http://lore.kernel.org/linux-fsdevel/20230427175543.GA59213@frogsfrogsfrogs/)** - -> Please pull this branch with changes for iomap for 6.4-rc1. The only -> changes for this cycle are the addition of tracepoints to the iomap -> directio code so that Ritesh (who is working on porting ext2 to iomap) -> can observe the io flows more easily. Dave will be sending you a pull -> request for xfs code for this cycle. -> - -**[v1: Prepare for supporting more filesystems with fanotify](http://lore.kernel.org/linux-fsdevel/20230425132223.2608226-1-amir73il@gmail.com/)** - -> This is the second part of the proposal to support fanotify reporing -> file ids on overlayfs. -> -> The first part [1] relaxes the requirements for filesystems to support -> reporting events with fid to require only the ->encode_fh() operation. -> - -**[GIT PULL: sysctl changes for v6.4-rc1](http://lore.kernel.org/linux-fsdevel/ZEcE6Ex20CwMfMKj@bombadil.infradead.org/)** - -> Note: given we *save* memory per each change move away from each -> deprecated call, I don't see a need to immediately *pause* all -> kernel/sysctl.c moves. Each replacement of a deprecated call saves -> us memory and likely more than a the simple empty entry when we move -> a kernel/syctl.c entry to its own file. -> - -**[v1: inotify: Avoid reporting event with invalid wd](http://lore.kernel.org/linux-fsdevel/20230424163219.9250-1-jack@suse.cz/)** - -> When inotify_freeing_mark() races with inotify_handle_inode_event() it -> can happen that inotify_handle_inode_event() sees that i_mark->wd got -> already reset to -1 and reports this value to userspace which can -> confuse the inotify listener. Avoid the problem by validating that wd is -> sensible (and pretend the mark got removed before the event got -> generated otherwise). -> - -**[RFC: allow building a kernel without buffer_heads](http://lore.kernel.org/linux-fsdevel/20230424054926.26927-1-hch@lst.de/)** - -> after all the talk about removing buffer_heads, here is a series that -> shows how to build a kernel without buffer_heads. And how unrealistic -> it is to remove the entirely. -> -> Most of the series refactors some common code to make implementing direct -> I/O easier without use of the ->direct_IO method and the helpers based -> around it. It then switches buffered writes (but not writeback) for -> block devices to use iomap unconditionally, but still using buffer_heads. -> - -**[git pull: vfs.git misc pile](http://lore.kernel.org/linux-fsdevel/20230424042949.GM3390869@ZenIV/)** - -> The following changes since commit eeac8ede17557680855031c6f305ece2378af326: -> -> Linux 6.3-rc2 (2023-03-12 16:36:44 -0700) -> -> are available in the Git repository at: -> -> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git tags/pull-misc -> -> for you to fetch changes up to 73bb5a9017b93093854c18eb7ca99c7061b16367: -> -> fs: Fix description of vfs_tmpfile() (2023-03-12 20:03:48 -0400) -> - -**[git pull: fget() whack-a-mole](http://lore.kernel.org/linux-fsdevel/20230424042529.GI3390869@ZenIV/)** - -> The following changes since commit fe15c26ee26efa11741a7b632e9f23b01aca4cc6: -> -> Linux 6.3-rc1 (2023-03-05 14:52:03 -0800) -> -> are available in the Git repository at: -> -> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git tags/pull-fd -> -> for you to fetch changes up to 4a892c0fe4bb0546d68a89fa595bd22cb4be2576: -> -> fuse_dev_ioctl(): switch to fdget() (2023-04-20 22:55:35 -0400) -> -> fget() to fdget() conversions -> - -**[v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/)** - -> This removes the dependency on interrupts to wake up task. Set task -> state as TASK_RUNNING, if need_resched() returns true, -> while polling for IO completion. -> Earlier, polling task used to sleep, relying on interrupt to wake it up. -> This made some IO take very long when interrupt-coalescing is enabled in -> NVMe. -> - -#### 网络设备 - -**[v4: net: mvpp2: tai: add extts support](http://lore.kernel.org/netdev/20230430170656.137549-1-shmuel.h@siklu.com/)** - -> This patch series adds support for PTP event capture on the Aramda -> 80x0/70x0. This feature is mainly used by tools linux ts2phc(3) in order -> to synchronize a timestamping unit (like the mvpp2's TAI) and a system -> DPLL on the same PCB. -> - -**[v1: net: virtio-net: allow usage of small vrings](http://lore.kernel.org/netdev/20230430131518.2708471-1-alvaro.karsz@solid-run.com/)** - -> At the moment, if a virtio network device uses vrings with less than -> MAX_SKB_FRAGS + 2 entries, the device won't be functional. -> -> The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always -> evaluate to false, leading to TX timeouts. -> - -**[v2: net: bonding: add xdp_features support](http://lore.kernel.org/netdev/e82117190648e1cbb2740be44de71a21351c5107.1682848658.git.lorenzo@kernel.org/)** - -> Introduce xdp_features support for bonding driver according to the slave -> devices attached to the master one. xdp_features is required whenever we -> want to xdp_redirect traffic into a bond device and then into selected -> slaves attached to it. -> - -**[v3: virtio_net: suppress cpu stall when free_unused_bufs](http://lore.kernel.org/netdev/1682783278-12819-1-git-send-email-wangwenliang.1995@bytedance.com/)** - -> For multi-queue and large ring-size use case, the following error -> occurred when free_unused_bufs: -> rcu: INFO: rcu_sched self-detected stall on CPU. -> - -**[v1: net: atlantic: Define aq_pm_ops conditionally on CONFIG_PM](http://lore.kernel.org/netdev/20230428214321.2678571-1-trix@redhat.com/)** - -> The only use of aq_pm_ops is conditional on CONFIG_PM. -> The definition of aq_pm_ops and its functions should also -> be conditional on CONFIG_PM. -> - -**[v1: igb: Define igb_pm_ops conditionally on CONFIG_PM](http://lore.kernel.org/netdev/20230428200009.2224348-1-trix@redhat.com/)** - -> The only use of igb_pm_ops is conditional on CONFIG_PM. -> The definition of igb_pm_ops should also be conditional on CONFIG_PM -> - -**[v4: bpf: Socket lookup BPF API from tc/xdp ingress does not respect VRF bindings.](http://lore.kernel.org/netdev/20230428083007.148364-1-gilad9366@gmail.com/)** - -> When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't -> respected. This patchset fixes this by regarding the incoming device's -> VRF attachment when performing the socket lookups from tc/xdp. -> -> The first two patches are coding changes which factor out the tc helper's -> logic which was shared with cg/sk_skb (which operate correctly). -> - -**[v4: bpf-next: Introduce a new kfunc of bpf_task_under_cgroup](http://lore.kernel.org/netdev/20230428071737.43849-1-zhoufeng.zf@bytedance.com/)** - -> Trace sched related functions, such as enqueue_task_fair, it is necessary to -> specify a task instead of the current task which within a given cgroup. -> - -**[v4: net-next: Wangxun netdev features support](http://lore.kernel.org/netdev/20230428055709.66071-1-mengyuanlou@net-swift.com/)** - -> Implement tx_csum and rx_csum to support hardware checksum offload. -> Implement ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid. -> Enable macros in netdev features which wangxun can support. -> - -**[v7: Create common DPLL configuration API](http://lore.kernel.org/netdev/20230428002009.2948020-1-vadfed@meta.com/)** - -> Implement common API for clock/DPLL configuration and status reporting. -> The API utilises netlink interface as transport for commands and event -> notifications. This API aim to extend current pin configuration and -> make it flexible and easy to cover special configurations. -> - -**[v2: can: bxcan: add support for single peripheral configuration](http://lore.kernel.org/netdev/20230427204540.3126234-1-dario.binacchi@amarulasolutions.com/)** - -> The series adds support for managing bxCAN controllers in single peripheral -> configuration. -> Unlike stm32f4 SOCs, where bxCAN controllers are only in dual peripheral -> configuration, stm32f7 SOCs contain three CAN peripherals, CAN1 and CAN2 -> in dual peripheral configuration and CAN3 in single peripheral -> - -**[v1: net-next: pds_core: add switchdev and tc for vlan offload](http://lore.kernel.org/netdev/20230427164546.31296-1-shannon.nelson@amd.com/)** - -> This is an RFC for adding to the pds_core driver some very simple support -> for VF representors and a tc command for offloading VF port vlans. -> - -**[v1: net: add xdp_features support for bonding driver](http://lore.kernel.org/netdev/cover.1682603719.git.lorenzo@kernel.org/)** - -> Introduce missing xdp_features support for bonding driver. xdp_features -> is required whenever we want to xdp_redirect traffic into a bond device -> and then into selected slaves attached to it. -> - -**[v1: net-next: net: tcp: make txhash use consistent for IPv4](http://lore.kernel.org/netdev/20230427134527.18127-1-atenart@kernel.org/)** - -> Series is divided in two parts. First two commits make the txhash (used -> for the skb hash in TCP) to be consistent for all IPv4/TCP packets (IPv6 -> doesn't have the same issue). Last two commits improve doc/comment -> hash-related parts. -> - -**[v1: mISDN: Use list_count_nodes()](http://lore.kernel.org/netdev/886a6fe86cfc3d787a2e3a5062ce8bd92323ed66.1682602766.git.christophe.jaillet@wanadoo.fr/)** - -> count_list_member() really looks the same as list_count_nodes(), so use the -> latter instead of hand writing it. -> -> The first one return an int and the other a size_t, but that should be -> fine. It is really unlikely that we get so many parties in a conference. -> - -**[v1: net: ice: block LAN in case of VF to VF offload](http://lore.kernel.org/netdev/20230427045711.1625449-1-michal.swiatkowski@linux.intel.com/)** - -> VF to VF traffic shouldn't go outside. To enforce it, set only the loopback -> enable bit in case of all ingress type rules added via the tc tool. -> - -**[v4: net-next: virtio_net: refactor xdp codes](http://lore.kernel.org/netdev/20230427030534.115066-1-xuanzhuo@linux.alibaba.com/)** - -> Due to historical reasons, the implementation of XDP in virtio-net is relatively -> chaotic. For example, the processing of XDP actions has two copies of similar -> code. Such as page, xdp_page processing, etc. -> - -**[v1: leds: introduce new LED hw control APIs](http://lore.kernel.org/netdev/20230427001541.18704-1-ansuelsmth@gmail.com/)** - -> This is a continue of [1]. It was decided to take a more gradual -> approach to implement LEDs support for switch and phy starting with -> basic support and then implementing the hw control part when we have all -> the prereq done. -> - -**[v1: net-next: wifi: ath10k: Use list_count_nodes()](http://lore.kernel.org/netdev/e6ec525c0c5057e97e33a63f8a4aa482e5c2da7f.1682541872.git.christophe.jaillet@wanadoo.fr/)** - -> ath10k_wmi_fw_stats_num_peers() and ath10k_wmi_fw_stats_num_vdevs() really -> look the same as list_count_nodes(), so use the latter instead of hand -> writing it. -> - -**[v1: net-next: wifi: ath11k: Use list_count_nodes()](http://lore.kernel.org/netdev/941484caae24b89d20524b1a5661dd1fd7025492.1682542084.git.christophe.jaillet@wanadoo.fr/)** - -> ath11k_wmi_fw_stats_num_vdevs() and ath11k_wmi_fw_stats_num_bcn() really -> look the same as list_count_nodes(), so use the latter instead of hand -> writing it. -> -> The first ones use list_for_each_entry() and the other list_for_each(), but -> they both count the number of nodes in the list. -> - -**[v2: net: dsa: mv88e6xxx: add mv88e6321 rsvd2cpu](http://lore.kernel.org/netdev/20230426202815.2991822-1-angelo@kernel-space.org/)** - -> Add rsvd2cpu capability for mv88e6321 model, to allow proper bpdu -> processing. -> - -**[v1: net-next: wifi: mwifiex: Use list_count_nodes()](http://lore.kernel.org/netdev/e77ed7f719787cb8836a93b6a6972f4147e40bc6.1682537509.git.christophe.jaillet@wanadoo.fr/)** - -> mwifiex_wmm_list_len() is the same as list_count_nodes(), so use the latter -> instead of hand writing it. -> -> Turn 'ba_stream_num' and 'ba_stream_max' in size_t to keep the same type -> as what is returned by list_count_nodes(). -> - -**[v4: New NDO methods ndo_hwtstamp_get/set](http://lore.kernel.org/netdev/20230426165835.443259-1-kory.maincent@bootlin.com/)** - -> You patch series work on my side with the macb MAC controller and this -> patch. -> I don't know if you are waiting for more reviews but it seems good enough -> to drop the RFC tag. -> - -**[v3: net: net/sched: act_mirred: Add carrier check](http://lore.kernel.org/netdev/20230426151940.639711-1-victor@mojatatu.com/)** - -> As you can see, it's administratively UP but operationally down. -> In this case, sending a packet to this port caused a nasty kernel hang (so -> nasty that we were unable to capture it). Aborting a transmit based on -> operational status (in addition to administrative status) fixes the issue. -> - -**[GIT PULL: Networking for 6.4](http://lore.kernel.org/netdev/20230426143118.53556-1-pabeni@redhat.com/)** - -> We have a few conflicts with your current tree, specifically: -> -> - between commits: -> -> dbb0ea153401 ("thermal: Use thermal_zone_device_type() accessor") -> -> the latter removed the code updated by the former, the resolution -> is deleting mlxsw_thermal_module_trips_reset() and -> mlxsw_thermal_module_trips_update(). -> - -**[v1: net-next: add driver support for Microchip LAN865X Rev.B0 Internal PHYs](http://lore.kernel.org/netdev/20230426114655.93672-1-Parthiban.Veerasooran@microchip.com/)** - -> The first patch updates the LAN867x PHY supported revision number to -> Rev.B1 and the second patch adds the support for Microchip LAN865X Rev.B0 -> 10BASE-T1S Internal PHYs. -> - -**[v2: net-next: Add support for VSC8531_02 PHY and DT RGMII tuning](http://lore.kernel.org/netdev/20230426104313.28950-1-harini.katakam@amd.com/)** - -> Add support for VSC8531_02 PHY ID. -> Also provide an option to tune RGMII delay value via devicetree. -> The default delays are retained in the driver. -> - -**[v1: bpf-next: net/smc: Introduce BPF injection capability](http://lore.kernel.org/netdev/1682501055-4736-1-git-send-email-alibuda@linux.alibaba.com/)** - -> This patches attempt to introduce BPF injection capability for SMC, -> and add selftest to ensure code stability. -> -> As we all know that the SMC protocol is not suitable for all scenarios, -> especially for short-lived. However, for most applications, they cannot -> guarantee that there are no such scenarios at all. Therefore, apps -> may need some specific strategies to decide shall we need to use SMC -> or not, for example, apps can limit the scope of the SMC to a specific -> IP address or port. -> - -**[v2: net/ncsi: clear Tx enable mode when handling a Config required AEN](http://lore.kernel.org/netdev/20230426081350.1214512-1-chou.cosmo@gmail.com/)** - -> ncsi_channel_is_tx() determines whether a given channel should be -> used for Tx or not. However, when reconfiguring the channel by -> handling a Configuration Required AEN, there is a misjudgment that -> the channel Tx has already been enabled, which results in the Enable -> Channel Network Tx command not being sent. -> - -**[v1: net: phy: aquantia: Add 10mbps support](http://lore.kernel.org/netdev/20230426081612.4123059-1-devangnayanbhai.vyas@amd.com/)** - -> This adds support for 10mbps speed in PHY device's -> "supported" field which helps in autonegotiating -> 10mbps link from PHY side where PHY supports the speed -> but not updated in PHY kernel framework. -> -> One such example is AQR113C PHY. -> - -**[v2: net-next: net: phy: hide the PHYLIB_LEDS knob](http://lore.kernel.org/netdev/d82489be8ed911c383c3447e9abf469995ccf39a.1682496488.git.pabeni@redhat.com/)** - -> commit 4bb7aac70b5d ("net: phy: fix circular LEDS_CLASS dependencies") -> solved a build failure, but introduces a new config knob with a default -> 'y' value: PHYLIB_LEDS. -> - -#### 安全增强 - -**[GIT PULL: flexible-array transformations for 6.4-rc1](http://lore.kernel.org/linux-hardening/ZEaNFzLag13mLxOL@work/)** - -> The following changes since commit fe15c26ee26efa11741a7b632e9f23b01aca4cc6: -> -> Linux 6.3-rc1 (2023-03-05 14:52:03 -0800) -> -> are available in the Git repository at: -> -> git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git tags/flex-array-transformations-6.4-rc1 -> -> for you to fetch changes up to 00168b415a60cec7558608efb4fc50f2a73daae2: -> - -#### 异步 IO - -**[v3: io_uring: Pass the whole sqe to commands](http://lore.kernel.org/io-uring/20230430143532.605367-1-leitao@debian.org/)** - -> These three patches prepare for the sock support in the io_uring cmd, as -> described in the following RFC: -> -> https://lore.kernel.org/lkml/20230406144330.1932798-1-leitao@debian.org/ -> -> Since the support linked above depends on other refactors, such as the sock -> ioctl() sock refactor[1], I would like to start integrating patches that have -> consensus and can bring value right now. This will also reduce the patchset -> size later. -> - -**[v1: Rethinking splice](http://lore.kernel.org/io-uring/cover.1682701588.git.asml.silence@gmail.com/)** - -> IORING_OP_SPLICE has problems, many of them are fundamental and rooted -> in the uapi design, see the patch 8 description. This patchset introduces -> a different approach, which came from discussions about splices -> and fused commands and absorbed ideas from both of them. We remove -> reliance onto pipes and registering "spliced" buffers with data as an -> io_uring's registered buffer. Then the user can use it as a usual -> registered buffer, e.g. pass it to IORING_OP_WRITE_FIXED. -> - -**[v1: io_uring attached nvme queue](http://lore.kernel.org/io-uring/20230429093925.133327-1-joshi.k@samsung.com/)** - -> Also, io_uring ring is not to be shared among application threads. -> Application is responsible for building the sharing (if it feels the -> need). This means ring-associated exclusive queue can do away with some -> synchronization costs that occur for shared queue. -> - -**[v11: io_uring: add napi busy polling support](http://lore.kernel.org/io-uring/20230428181248.610605-1-shr@devkernel.io/)** - -> This adds the napi busy polling support in io_uring.c. It adds a new -> napi_list to the io_ring_ctx structure. This list contains the list of -> napi_id's that are currently enabled for busy polling. This list is -> used to determine which napi id's enabled busy polling. For faster -> access it also adds a hash table. -> - -**[v1: io_uring: Add io_uring_setup flag to pre-register ring fd and never install it](http://lore.kernel.org/io-uring/bc8f431bada371c183b95a83399628b605e978a3.1682699803.git.josh@joshtriplett.org/)** - -> With IORING_REGISTER_USE_REGISTERED_RING, an application can register -> the ring fd and use it via registered index rather than installed fd. -> This allows using a registered ring for everything *except* the initial -> mmap. -> - -**[v10: io_uring: add napi busy polling support](http://lore.kernel.org/io-uring/20230425181845.2813854-1-shr@devkernel.io/)** - -> This adds the napi busy polling support in io_uring.c. It adds a new -> napi_list to the io_ring_ctx structure. This list contains the list of -> napi_id's that are currently enabled for busy polling. This list is -> used to determine which napi id's enabled busy polling. For faster -> access it also adds a hash table. -> - -**[v9: liburing: add api for napi busy poll](http://lore.kernel.org/io-uring/20230425182054.2826621-1-shr@devkernel.io/)** - -> This adds two new api's to set/clear the napi busy poll settings. The two -> new functions are called: -> - io_uring_register_napi -> - io_uring_unregister_napi -> -> The patch series also contains the documentation for the two new functions -> and two example programs. The client program is called napi-busy-poll-client -> and the server program napi-busy-poll-server. The client measures the -> roundtrip times of requests. -> - -#### Rust For Linux - -**[v3: rust: helpers: sort includes alphabetically in rust/helpers.c](http://lore.kernel.org/rust-for-linux/20230426204923.16195-1-amiculas@cisco.com/)** - -> Sort the #include directives of rust/helpers.c alphabetically and add a -> comment specifying this. The reason for this is to improve readability -> and to be consistent with the other files with a similar approach within -> 'rust/'. -> - -**[v1: rust: Sort rust/helpers.c's #include directives](http://lore.kernel.org/rust-for-linux/20230426081715.40834-1-amiculas@cisco.com/)** - -> Sort the #include directives of rust/helpers.c alphabetically and add a -> comment specifying this. -> - -#### BPF - -**[v3: bpf-next: Handle immediate reuse in bpf memory allocator](http://lore.kernel.org/bpf/20230429101215.111262-1-houtao@huaweicloud.com/)** - -> As discussed in v1, currently the freed objects in bpf memory allocator -> may be reused immediately by the new allocation, it introduces -> use-after-bpf-ma-free problem for non-preallocated hash map and makes -> lookup procedure return incorrect result. The immediate reuse also makes -> introducing new use case more difficult (e.g. qp-trie). -> - -**[v1: bpf-next: libbpf: capability for resizing datasec maps](http://lore.kernel.org/bpf/20230428222754.183432-1-inwardvessel@gmail.com/)** - -> The thought behind this is to allow for use cases where a given datasec -> needs to scale to for example the number of CPU's present. A bpf program -> can have a global array in a custom data section with an initial length -> and before loading the bpf program, the array length could be extended to -> match the CPU count. The selftests included in this series perform this -> scaling to an arbitrary value to demonstrate how it can work. -> - -**[v1: x86/pie: Make kernel image's virtual address flexible](http://lore.kernel.org/bpf/cover.1682673542.git.houwenlong.hwl@antgroup.com/)** - -> These patches make the changes necessary to build the kernel as Position -> Independent Executable (PIE) on x86_64. A PIE kernel can be relocated -> below the top 2G of the virtual address space. And this patchset -> provides an example to allow kernel image to be relocated in top 512G of -> the address space. -> - -**[v1: bpf-next: selftests/bpf: Add fexit_sleep to DENYLIST.aarch64](http://lore.kernel.org/bpf/20230428034726.2593484-1-martin.lau@linux.dev/)** - -> It is reported that the fexit_sleep never returns in aarch64. -> The remaining tests cannot start. Put this test into DENYLIST.aarch64 -> for now so that other tests can continue to run in the CI. -> - -**[v2: bpf-next: libbpf: btf_dump_type_data_check_overflow needs to consider BTF_MEMBER_BITFIELD_SIZE](http://lore.kernel.org/bpf/20230428013638.1581263-1-martin.lau@linux.dev/)** - -> The reason is in btf_dump_type_data_check_overflow(). It does not use -> BTF_MEMBER_BITFIELD_SIZE from the struct's member (btf_member). Instead, -> it is using the enum size which is 4. It had been working till the recent -> commit 4e04143c869c ("fs_context: drop the unused lsm_flags member") -> removed an integer member which also removed the 4 bytes padding at the end -> of the fs_context. Missing this 4 bytes padding exposed this bug. -> In particular, when btf_dump_type_data_check_overflow() reaches -> the member 'phase', -E2BIG is returned. -> - -**[v2: bpf-next: selftests/bpf: test_progs can read test lists from file](http://lore.kernel.org/bpf/20230427225333.3506052-1-sveiss@meta.com/)** - -> BPF selftests have ALLOWLIST and DENYLIST files, used to control which -> tests are run in CI. These files are currently parsed by a shell -> script. [1] -> -> This patchset allows those files to be specified directly on the -> test_progs command line (eg, as -a @ALLOWLIST). -> - -**[v2: bpf-next: bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen](http://lore.kernel.org/bpf/20230427200409.1785263-1-sdf@google.com/)** - -> optval larger than PAGE_SIZE leads to EFAULT if the BPF program -> isn't careful enough. This is often overlooked and might break -> completely unrelated socket options. Instead of EFAULT, -> let's ignore BPF program buffer changes. See the first patch for -> more info. -> - -**[v1: selftests/bpf: Do not use sign-file as testcase](http://lore.kernel.org/bpf/88e3ab23029d726a2703adcf6af8356f7a2d3483.1682607419.git.legion@kernel.org/)** - -> The sign-file utility (from scripts/) is used in prog_tests/verify_pkcs7_sig.c, -> but the utility should not be called as a test. Executing this utility -> produces the following error: -> -> selftests: /linux/tools/testing/selftests/bpf: urandom_read -> ok 16 selftests: /linux/tools/testing/selftests/bpf: urandom_read -> -> selftests: /linux/tools/testing/selftests/bpf: sign-file -> not ok 17 selftests: /linux/tools/testing/selftests/bpf: sign-file # exit=2 -> - -**[v4: bpf-next: bpftool: Dump map id instead of value for map_of_maps types](http://lore.kernel.org/bpf/20230427120313.43574-1-kuro@kuroa.me/)** - -> When using `bpftool map dump` with map_of_maps, it is usually -> more convenient to show the inner map id instead of raw value. -> -> We are changing the plain print behavior to show inner_map_id -> instead of hex value, this would help with quick look up of -> inner map with `bpftool map dump id `. -> To avoid disrupting scripted behavior, we will add a new -> `inner_map_id` field to json output instead of replacing value. -> - -**[v2: bpf-next: bpf: Make bpf_helper_defs.h c++ friendly](http://lore.kernel.org/bpf/20230426155357.4158846-1-sdf@google.com/)** - -> Compiling C++ BPF programs with existing bpf_helper_defs.h is not -> possible due to stricter C++ type conversions. C++ complains -> about (void *) type conversions: -> -> $ clang++ --include linux/types.h ./tools/lib/bpf/bpf_helper_defs.h -> - -**[v1: bpf-next: Add precision propagation for subprogs and callbacks](http://lore.kernel.org/bpf/20230425234911.2113352-1-andrii@kernel.org/)** - -> As more and more real-world BPF programs become more complex -> and increasingly use subprograms (both static and global), scalar precision -> tracking and its (previously weak) support for BPF subprograms (and callbacks -> as a special case of that) is becoming more and more of an issue and -> limitation. Couple that with increasing reliance on state equivalence (BPF -> open-coded iterators have a hard requirement for state equivalence to converge -> and successfully validate loops), and it becomes pretty critical to address -> this limitation and make precision tracking universally supported for BPF -> programs of any complexity and composition. -> - -**[v1: KEYS: Introduce user mode key and signature parsers](http://lore.kernel.org/bpf/20230425173557.724688-1-roberto.sassu@huaweicloud.com/)** - -> Support new key and signature formats with the same kernel component. -> -> Verify the authenticity of system data with newly supported data formats. -> -> Mitigate the risk of parsing arbitrary data in the kernel. -> - -**[v7: vhost: virtio core prepares for AF_XDP](http://lore.kernel.org/bpf/20230425073613.8839-1-xuanzhuo@linux.alibaba.com/)** - -> Now, virtio may can not work with DMA APIs when virtio features do not have -> VIRTIO_F_ACCESS_PLATFORM. -> -> 1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just -> work with the "real" devices. -> 2. I tried to let xsk support callballs to get phy address from virtio-net -> driver as the dma address. But the maintainers of xsk may want to use dma-buf -> to replace the DMA APIs. I think that may be a larger effort. We will wait -> too long. -> - -**[v2: powerpc/bpf: populate extable entries only during the last pass](http://lore.kernel.org/bpf/20230425065829.18189-1-hbathini@linux.ibm.com/)** - -> Since commit 85e031154c7c ("powerpc/bpf: Perform complete extra passes -> to update addresses"), two additional passes are performed to avoid -> space and CPU time wastage on powerpc. But these extra passes led to -> WARN_ON_ONCE() hits in bpf_add_extable_entry() as extable entries are -> populated again, during the extra pass, without resetting the index. -> Fix it by resetting entry index before repopulating extable entries, -> if and when there is an additional pass. -> - -**[v1: bpf-next: selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests](http://lore.kernel.org/bpf/20230424235128.1941726-1-andrii@kernel.org/)** - -> For now, change the test to assume fixed size of passed in array. Once -> BPF verifier supports precision tracking across subprogram calls, these -> changes will be reverted as unnecessary. -> - -**[[RFC/PATCH bpf-next 00/20] bpf: Add multi uprobe link](http://lore.kernel.org/bpf/20230424160447.2005755-1-jolsa@kernel.org/)** - -> this patchset is adding support to attach multiple uprobes and usdt probes -> through new uprobe_multi link. -> -> The current uprobe is attached through the perf event and attaching many -> uprobes takes a lot of time because of that. -> - -**[v6: tracing: Add fprobe events](http://lore.kernel.org/bpf/168234755610.2210510.12133559313738141202.stgit@mhiramat.roam.corp.google.com/)** - -> Here is the 6th version of improve fprobe and add a basic fprobe event -> support for ftrace (tracefs) and perf. Here is the previous version. -> -> https://lore.kernel.org/all/168198993129.1795549.8306571027057356176.stgit@mhiramat.roam.corp.google.com/ -> - -**[v2: libbpf: Improve version handling when attaching uprobe](http://lore.kernel.org/bpf/ZEV%2FEzOM+TJomP66@eg/)** - -> This change fixes the handling of versions in elf_find_func_offset. -> In the previous implementation, we incorrectly assumed that the -> version information would be present in the string found in the -> string table. -> - -**[v1: bpf-next: xsk: Use pool->dma_pages to check for DMA](http://lore.kernel.org/bpf/20230423180157.93559-1-kal.conley@dectris.com/)** - -> Compare pool->dma_pages instead of pool->dma_pages_cnt to check for an -> active DMA mapping. pool->dma_pages needs to be read anyway to access -> the map so this compiles to more efficient code. -> - -### 周边技术动态 - -#### Qemu - -**[v1: target/riscv: RVV 1-fill tail element changes](http://lore.kernel.org/qemu-devel/20230427205708.246679-1-dbarboza@ventanamicro.com/)** - -> This series makes changes in vext_set_tail_elements_1s() to be a little -> nicer to the emulation. -> -> First patch makes the function a no-op when vta == 0. Aside from the -> logic simplification we also have a little performance boost. -> - -**[v2: hw/riscv: virt: Assume M-mode FW in pflash0 only when "-bios none"](http://lore.kernel.org/qemu-devel/20230425102545.162888-1-sunilvl@ventanamicro.com/)** - -> Currently, virt machine supports two pflash instances each with -> 32MB size. However, the first pflash is always assumed to -> contain M-mode firmware and reset vector is set to this if -> enabled. Hence, for S-mode payloads like EDK2, only one pflash -> instance is available for use. This means both code and NV variables -> of EDK2 will need to use the same pflash. -> - -**[v3: hw/riscv/virt: Add a second UART for secure world](http://lore.kernel.org/qemu-devel/20230425073509.3618388-1-yong.li@intel.com/)** - -> The virt machine can have two UARTs and the second UART -> can be used by the secure payload, firmware or OS residing -> in secure world. Will include the UART device to FDT in a -> seperated patch. -> - -**[v1: Add RISC-V KVM AIA Support](http://lore.kernel.org/qemu-devel/20230424090716.15674-1-yongxuan.wang@sifive.com/)** - -> This series introduces support for KVM AIA in the RISC-V architecture. The -> implementation is refered to Anup's KVM AIA implementation in kvmtool -> (https://github.com/avpatel/kvmtool.git). To test these patches, a Linux kernel -> with KVM AIA support is required, which can be found in the qemu_kvm_aia branch -> at https://github.com/yong-xuan/linux.git. This kernel branch is based on the -> riscv_aia_v1 branch from https://github.com/avpatel/linux.git and includes two -> additional patches. -> - -#### U-Boot - -**[v3: Add ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/u-boot/20230428022515.29393-1-yanhong.wang@starfivetech.com/)** - -> This series of patches base on the latest branch/master,and -> adds ethernet support for the StarFive JH7110 RISC-V SoC. -> The series includes EEPROM, PHY and MAC drivers. The PHY model is -> YT8531 (from Motorcomm Inc), and the MAC version is dwmac-5.20 -> (from Synopsys DesignWare). -> - -**[v5: Add StarFive JH7110 PCIe drvier support](http://lore.kernel.org/u-boot/20230423105859.125764-1-minda.chen@starfivetech.com/)** - -> The PCIe driver depends on gpio, pinctrl, clk and reset driver to do init. -> The PCIe dts configuation includes all these setting. -> - -## 20230423:第 43 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v1: riscv: uprobes: Restore thread.bad_cause](http://lore.kernel.org/linux-riscv/1682214146-3756-1-git-send-email-yangtiezhu@loongson.cn/)** - -> thread.bad_cause is saved in arch_uprobe_pre_xol(), it should be restored -> in arch_uprobe_{post,abort}_xol() accordingly, otherwise the save operation -> is meaningless, this change is similar with x86 and powerpc. -> - -**[v1: dt-bindings: riscv: add sv57 mmu-type](http://lore.kernel.org/linux-riscv/20230421-voucher-ecology-7ddfdf801a71@spud/)** - -> Dumping the dtb from new versions of QEMU warns that sv57 is an -> undocumented mmu-type. The kernel has supported sv57 for about a year, -> so bring it into the fold. -> - -**[GIT PULL: KVM/riscv changes for 6.4](http://lore.kernel.org/linux-riscv/CAAhSdy2RLinG5Gx-sfOqrYDAT=xDa3WAk8r1jTu8ReO5Jo0LVA@mail.gmail.com/)** - -> We have the following KVM RISC-V changes for 6.4: -> 1) ONE_REG interface to enable/disable SBI extensions -> 2) Zbb extension for Guest/VM -> 3) AIA CSR virtualization -> 4) Few minor cleanups and fixes -> - -**[v17: Microchip Soft IP corePWM driver](http://lore.kernel.org/linux-riscv/20230421-neurology-trapezoid-b4fa29923a23@wendy/)** - -> Yet another version of this driver :) -> -> This time around I've implemented Uwe's simplified method for -> calculating the prescale & period_steps. For low values of prescale it -> makes for much worse approximations of the period, but as the period -> increases with respect to the that of the pwm's underlying clock there -> is mostly no different in the approximations. -> - -**[v1: riscv: mm: Ensure prot of VM_WRITE and VM_EXEC must be readable](http://lore.kernel.org/linux-riscv/20230421075111.1391952-1-woodrow.shen@sifive.com/)** - -> The commit 8aeb7b17f04e ("RISC-V: Make mmap() with PROT_WRITE imply PROT_READ") -> allows riscv to use mmap with PROT_WRITE only, and meanwhile mmap with w+x is -> also permitted. However, when userspace tries to access this page with -> PROT_WRITE|PROT_EXEC, which causes infinite loop at load page fault as well as -> it triggers soft lockup. According to riscv privileged spec, -> "Writable pages must also be marked readable". The fix to drop the -> `PAGE_COPY_EXEC` and then `PAGE_COPY_READ_EXEC` should be just used instead. -> This aligns the other arches (i.e arm64) for protection_map. -> - -**[v3: Add JH7110 cpufreq support](http://lore.kernel.org/linux-riscv/20230421031431.23010-1-mason.huo@starfivetech.com/)** - -> The StarFive JH7110 SoC has four RISC-V cores, -> and it supports up to 4 cpu frequency loads. -> -> This patchset adds the compatible strings into the allowlist -> for supporting the generic cpufreq driver on JH7110 SoC. -> Also, it enables the axp15060 pmic for the cpu power source. -> - -**[v1: RISC-V: include cpufeature.h in cpufeature.c](http://lore.kernel.org/linux-riscv/20230420-wound-gizzard-2b2b589d9bea@spud/)** - -> Automation complains: -> warning: symbol '__pcpu_scope_misaligned_access_speed' was not declared. Should it be static? -> -> cpufeature.c doesn't actually include the header of the same name, as it -> had not previously used anything from it. -> The per-cpu variable is declared there, so include it to silence the -> complaints. -> - -**[v5: Add JH7110 USB and USB PHY driver support](http://lore.kernel.org/linux-riscv/20230420110052.3182-1-minda.chen@starfivetech.com/)** - -> This patchset adds USB driver and USB PHY for the StarFive JH7110 SoC. -> USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. -> The patch has been tested on the VisionFive 2 board. -> - -**[v3: Change PWM-controlled LED pin active mode and algorithm](http://lore.kernel.org/linux-riscv/20230420093457.18936-1-nylon.chen@sifive.com/)** - -> According to the circuit diagram of User LEDs - RGB described in the manual hifive-unleashed-a00.pdf[0] and hifive-unmatched-schematics-v3.pdf[1]. -> - -**[v2: Add TDM audio on StarFive JH7110](http://lore.kernel.org/linux-riscv/20230420024118.22677-1-walker.chen@starfivetech.com/)** - -> This patchset adds TDM audio driver for the StarFive JH7110 SoC. The -> first patch adds device tree binding for TDM module. The second patch -> adds the item for JH7110 audio board to the dt-binding of StarFive -> SoC-based boards. The third patch adds tdm driver support for JH7110 -> SoC. The last patch adds device node of tdm and sound card to JH7110 dts. -> - -**[v1: kvmtool: RISC-V CoVE support](http://lore.kernel.org/linux-riscv/20230419222350.3604274-1-atishp@rivosinc.com/)** - -> This series is an initial version of the support for running confidential VMs on -> riscv architecture. This is to get feedback on the proposed COVH, COVI and COVG -> extensions for running Confidential VMs on riscv. The specification is available -> here [0]. Make sure to build it to get the latest changes as it gets updated -> from time to time. -> - -**[v2: Add JH7110 AON PMU support](http://lore.kernel.org/linux-riscv/20230419034833.43243-1-changhuang.liang@starfivetech.com/)** - -> This patchset adds aon power domain driver for the StarFive JH7110 SoC. -> It is used to turn on/off dphy rx/tx power switch. The series has been -> tested on the VisionFive 2 board. -> - -**[v1: pwm: sifive: Simplify using devm_clk_get_prepared()](http://lore.kernel.org/linux-riscv/20230418202102.117658-1-u.kleine-koenig@pengutronix.de/)** - -> Instead of preparing the clk after it was requested and unpreparing in -> .probe()'s error path and .remove(), use devm_clk_get_prepared() which -> copes for unpreparing automatically. -> - -**[v1: Split ptdesc from struct page](http://lore.kernel.org/linux-riscv/20230417205048.15870-1-vishal.moola@gmail.com/)** - -> The MM subsystem is trying to shrink struct page. This patchset -> introduces a memory descriptor for page table tracking - struct ptdesc. -> -> This patchset introduces ptdesc, splits ptdesc from struct page, and -> converts many callers of page table constructor/destructors to use ptdescs. -> - -**[v1: tools/nolibc: add stackprotector support for more architectures](http://lore.kernel.org/linux-riscv/20230408-nolibc-stackprotector-archs-v1-0-271f5c859c71@weissschuh.net/)** - -> Add stackprotector support for all remaining architectures, except s390. -> -> On s390 the stackprotectors are not supported in "global" mode; only -> "sysreg" mode which is not suppored in nolibc. -> - -**[v1: RISC-V: Add steal-time support](http://lore.kernel.org/linux-riscv/20230417103402.798596-1-ajones@ventanamicro.com/)** - -> One frequently touted benefit of virtualization is the ability to -> consolidate machines, increasing resource utilization. It may even be -> desirable to overcommit, at the risk of one or more VCPUs having to wait. -> Hypervisors which have interfaces for guests to retrieve the amount of -> time each VCPU had to wait give observers within the guests ways to -> account for less progress than would otherwise be expected. The SBI STA -> extension proposal[1] provides a standard interface for guest VCPUs to -> retrieve the amount of time "stolen". -> - -**[v3: riscv: mm: execute local TLB flush after populating vmemmap](http://lore.kernel.org/linux-riscv/20230417060618.639395-1-vincent.chen@sifive.com/)** - -> The spare_init() calls memmap_populate() many times to create VA to PA -> mapping for the VMEMMAP area, where all "struct page" are located once -> CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later -> initialized in the zone_sizes_init() function. However, during this -> process, no sfence.vma instruction is executed for this VMEMMAP area. -> This omission may cause the hart to fail to perform page table walk -> because some data related to the address translation is invisible to the -> hart. To solve this issue, the local_flush_tlb_kernel_range() is called -> right after the spare_init() to execute a sfence.vma instruction for the -> VMEMMAP area, ensuring that all data related to the address translation -> is visible to the hart. -> - -**[v1: riscv: dts: starfive: Add PMU controller node](http://lore.kernel.org/linux-riscv/20230417034728.2670-1-walker.chen@starfivetech.com/)** - -> Add the pmu controller node for the StarFive JH7110 SoC. The PMU needs -> to be used by other modules, e.g. VPU,ISP,etc. -> - -#### 进程调度 - -**[v2: net: net/sched: cls_api: Initialize miss_cookie_node when action miss is not used](http://lore.kernel.org/lkml/20230420183634.1139391-1-ivecera@redhat.com/)** - -> Function tcf_exts_init_ex() sets exts->miss_cookie_node ptr only -> when use_action_miss is true so it assumes in other case that -> the field is set to NULL by the caller. If not then the field -> contains garbage and subsequent tcf_exts_destroy() call results -> in a crash. -> Ensure that the field .miss_cookie_node pointer is NULL when -> use_action_miss parameter is false to avoid this potential scenario. -> - -**[v2: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230420051946.7463-1-yury.norov@gmail.com/)** - -> for_each_cpu() is widely used in kernel, and it's beneficial to create -> a NUMA-aware version of the macro. -> - -**[v1: net: sched: print jiffies when transmit queue time out](http://lore.kernel.org/lkml/20230419115632.738730-1-yajun.deng@linux.dev/)** - -> Although there is watchdog_timeo to let users know when the transmit queue -> begin stall, but dev_watchdog() is called with an interval. The jiffies -> will always be greater than watchdog_timeo. -> - -**[v1: drm/msm: Move cmdstream dumping out of sched kthread](http://lore.kernel.org/lkml/20230417225510.494951-1-robdclark@gmail.com/)** - -> This is something that can block for arbitrary amounts of time as -> userspace consumes from the FIFO. So we don't really want this to -> be in the fence signaling path. -> - -**[v1: sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag](http://lore.kernel.org/lkml/20230416213406.2966521-1-davidai@google.com/)** - -> A userspace service may manage uclamp dynamically for individual tasks and -> a child task will unintentionally inherit a pesudo-random uclamp setting. -> This could result in the child task being stuck with a static uclamp value -> that results in poor performance or poor power. -> - -**[GIT PULL: sched/urgent for v6.3-rc7](http://lore.kernel.org/lkml/20230416123412.GDZDvrRCv9VvvmXuPz@fat_crate.local/)** - -> pls pull an urgent scheduler fix for 6.3. -> -> Thx. -> - -#### 内存管理 - -**[v1: mm/gup: disallow GUP writing to file-backed mappings by default](http://lore.kernel.org/linux-mm/f86dc089b460c80805e321747b0898fd1efe93d7.1682168199.git.lstoakes@gmail.com/)** - -> It isn't safe to write to file-backed mappings as GUP does not ensure that -> the semantics associated with such a write are performed correctly, for -> instance filesystems which rely upon write-notify will not be correctly -> notified. -> - -**[v12: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230421231421.2401346-1-nphamcs@gmail.com/)** - -> There is currently no good way to query the page cache statistics of large -> files and directory trees. There is mincore(), but it scales poorly: the -> kernel writes out a lot of bitmap data that userspace has to aggregate, -> when the user really does not care about per-page information in that -> case. The user also needs to mmap and unmap each file as it goes along, -> which can be quite slow as well. -> - -**[v2: migrate: Avoid unbounded blocks in MIGRATE_SYNC_LIGHT](http://lore.kernel.org/linux-mm/20230421221249.1616168-1-dianders@chromium.org/)** - -> This series is the result of discussion around my RFC patch [1] where -> I talked about completely removing the waits for the folio_lock in -> migrate_folio_unmap(). -> - -**[v1: shmem: add support for blocksize > PAGE_SIZE](http://lore.kernel.org/linux-mm/20230421214400.2836131-1-mcgrof@kernel.org/)** - -> This is an initial attempt to add support for block size > PAGE_SIZE for tmpfs. -> Why would you want this? It helps us experiment with higher order folio uses -> with fs APIS and helps us test out corner cases which would likely need -> to be accounted for sooner or later if and when filesystems enable support -> for this. Better review early and burn early than continue on in the wrong -> direction so looking for early feedback. -> - -**[v2: kasan: use internal prototypes matching gcc-13 builtins](http://lore.kernel.org/linux-mm/20230421205754.106794-1-arnd@kernel.org/)** - -> This now passes all randconfig builds on arm, arm64 and x86, but I have -> not tested it on the other architectures that support kasan, since they -> tend to fail randconfig builds in other ways. This might fail if any -> of the 32-bit architectures expect a 'long' instead of 'int' for the -> size argument. -> - -**[v1: block: simplify with PAGE_SECTORS_SHIFT](http://lore.kernel.org/linux-mm/20230421195807.2804512-1-mcgrof@kernel.org/)** - -> A bit of block drivers have their own incantations with -> PAGE_SHIFT - SECTOR_SHIFT. Just simplfy and use PAGE_SECTORS_SHIFT -> all over. -> - -**[v5: cgroup: eliminate atomic rstat flushing](http://lore.kernel.org/linux-mm/20230421174020.2994750-1-yosryahmed@google.com/)** - -> A previous patch series ([1] currently in mm-stable) changed most -> atomic rstat flushing contexts to become non-atomic. This was done to -> avoid an expensive operation that scales with # cgroups and # cpus to -> happen with irqs disabled and scheduling not permitted. There were two -> remaining atomic flushing contexts after that series. This series tries -> to eliminate them as well, eliminating atomic rstat flushing completely. -> - -**[v1: arm64: Also reset KASAN tag if page is not PG_mte_tagged](http://lore.kernel.org/linux-mm/20230420210945.2313627-1-pcc@google.com/)** - -> Consider the following sequence of events: -> -> 1) A page in a PROT_READ|PROT_WRITE VMA is faulted. -> 2) Page migration allocates a page with the KASAN allocator, -> causing it to receive a non-match-all tag, and uses it -> to replace the page faulted in 1. -> 3) The program uses mprotect() to enable PROT_MTE on the page faulted in 1. -> - -**[v4: bio: check return values of bio_add_page](http://lore.kernel.org/linux-mm/20230420100501.32981-1-jth@kernel.org/)** - -> We have two functions for adding a page to a bio, __bio_add_page() which is -> used to add a single page to a freshly created bio and bio_add_page() which is -> used to add a page to an existing bio. -> - -**[v1: shmem: restrict noswap option to initial user namespace](http://lore.kernel.org/linux-mm/20230420-faxen-advokat-40abb4c1a152@brauner/)** - -> Prevent tmpfs instances mounted in an unprivileged namespaces from -> evading accounting of locked memory by using the "noswap" mount option. -> - -**[v15: RESEND: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-mm/20230420060156.895881-1-usama.anjum@collabora.com/)** - -> This syscall is used in Windows applications and games etc. This syscall is -> being emulated in pretty slow manner in userspace. Our purpose is to -> enhance the kernel such that we translate it efficiently in a better way. -> Currently some out of tree hack patches are being used to efficiently -> emulate it in some kernels. We intend to replace those with these patches. -> So the whole gaming on Linux can effectively get benefit from this. It -> means there would be tons of users of this code. -> - -**[v2: module: add debugging auto-load duplicate module support](http://lore.kernel.org/linux-mm/20230420003046.1604251-1-mcgrof@kernel.org/)** - -> The finit_module() system call can in the worst case use up to more than -> twice of a module's size in virtual memory. Duplicate finit_module() -> system calls are non fatal, however they unnecessarily strain virtual -> memory during bootup and in the worst case can cause a system to fail -> to boot. This is only known to currently be an issue on systems with -> larger number of CPUs. -> - -**[v15: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-mm/20230419110716.4113627-1-usama.anjum@collabora.com/)** - -> This syscall is used in Windows applications and games etc. This syscall is -> being emulated in pretty slow manner in userspace. Our purpose is to -> enhance the kernel such that we translate it efficiently in a better way. -> Currently some out of tree hack patches are being used to efficiently -> emulate it in some kernels. We intend to replace those with these patches. -> So the whole gaming on Linux can effectively get benefit from this. It -> means there would be tons of users of this code. -> - -**[v1: mm/cma: mm/cma: retry allocation of dedicated area on EBUSY](http://lore.kernel.org/linux-mm/20230419083851.2555096-1-sergii.piatakov@globallogic.com/)** - -> Sometimes continuous page range can't be successfully allocated, because -> some pages in the range may not pass the isolation test. In this case, -> the CMA allocator gets an EBUSY error and retries allocation again (in -> the slightly shifted range). -> - -**[v1: printk: Enough to disable preemption in printk deferred context](http://lore.kernel.org/linux-mm/20230419074210.17646-1-pmladek@suse.com/)** - -> The comment above printk_deferred_enter()/exit() definition claims -> that it can be used only when interrupts are disabled. -> - -**[v1: mm: skip CMA pages when they are not available](http://lore.kernel.org/linux-mm/1681882824-17532-1-git-send-email-zhaoyang.huang@unisoc.com/)** - -> It is wasting of effort to reclaim CMA pages if they are not availabe -> for current context during direct reclaim. Skip them when under corresponding -> circumstance. -> - -**[v1: mm/mmap: Map MAP_STACK to VM_STACK](http://lore.kernel.org/linux-mm/20230418210230.3495922-1-longman@redhat.com/)** - -> One of the flags of mmap(2) is MAP_STACK to request a memory segment -> suitable for a process or thread stack. The kernel currently ignores -> this flags. Glibc uses MAP_STACK when mmapping a thread stack. However, -> selinux has an execstack check in selinux_file_mprotect() which disallows -> a stack VMA to be made executable. -> - -**[v1: mm: reliable huge page allocator](http://lore.kernel.org/linux-mm/20230418191313.268131-1-hannes@cmpxchg.org/)** - -> As memory capacity continues to grow, 4k TLB coverage has not been -> able to keep up. On Meta's 64G webservers, close to 20% of execution -> cycles are observed to be handling TLB misses when using 4k pages -> only. Huge pages are shifting from being a nice-to-have optimization -> for HPC workloads to becoming a necessity for common applications. -> - -#### 文件系统 - -**[v1: io_uring: add getdents support, take 2](http://lore.kernel.org/linux-fsdevel/20230422-uring-getdents-v1-0-14c1db36e98c@codewreck.org/)** - -> The new API does nothing that cannot be achieved with plain syscalls so -> it shouldn't be introducing any new problem, the only downside is that -> having the state in the file struct isn't very uring-ish and if a -> better solution is found later that will probably require duplicating -> some logic in a new flag... But that seems like it would likely be a -> distant future, and this version should be usable right away. -> - -**[v2: Support negative dentries on case-insensitive ext4 and f2fs](http://lore.kernel.org/linux-fsdevel/20230422000310.1802-1-krisman@suse.de/)** - -> This is the v2 of the negative dentry support on case-insensitive directories. -> It doesn't have any functional changes from v1, but it adds more context and a -> comment to the dentry->d_name access I'm doing in d_revalidate, documenting -> why (i understand) it is safe to do it without protecting from the parallell -> directory changes. -> - -**[GIT PULL: Turn single vector imports into ITER_UBUF](http://lore.kernel.org/linux-fsdevel/f16053ea-d3b8-a8a2-0178-3981fea5a656@kernel.dk/)** - -> This series turns singe vector imports into ITER_UBUF, rather than -> ITER_IOVEC. The former is more trivial to iterate and advance, and hence -> a bit more efficient. From some very unscientific testing, -> 60% of all -> iovec imports are single vector. -> - -**[GIT PULL: pipe: nonblocking rw for io_uring](http://lore.kernel.org/linux-fsdevel/20230421-seilbahn-vorpreschen-bd73ac3c88d7@brauner/)** - -> /* Summary */ -> This contains Jens' work to support FMODE_NOWAIT and thus IOCB_NOWAIT -> for pipes ensuring that all places can deal with non-blocking requests. -> -> To this end, pass down the information that this is a nonblocking -> request so that pipe locking, allocation, and buffer checking correctly -> deal with those. -> - -**[v1: fs/coredump: open coredump file in O_WRONLY instead of O_RDWR](http://lore.kernel.org/linux-fsdevel/20230420120409.602576-1-vsementsov@yandex-team.ru/)** - -> This makes it possible to make stricter apparmor profile and don't -> allow the program to read any coredump in the system. -> - -**[v2: shmem: Add user and group quota support for tmpfs](http://lore.kernel.org/linux-fsdevel/20230420080359.2551150-1-cem@kernel.org/)** - -> This is the version 2 of the quota support from tmpfs addressing some issues -> discussed on V1 and a few extra things, details are within each patch. Original -> cover-letter below. -> - -**[v5: Introduce block provisioning primitives](http://lore.kernel.org/linux-fsdevel/20230420004850.297045-1-sarthakkukreti@chromium.org/)** - -> Next revision of adding support for block provisioning requests. -> - -**[v2: ext4: Handle error pointers being returned from __filemap_get_folio](http://lore.kernel.org/linux-fsdevel/20230419120923.3152939-1-willy@infradead.org/)** - -> Commit "mm: return an ERR_PTR from __filemap_get_folio" changed from -> returning NULL to returning an ERR_PTR(). This cannot be fixed in either -> the ext4 tree or the mm tree, so this patch should be applied as part -> of merging the two trees. -> - -**[v10: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20230419114320.13674-1-nj.shetty@samsung.com/)** - -> The patch series covers the points discussed in November 2021 virtual -> call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0]. -> We have covered the initial agreed requirements in this patchset and -> further additional features suggested by community. -> Patchset borrows Mikulas's token based approach for 2 bdev -> implementation. -> - -**[v1: Backport several fuse patches for 6.1.y](http://lore.kernel.org/linux-fsdevel/20230419095518.51373-1-yb203166@antfin.com/)** - -> Antgroup is using 5.10.y in product environment, we found several patches are -> missing in 5.10.y tree. These patches are needed for us. So we backported them -> to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression. -> - -**[v1: Backport several fuse patches for 5.15.y](http://lore.kernel.org/linux-fsdevel/20230419095424.51328-1-yb203166@antfin.com/)** - -> Antgroup is using 5.10.y in product environment, we found several patches are -> missing in 5.10.y tree. These patches are needed for us. So we backported them -> to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression. -> - -**[v1: Backport several fuse patches to 5.10.y](http://lore.kernel.org/linux-fsdevel/20230419094844.51110-1-yb203166@antfin.com/)** - -> Antgroup is using 5.10.y in product environment, we found several patches are -> missing in 5.10.y tree. These patches are needed for us. So we backported them -> to 5.10.y. Also backport to 5.15.y and 6.1.y to prevent regression. -> - -**[v4: Introduce provisioning primitives for thinly provisioned storage](http://lore.kernel.org/linux-fsdevel/20230418221207.244685-1-sarthakkukreti@chromium.org/)** - -> This patch series is revision 4 of introducing a new mechanism to pass through provision requests on stacked thinly provisioned storage devices. See [1] for original cover letter. -> -> [1] https://lore.kernel.org/lkml/ZDnMl8A1B1+Tfn5S@redhat.com/T/#md4f20113c2242755747ae069f84be720a6751012 -> - -**[v3: bpf-next: FUSE BPF: A Stacked Filesystem Extension for FUSE](http://lore.kernel.org/linux-fsdevel/20230418014037.2412394-1-drosen@google.com/)** - -> These patches extend FUSE to be able to act as a stacked filesystem. This -> allows pure passthrough, where the fuse file system simply reflects the lower -> filesystem, and also allows optional pre and post filtering in BPF and/or the -> userspace daemon as needed. This can dramatically reduce or even eliminate -> transitions to and from userspace. -> - -**[v1: shmem: stable directory cookies](http://lore.kernel.org/linux-fsdevel/168175931561.2843.16288612382874559384.stgit@manet.1015granger.net/)** - -> The current cursor-based directory cookie mechanism doesn't work -> when a tmpfs filesystem is exported via NFS. This is because NFS -> clients do not open directories: each READDIR operation has to open -> the directory on the server, read it, then close it. The cursor -> state for that directory, being associated strictly with the opened -> struct file, is then discarded. -> - -**[v1: vfs: allow using kernel buffer during fiemap operation](http://lore.kernel.org/linux-fsdevel/bc30483b-7f9b-df4e-7143-8646aeb4b5a2@I-love.SAKURA.ne.jp/)** - -> syzbot is reporting circular locking dependency between ntfs_file_mmap() -> (which has mm->mmap_lock => ni->ni_lock => ni->file.run_lock dependency) -> and ntfs_fiemap() (which has ni->ni_lock => ni->file.run_lock => -> mm->mmap_lock dependency), for commit c4b929b85bdb ("vfs: vfs-level fiemap -> interface") implemented fiemap_fill_next_extent() using copy_to_user() -> where direct mm->mmap_lock dependency is inevitable. -> - -#### 网络设备 - -**[v5: net-next: net/smc: Introduce SMC-D-based OS internal communication acceleration](http://lore.kernel.org/netdev/1682252271-2544-1-git-send-email-guwen@linux.alibaba.com/)** - -> We found SMC-D can be used to accelerate OS internal communication, such as -> loopback or between two containers within the same OS instance. So this patch -> set provides a kind of SMC-D dummy device (we call it the SMC-D loopback device) -> to emulate an ISM device, so that SMC-D can also be used on architectures -> other than s390. The SMC-D loopback device are designed as a system global -> device, visible to all containers. -> - -**[v4: net-next: tsnep: XDP socket zero-copy support](http://lore.kernel.org/netdev/20230421194656.48063-1-gerhard@engleder-embedded.com/)** - -> Implement XDP socket zero-copy support for tsnep driver. I tried to -> follow existing drivers like igc as far as possible. But one main -> - -**[v3: net: netlink: Use copy_to_user() for optval in netlink_getsockopt().](http://lore.kernel.org/netdev/20230421185255.94606-1-kuniyu@amazon.com/)** - -> Brad Spencer provided a detailed report [0] that when calling getsockopt() -> for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such -> options require at least sizeof(int) as length. -> - -**[v5: bpf-next: bpf: add netfilter program type](http://lore.kernel.org/netdev/20230421170300.24115-1-fw@strlen.de/)** - -> Changes since last version: -> - rework test case in last patch wrt. ctx->skb dereference etc (Alexei) -> - pacify bpf ci tests, netfilter program type missed string translation -> in libbpf helper. -> - -**[v5: drivers/net/phy: add driver for Microchip LAN867x 10BASE-T1S PHY](http://lore.kernel.org/netdev/ZEK8Hvl0Zl%2F0NntI@debian/)** - -> This patch adds support for the Microchip LAN867x 10BASE-T1S family -> (LAN8670/1/2). The driver supports P2MP with PLCA. -> - -**[v2: can: virtio: Initial virtio CAN driver.](http://lore.kernel.org/netdev/20230421145653.12811-1-Mikhail.Golubev-Ciuchea@opensynergy.com/)** - -> This is version 3 of the driver after having gotten review comments. -> - -**[v1: net-next: net: dsa: MT7530, MT7531, and MT7988 improvements](http://lore.kernel.org/netdev/20230421143648.87889-1-arinc.unal@arinc9.com/)** - -> This patch series is focused on simplifying the code, and improving the -> logic of the support for MT7530, MT7531, and MT7988 SoC switches. -> -> There's also a fix for the switch on the MT7988 SoC. -> - -#### 异步 IO - -**[v1: io_uring: honor I/O nowait flag for read/write](http://lore.kernel.org/io-uring/20230421172822.8053-1-kch@nvidia.com/)** - -> When IO_URING_F_NONBLOCK is set on io_kiocb req->flag in io_write() or -> io_read() IOCB_NOWAIT is set for kiocb when passed it to the respective -> rw_iter callback. This sets REQ_NOWAIT for underlaying I/O. The result -> is low level driver always sees block layer request as REQ_NOWAIT even -> if user has submitted request with nowait = 0 e.g. fio nowait=0. -> - -**[v1: tools/io_uring: Add .gitignore](http://lore.kernel.org/io-uring/tencent_C8F457D8D10F44760333A1E1AC9B4B0C1507@qq.com/)** - -> Ignore {io_uring-bench,io_uring-cp}. -> - -**[v2: io_uring: Pass the whole sqe to commands](http://lore.kernel.org/io-uring/20230421114440.3343473-1-leitao@debian.org/)** - -> These three patches prepare for the sock support in the io_uring cmd, as -> described in the following RFC: -> -> https://lore.kernel.org/lkml/20230406144330.1932798-1-leitao@debian.org/ -> - -**[v1: test/file-verify.t: Don't run over mlock limit when run as non-root](http://lore.kernel.org/io-uring/20230420185728.4104-1-krisman@suse.de/)** - -> test/file-verify tries to get 2MB of pinned memory at once, which is -> higher than the default allowed for non-root users in older -> kernels (64kb before v5.16, nowadays 8mb). Skip the test for non-root -> users if the registration fails instead of failing the test. -> - -**[v1: Support for mapping SQ/CQ rings into huge page](http://lore.kernel.org/io-uring/20230419224805.693734-1-axboe@kernel.dk/)** - -> io_uring SQ/CQ rings are allocated by the kernel from contigious, normal -> pages, and then the application mmap()'s the rings into userspace. This -> works fine, but does require contigious pages to be available for the -> given SQ and CQ ring sizes. As uptime increases on a given system, so -> does memory fragmentation. Entropy is invevitable. -> - -**[v1: io_uring: Pass whole sqe to commands](http://lore.kernel.org/io-uring/20230419102930.2979231-1-leitao@debian.org/)** - -> These two patches prepares for the sock support in the io_uring cmd, as -> described in the following RFC: -> -> https://lore.kernel.org/lkml/20230406144330.1932798-1-leitao@debian.org/ -> - -**[v1: io_uring: Optimization of buffered random write](http://lore.kernel.org/io-uring/20230419092233.56338-1-luhongfei@vivo.com/)** - -> The buffered random write performance of io_uring is poor -> due to the following reason: -> By default, when performing buffered random writes, io_sq_thread -> will call io_issue_sqe writes req, but due to the setting of -> IO_URING_F_NONBLOCK, req is executed asynchronously in iou-wrk, -> where io_wq_submit_work calls io_issue_sqe completes the write req, -> with issue_flag as IO_URING_F_UNLOCKED | IO_URING_F_IOWQ, -> which will reduce performance. -> This patch will determine whether this req is a buffered random write, -> and if so, io_sq_thread directly calls io_issue_sqe(req, 0) -> completes req instead of completing it asynchronously in iou wrk. -> - -**[v4: io_uring: add support for multishot timeouts](http://lore.kernel.org/io-uring/20230418225817.1905027-1-davidhwei@meta.com/)** - -> A multishot timeout submission will repeatedly generate completions with -> the IORING_CQE_F_MORE cflag set. -> - -**[v1: for-next: another round of rsrc refactoring](http://lore.kernel.org/io-uring/cover.1681822823.git.asml.silence@gmail.com/)** - -> The main part is Patch 3, which establishes 1:1 relation between -> struct io_rsrc_put and nodes, which removes io_rsrc_node_switch() / -> io_rsrc_node_switch_start() and all the additional complexity with -> pre allocations. Note, it doesn't change any guarantees as -> io_queue_rsrc_removal() was doing allocations anyway and could -> always fail. -> - -**[v1: liburing: io_uring sendto](http://lore.kernel.org/io-uring/20230415165821.791763-1-ammarfaizi2@gnuweeb.org/)** - -> There are two patches in this series. The first patch adds -> io_uring_prep_sendto() function. The second patch addd the -> manpage and CHANGELOG. -> - -#### Rust For Linux - -**[v1: v4.1: rust: lock: introduce `SpinLock`](http://lore.kernel.org/rust-for-linux/20230419174426.132207-1-wedsonaf@gmail.com/)** - -> This is the `spinlock_t` lock backend and allows Rust code to use the -> kernel spinlock idiomatically. -> - -**[v1: .gitattributes: set diff driver for Rust source code files](http://lore.kernel.org/rust-for-linux/20230418233048.335281-1-ojeda@kernel.org/)** - -> Git supports a builtin Rust diff driver [1] since v2.23.0 (2019). -> -> It improves the choice of hunk headers in some cases, such as -> - -**[v1: Rust 1.68.2 upgrade](http://lore.kernel.org/rust-for-linux/20230418214347.324156-1-ojeda@kernel.org/)** - -> This is the first upgrade to the Rust toolchain since the initial Rust -> merge, from 1.62.0 to 1.68.2 (i.e. the latest). -> - -#### BPF - -**[v4: bpf-next: bpftool: Show map IDs along with struct_ops links.](http://lore.kernel.org/bpf/20230421214131.352662-1-kuifeng@meta.com/)** - -> A new link type, BPF_LINK_TYPE_STRUCT_OPS, was added to attach -> struct_ops to links. (226bc6ae6405) It would be helpful for users to -> know which map is associated with the link. -> - -**[v1: bpf-next: selftests/bpf: verifier/prevent_map_lookup converted to inline assembly](http://lore.kernel.org/bpf/20230421204514.2450907-1-eddyz87@gmail.com/)** - -> Test verifier/prevent_map_lookup automatically converted to use inline assembly. -> -> This was a part of a series [1] but could not be applied becuase -> another patch from a series had to be witheld. -> - -**[v1: bpf-next: Second set of verifier/*.c migrated to inline assembly](http://lore.kernel.org/bpf/20230421174234.2391278-1-eddyz87@gmail.com/)** - -> This is a follow up for RFC [1]. It migrates a second batch of 23 -> verifier/*.c tests to inline assembly and use of ./test_progs for -> actual execution. Link to the first batch is [2]. -> - -**[v1: Dump map id instead of value for map_of_maps types](http://lore.kernel.org/bpf/20230421101154.23690-1-kuro@kuroa.me/)** - -> When using `bpftool map dump` in plain format, it is usually -> more convenient to show the inner map id instead of raw value. -> Changing this behavior would help with quick debugging with -> `bpftool`, without disruption scripted behavior. Since user -> could dump the inner map with id, but need to convert value. -> - -**[v2: bpf-next: Introduce a new kfunc of bpf_task_under_cgroup](http://lore.kernel.org/bpf/20230421090403.15515-1-zhoufeng.zf@bytedance.com/)** - -> Trace sched related functions, such as enqueue_task_fair, it is necessary to -> specify a task instead of the current task which within a given cgroup. -> - -**[v1: bpf-next: selftests/xsk: put MAP_HUGE_2MB in correct argument](http://lore.kernel.org/bpf/20230421062208.3772-1-magnus.karlsson@gmail.com/)** - -> Put the flag MAP_HUGE_2MB in the correct flags argument instead of the -> wrong offset argument. -> - -**[v3: bpf-next: net/smc: Introduce BPF injection capability](http://lore.kernel.org/bpf/1682051033-66125-1-git-send-email-alibuda@linux.alibaba.com/)** - -> This patches attempt to introduce BPF injection capability for SMC, -> and add selftest to ensure code stability. -> -> As we all know that the SMC protocol is not suitable for all scenarios, -> especially for short-lived. However, for most applications, they cannot -> guarantee that there are no such scenarios at all. Therefore, apps -> may need some specific strategies to decide shall we need to use SMC -> or not, for example, apps can limit the scope of the SMC to a specific -> IP address or port. -> - -**[v2: bpf: Socket lookup BPF API from tc/xdp ingress does not respect VRF bindings.](http://lore.kernel.org/bpf/20230420145041.508434-1-gilad9366@gmail.com/)** - -> When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't -> respected. This patchset fixes this by regarding the incoming device's -> VRF attachment when performing the socket lookups from tc/xdp. -> - -**[v1: net-next: net: lan966x: Don't use xdp_frame when action is XDP_TX](http://lore.kernel.org/bpf/20230420121152.2737625-1-horatiu.vultur@microchip.com/)** - -> When the action of an xdp program was XDP_TX, lan966x was creating -> a xdp_frame and use this one to send the frame back. But it is also -> possible to send back the frame without needing a xdp_frame, because -> it possible to send it back using the page. -> And then once the frame is transmitted is possible to use directly -> page_pool_recycle_direct as lan966x is using page pools. -> This would save some CPU usage on this path. -> - -**[v5: tracing: Add fprobe events](http://lore.kernel.org/bpf/168198993129.1795549.8306571027057356176.stgit@mhiramat.roam.corp.google.com/)** - -> Here is the 5th version of improve fprobe and add a basic fprobe event -> support for ftrace (tracefs) and perf. Here is the previous version. -> - -**[v1: bpf-next: Introduce a new bpf helper of bpf_task_under_cgroup](http://lore.kernel.org/bpf/20230420072657.80324-1-zhoufeng.zf@bytedance.com/)** - -> Trace sched related functions, such as enqueue_task_fair, it is necessary to -> specify a task instead of the current task which within a given cgroup to a map. -> - -**[v2: bpf-next: Dynptr helpers](http://lore.kernel.org/bpf/20230420071414.570108-1-joannelkoong@gmail.com/)** - -> This patchset is the 3rd in the dynptr series. The 1st (dynptr -> fundamentals) can be found here [0] and the second (skb + xdp dynptrs) -> can be found here [1]. -> - -**[v2: bpf-next: Access variable length array relaxed for integer type](http://lore.kernel.org/bpf/20230420032735.27760-1-zhoufeng.zf@bytedance.com/)** - -> Add support for integer type of accessing variable length array. -> Add a selftest to check it. -> - -**[v1: bpf-next: bpftool: Replace "__fallthrough" by a comment to address merge conflict](http://lore.kernel.org/bpf/20230420003333.90901-1-quentin@isovalent.com/)** - -> The recent support for inline annotations in control flow graphs -> generated by bpftool introduced the usage of the "__fallthrough" macro -> in a switch/case block in btf_dumper.c. This change went through the -> bpf-next tree, but resulted in a merge conflict in linux-next, because -> this macro has been renamed "fallthrough" (no underscores) in the -> meantime. -> - -**[v1: bpf-next: bpf: handle another corner case in getsockopt](http://lore.kernel.org/bpf/20230418225343.553806-1-sdf@google.com/)** - -> Martin reports another case where getsockopt EFAULTs perfectly -> valid callers. Let's fix it and also replace EFAULT with -> pr_info_ratelimited. That should hopefully make this place -> less error prone. -> - -**[v2: vmlinux.lds.h: Discard .note.gnu.property section](http://lore.kernel.org/bpf/20230418214925.ay3jpf2zhw75kgmd@treble/)** - -> When tooling reads ELF notes, it assumes each note entry is aligned to -> the value listed in the .note section header's sh_addralign field. -> -> The kernel-created ELF notes in the .note.Linux and .note.Xen sections -> are aligned to 4 bytes. This causes the toolchain to set those -> sections' sh_addralign values to 4. -> - -**[v1: bpf-next: bpftool: Register struct_ops with a link.](http://lore.kernel.org/bpf/20230418200058.603169-1-kuifeng@meta.com/)** - -> You can include an optional path after specifying the object name for the -> 'struct_ops register' subcommand. -> -> Since the commit 226bc6ae6405 ("Merge branch 'Transit between BPF TCP -> congestion controls.'") has been accepted, it is now possible to create a -> link for a struct_ops. This can be done by defining a struct_ops in -> SEC(".struct_ops.link") to make libbpf returns a real link. If we don't pin -> the links before leaving bpftool, they will disappear. To instruct bpftool -> to pin the links in a directory with the names of the maps, we need to -> provide the path of that directory. -> - -**[v6: bpf-next: bpf: Add socket destroy capability](http://lore.kernel.org/bpf/20230418153148.2231644-1-aditi.ghag@isovalent.com/)** - -> This patch adds the capability to destroy sockets in BPF. We plan to use -> the capability in Cilium to force client sockets to reconnect when their -> remote load-balancing backends are deleted. The other use case is -> on-the-fly policy enforcement where existing socket connections prevented -> by policies need to be terminated. -> - -**[v2: bpf-next: XDP-hints: XDP kfunc metadata for driver igc](http://lore.kernel.org/bpf/168182460362.616355.14591423386485175723.stgit@firesoul/)** - -> Implement both RX hash and RX timestamp XDP hints kfunc metadata -> for driver igc. -> - -### 周边技术动态 - -#### Qemu - -**[v8: target/riscv: rework CPU extension validation](http://lore.kernel.org/qemu-devel/20230421132727.121462-1-dbarboza@ventanamicro.com/)** - -> This version dropped patch 12 from v7. Alistair mentioned that it would -> limiti static CPUs needlesly, since there's nothing preventing a static -> CPU to allow for extension changes during runtime, and that misa-w is -> enough to prevent write_misa() during runtime. I agree. -> - -**[v1: hw/riscv: virt: Enable booting M-mode or S-mode FW from pflash0](http://lore.kernel.org/qemu-devel/20230421043353.125701-1-sunilvl@ventanamicro.com/)** - -> Currently, virt machine supports two pflash instances each with -> 32MB size. However, the first pflash is always assumed to -> contain M-mode firmware and reset vector is set to this if -> enabled. Hence, for S-mode payloads like EDK2, only one pflash -> instance is available for use. This means both code and NV variables -> of EDK2 will need to use the same pflash. -> - -**[v3: riscv: Make sure an exception is raised if a pte is malformed](http://lore.kernel.org/qemu-devel/20230420150220.60919-1-alexghiti@rivosinc.com/)** - -> As per the specification, in 64-bit, if any of the pte reserved bits -> Memory Protection"). In addition, we must check the napot/pbmt bits are -> not set if those extensions are not active. -> - -**[v1: target/riscv: add Ventana's Veyron V1 CPU](http://lore.kernel.org/qemu-devel/20230418123624.16414-1-dbarboza@ventanamicro.com/)** - -> Add a virtual CPU for Ventana's first CPU named veyron-v1. It runs -> exclusively for the rv64 target. It's tested with the 'virt' board. -> - -**[v7: target/riscv: rework CPU extensions validation](http://lore.kernel.org/qemu-devel/20230417140013.58893-1-dbarboza@ventanamicro.com/)** - -> In this v7 we have three extra patches: -> -> - patch 4 [1] and 5 [2], both from Weiwei Li, addresses an issue that -> we're going to have with Zca and RVC if we push the priv spec -> disabling code to the end of validation. More details can be seen on -> [3]. Patch 5 commit message also has some context on it; -> - -**[v2: Add RISC-V vector cryptographic instruction set support](http://lore.kernel.org/qemu-devel/20230417135821.609964-1-lawrence.hunter@codethink.co.uk/)** - -> This patchset provides an implementation for Zvbb, Zvbc, Zvkned, Zvknh, Zvksh, -> Zvkg, and Zvksed of the draft RISC-V vector cryptography extensions as per the -> v20230407 version of the specification(1) (3206f07). This is an update to the -> patchset submitted to qemu-devel on Friday, 10 Mar 2023 16:03:01 +0000. -> - -**[v2: target/riscv: Restore the predicate() NULL check behavior](http://lore.kernel.org/qemu-devel/20230417043054.3125614-1-bmeng@tinylab.org/)** - -> When reading a non-existent CSR QEMU should raise illegal instruction -> exception, but currently it just exits due to the g_assert() check. -> -> This actually reverts commit 0ee342256af9205e7388efdf193a6d8f1ba1a617. -> Some comments are also added to indicate that predicate() must be -> provided for an implemented CSR. -> - -**[v1: riscv: implement Ssqosid extension and CBQRI controllers](http://lore.kernel.org/qemu-devel/20230416232050.4094820-1-dfustini@baylibre.com/)** - -> This RFC series implements the Ssqosid extension and the sqoscfg CSR as -> defined in the RISC-V Capacity and Bandwidth Controller QoS Register -> Interface (CBQRI) specification [1]. Quality of Service (QoS) in this -> context is concerned with shared resources on an SoC such as cache -> capacity and memory bandwidth. -> - -#### U-Boot - -**[v5: Add StarFive JH7110 PCIe drvier support](http://lore.kernel.org/u-boot/20230423105859.125764-1-minda.chen@starfivetech.com/)** - -> This patchset needs to apply after patchset in [1]. These PCIe series patches -> are based on the JH7110 RISC-V SoC and VisionFive V2 board. -> -> [1] https://patchwork.ozlabs.org/project/uboot/cover/20230329034224.26545-1-yanhong.wang@starfivetech.com -> - -**[v1: u-boot-riscv/master](http://lore.kernel.org/u-boot/ZEHbqoEXAB+BAtmo@ubuntu01/)** - -> The following changes since commit 5db4972a5bbdbf9e3af48ffc9bc4fec73b7b6a79: -> -> Merge tag 'u-boot-nand-20230417' of https://source.denx.de/u-boot/custodians/u-boot-nand-flash (2023-04-17 10:47:33 -0400) -> - -**[v1: riscv: visionfive2: use OF_BOARD_SETUP](http://lore.kernel.org/u-boot/20230419112801.GA1907@lst.de/)** - -> U-Boot already has a mechanism to fix up the DT before OS boot. -> This avoids the excessive duplication of data and work proposed -> by the explicit separation of 1.2a and 1.3b board revisions. It -> will also, to a good degree, improve the user experience, as -> pointed out by Matthias. -> - -## 20230416:第 42 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v18: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230414155843.12963-1-andy.chiu@sifive.com/)** - -> This patchset is implemented based on vector 1.0 spec to add vector support -> in riscv Linux kernel. There are some assumptions for this implementations. -> - -**[v1: riscv: mm: execute local TLB flush after populating vmemmap](http://lore.kernel.org/linux-riscv/20230414081605.471375-1-vincent.chen@sifive.com/)** - -> The spare_init() calls memmap_populate() many times to create VA to PA -> mapping for the VMEMMAP area, where all "strcut page" are located once -> CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later -> initialized in the zone_sizes_init() function. However, during this -> process, no sfence.vma instruction is executed for this VMEMMAP area. -> This omission may cause the hart to fail to perform page table work -> because some data related to the address translation is invisible to the -> hart. To solve this issue, the local_flush_tlb_kernel_range() is called -> right after the spare_init() to execute a sfence.vma instruction for the -> VMEMMAP area, ensuring that all data related to the address translation -> is visible to the hart. -> - -**[v3: Add PLL clocks driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230414024157.53203-1-xingyu.wu@starfivetech.com/)** - -> This patch serises are to add PLL clocks driver and providers by writing -> and reading syscon registers for the StarFive JH7110 RISC-V SoC. And add -> documentation to describe StarFive System Controller(syscon) Registers. -> - -**[v1: riscv: Allow userspace to directly access perf counters](http://lore.kernel.org/linux-riscv/20230413161725.195417-1-alexghiti@rivosinc.com/)** - -> riscv used to allow direct access to cycle/time/instret counters, -> bypassing the perf framework, this patchset intends to allow the user to -> mmap any counter when accessed through perf. But we can't break the -> existing behaviour so we introduce a sysctl perf_user_access like arm64 -> does, which defaults to the legacy mode described above. -> - -**[v8: Add non-coherent DMA support for AX45MP](http://lore.kernel.org/linux-riscv/20230412110900.69738-1-prabhakar.mahadev-lad.rj@bp.renesas.com/)** - -> On the Andes AX45MP core, cache coherency is a specification option so it -> may not be supported. In this case DMA will fail. To get around with this -> issue this patch series does the below: -> -> 1] Andes alternative ports is implemented as errata which checks if the IOCP -> is missing and only then applies to CMO errata. One vendor specific SBI EXT -> (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of errata. -> - -**[v4: Add JH7110 MIPI DPHY RX support](http://lore.kernel.org/linux-riscv/20230412084540.295411-1-changhuang.liang@starfivetech.com/)** - -> This patchset adds mipi dphy rx driver for the StarFive JH7110 SoC. -> It is used to transfer CSI camera data. The series has been tested on -> the VisionFive 2 board. -> - -**[v4: Add new partial clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230411135558.44282-1-xingyu.wu@starfivetech.com/)** - -> This patch serises are base on the basic JH7110 SYSCRG/AONCRG -> drivers and add new partial clock drivers and reset supports -> about System-Top-Group(STG), Image-Signal-Process(ISP) -> and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. These -> clocks and resets could be used by DMA, VIN and Display modules. -> - -**[v16: Microchip Soft IP corePWM driver](http://lore.kernel.org/linux-riscv/20230411-wizard-cautious-3c048db6b4d2@wendy/)** - -> Uwe & I had a long back and forth about period calculations on v13, -> my ultimate conclusion being that, after some testing of the "corrected" -> calculation in hardware, the original calculation was correct. -> I think we had gotten sucked into discussion the calculation of the -> period itself, when we were in fact trying to calculate a bound on the -> period instead. That discussion is here: -> https://lore.kernel.org/linux-pwm/Y+ow8tfAHo1yv1XL@wendy/ -> - -**[v1: Add JH7110 cpufreq support](http://lore.kernel.org/linux-riscv/20230411083257.16155-1-mason.huo@starfivetech.com/)** - -> The StarFive JH7110 SoC has four RISC-V cores, -> and it supports up to 4 cpu frequency loads. -> -> This patchset adds the compatible strings into the allowlist -> for supporting the generic cpufreq driver on JH7110 SoC. -> Also, it enables the axp15060 pmic for the cpu power source. -> - -**[v1: Add JH7110 DPHY PMU support](http://lore.kernel.org/linux-riscv/20230411064743.273388-1-changhuang.liang@starfivetech.com/)** - -> This patchset adds mipi dphy power domain driver for the StarFive JH7110 -> SoC. It is used to turn on dphy power switch. The series has been tested -> on the VisionFive 2 board. -> - -**[v4: -next: support allocating crashkernel above 4G explicitly on riscv](http://lore.kernel.org/linux-riscv/20230410130553.3226347-1-chenjiahao16@huawei.com/)** - -> On riscv, the current crash kernel allocation logic is trying to -> allocate within 32bit addressible memory region by default, if -> failed, try to allocate without 4G restriction. -> - -**[v1: RISC-V: Detect Ssqosid extension and handle sqoscfg CSR](http://lore.kernel.org/linux-riscv/20230410043646.3138446-1-dfustini@baylibre.com/)** - -> This RFC series adds initial support for the Ssqosid extension and the -> sqoscfg CSR as specified in Chapter 2 of the RISC-V Capacity and -> Bandwidth Controller QoS Register Interface (CBQRI) specification [1]. -> - -**[v1: ata: Change email addresses in MAINTAINERS](http://lore.kernel.org/linux-riscv/20230410042646.124962-1-dlemoal@kernel.org/)** - -> Change my email addresses referenced in the MAINTAINERS file for the ata -> subsystem to dlemoal@kernel.org. While at it, also change other -> references (zonefs and k210 drivers) to the same address. -> - -**[v1: riscv: enable BUILDTIME_TABLE_SORT for !MMU](http://lore.kernel.org/linux-riscv/20230409164306.3801-1-jszhang@kernel.org/)** - -> BUILDTIME_TABLE_SORT works for !MMU as well, so enable it. -> - -#### 进程调度 - -**[v2: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230415050617.324288-1-yury.norov@gmail.com/)** - -> for_each_cpu() is widely used in kernel, and it's beneficial to create -> a NUMA-aware version of the macro. -> -> Recently added for_each_numa_hop_mask() works, but switching existing -> codebase to it is not an easy process. -> - -**[v6: sched/numa: add per-process numa_balancing](http://lore.kernel.org/lkml/20230412140701.58337-1-ligang.bdlg@bytedance.com/)** - -> # Introduce -> Add PR_NUMA_BALANCING in prctl. -> -> A large number of page faults will cause performance loss when numa -> balancing is performing. Thus those processes which care about worst-case -> performance need numa balancing disabled. Others, on the contrary, allow a -> temporary performance loss in exchange for higher average performance, so -> enable numa balancing is better for them. -> - -**[v1: sched/core: Make sched_dynamic_mutex static](http://lore.kernel.org/lkml/016987c1ec4649b74973a000e81c35e48ba6072e.1681277194.git.jpoimboe@kernel.org/)** - -> The sched_dynamic_mutex is only used within the file. Make it static. -> - -**[v1: sched: Rate limit migrations](http://lore.kernel.org/lkml/20230411214116.361016-1-mathieu.desnoyers@efficios.com/)** - -> This WIP patch rate-limits migrations to 32 migrations per 10ms window -> for each task. -> - -#### 内存管理 - -**[v8: mm: process/cgroup ksm support](http://lore.kernel.org/linux-mm/20230415225913.3206647-1-shr@devkernel.io/)** - -> So far KSM can only be enabled by calling madvise for memory regions. To -> be able to use KSM for more workloads, KSM needs to have the ability to be -> enabled / disabled at the process / cgroup level. -> - -**[v5: Replace invocations of prandom_u32() with get_random_u32()](http://lore.kernel.org/linux-mm/20230415173549.5345-1-david.keisarschm@mail.huji.ac.il/)** - -> The security improvements for prandom_u32 done in commits c51f8f88d705 -> from October 2020 and d4150779e60f from May 2022 didn't handle the cases -> when prandom_bytes_state() and prandom_u32_state() are used. -> - -**[v1: mm: rename reclaim_pages() to reclaim_folios()](http://lore.kernel.org/linux-mm/20230415092716.61970-1-wangkefeng.wang@huawei.com/)** - -> As commit a83f0551f496 ("mm/vmscan: convert reclaim_pages() to use -> a folio") changes the arg from page_list to folio_list, but not -> the defination, let's correct it and rename it to reclaim_folios too. -> - -**[v2: mm: make arch_has_descending_max_zone_pfns() static](http://lore.kernel.org/linux-mm/20230415081904.969049-1-arnd@kernel.org/)** - -> clang produces a build failure on x86 for some randconfig builds -> after a change that moves around code to mm/mm_init.c: -> -> Cannot find symbol for section 2: .text. -> mm/mm_init.o: failed -> - -**[v1: NFSD memory allocation optimizations](http://lore.kernel.org/linux-mm/168151777579.1588.7882383278745556830.stgit@klimt.1015granger.net/)** - -> I've found a few ways to optimize the release of pages in NFSD. -> Please let me know if I'm abusing the release_pages() and pagevec -> APIs. -> - -**[v1: mm/folio: Avoid special handling for order value 0 in folio_set_order](http://lore.kernel.org/linux-mm/20230414194832.973194-1-tsahu@linux.ibm.com/)** - -> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order -> folio does not have any tail page to set order. folio->_folio_nr_pages is -> set to 0 for order 0 in folio_set_order. It is required because -> _folio_nr_pages overlapped with page->mapping and leaving it non zero -> caused "bad page" error while freeing gigantic hugepages. This was fixed in -> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic -> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA -> pages to CMA") now explicitly clear page->mapping and hence we won't see -> the bad page error even if _folio_nr_pages remains unset. Also the order 0 -> folios are not supposed to call folio_set_order, So now we can get rid of -> folio_set_order(folio, 0) from hugetlb code path to clear the confusion. -> - -**[v4: modules/kmod: replace implementation with a sempahore](http://lore.kernel.org/linux-mm/20230414171644.2434448-1-mcgrof@kernel.org/)** - -> Changes on this v4: -> -> o Really add Matthew Wilcox' preferred tribal knowledge docs -> o Add all the pending tags -> - -**[v1: lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes](http://lore.kernel.org/linux-mm/20230414162755.281993820@linutronix.de/)** - -> The cpu_dying_mask is not only undocumented but also to some extent a -> misnomer. It's purpose is to capture the last direction of a cpu_up() or -> cpu_down() operation taking eventual rollback operations into account. -> - -**[v5: Introduce Copy-On-Write to Page Table](http://lore.kernel.org/linux-mm/20230414142341.354556-1-shiyn.lin@gmail.com/)** - -> This patch is primarily aimed at optimizing the memory usage of page -> table in processes with large address space, which can potentailly lead -> to improved the fork system calll latency under certain conditions. -> However, we're planning to improve the fork latency in the future but -> not in this patch. -> - -**[v1: mm: page_alloc: Skip regions with hugetlbfs pages when allocating 1G pages](http://lore.kernel.org/linux-mm/20230414141429.pwgieuwluxwez3rj@techsingularity.net/)** - -> A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is -> taking an excessive amount of time for large amounts of memory. Further -> testing allocating huge pages that the cost is linear i.e. if allocating -> 1G pages in batches of 10 then the time to allocate nr_hugepages from -> 10->20->30->etc increases linearly even though 10 pages are allocated at -> each step. Profiles indicated that much of the time is spent checking the -> validity within already existing huge pages and then attempting a migration -> that fails after isolating the range, draining pages and a whole lot of -> other useless work. -> - -**[v1: mm: page_alloc: Assume huge tail pages are valid when allocating contiguous pages](http://lore.kernel.org/linux-mm/20230414082222.idgw745cgcduzy37@techsingularity.net/)** - -> A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is -> taking an excessive amount of time for large amounts of memory. Further -> testing allocating huge pages that the cost is linear i.e. if allocating -> 1G pages in batches of 10 then the time to allocate nr_hugepages from -> 10->20->30->etc increases linearly even though 10 pages are allocated at -> each step. -> - -**[v3: module: avoid userspace pressure on unwanted allocations](http://lore.kernel.org/linux-mm/20230414050836.1984746-1-mcgrof@kernel.org/)** - -> This v3 series follows up on the second iteration of these patches [0]. This -> and other pending changes are avaiable on 20230413-module-alloc-opts -> branch [1] which is based on modules-next. -> - -**[v2: mm: ksm: support hwpoison for ksm page](http://lore.kernel.org/linux-mm/20230414021741.2597273-1-xialonglong1@huawei.com/)** - -> Currently, ksm does not support hwpoison. As ksm is being used more widely -> for deduplication at the system level, container level, and process level, -> supporting hwpoison for ksm has become increasingly important. However, ksm -> pages were not processed by hwpoison in 2009 [1]. -> - -**[v1: migrate_pages: Never block waiting for the page lock](http://lore.kernel.org/linux-mm/20230413182313.RFC.1.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid/)** - -> Currently when we try to do page migration and we're in "synchronous" -> mode (and not doing direct compaction) then we'll wait an infinite -> amount of time for a page lock. This does not appear to be a great -> idea. -> - -**[v1: Setting memory policy for restrictedmem file](http://lore.kernel.org/linux-mm/cover.1681430907.git.ackerleytng@google.com/)** - -> This patchset builds upon the memfd_restricted() system call that was -> discussed in the 'KVM: mm: fd-based approach for supporting KVM' patch -> series [1]. -> - -**[v1: change ->index to PAGE_SIZE for hugetlb pages](http://lore.kernel.org/linux-mm/20230413231452.84529-1-sidhartha.kumar@oracle.com/)** - -> This RFC patch series attempts to simplify the page cache code by removing -> special casing code for hugetlb pages. Normal pages in the page cache are -> indexed by PAGE_SIZE while hugetlb pages are indexed by their huge page -> size. This was previously tried but the xarray was not performant enough -> for the changes. -> - -**[v2: -next: mm: hwpoison: support recovery from HugePage copy-on-write faults](http://lore.kernel.org/linux-mm/20230413131349.2524210-1-liushixin2@huawei.com/)** - -> copy-on-write of hugetlb user pages with uncorrectable errors will result -> in a kernel crash. This is because the copy is performed in kernel mode -> and in general we can not handle accessing memory with such errors while -> in kernel mode. Commit a873dfe1032a ("mm, hwpoison: try to recover from -> copy-on write faults") introduced the routine copy_user_highpage_mc() to -> gracefully handle copying of user pages with uncorrectable errors. However, -> the separate hugetlb copy-on-write code paths were not modified as part -> of commit a873dfe1032a. -> - -**[v6: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230413104034.1086717-1-yosryahmed@google.com/)** - -> Upon running some proactive reclaim tests using memory.reclaim, we -> noticed some tests flaking where writing to memory.reclaim would be -> successful even though we did not reclaim the requested amount fully -> Looking further into it, I discovered that *sometimes* we overestimate -> the number of reclaimed pages in memcg reclaim. -> - -**[v1: printk: Export console trace point for kcsan/kasan/kfence/kmsan](http://lore.kernel.org/linux-mm/20230413100859.1492323-1-quic_pkondeti@quicinc.com/)** - -> The console tracepoint is used by kcsan/kasan/kfence/kmsan test -> modules. Since this tracepoint is not exported, these modules iterate -> over all available tracepoints to find the console trace point. -> Export the trace point so that it can be directly used. -> - -**[v7: ksm: support tracking KSM-placed zero-pages](http://lore.kernel.org/linux-mm/202304131346489021903@zte.com.cn/)** - -> The core idea of this patch set is to enable users to perceive the number -> of any pages merged by KSM, regardless of whether use_zero_page switch has -> been turned on, so that users can know how much free memory increase is -> really due to their madvise(MERGEABLE) actions. But the problem is, when -> enabling use_zero_pages, all empty pages will be merged with kernel zero -> pages instead of with each other as use_zero_pages is disabled, and then -> these zero-pages are no longer monitored by KSM. -> - -**[v1: mm: hwpoison: coredump: support recovery from dump_user_range()](http://lore.kernel.org/linux-mm/20230413041336.26874-1-wangkefeng.wang@huawei.com/)** - -> The dump_user_range() is used to copy the user page to a coredump -> file, but if a hardware memory error occurred during copy, which -> called from __kernel_write_iter() in dump_user_range(), it crashs, -> - -**[v1: selftests/mm: Replace obsolete memalign() with posix_memalign()](http://lore.kernel.org/linux-mm/20230413012751.4445-1-wangdeming@inspur.com/)** - -> memalign() is obsolete according to its manpage. -> -> Replace memalign() with posix_memalign(). -> - -**[v1: mm: huge_memory: Replace obsolete memalign() with posix_memalign()](http://lore.kernel.org/linux-mm/20230413011719.4355-1-wangdeming@inspur.com/)** - -> memalign() is obsolete according to its manpage. -> -> Replace memalign() with posix_memalign() -> - -**[v2: mm: hugetlb_vmemmap: provide stronger vmemmap allocation guarantees](http://lore.kernel.org/linux-mm/20230412195939.1242462-1-pasha.tatashin@soleen.com/)** - -> HugeTLB pages have a struct page optimizations where struct pages for tail -> pages are freed. However, when HugeTLB pages are destroyed, the memory for -> struct pages (vmemmap) need to be allocated again. -> - -**[v1: mm: hugetlb_vmemmap: provide stronger vmemmap allocaction gurantees](http://lore.kernel.org/linux-mm/20230412152337.1203254-1-pasha.tatashin@soleen.com/)** - -> HugeTLB pages have a struct page optimizations where struct pages for tail -> pages are freed. However, when HugeTLB pages are destroyed, the memory for -> struct pages (vmemmap) need to be allocated again. -> - -#### 文件系统 - -**[v1: fanotify: support watching filesystems and mounts inside userns](http://lore.kernel.org/linux-fsdevel/20230416060722.1912831-1-amir73il@gmail.com/)** - -> An unprivileged user is allowed to create an fanotify group and add -> inode marks, but not filesystem and mount marks. -> - -**[v2: fs/proc: add Kthread flag to /proc/$pid/status](http://lore.kernel.org/linux-fsdevel/20230416052404.2920-1-fullspring2018@gmail.com/)** - -> The command `ps -ef ` and `top -c` mark kernel thread by '[' -> and ']', but sometimes the result is not correct. -> The task->flags in /proc/$pid/stat is good, but we need remember -> the value of PF_KTHREAD is 0x00200000 and convert dec to hex. -> If we have no binary program and shell script which read -> /proc/$pid/stat, we can know it directly by -> `cat /proc/$pid/status`. -> - -**[v1: Monitoring unmounted fs with fanotify](http://lore.kernel.org/linux-fsdevel/20230414182903.1852019-1-amir73il@gmail.com/)** - -> Followup on my quest to close the gap with inotify functionality, -> here is a proposal for FAN_UNMOUNT event. -> - -**[v2: Alter fcntl to handle int arguments correctly](http://lore.kernel.org/linux-fsdevel/20230414152459.816046-1-Luca.Vizzarro@arm.com/)** - -> According to the documentation of fcntl, some commands take an int as -> argument. In practice not all of them enforce this behaviour, as they -> instead accept a more permissive long and in most cases not even a -> range check is performed. -> -> An issue could possibly arise from a combination of the handling of the -> varargs in user space and the ABI rules of the target, which may result -> in the top bits of an int argument being non-zero. -> - -**[v1: mm/filemap: allocate folios according to the blocksize](http://lore.kernel.org/linux-fsdevel/20230414134908.103932-1-hare@suse.de/)** - -> If the blocksize is larger than the pagesize allocate folios -> with the correct order. -> - -**[v1: convert create_page_buffers to create_folio_buffers](http://lore.kernel.org/linux-fsdevel/20230414110821.21548-1-p.raghav@samsung.com/)** - -> One of the first kernel panic we hit when we try to increase the -> block size > 4k is inside create_page_buffers()[1]. Even though buffer.c -> function do not support large folios (folios > PAGE_SIZE) at the moment, -> these changes are required when we want to remove that constraint. -> - -**[v3: Introduce provisioning primitives for thinly provisioned storage](http://lore.kernel.org/linux-fsdevel/20230414000219.92640-1-sarthakkukreti@chromium.org/)** - -> This patch series adds a mechanism to pass through provision requests on -> stacked thinly provisioned block devices. -> - -**[v1: fs/ntfs3: disable page fault during ntfs_fiemap()](http://lore.kernel.org/linux-fsdevel/f649c9c0-6c0c-dd0d-e3c9-f0c580a11cd9@I-love.SAKURA.ne.jp/)** - -> syzbot is reporting circular locking dependency between ntfs_file_mmap() -> (which has mm->mmap_lock => ni->ni_lock dependency) and ntfs_fiemap() -> (which has ni->ni_lock => mm->mmap_lock dependency). -> - -**[v1: Backport several patches to 5.10.y](http://lore.kernel.org/linux-fsdevel/20230412041935.1556-1-yb203166@antfin.com/)** - -> Antgroup is using 5.10.y in product environment, we found several patches are -> missing in 5.10.y tree. These patches are needed for us. So we backported them -> to 5.10.y -> - -**[v6: net-next: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1](http://lore.kernel.org/linux-fsdevel/20230411160902.4134381-1-dhowells@redhat.com/)** - -> Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES -> internal sendmsg flag that is intended to replace the ->sendpage() op with -> calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol -> that it should splice the pages supplied if it can and copy them if not. -> - -**[v1: [RESEND] fs: opportunistic high-res file timestamps](http://lore.kernel.org/linux-fsdevel/20230411143702.64495-1-jlayton@kernel.org/)** - -> (Apologies for the resend, but I didn't send this with a wide enough -> distribution list originally). -> - -**[v1: fs: opportunistic high-res file timestamps](http://lore.kernel.org/linux-fsdevel/20230411142708.62475-1-jlayton@kernel.org/)** - -> While I don't think we can practically optimize away ctime updates -> like we do with i_version, I do like the idea of using this scheme to -> indicate when we need to use a high-res timestamp. -> - -**[v1: fanotify: Enable FAN_REPORT_FID on more filesystem types](http://lore.kernel.org/linux-fsdevel/20230411124037.1629654-1-amir73il@gmail.com/)** - -> If kernel supports FAN_REPORT_ANY_FID, use this flag to allow testing -> also filesystems that do not support fsid or NFS file handles (e.g. fuse). -> - -**[v9: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20230411081041.5328-1-anuj20.g@samsung.com/)** - -> The patch series covers the points discussed in November 2021 virtual -> call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0]. -> We have covered the initial agreed requirements in this patchset and -> further additional features suggested by community. -> Patchset borrows Mikulas's token based approach for 2 bdev -> implementation. -> - -**[v4: Providing mount in memfd_restricted() syscall](http://lore.kernel.org/linux-fsdevel/cover.1681176340.git.ackerleytng@google.com/)** - -> This patchset builds upon the memfd_restricted() system call that was -> discussed in the 'KVM: mm: fd-based approach for supporting KVM' patch -> series, at -> https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/ -> - -**[v2: sysv: don't call sb_bread() with pointers_lock held](http://lore.kernel.org/linux-fsdevel/38509ddd-51a2-70da-6564-9ded34b2f363@I-love.SAKURA.ne.jp/)** - -> syzbot is reporting sleep in atomic context in SysV filesystem [1], for -> sb_bread() is called with rw_spinlock held. -> -> A "write_lock(&pointers_lock) => read_lock(&pointers_lock) deadlock" bug -> and a "sb_bread() with write_lock(&pointers_lock)" bug were introduced by -> "Replace BKL for chain locking with sysvfs-private rwlock" in Linux 2.5.12. -> - -**[v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/)** - -> This removes the dependency on interrupts to wake up task. Set task -> state as TASK_RUNNING, if need_resched() returns true, -> while polling for IO completion. -> Earlier, polling task used to sleep, relying on interrupt to wake it up. -> This made some IO take very long when interrupt-coalescing is enabled in -> NVMe. -> - -#### 网络设备 - -**[v1: brcmfmac: Demote some kernel errors to info](http://lore.kernel.org/netdev/20230416-brcmfmac-noise-v1-0-f0624e408761@marcan.st/)** - -> brcmfmac has some messages that are KERN_ERR even though they are -> harmless. This is spooking and confusing people, because they end up -> being the *only* kernel messages on their boot console with common -> error-only printk levels (at least on Apple Macs). -> - -**[v1: net: virtio-net: reject small vring sizes](http://lore.kernel.org/netdev/20230416074607.292616-1-alvaro.karsz@solid-run.com/)** - -> Check vring size and fail probe if a transmit/receive vring size is -> smaller than MAX_SKB_FRAGS + 2. -> -> At the moment, any vring size is accepted. This is problematic because -> it may result in attempting to transmit a packet with more fragments -> than there are descriptors in the ring. -> - -**[v1: net-next: ethtool mm API improvements](http://lore.kernel.org/netdev/20230415173454.3970647-1-vladimir.oltean@nxp.com/)** - -> Currently the ethtool --set-mm API permits the existence of 2 -> configurations which don't make sense: -> -> - pmac-enabled false tx-enabled true -> - tx-enabled false verify-enabled true -> - -**[v1: net-next: Ocelot/Felix driver support for preemptible traffic classes](http://lore.kernel.org/netdev/20230415170551.3939607-1-vladimir.oltean@nxp.com/)** - -> The series "Add tc-mqprio and tc-taprio support for preemptible traffic -> classes" from: -> https://lore.kernel.org/netdev/20230220122343.1156614-1-vladimir.oltean@nxp.com/ -> -> was eventually submitted in a form without the support for the -> Ocelot/Felix switch driver. This patch set picks up that work again, -> and presents a fairly modified form compared to the original. -> - -**[v2: net: net/sched: clear actions pointer in miss cookie init fail](http://lore.kernel.org/netdev/20230415153309.241940-1-pctammela@mojatatu.com/)** - -> Palash reports a UAF when using a modified version of syzkaller[1]. -> -> When 'tcf_exts_miss_cookie_base_alloc()' fails in 'tcf_exts_init_ex()' -> a call to 'tcf_exts_destroy()' is made to free up the tcf_exts -> resources. -> In flower, a call to '__fl_put()' when 'tcf_exts_init_ex()' fails is made; -> Then calling 'tcf_exts_destroy()', which triggers an UAF since the -> already freed tcf_exts action pointer is lingering in the struct. -> - -**[v2: net-next: tsnep: XDP socket zero-copy support](http://lore.kernel.org/netdev/20230415144256.27884-1-gerhard@engleder-embedded.com/)** - -> Implement XDP socket zero-copy support for tsnep driver. I tried to -> follow existing drivers like igc as far as possible. But one main -> - -**[v2: net-next: r8169: use new macros from netdev_queues.h](http://lore.kernel.org/netdev/f07fd01b-b431-6d8d-bd14-d447dffd8e64@gmail.com/)** - -> Add one missing subqueue version of the macros, and use the new macros -> in r8169 to simplify the code. -> - -**[v6: net-next: XDP Rx HWTS metadata for stmmac driver](http://lore.kernel.org/netdev/20230415064503.3225835-1-yoong.siang.song@intel.com/)** - -> Implemented XDP receive hardware timestamp metadata for stmmac driver. -> -> This patchset is tested with tools/testing/selftests/bpf/xdp_hw_metadata. -> Below are the test steps and results. -> - -**[v1: net-next: sctp: add some missing peer_capables in sctp info dump](http://lore.kernel.org/netdev/cover.1681507192.git.lucien.xin@gmail.com/)** - -> The 1st patch removes the unused and obsolete hostname_address from -> sctp_association peer and also the bit from sctp_info peer_capables, -> and then reuses its bit for reconf_capable and use the higher -> available bit for intl_capable in the 2nd patch. -> - -**[v6: ip.7: Add "special and reserved addresses" section](http://lore.kernel.org/netdev/20230414184558.GB2557040@demorgan/)** - -> Break out the discussion of special and reserved IPv4 addresses into -> a subsection, formatted as a pair of definition lists, and briefly -> describing three cases in which Linux no longer treats addresses -> specially, where other systems do or did. -> - -**[v1: net-next: eth: mlx5: avoid iterator use outside of a loop](http://lore.kernel.org/netdev/20230414180729.198284-1-kuba@kernel.org/)** - -> Fix the following warning about risky iterator use: -> -> drivers/net/ethernet/mellanox/mlx5/core/eq.c:1010 mlx5_comp_irq_get_affinity_mask() warn: iterator used outside loop: 'eq' -> - -**[v1: ice: document RDMA devlink parameters](http://lore.kernel.org/netdev/20230414162614.571861-1-jacob.e.keller@intel.com/)** - -> Commit e523af4ee560 ("net/ice: Add support for enable_iwarp and enable_roce -> devlink param") added support for the enable_roce and enable_iwarp -> parameters in the ice driver. It didn't document these parameters in the -> ice devlink documentation file. Add this documentation, including a note -> about the mutual exclusion between the two modes. -> - -**[v1: net-next: net: skbuff: hide some bitfield members](http://lore.kernel.org/netdev/20230414160105.172125-1-kuba@kernel.org/)** - -> There is a number of protocol or subsystem specific fields -> in struct sk_buff which are only accessed by one subsystem. -> We can wrap them in ifdefs with minimal code impact. -> -> This gives us a better chance to save a 2B and a 4B holes -> resulting with the following savings (assuming a lucky -> kernel config): -> - -**[v2: net-next: ax25: exit linked-list searches earlier](http://lore.kernel.org/netdev/20230414143357.5523-1-peter@n8pjl.ca/)** - -> There's no need to loop until the end of the list if we have a result. -> -> Device callsigns are unique, so there can only be one dev returned from -> ax25_addr_ax25dev(). If not, there would be inconsistencies based on -> order of insertion, and refcount leaks. -> - -**[v1: net-next: selftests: openvswitch: add support for testing upcall interface](http://lore.kernel.org/netdev/20230414131750.4185160-1-aconole@redhat.com/)** - -> The existing selftest suite for openvswitch will work for regression -> testing the datapath feature bits, but won't test things like adding -> interfaces, or the upcall interface. Here, we add some additional -> test facilities. -> - -**[v1: net: wwan: Expose secondary AT port on DATA1](http://lore.kernel.org/netdev/20230414-rpmsg-wwan-secondary-at-port-v1-1-6d7307527911@nayarsystems.com/)** - -> Our use-case needs two AT ports available: -> One for running a ppp daemon, and another one for management -> -> This patch enables a second AT port on DATA1 -> - -**[答复: v1: net: Add check for csum_start in skb_partial_csum_set()](http://lore.kernel.org/netdev/a30a8ffaa8dd4cb6a84103eecf0c3338@huawei.com/)** - -> Conceivably this can be added, though it is a bit complex for devices with variable length link layer headers. And it would have to happen not only for packet sockets, but all users of virtio_net_hdr. -> - -**[v1: net-next: net: phy: add driver for MediaTek SoC built-in GE PHYs](http://lore.kernel.org/netdev/ZDihjfnzaZ1yh9cT@makrotopia.org/)** - -> Some of MediaTek's Filogic SoCs come with built-in Gigabit Ethernet -> PHYs which require calibration data from the SoC's efuse. -> Add support for these PHYs to the mediatek-ge driver if built for -> MediaTek's ARM64 SoCs. -> - -**[v2: net-next: virtio/vsock: support datagrams](http://lore.kernel.org/netdev/20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com/)** - -> This series introduces support for datagrams to virtio/vsock. -> -> It is a spin-off (and smaller version) of this series from the summer: -> https://lore.kernel.org/all/cover.1660362668.git.bobby.eshleman@bytedance.com/ -> - -**[v1: Enable multiple MCAN on AM62x](http://lore.kernel.org/netdev/20230413223051.24455-1-jm@ti.com/)** - -> On AM62x there is one MCAN in MAIN domain and two in MCU domain. -> The MCANs in MCU domain were not enabled since there is no -> hardware interrupt routed to A53 GIC interrupt controller. -> Therefore A53 Linux cannot be interrupted by MCU MCANs. -> - -**[v1: net: Revert "net/mlx5: Enable management PF initialization"](http://lore.kernel.org/netdev/20230413222547.56901-1-kuba@kernel.org/)** - -> Paul reports that it causes a regression with IB on CX4 -> and FW 12.18.1000. In addition I think that the concept -> of "management PF" is not fully accepted and requires -> a discussion. -> - -**[v1: net-next: net: page_pool: add pages and released_pages counters](http://lore.kernel.org/netdev/a20f97acccce65d174f704eadbf685d0ce1201af.1681422222.git.lorenzo@kernel.org/)** - -> Introduce pages and released_pages counters to page_pool ethtool stats -> in order to track the number of allocated and released pages from the -> pool. -> - -**[GIT PULL: Networking for v6.3-rc7](http://lore.kernel.org/netdev/20230413213217.822550-1-kuba@kernel.org/)** - -> Including fixes from bpf, and bluetooth. -> -> Not all that quiet given spring celebrations, but "current" fixes -> are thinning out, which is encouraging. One outstanding regression -> in the mlx5 driver when using old FW, not blocking but we're pushing -> for a fix. -> - -**[v5: Add EMAC3 support for sa8540p-ride (devicetree/clk bits)](http://lore.kernel.org/netdev/20230413191541.1073027-1-ahalaney@redhat.com/)** - -> This is a forward port / upstream refactor of code delivered -> downstream by Qualcomm over at [0] to enable the DWMAC5 based -> implementation called EMAC3 on the sa8540p-ride dev board. -> - -**[v9: Another crack at a handshake upcall mechanism](http://lore.kernel.org/netdev/168141287044.157208.15120359741792569671.stgit@manet.1015granger.net/)** - -> Here is v9 of a series to add generic support for transport layer -> security handshake on behalf of kernel socket consumers (user space -> consumers use a security library directly, of course). -> - -**[v1: net-next: lib/win_minmax: export symbol of minmax_running_min](http://lore.kernel.org/netdev/20230413164726.59019-1-bobankhshen@gmail.com/)** - -> This commit export the symbol of the function minmax_running_min -> to make it accessible to dynamically loaded modules. It can make -> this library more general, especially for those congestion -> control algorithm modules who wants to implement a windowed min -> filter. -> - -**[v1: staging: octeon: Convert to use phylink](http://lore.kernel.org/netdev/ZDgNexVTEfyGo77d@lenoch/)** - -> The purpose of this patches is to provide support for SFP cage to -> Octeon ethernet driver. -> - -**[v4: net-next: Add SCM_PIDFD and SO_PEERPIDFD](http://lore.kernel.org/netdev/20230413133355.350571-1-aleksandr.mikhalitsyn@canonical.com/)** - -> 1. Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS, -> but it contains pidfd instead of plain pid, which allows programmers not -> to care about PID reuse problem. -> -> 2. Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd. -> This thing is direct analog of SO_PEERCRED which allows to get plain PID. -> -> 3. Add SCM_PIDFD / SO_PEERPIDFD kselftest -> - -**[v2: bpf-next: bpf: add netfilter program type](http://lore.kernel.org/netdev/20230413133228.20790-1-fw@strlen.de/)** - -> The new program type is 'tracing style', i.e. there is no context -> access rewrite done by verifier, the function argument (struct bpf_nf_ctx) -> isn't stable. -> There is no support for direct packet access, dynptr api should be used -> instead. -> - -**[v1: net-next: Support tunnel mode in mlx5 IPsec packet offload](http://lore.kernel.org/netdev/cover.1681388425.git.leonro@nvidia.com/)** - -> This series extends mlx5 to support tunnel mode in its IPsec packet -> offload implementation. -> - -**[v2: net: Finish up ->msg_control{,_user} split](http://lore.kernel.org/netdev/20230413114705.157046-1-kevin.brodsky@arm.com/)** - -> Commit 1f466e1f15cf ("net: cleanly handle kernel vs user buffers for -> ->msg_control") introduced the msg_control_user and -> msg_control_is_user fields in struct msghdr, to ensure that user -> pointers are represented as such. It also took care of converting most -> users of struct msghdr::msg_control where user pointers are involved. It -> did however miss a number of cases, and some code using msg_control -> inappropriately has also appeared in the meantime. -> - -**[v8: net/packet: support mergeable feature of virtio](http://lore.kernel.org/netdev/20230413114402.50225-1-amy.saq@antgroup.com/)** - -> Packet sockets, like tap, can be used as the backend for kernel vhost. -> In packet sockets, virtio net header size is currently hardcoded to be -> the size of struct virtio_net_hdr, which is 10 bytes; however, it is not -> always the case: some virtio features, such as mrg_rxbuf, need virtio -> net header to be 12-byte long. -> - -**[v5: net-next: Support MACsec VLAN](http://lore.kernel.org/netdev/20230413105622.32697-1-ehakim@nvidia.com/)** - -> This patch series introduces support for hardware (HW) offload MACsec -> devices with VLAN configuration. The patches address both scenarios -> where the VLAN header is both the inner and outer header for MACsec. -> - -**[v3: net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg](http://lore.kernel.org/netdev/ZDfbCsDa6oLKzsed@pr0lnx/)** - -> If the TCA_QFQ_LMAX value is not offered through nlattr, lmax is determined by the MTU value of the network device. -> The MTU of the loopback device can be set up to 2^31-1. -> As a result, it is possible to have an lmax value that exceeds QFQ_MIN_LMAX. -> - -**[v1: net-next: bridge: Add per-{Port, VLAN} neighbor suppression](http://lore.kernel.org/netdev/20230413095830.2182382-1-idosch@nvidia.com/)** - -> In order to minimize the flooding of ARP and ND messages in the VXLAN -> network, EVPN includes provisions [1] that allow participating VTEPs to -> suppress such messages in case they know the MAC-IP binding and can -> reply on behalf of the remote host. In Linux, the above is implemented -> in the bridge driver using a per-port option called "neigh_suppress" -> that was added in kernel version 4.15 [2]. -> - -#### 异步 IO - -**[v1: liburing: io_uring sendto](http://lore.kernel.org/io-uring/20230415165821.791763-1-ammarfaizi2@gnuweeb.org/)** - -> There are two patches in this series. The first patch adds -> io_uring_prep_sendto() function. The second patch addd the -> manpage and CHANGELOG. -> - -**[v3: liburing: multishot timeout support](http://lore.kernel.org/io-uring/20230414225506.4108955-1-davidhwei@meta.com/)** - -> Changes on the liburing side to support multishot timeouts. -> - -**[v1: io_uring: complete request via task work in case of DEFER_TASKRUN](http://lore.kernel.org/io-uring/20230414075313.373263-1-ming.lei@redhat.com/)** - -> So far io_req_complete_post() only covers DEFER_TASKRUN by completing -> request via task work when the request is completed from IOWQ. -> -> However, uring command could be completed from any context, and if io -> uring is setup with DEFER_TASKRUN, the command is required to be -> completed from current context, otherwise wait on IORING_ENTER_GETEVENTS -> can't be wakeup, and may hang forever. -> - -**[v2: liburing: add multishot timeout support](http://lore.kernel.org/io-uring/20230412222931.1635706-1-davidhwei@meta.com/)** - -> Single change to sync the new IORING_TIMEOUT_MULTISHOT flag with kernel. -> -> Mostly unit tests for multishot timeouts. -> - -**[v1: io_uring/uring_cmd: take advantage of completion batching](http://lore.kernel.org/io-uring/bbcdf761-e6f2-c2c5-dfb7-4579124a8fd5@kernel.dk/)** - -> We know now what the completion context is for the uring_cmd completion -> handling, so use that to have io_req_task_complete() decide what the -> best way to complete the request is. This allows batching of the posted -> completions if we have multiple pending, rather than always doing them -> one-by-one. -> - -#### Rust For Linux - -**[v1: rust: init: broaden the blanket impl of `Init`](http://lore.kernel.org/rust-for-linux/20230413100157.740697-1-benno.lossin@proton.me/)** - -> This makes it possible to use `T` as a `impl Init` for every error -> type `E` instead of just `Infallible`. -> - -**[v1: MAINTAINERS: add Benno Lossin as Rust reviewer](http://lore.kernel.org/rust-for-linux/20230412221823.830135-1-ojeda@kernel.org/)** - -> Benno has been involved with the Rust for Linux project for -> the better part of a year now. He has been working on solving -> the safe pinned initialization problem [1], which resulted in -> the pin-init API patch series [2] that allows to reduce the -> need for `unsafe` code in the kernel. He is also working on -> the field projection RFC for Rust [3] to bring pin-init as -> a language feature. -> - -**[v1: v4.1: rust: lock: add `Guard::do_unlocked`](http://lore.kernel.org/rust-for-linux/20230412121431.41627-1-wedsonaf@gmail.com/)** - -> It releases the lock, executes some function provided by the caller, -> then reacquires the lock. This is preparation for the implementation of -> condvars, which will sleep after between unlocking and relocking. -> - -**[v5: scripts: `make rust-analyzer` for out-of-tree modules](http://lore.kernel.org/rust-for-linux/20230411091714.130525-1-varmavinaym@gmail.com/)** - -> Adds support for out-of-tree rust modules to use the `rust-analyzer` -> make target to generate the rust-project.json file. -> -> The change involves adding an optional parameter `external_src` to the -> `generate_rust_analyzer.py` which expects the path to the out-of-tree -> module's source directory. When this parameter is passed, I have chosen -> not to add the non-core modules (samples and drivers) into the result -> since these are not expected to be used in third party modules. Related -> changes are also made to the Makefile and rust/Makefile allowing the -> `rust-analyzer` target to be used for out-of-tree modules as well. -> - -#### BPF - -**[v1: A new bpf map type for fuzzy matching key](http://lore.kernel.org/bpf/303b5895-319d-2bb7-9909-10fec3323df2@antgroup.com/)** - -> For supporting fuzzy matching in bpf map as described in the original -> question [0], we come up with a proposal that would like to have some -> advice or comments from bpf thread. Thanks a lot for all the feedback :) -> -> We plan to implement a new bpf map type, naming BPF_FM_MAP, standing for -> fuzzy matching map. -> The basic idea is implementing a trie-tree using map of map runtime -> structure. -> - -**[v2: bpf-next: Shared ownership for local kptrs](http://lore.kernel.org/bpf/20230415201811.343116-1-davemarchevsky@fb.com/)** - -> The above program will fail verification due to current owning / non-owning ref -> logic: after bpf_list_push_back, n is a non-owning reference and thus cannot be -> passed to bpf_rbtree_add. The only way to get an owning reference for the node -> that was added is to bpf_list_pop_{front,back} it. -> - -**[v2: libbpf: correct the macro KERNEL_VERSION for old kernel](http://lore.kernel.org/bpf/20230414084353.36545-1-songrui.771@bytedance.com/)** - -> The introduced header file linux/version.h in libbpf_probes.c may have a wrong macro KERNEL_VERSION for calculating LINUX_VERSION_CODE in some old kernel (Debian9,10). Below is a version info example from Debian 10. -> - -**[v1: vmlinux.lds.h: Discard .note.gnu.property section](http://lore.kernel.org/bpf/20230413185922.ufmollqlnlghwyvy@treble/)** - -> It looks like CONFIG_DEBUG_INFO_BTF is already (inadvertently) stripping -> it from vmlinux due to how GNU properties are merged by the linker (see -> "How GNU properties are merged" in the ld man page). -> - -**[v1: MAINTAINERS: make me a reviewer of VIRTIO CORE AND NET DRIVERS](http://lore.kernel.org/bpf/20230413071610.43659-1-xuanzhuo@linux.alibaba.com/)** - -> First of all, I personally love open source, linux and virtio. I have -> also participated in community work such as virtio for a long time. -> - -**[v1: net-next: bpf, net: Support redirecting to ifb with bpf](http://lore.kernel.org/bpf/20230413025350.79809-1-laoar.shao@gmail.com/)** - -> In our container environment, we are using EDT-bpf to limit the egress -> bandwidth. EDT-bpf can be used to limit egress only, but can't be used -> to limit ingress. Some of our users also want to limit the ingress -> bandwidth. -> - -**[v3: net: mana: Add support for jumbo frame](http://lore.kernel.org/bpf/1681334163-31084-1-git-send-email-haiyangz@microsoft.com/)** - -> The set adds support for jumbo frame, -> with some optimization for the RX path. -> - -**[v10: bpf: XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash](http://lore.kernel.org/bpf/168132888942.340624.2449617439220153267.stgit@firesoul/)** - -> Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value, -> but doesn't provide information on the RSS hash type (part of 6.3-rc). -> -> This patchset proposal is to change the function call signature via adding -> a pointer value argument for providing the RSS hash type. -> - -**[v1: bpf-next: bpf: Handle NULL in bpf_local_storage_free.](http://lore.kernel.org/bpf/20230412171252.15635-1-alexei.starovoitov@gmail.com/)** - -> During OOM bpf_local_storage_alloc() may fail to allocate 'storage' and -> call to bpf_local_storage_free() with NULL pointer will cause a crash like: -> - -**[v6: bpf-next: xsk: Support UMEM chunk_size > PAGE_SIZE](http://lore.kernel.org/bpf/20230412162114.19389-1-kal.conley@dectris.com/)** - -> The main purpose of this patchset is to add AF_XDP support for UMEM -> chunk sizes > PAGE_SIZE. This is enabled for UMEMs backed by HugeTLB -> pages. -> - -**[v1: selftests/bpf: ignore pointer types check with clang](http://lore.kernel.org/bpf/20230412095912.188453-1-andrea.righi@canonical.com/)** - -> This is due to the fact that bpftool emits duplicate data types with -> - -**[v1: bpf-next: samples/bpf: sampleip: Replace PAGE_OFFSET with _text address](http://lore.kernel.org/bpf/tencent_A0E82E0BEE925285F8156D540731DF805F05@qq.com/)** - -> Macro PAGE_OFFSET(0xffff880000000000) in sampleip_user.c is inaccurate, -> for example, in aarch64 architecture, this value depends on the -> CONFIG_ARM64_VA_BITS compilation configuration, this value defaults to 48, -> the corresponding PAGE_OFFSET is 0xffff800000000000, if we use the value -> defined in sampleip_user.c, then all KSYMs obtained by sampleip are (user) -> - -**[v1: bpf-next: New BPF map and BTF security LSM hooks](http://lore.kernel.org/bpf/20230412043300.360803-1-andrii@kernel.org/)** - -> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which -> are meant to allow highly-granular LSM-based control over the usage of BPF -> subsytem. Specifically, to control the creation of BPF maps and BTF data -> objects, which are fundamental building blocks of any modern BPF application. -> - -**[v1: Smack modifications for: security: Allow all LSMs to provide xattrs for inode_init_security hook](http://lore.kernel.org/bpf/20230411172337.340518-1-roberto.sassu@huaweicloud.com/)** - -> Very very quick modification. Not tested. -> - -**[v1: bpf: lirc program type should not require SYS_CAP_ADMIN](http://lore.kernel.org/bpf/ZDWAcN6wfeXzipHz@gofer.mess.org/)** - -> Make it possible to load lirc program type with just CAP_BPF. -> - -**[v2: bpf-next: xsk: Elide base_addr comparison in xp_unaligned_validate_desc](http://lore.kernel.org/bpf/20230411130025.19704-1-kal.conley@dectris.com/)** - -> Remove redundant (base_addr >= pool->addrs_cnt) comparison from the -> conditional. -> - -**[v1: bpf-next: tools/resolve_btfids: Ignore libsubcmd](http://lore.kernel.org/bpf/tencent_D5422A55AFF3A307880D06AD42D559739708@qq.com/)** - -> Since commit af03299d8536("tools/resolve_btfids: Install subcmd headers") -> introduce subcmd headers directory, we should ignore it. -> - -**[v1: perf bperf: Avoid use after free via union](http://lore.kernel.org/bpf/20230411051718.267228-1-irogers@google.com/)** - -> If bperf sets leader_skel or follower_skel then it appears bpf_skel is -> set and can trigger the following use-after-free -> - -**[v1: bpf-next: xsk: Simplify xp_aligned_validate_desc implementation](http://lore.kernel.org/bpf/20230410121841.643254-1-kal.conley@dectris.com/)** - -> Perform the chunk boundary check like the page boundary check in -> xp_desc_crosses_non_contig_pg(). This simplifies the implementation and -> reduces the number of branches. -> - -**[v1: bpf-next: Dynptr convenience helpers](http://lore.kernel.org/bpf/20230409033431.3992432-1-joannelkoong@gmail.com/)** - -> This patchset is the 3rd in the dynptr series. The 1st (dynptr -> fundamentals) can be found here [0] and the second (skb + xdp dynptrs) -> can be found here [1]. -> - -**[v2: bpf-next: Introduce BPF_MA_REUSE_AFTER_RCU_GP](http://lore.kernel.org/bpf/20230408141846.1878768-1-houtao@huaweicloud.com/)** - -> As discussed in v1, currently the freed objects in bpf memory allocator -> may be reused immediately by the new allocation, it introduces -> use-after-bpf-ma-free problem for non-preallocated hash map and makes -> lookup procedure return incorrect result. The immediate reuse also makes -> introducing new use case more difficult (e.g. qp-trie). -> - -### 周边技术动态 - -#### Qemu - -**[v3: riscv: Add support for the Zfa extension](http://lore.kernel.org/qemu-devel/20230413155010.191051-1-christoph.muellner@vrull.eu/)** - -> This patch introduces the RISC-V Zfa extension, which introduces -> additional floating-point extensions: -> * fli (load-immediate) with pre-defined immediates -> * fminm/fmaxm (like fmin/fmax but with different NaN behaviour) -> * fround/froundmx (round to integer) -> * fcvtmod.w.d (Modular Convert-to-Integer) -> * fmv* to access high bits of float register bigger than XLEN -> * Quiet comparison instructions (fleq/fltq) -> - -**[v1: riscv: Raise an exception if pte reserved bits are not cleared](http://lore.kernel.org/qemu-devel/20230412091716.126601-1-alexghiti@rivosinc.com/)** - -> As per the specification, in 64-bit, if any of the pte reserved bits 60-54 -> is set, an exception should be triggered (see 4.4.1, "Addressing and Memory -> Protection"), so implement this behaviour in the address translation process. -> - -**[v1: target/riscv: Add support for BF16 extensions](http://lore.kernel.org/qemu-devel/20230412023320.50706-1-liweiwei@iscas.ac.cn/)** - -> Specification for BF16 extensions can be found in: -> https://github.com/riscv/riscv-bfloat16 -> -> The port is available here: -> https://github.com/plctlab/plct-qemu/tree/plct-bf16-upstream -> - -**[v3: target/riscv: implement query-cpu-definitions](http://lore.kernel.org/qemu-devel/20230411183511.189632-1-dbarboza@ventanamicro.com/)** - -> In this v3 I removed patches 3 and 4 of v2. -> -> Patch 3 now implements a new type that the generic CPUs (any, rv32, -> rv64, x-rv128) were converted to. This type will be used by -> query-cpu-definitions to determine if a given cpu is static or not based -> on its type. This approach was suggested by Richard Henderson in the v2 -> review. -> - -**[v1: target/riscv: Restore the predicate() NULL check behavior](http://lore.kernel.org/qemu-devel/20230411090211.3039186-1-bmeng@tinylab.org/)** - -> When reading a non-existent CSR QEMU should raise illegal instruction -> exception, but currently it just exits due to the g_assert() check. -> -> This actually reverts commit 0ee342256af9205e7388efdf193a6d8f1ba1a617, -> Some comments are also added to indicate that predicate() must be -> provided for an implemented CSR. -> - -**[v1: target/riscv: Separate implicitly-enabled and explicitly-enabled extensions](http://lore.kernel.org/qemu-devel/20230410033526.31708-1-liweiwei@iscas.ac.cn/)** - -> The patch tries to separate the multi-letter extensions that may implicitly-enabled by misa.EXT from the explicitly-enabled cases, so that the misa.EXT can truely disabled by write_misa(). -> With this separation, the implicitly-enabled zve64d/f and zve32f extensions will no work if we clear misa.V. And clear misa.V will have no effect on the explicitly-enalbed zve64d/f and zve32f extensions. -> - -**[v1: target/riscv: Add support for PC-relative translation](http://lore.kernel.org/qemu-devel/20230409105306.28575-1-liweiwei@iscas.ac.cn/)** - -> This patchset tries to add support for PC-relative translation. -> -> The existence of CF_PCREL can improve performance with the guest -> kernel's address space randomization. Each guest process maps libc.so -> (et al) at a different virtual address, and this allows those -> translations to be shared. -> - -**[v1: target/riscv: Use check for relationship between Zdinx/Zhinx{min} and Zfinx](http://lore.kernel.org/qemu-devel/20230408135908.25269-1-liweiwei@iscas.ac.cn/)** - -> Zdinx/Zhinx{min} require Zfinx. And require relationship is usually done -> by check currently. -> - -#### U-Boot - -**[v4: Add StarFive JH7110 PCIe drvier support](http://lore.kernel.org/u-boot/20230411010209.76561-1-minda.chen@starfivetech.com/)** - -> This patchset needs to apply after patchset in [1]. These PCIe series patches -> are based on the JH7110 RISC-V SoC and VisionFive V2 board. -> -> [1] https://patchwork.ozlabs.org/project/uboot/cover/20230329034224.26545-1-yanhong.wang@starfivetech.com -> - -**[v1: riscv: Support riscv64 image type](http://lore.kernel.org/u-boot/20230410072718.3484-1-rick@andestech.com/)** - -> Allow U-Boot to load 32 or 64 bits RISC-V Kernel Image -> distinguishly. It helps to avoid someone maybe make a mistake -> to run 32-bit U-Boot to load 64-bit kernel. -> - -## 20230409:第 41 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v11: function_graph: Support recording and printing the return value of function](http://lore.kernel.org/linux-riscv/cover.1680954589.git.pengdonglin@sangfor.com.cn/)** - -> When using the function_graph tracer to analyze system call failures, -> it can be time-consuming to analyze the trace logs and locate the kernel -> function that first returns an error. This change aims to simplify the -> process by recording the function return value to the 'retval' member of -> 'ftrace_graph_ent' and printing it when outputing the trace log. -> - -**[v1: Convert SiFive drivers from SOC_FOO dependencies to ARCH_FOO](http://lore.kernel.org/linux-riscv/20230406-undertake-stowing-50f45b90413a@spud/)** - -> RISC-V's SOC_FOO symbols for micro-archs are going away, and being -> replaced with the more common ARCH_FOO pattern that is used by other -> archs (and by vendors with a history outside of RISC-V). -> I kicked the conversion off by converting the Microchip RISC-V bits to -> use their replacement symbol, so here's round two: the various SiFive -> drivers. -> - -**[GIT PULL: RISC-V Devicetrees for v6.4](http://lore.kernel.org/linux-riscv/20230406-shank-impromptu-3d483bbc249f@spud/)** - -> Please pull some Devicetree updates for v6.4, mainly adding the base -> level of support for the StarFive VisionFive v2. -> I wanted to get an initial PR out before -rc6, but I may have another -> PR adding some of the peripherals (pmu, mmc) for the StarFive stuff -> that are already reviewed etc, but need a rebase on top of what -> actually got applied. Is that okay, or will the end of next week be -> too late for you? -> - -**[GIT PULL: RISC-V SoC drivers for v6.4](http://lore.kernel.org/linux-riscv/20230406-islamist-mop-81d651b8830d@spud/)** - -> Please pull some updates for the "otherwise unloved" RISC-V SoC drivers -> for v6.4! The bulk of this is my fixing my own driver, and there's a fix -> in here to make sure that we don't hit randconfig build issues once !MMU -> is enabled for 32-bit kernels. -> - -**[v3: -next: support allocating crashkernel above 4G explicitly on riscv](http://lore.kernel.org/linux-riscv/20230406220206.3067006-1-chenjiahao16@huawei.com/)** - -> On riscv, the current crash kernel allocation logic is trying to -> allocate within 32bit addressible memory region by default, if -> failed, try to allocate without 4G restriction. -> -> In need of saving DMA zone memory while allocating a relatively large -> crash kernel region, allocating the reserved memory top down in -> high memory, without overlapping the DMA zone, is a mature solution. -> Hence this patchset introduces the parameter option crashkernel=X,[high,low]. -> - -**[v1: Add JH7110 PCIe driver support](http://lore.kernel.org/linux-riscv/20230406111142.74410-1-minda.chen@starfivetech.com/)** - -> This patchset adds PCIe driver for the StarFive JH7110 SoC. -> The patch has been tested on the VisionFive 2 board. The test -> devices include M.2 NVMe SSD and Realtek 8169 Ethernet adapter. -> - -**[v7: StarFive's SYSCON support](http://lore.kernel.org/linux-riscv/20230406103308.1280860-1-william.qiu@starfivetech.com/)** - -> This patchset adds initial rudimentary support for the StarFive -> designware mobile storage host controller driver. And this driver will -> be used in StarFive's VisionFive 2 board. The main purpose of adding -> this driver is to accommodate the ultra-high speed mode of eMMC. -> - -**[v4: Add JH7110 USB and USB PHY driver support](http://lore.kernel.org/linux-riscv/20230406015216.27034-1-minda.chen@starfivetech.com/)** - -> This patchset adds USB driver and USB PHY for the StarFive JH7110 SoC. -> USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. -> The patch has been tested on the VisionFive 2 board. -> - -**[GIT PULL: Initial clk/reset support for JH7110 for v6.4](http://lore.kernel.org/linux-riscv/20230405-constant-dreamily-0128e071c665@spud/)** - -> Here's a PR for the StarFive JH7110 clk/reset bits since I'd like to -> take the DT this cycle & depend on the binding headers. -> -> I've picked up R-B tags from Emil on all that patches, despite him being -> listed as an author, as things have changed quite a lot since he was -> involved in writing things many months ago. -> - -**[v2: RISC-V: align ISA extension Kconfig help text with each other](http://lore.kernel.org/linux-riscv/20230405-pucker-cogwheel-3a999a94a2f2@wendy/)** - -> Other extensions only capitalise the first letter in the text visible -> in Kconfig menus, and provide a short comment about the extension's -> meaning. Do the same for Svnapot & Svpbmt. -> -> The precedent for capitalisation in the Kconfig text was set by Zicbom -> & sorta followed for Zicboz. The RVI styling used for multi-letter -> extensions only capitalises the first letter, so do the same here. -> If nothing else, my OCD likes it when the extensions follow a consistent -> pattern. -> - -**[v1: riscv: Adjust dependencies of HAVE_DYNAMIC_FTRACE selection](http://lore.kernel.org/linux-riscv/20230404-riscv-dynamic-ftrace-checks-clang-v1-1-0ce296b7d423@kernel.org/)** - -> When building allmodconfig with clang and its integrated assembler and -> linking with a version of GNU ld prior to 2.36, the following link error -> occurs: -> -> riscv64-linux-gnu-ld: .init.data has both ordered [`__patchable_function_entries' in init/main.o] and unordered [`.init_array.0' in kernel/trace/trace_benchmark.o] sections -> riscv64-linux-gnu-ld: final link failed: bad value -> - -**[v4: Add basic ACPI support for RISC-V](http://lore.kernel.org/linux-riscv/20230404182037.863533-1-sunilvl@ventanamicro.com/)** - -> This patch series enables the basic ACPI infrastructure for RISC-V. -> Supporting external interrupt controllers is in progress and hence it is -> tested using poll based HVC SBI console and RAM disk. -> -> The first patch in this series is one of the patch from Jisheng's -> series [1] which is not merged yet. This patch is required to support -> ACPI since efi_init() which gets called before sbi_init() can enable -> static branches and hits a panic. -> - -**[v4: RISC-V KVM virtualize AIA CSRs](http://lore.kernel.org/linux-riscv/20230404153452.2405681-1-apatel@ventanamicro.com/)** - -> The RISC-V AIA specification is now frozen as-per the RISC-V international -> process. The latest frozen specifcation can be found at: -> https://github.com/riscv/riscv-aia/releases/download/1.0-RC3/riscv-interrupts-1.0-RC3.pdf -> - -**[v5: irqchip/irq-sifive-plic: Add syscore callbacks for hibernation](http://lore.kernel.org/linux-riscv/20230404032908.89638-1-mason.huo@starfivetech.com/)** - -> The priority and enable registers of plic will be reset -> during hibernation power cycle in poweroff mode, -> add the syscore callbacks to save/restore those registers. -> - -**[v5: RISC-V KVM ONE_REG interface for SBI](http://lore.kernel.org/linux-riscv/20230403121527.2286489-1-apatel@ventanamicro.com/)** - -> This series first does few cleanups/fixes (PATCH1 to PATCH5) and adds -> ONE-REG interface for customizing the SBI interface visible to the -> Guest/VM. -> -> The testing of this series has been done with KVMTOOL changes in -> riscv_sbi_imp_v1 branch at: -> https://github.com/avpatel/kvmtool.git -> - -**[v1: riscv: entry: Save a0 prior syscall_enter_from_user_mode()](http://lore.kernel.org/linux-riscv/20230403065207.1070974-1-bjorn@kernel.org/)** - -> The RISC-V calling convention passes the first argument, and the -> return value in the a0 register. For this reason, the a0 register -> needs some extra care; When handling syscalls, the a0 register is -> saved into regs->orig_a0, so a0 can be properly restored for, -> e.g. interrupted syscalls. -> - -**[v1: riscv: Add static call implementation](http://lore.kernel.org/linux-riscv/tencent_A8A256967B654625AEE1DB222514B0613B07@qq.com/)** - -> Add the riscv static call implementation. For each key, a permanent -> trampoline is created which is the destination for all static calls -> for the given key. -> -> The trampoline has a direct jump which gets patched by static_call_update() -> when the destination function changes. -> - -**[v1: RISC-V: KVM: Allow Zbb extension for Guest/VM](http://lore.kernel.org/linux-riscv/20230401112730.2105240-1-apatel@ventanamicro.com/)** - -> We extend the KVM ISA extension ONE_REG interface to allow KVM -> user space to detect and enable Zbb extension for Guest/VM. -> - -**[v7: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230401111934.130844-1-hal.feng@starfivetech.com/)** - -> This patch series adds basic clock, reset & DT support for StarFive -> JH7110 SoC. -> -> @Stephen and @Conor, I have made this series start with the shared -> dt-bindings, so it will be easier to merge. -> - -**[v4: Use dma_default_coherent for devicetree default coherency](http://lore.kernel.org/linux-riscv/20230401091531.47412-1-jiaxun.yang@flygoat.com/)** - -> This series split out second half of my previous series -> "v1: MIPS DMA coherence fixes". -> -> It intends to use dma_default_coherent to determine the default coherency of -> devicetree probed devices instead of hardcoding it with Kconfig options. -> - -#### 进程调度 - -**[v4: sched: Avoid unnecessary migrations within SMT domains](http://lore.kernel.org/lkml/20230406203148.19182-1-ricardo.neri-calderon@linux.intel.com/)** - -> This is v4 of this series. Previous versions can be found here [1], [2], -> and here [3]. To avoid duplication, I do not include the cover letter of -> the original submission. You can read it in [1]. -> - -**[v1: sched: Consider CPU contention in frequency & load-balance busiest CPU selection](http://lore.kernel.org/lkml/20230406155030.1989554-1-dietmar.eggemann@arm.com/)** - -> This is the implementation of the idea to factor in root cfs_rq -> runnable_avg as a way to consider CPU contention for CPU frequency and -> `migrate_util` type load-balance busiest CPU selection. -> - -**[v1: sched: rt: Simplify pick_task_rt()](http://lore.kernel.org/lkml/20230407192435.3390-1-kunyu@nfschina.com/)** - -> Remove useless intermediate variable "p" and its initialization. -> Directly return the next RT scheduling task obtained from -> _pick_next_task_rt(). -> - -**[v2: sched: rt: Simplify pick_next_rt_entity()](http://lore.kernel.org/lkml/20230407180952.2757-1-zeming@nfschina.com/)** - -> Remove useless intermediate variable "next" and its initialization. -> Directly return the next RT scheduling entity obtained from -> list_entry(). -> - -**[v1: sched/psi: set varaiable psi_cgroups_enabled storage-class-specifier to static](http://lore.kernel.org/lkml/20230405163602.1939400-1-trix@redhat.com/)** - -> smatch reports -> kernel/sched/psi.c:143:1: warning: symbol -> 'psi_cgroups_enabled' was not declared. Should it be static? -> -> This variable is only used in one file so should be static. -> - -**[v1: sched: rt: Optimization function 'pick_next_rt_entity'](http://lore.kernel.org/lkml/20230405232900.4019-1-zeming@nfschina.com/)** - -> The moral of this function is to obtain the next RT scheduling entity -> object,while 'list_entry' Implementation function of 'container_of' -> returns the next RT scheduling entity object (no new code should be -> added afterwards), directly returning 'list_entry' The execution result -> is sufficient. -> - -#### 内存管理 - -**[v1: linux-next: delayacct: track delays from IRQ/SOFTIRQ](http://lore.kernel.org/linux-mm/202304081728353557233@zte.com.cn/)** - -> Delay accounting does not track the delay of IRQ/SOFTIRQ. While -> IRQ/SOFTIRQ could have obvious impact on some workloads productivity, -> such as when workloads are running on system which is busy handling -> network IRQ/SOFTIRQ. -> - -**[v4: ACPI: APEI: handle synchronous exceptions with proper si_code](http://lore.kernel.org/linux-mm/20230408091359.31554-1-xueshuai@linux.alibaba.com/)** - -> changes since v3 by addressing comments from Xiaofei: -> - do a force kill for abnormal memofy failure error such as invalid PA, -> unexpected severity, OOM, etc -> - pcik up tested-by tag from Ma Wupeng -> - -**[v1: mm: introduce defer free for cma](http://lore.kernel.org/linux-mm/1680864131-4675-1-git-send-email-zhaoyang.huang@unisoc.com/)** - -> Continues page blocks are expensive for the system. Introducing defer free -> mechanism to buffer some which make the allocation easier. The shrinker will -> ensure the page block can be reclaimed when there is memory pressure. -> - -**[v5: net-next: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1](http://lore.kernel.org/linux-mm/20230406094245.3633290-1-dhowells@redhat.com/)** - -> Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES -> internal sendmsg flag that is intended to replace the ->sendpage() op with -> calls to sendmsg(). MSG_SPLICE is a hint that tells the protocol that it -> should splice the pages supplied if it can and copy them if not. -> - -**[v1: memcg: Default value setting in memcg-v1](http://lore.kernel.org/linux-mm/20230406091450.167779-1-shaun.tancheff@gmail.com/)** - -> Setting min, low and high values with memcg-v1 -> provides bennefits for users that are unable to update -> to memcg-v2. -> -> Setting min, low and high can be set in memcg-v1 -> to apply enough memory pressure to effective throttle -> filesystem I/O without hitting memcg oom. -> - -**[v12: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-mm/20230406074005.1784728-1-usama.anjum@collabora.com/)** - -> *Changes in v12* -> - Update and other memory types to UFFD_FEATURE_WP_ASYNC -> - Rebaase on top of next-20230406 -> - Review updates -> - -**[v2: dma-buf/heaps: system_heap: Avoid DoS by limiting single allocations to half of all memory](http://lore.kernel.org/linux-mm/20230406000854.25764-1-jaewon31.kim@samsung.com/)** - -> Normal free:212600kB min:7664kB low:57100kB high:106536kB -> reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB -> active_file:1200kB inactive_file:0kB unevictable:2932kB -> writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB -> pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB -> free_cma:200844kB -> Out of memory and no killable processes... -> Kernel panic - not syncing: System is deadlocked on memory -> - -**[v2: kmod: simplify with a semaphore](http://lore.kernel.org/linux-mm/20230405203505.1343562-1-mcgrof@kernel.org/)** - -> I split the semaphore simplification work out from my first patch series [0] -> because as although the changes came out of that effort, in the end this set -> of patches are slightly orthogonal to the goal behind that series and this -> ended up being mostly a cleanup with mild bike shedding exercise. -> - -**[v5: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230405185427.1246289-1-yosryahmed@google.com/)** - -> Upon running some proactive reclaim tests using memory.reclaim, we -> noticed some tests flaking where writing to memory.reclaim would be -> successful even though we did not reclaim the requested amount fully. -> Looking further into it, I discovered that *sometimes* we over-report -> the number of reclaimed pages in memcg reclaim. -> - -**[v3: Expose GPU memory as coherently CPU accessible](http://lore.kernel.org/linux-mm/20230405180134.16932-1-ankita@nvidia.com/)** - -> NVIDIA's upcoming Grace Hopper Superchip provides a PCI-like device -> for the on-chip GPU that is the logical OS representation of the -> internal propritary cache coherent interconnect. -> - -**[v1: net-next: net: sunhme: move asm includes to below linux includes](http://lore.kernel.org/linux-mm/20230405-sunhme-includes-fix-v1-1-bf17cc5de20d@kernel.org/)** - -> A recent rearrangement of includes has lead to a problem on m68k -> as flagged by the kernel test robot. -> -> Resolve this by moving the block asm includes to below linux includes. -> A side effect i that non-Sparc asm includes are now immediately -> before Sparc asm includes, which seems nice. -> - -**[v1: mm, page_alloc: use check_pages_enabled static key to check tail pages](http://lore.kernel.org/linux-mm/20230405142840.11068-1-vbabka@suse.cz/)** - -> Commit 700d2e9a36b9 ("mm, page_alloc: reduce page alloc/free sanity -> checks") has introduced a new static key check_pages_enabled to control -> when struct pages are sanity checked during allocation and freeing. Mel -> Gorman suggested that free_tail_pages_check() could use this static key -> as well, instead of relying on CONFIG_DEBUG_VM. That makes sense, so do -> that. Also rename the function to free_tail_page_prepare() because it -> works on a single tail page and has a struct page preparation component -> as well as the optional checking component. -> Also remove some unnecessary unlikely() within static_branch_unlikely() -> statements that Mel pointed out for commit 700d2e9a36b9. -> - -**[v1: memcg-v1: Enable setting memory min, low, high](http://lore.kernel.org/linux-mm/20230405110107.127156-1-shaun.tancheff@gmail.com/)** - -> For users that are unable to update to memcg-v2 this -> provides a method where memcg-v1 can more effectively -> apply enough memory pressure to effectively throttle -> filesystem I/O or otherwise minimize being memcg oom -> killed at the expense of reduced performance. -> - -**[v2: module: avoid userspace pressure on unwanted allocations](http://lore.kernel.org/linux-mm/20230405022702.753323-1-mcgrof@kernel.org/)** - -> This v2 series follows up on the first iteration of these patches [0]. -> They have the following changes made: -> -> o Rolled in fix for an kmemleak issue reported by Jim Cromie -> o Dropped from this series all the semaphore & and simplifications -> on kmod.c as that should just be sent as a separate bike-shedding -> opporunity patch series and it does not in any way address the -> the unwanted allocations. -> o The rest of the feedback was just from Greg KH and I've addressed -> all his feedback. I decided to do away with the debug.c as a -> separate file and leave the #ifdef CONFIG_MODULE_DEBUG eyesore -> at the end of main.c. I guess it's not so bad there. -> o *Tons* of fixes and enhancements to my counters, including tons -> of documentation to help ensure we don't loose track of some of -> the tribal knowledge and so to help ensure we have references to -> what our accounting looks like. Those large wasted virtual memory -> allocations on a simple qemu idle boring boot are simply rediculous, I -> am quite baffled we had not spotted this before, and so it all reveals -> we have quite a bit of optimizations left to do to make loading modules -> an even more smoother experience at bootup. -> - -**[v2: regmap: Use mas_walk() instead of mas_find()](http://lore.kernel.org/linux-mm/20230403-regmap-maple-walk-fine-v2-1-c07371c8a867@kernel.org/)** - -> Liam recommends using mas_walk() instead of mas_find() for our use case so -> let's do that, it avoids some minor overhead associated with being able to -> restart the operation which we don't need since we do a simple search. -> - -**[v1: memcg v1: provide read access to memory.pressure_level](http://lore.kernel.org/linux-mm/20230404105900.2005-1-flosch@nutanix.com/)** - -> This is all fine as long as the subscribing process runs as root and is -> otherwise unconfined by further restrictions. However, if you add strict -> access controls such as selinux, the permission bits will be enforced, -> and opening memory.pressure_level for reading will fail, preventing the -> process from subscribing, even as root. -> - -**[v1: mm/madvise: Use vma_lookup() instead of find_vma()](http://lore.kernel.org/linux-mm/20230404094515.1883552-1-zhangpeng362@huawei.com/)** - -> Using vma_lookup() verifies the address is contained in the found vma. -> This results in easier to read the code. -> - -**[v1: m68k/mm: Use correct bit number in _PAGE_SWP_EXCLUSIVE comment](http://lore.kernel.org/linux-mm/20230404085636.121409-1-david@redhat.com/)** - -> As noticed by Geert, commit b5c88f21531c ("microblaze/mm: support -> __HAVE_ARCH_PTE_SWP_EXCLUSIVE") modified m68k code by accident. While -> replacing 0x080 by CF_PAGE_NOCACHE is correct, although it should have -> been part of commit ed4154067a08 ("m68k/mm: support -> __HAVE_ARCH_PTE_SWP_EXCLUSIVE"), replacing "bit 7" by "bit 24" in the -> comment was wrong. -> - -**[v2: LoongArch: Add kernel address sanitizer support](http://lore.kernel.org/linux-mm/20230404084148.744-1-zhangqing@loongson.cn/)** - -> Kernel Address Sanitizer (KASAN) is a dynamic memory safety error detector -> designed to find out-of-bounds and use-after-free bugs, Generic KASAN is -> supported on LoongArch now. -> -> 1/8 of kernel addresses reserved for shadow memory. But for LoongArch, -> There are a lot of holes between different segments and valid address -> space(256T available) is insufficient to map all these segments to kasan -> shadow memory with the common formula provided by kasan core, saying -> addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET -> - -**[v1: mm: check mapping addr is correct when dump page](http://lore.kernel.org/linux-mm/1680587425-4683-1-git-send-email-Xiaosong.Ma@unisoc.com/)** - -> when we debug with slub_debug_on, the following backtraces show dump_page -> will show wrong info when the bad page is non-NULL mapping and page->mapping -> is 0x80000000000 so do virt_addr valid check is needed when dump mapping page. -> - -**[v1: permit write-sealed memfd read-only shared mappings](http://lore.kernel.org/linux-mm/cover.1680560277.git.lstoakes@gmail.com/)** - -> This patch series is in two parts:- -> -> 1. Currently there are a number of places in the kernel where we assume -> VM_SHARED implies that a mapping is writable. Let's be slightly less -> strict and relax this restriction in the case that VM_MAYWRITE is not -> set. -> - -**[v1: mm-unstable: cgroup: eliminate atomic rstat](http://lore.kernel.org/linux-mm/20230403220337.443510-1-yosryahmed@google.com/)** - -> A previous patch series ([1] currently in mm-unstable) changed most -> atomic rstat flushing contexts to become non-atomic. This was done to -> avoid an expensive operation that scales with # cgroups and # cpus to -> happen with irqs disabled and scheduling not permitted. There were two -> remaining atomic flushing contexts after that series. This series tries -> to eliminate them as well, eliminating atomic rstat flushing completely. -> - -**[v3: Split a folio to any lower order folios](http://lore.kernel.org/linux-mm/20230403201839.4097845-1-zi.yan@sent.com/)** - -> File folio supports any order and people would like to support flexible orders -> for anonymous folio[1] too. Currently, split_huge_page() only splits a huge -> page to order-0 pages, but splitting to orders higher than 0 is also useful. -> This patchset adds support for splitting a huge page to any lower order pages -> and uses it during file folio truncate operations. -> - -**[v8: -next: Delay the initialization of zswap](http://lore.kernel.org/linux-mm/20230403121318.1876082-1-liushixin2@huawei.com/)** - -> In the initialization of zswap, about 18MB memory will be allocated for -> zswap_pool. Since some users may not use zswap, the zswap_pool is wasted. -> Save memory by delaying the initialization of zswap until enabled. -> - -#### 文件系统 - -**[v2: dax: enable dax fault handler to report VM_FAULT_HWPOISON](http://lore.kernel.org/linux-fsdevel/20230406230127.716716-1-jane.chu@oracle.com/)** - -> When dax fault handler fails to provision the fault page due to -> hwpoison, it returns VM_FAULT_SIGBUS which lead to a sigbus delivered -> to userspace with .si_code BUS_ADRERR. Channel dax backend driver's -> detection on hwpoison to the filesystem to provide the precise reason -> for the fault. -> - -**[v1: fsverity: reject FS_IOC_ENABLE_VERITY on mode 3 fds](http://lore.kernel.org/linux-fsdevel/20230406215106.235829-1-ebiggers@kernel.org/)** - -> Commit 56124d6c87fd ("fsverity: support enabling with tree block size < -> PAGE_SIZE") changed FS_IOC_ENABLE_VERITY to use __kernel_read() to read -> the file's data, instead of direct pagecache accesses. -> - -**[v1: shmem: stable directory cookies](http://lore.kernel.org/linux-fsdevel/168080987776.946167.3501480439542616457.stgit@manet.1015granger.net/)** - -> The current cursor-based directory cookie mechanism doesn't work -> when a tmpfs filesystem is exported via NFS. This is because NFS -> clients do not open directories: each READDIR operation has to open -> the directory on the server, read it, then close it. The cursor -> state for that directory, being associated strictly with the opened -> struct file, is then discarded. -> - -**[v2: eventfd: use wait_event_interruptible_locked_irq() helper](http://lore.kernel.org/linux-fsdevel/tencent_F38839D00FE579A60A97BA24E86AF223DD05@qq.com/)** - -> wait_event_interruptible_locked_irq was introduced by commit 22c43c81a51e -> ("wait_event_interruptible_locked() interface"), but older code such as -> eventfd_{write,read} still uses the open code implementation. -> Inspired by commit 8120a8aadb20 -> ("fs/timerfd.c: make use of wait_event_interruptible_locked_irq()"), this -> patch replaces the open code implementation with a single macro call. -> - -**[v1: fsverity: use shash API instead of ahash API](http://lore.kernel.org/linux-fsdevel/20230406003714.94580-1-ebiggers@kernel.org/)** - -> The "ahash" API, like the other scatterlist-based crypto APIs such as -> "skcipher", comes with some well-known limitations. First, it can't -> easily be used with vmalloc addresses. Second, the request struct can't -> be allocated on the stack. This adds complexity and a possible failure -> point that needs to be worked around, e.g. using a mempool. -> - -**[v3: blksnap - block devices snapshots module](http://lore.kernel.org/linux-fsdevel/20230404140835.25166-1-sergei.shtepa@veeam.com/)** - -> I am happy to offer a modified version of the Block Devices Snapshots -> Module. It allows to create non-persistent snapshots of any block devices. -> The main purpose of such snapshots is to provide backups of block devices. -> See more in Documentation/block/blksnap.rst. -> - -**[v1: exfat: add sysfs interface](http://lore.kernel.org/linux-fsdevel/20230405084635.74680-1-frank.li@vivo.com/)** - -> Add sysfs interface to configure exfat related parameters. -> - -**[v1: fstests specific MAINTAINERS file](http://lore.kernel.org/linux-fsdevel/20230404171411.699655-1-zlang@kernel.org/)** - -> I think I might be mad to include that many mailing lists in this patchset... -> -> As I explained in v1: , fstests covers more and more fs testing -> thing, so we always get help from fs specific mailing list, due to they -> learn about their features and bugs more. Besides that, some folks help -> to review patches (relevant with them) more often. So I'd like to bring -> in the similar way of linux/MAINTAINERS, records fs relevant mailing lists, -> reviewers or supporters (or call co-maintainers). To recognize the -> can be added in CC list of a patch. -> - -**[v1: Avoid the mmap lock for fault-around](http://lore.kernel.org/linux-fsdevel/20230404135850.3673404-1-willy@infradead.org/)** - -> The linux-next tree currently contains patches (mostly from Suren) -> which handle some page faults without the protection of the mmap lock. -> This patchset adds the ability to handle page faults on parts of files -> which are already in the page cache without taking the mmap lock. -> - -**[v2: fuse: API for Checkpoint/Restore](http://lore.kernel.org/linux-fsdevel/20230403144517.347517-1-aleksandr.mikhalitsyn@canonical.com/)** - -> The main problem for CRIU is that we have to restore mount namespaces and memory mappings before the process tree. -> It means that when CRIU is performing mount of fuse filesystem it can't use the original FUSE daemon from the -> restorable process tree, but instead use a "fake daemon". -> - -**[v1: shmem: Add user and group quota support for tmpfs](http://lore.kernel.org/linux-fsdevel/20230403084759.884681-1-cem@kernel.org/)** - -> so I'm taking over his work from where he left it of. This series is virtually -> done, and he had updated it with comments from the last version, but, I'm -> initially posting it as a RFC because it's been a while since he posted the -> last version. -> Most of what I did here was rebase his last work on top of current Linus's tree. -> - -**[v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/)** - -> This removes the dependency on interrupts to wake up task. Set task -> state as TASK_RUNNING, if need_resched() returns true, -> while polling for IO completion. -> Earlier, polling task used to sleep, relying on interrupt to wake it up. -> This made some IO take very long when interrupt-coalescing is enabled in -> NVMe. -> - -#### 网络设备 - -**[v7: bpf: XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash](http://lore.kernel.org/netdev/168098183268.96582.7852359418481981062.stgit@firesoul/)** - -> Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value, -> but doesn't provide information on the RSS hash type (part of 6.3-rc). -> -> This patchset proposal is to change the function call signature via adding -> a pointer value argument for providing the RSS hash type. -> -> Patchset also disables all bpf_printk's from xdp_hw_metadata program -> that we expect driver developers to use. -> - -**[v1: nft: main: Error out when combining -i/--interactive and -f/--file](http://lore.kernel.org/netdev/20230408181818.72264-1-pablo@netfilter.org/)** - -> These two options are mutually exclusive, display error in that case: -> -> # nft -i -f test.nft -> Error: -i/--interactive and -f/--file options cannot be combined -> - -**[v2: Add missing DSA properties for marvell switches](http://lore.kernel.org/netdev/20230408152801.2336041-1-andrew@lunn.ch/)** - -> The DSA core has become more picky about DT properties. This patchset -> add missing properties and removes some unused ones, for iMX boards. -> -> Once all the missing properties are added, it should be possible to -> simply phylink and the mv88e6xxx driver. -> - -**[v4: net-next: Support MACsec VLAN](http://lore.kernel.org/netdev/20230408105735.22935-1-ehakim@nvidia.com/)** - -> This patch series introduces support for hardware (HW) offload MACsec -> devices with VLAN configuration. The patches address both scenarios -> where the VLAN header is both the inner and outer header for MACsec. -> - -**[v1: net: ipv6: Add Kconfig option to set default value of accept_dad](http://lore.kernel.org/netdev/3072adab06f9c5f45cc72d2068d1aed0100436ff.1680941918.git.josh@joshtriplett.org/)** - -> The kernel already supports disabling Duplicate Address Detection (DAD) -> by setting net.ipv6.conf.$interface.accept_dad to 0. However, for -> interfaces available at boot time, the kernel brings up the interface -> and sets up the link-local address before processing sysctls set on the -> kernel command line; thus, setting -> sysctl.net.ipv6.conf.default.accept_dad=0 on the kernel command line -> does not suffice to affect such interfaces. -> - -**[v1: Alternative, restart tx after tx used bit read](http://lore.kernel.org/netdev/20230407213349.8013-1-ingo.rohloff@lauterbach.com/)** - -> I am developing on a ZynqMP (Ultrascale+) SoC from AMD/Xilinx. -> I have seen the same issue before commit 4298388574dae6168 ("net: macb: -> restart tx after tx used bit read") -> - -**[v2: net: mana: Add support for jumbo frame](http://lore.kernel.org/netdev/1680901196-20643-1-git-send-email-haiyangz@microsoft.com/)** - -> The set adds support for jumbo frame, -> with some optimization for the RX path. -> - -**[v2: wifi: brcmfmac: add Cypress 43439 SDIO ids](http://lore.kernel.org/netdev/20230407203752.128539-1-marex@denx.de/)** - -> Add SDIO ids for use with the muRata 1YN (Cypress CYW43439). -> The odd thing about this is that the previous 1YN populated -> on M.2 card for evaluation purposes had BRCM SDIO vendor ID, -> while the chip populated on real hardware has a Cypress one. -> The device ID also differs between the two devices. But they -> are both 43439 otherwise, so add the IDs for both. -> - -**[v1: net-next: gve: Unify duplicate GQ min pkt desc size constants](http://lore.kernel.org/netdev/20230407184830.309398-1-shailend@google.com/)** - -> The two constants accomplish the same thing. -> - -**[v4: net-next: ice: allow matching on meta data](http://lore.kernel.org/netdev/20230407165219.2737504-1-michal.swiatkowski@linux.intel.com/)** - -> This patchset is intended to improve the usability of the switchdev -> slow path. Without matching on a meta data values slow path works -> based on VF's MAC addresses. It causes a problem when the VF wants -> to use more than one MAC address (e.g. when it is in trusted mode). -> - -**[v1: regmap: allow upshifting register addresses before performing operations](http://lore.kernel.org/netdev/20230407152604.105467-1-maxime.chevallier@bootlin.com/)** - -> Similar to the existing reg_downshift mechanism, that is used to -> translate register addresses on busses that have a smaller address -> stride, it's also possible to want to upshift register addresses. -> - -**[v1: ARM64: dts: marvell: cn9310: Add missing phy-mode](http://lore.kernel.org/netdev/20230407151839.2320596-1-andrew@lunn.ch/)** - -> The DSA framework has got more picky about always having a phy-mode -> for the CPU port. The SoC Ethernet is being configured to -> 10gbase-r. Set the switch phy-mode based on this. Additionally, the -> SoC Ethernet is using in-band signalling to determine the link speed, -> so add same parameter to the switch. -> - -**[v1: net-next: tools: ynl: throw a more meaningful exception if family not supported](http://lore.kernel.org/netdev/20230407145609.297525-1-kuba@kernel.org/)** - -> cli.py currently throws a pure KeyError if kernel doesn't support -> a netlink family. Users who did not write ynl (hah) may waste -> their time investigating what's wrong with the Python code. -> - -**[v1: net-next: ax25: exit linked-list searches earlier](http://lore.kernel.org/netdev/20230407142042.11901-1-peter@n8pjl.ca/)** - -> There's no need to loop until the end of the list if we have a result. -> -> Device callsigns are unique, so there can only be one dev returned from -> ax25_addr_ax25dev(). If not, there would be inconsistencies based on -> order of insertion, and refcount leaks. -> -> Same reasoning for ax25_get_route() as above. -> - -**[v1: net-next: DSA trace events](http://lore.kernel.org/netdev/20230407141451.133048-1-vladimir.oltean@nxp.com/)** - -> These are useful to debug refcounting issues on CPU and DSA ports, where -> entries may remain lingering, or may be removed too soon, depending on -> bugs in higher layers of the network stack. -> - -**[v3: bpf-next: Add FOU support for externally controlled ipip devices](http://lore.kernel.org/netdev/cover.1680874078.git.cehrig@cloudflare.com/)** - -> This patch set adds support for using FOU or GUE encapsulation with -> an ipip device operating in collect-metadata mode and a set of kfuncs -> for controlling encap parameters exposed to a BPF tc-hook. -> - -**[v2: net-next: net: ethernet: mtk_eth_soc: use be32 type to store be32 values](http://lore.kernel.org/netdev/20230401-mtk_eth_soc-sparse-v2-1-963becba3cb7@kernel.org/)** - -> n_addr is used to store be32 values, -> so a sparse-friendly array of be32 to store these values. -> - -**[v1: net-next: net: davicom: Make davicom drivers not depends on DM9000](http://lore.kernel.org/netdev/20230407094930.2633137-1-weiyongjun@huaweicloud.com/)** - -> All davicom drivers build need CONFIG_DM9000 is set, but this dependence -> is not correctly since dm9051 can be build as module without dm9000, switch -> to using CONFIG_NET_VENDOR_DAVICOM instead. -> - -**[v4: net-next: sfc: add vDPA support for EF100 devices](http://lore.kernel.org/netdev/20230407081021.30952-1-gautam.dawar@amd.com/)** - -> This series adds the vdpa support for EF100 devices. -> For now, only a network class of vdpa device is supported and -> they can be created only on a VF. Each EF100 VF can have one -> of the three function personalities (EF100, vDPA & None) at -> any time with EF100 being the default. A VF's function personality -> is changed to vDPA while creating the vdpa device using vdpa tool. -> - -**[v2: net-next: qlcnic: check pci_reset_function result](http://lore.kernel.org/netdev/20230407071849.309516-1-den-plotnikov@yandex-team.ru/)** - -> Static code analyzer complains to unchecked return value. -> The result of pci_reset_function() is unchecked. -> Despite, the issue is on the FLR supported code path and in that -> case reset can be done with pcie_flr(), the patch uses less invasive -> approach by adding the result check of pci_reset_function(). -> - -**[v1: net/sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg](http://lore.kernel.org/netdev/ZC+Kgc7feqYy%2FGdw@pr0lnx/)** - -> If the TCA_QFQ_LMAX value is not offered through nlattr, lmax is determined by the MTU value of the network device. -> The MTU of the loopback device can be set up to 2^31-1. -> As a result, it is possible to have an lmax value that exceeds QFQ_MIN_LMAX. -> - -**[v4: net-next: net: lockless stop/wake combo macros](http://lore.kernel.org/netdev/20230407012536.273382-1-kuba@kernel.org/)** - -> A lot of drivers follow the same scheme to stop / start queues -> without introducing locks between xmit and NAPI tx completions. -> I'm guessing they all copy'n'paste each other's code. -> The original code dates back all the way to e1000 and Linux 2.6.19. -> - -**[v1: bpf-next: bpf: ensure all memory is initialized in bpf_get_current_comm](http://lore.kernel.org/netdev/20230407001808.1622968-1-brho@google.com/)** - -> BPF helpers that take an ARG_PTR_TO_UNINIT_MEM must ensure that all of -> the memory is set, including beyond the end of the string. -> - -**[v9: net-next: pds_core driver](http://lore.kernel.org/netdev/20230406234143.11318-1-shannon.nelson@amd.com/)** - -> This patchset implements a new driver for use with the AMD/Pensando -> Distributed Services Card (DSC), intended to provide core configuration -> services through the auxiliary_bus and through a couple of EXPORTed -> functions for use initially in VFio and vDPA feature specific drivers. -> - -**[v1: bpf-next: xsk: Elide base_addr comparison in xp_unaligned_validate_desc](http://lore.kernel.org/netdev/20230406212136.19716-1-kal.conley@dectris.com/)** - -> Remove redundant (base_addr >= pool->addrs_cnt) comparison from the -> conditional. -> -> In particular, addr is computed as: -> -> addr = base_addr + offset -> -> where base_addr and offset are stored as 48-bit and 16-bit unsigned -> integers, respectively. The above sum cannot overflow u64 since -> base_addr has a maximum value of 0x0000ffffffffffff and offset has a -> maximum value of 0xffff (implying a maximum sum of 0x000100000000fffe). -> Since overflow is impossible, it follows that addr >= base_addr. -> - -**[v1: net-next: net: make SO_BUSY_POLL available to all users](http://lore.kernel.org/netdev/20230406194634.1804691-1-edumazet@google.com/)** - -> After commit 217f69743681 ("net: busy-poll: allow preemption -> in sk_busy_loop()"), a thread willing to use busy polling -> is not hurting other threads anymore in a non preempt kernel. -> -> I think it is safe to remove CAP_NET_ADMIN check. -> - -**[[PATCH net-next RFC v4 0/5] net: Make MAC/PHY time stamping selectable](http://lore.kernel.org/netdev/20230406173308.401924-1-kory.maincent@bootlin.com/)** - -> Up until now, there was no way to let the user select the layer at -> which time stamping occurs. The stack assumed that PHY time stamping -> is always preferred, but some MAC/PHY combinations were buggy. -> -> This series aims to allow the user to select the desired layer -> administratively. -> - -**[v1: net-next: net: stmmac: dwmac-anarion: address issues flagged by sparse](http://lore.kernel.org/netdev/20230406-dwmac-anarion-sparse-v1-0-b0c866c8be9d@kernel.org/)** - -> Two minor enhancements to dwmac-anarion to address issues flagged by -> sparse. -> -> 1. Always return struct anarion_gmac * from anarion_config_dt() -> 2. Add __iomem annotation to register base -> -> No functional change intended. -> Compile tested only. -> - -**[v1: io_uring: Pass whole sqe to commands](http://lore.kernel.org/netdev/20230406165705.3161734-1-leitao@debian.org/)** - -> Currently uring CMD operation relies on having large SQEs, but future -> operations might want to use normal SQE. -> -> The io_uring_cmd currently only saves the payload (cmd) part of the SQE, -> but, for commands that use normal SQE size, it might be necessary to -> access the initial SQE fields outside of the payload/cmd block. So, -> saves the whole SQE other than just the pdu. -> - -**[v1: bpf-next: net/smc: Introduce BPF injection capability](http://lore.kernel.org/netdev/1680795034-86384-1-git-send-email-alibuda@linux.alibaba.com/)** - -> This patches attempt to introduce BPF injection capability for SMC, -> and add selftest to ensure code stability. -> -> As we all know that the SMC protocol is not suitable for all scenarios, -> especially for short-lived. However, for most applications, they cannot -> guarantee that there are no such scenarios at all. Therefore, apps -> may need some specific strategies to decide shall we need to use SMC -> or not, for example, apps can limit the scope of the SMC to a specific -> IP address or port. -> - -**[v1: add initial io_uring_cmd support for sockets](http://lore.kernel.org/netdev/20230406144330.1932798-1-leitao@debian.org/)** - -> This patchset creates the initial plumbing for a io_uring command for -> sockets. -> -> For now, create two uring commands for sockets, SOCKET_URING_OP_SIOCOUTQ -> and SOCKET_URING_OP_SIOCINQ. They are similar to ioctl operations -> SIOCOUTQ and SIOCINQ. In fact, the code on the protocol side itself is -> heavily based on the ioctl operations. -> - -**[v1: next: wifi: mt76: Replace zero-length array with flexible-array member](http://lore.kernel.org/netdev/ZC7X7KCb+JEkPe5D@work/)** - -> Zero-length arrays are deprecated [1] and have to be replaced by C99 -> flexible-array members. -> -> This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines -> on memcpy() and help to make progress towards globally enabling -> -fstrict-flex-arrays=3 [2] -> - -#### 安全增强 - -**[v2: Tab P11 features](http://lore.kernel.org/linux-hardening/20230406-topic-lenovo_features-v2-0-625d7cb4a944@linaro.org/)** - -**[v2: fortify: Add KUnit tests for runtime overflows](http://lore.kernel.org/linux-hardening/20230407191904.gonna.522-kees@kernel.org/)** - -> This series adds KUnit tests for the CONFIG_FORTIFY_SOURCE behavior of the -> standard C string functions, and for the strcat() family of functions, -> as those were updated during refactoring. Finally, fortification error -> messages are improved to give more context for the failure condition. -> - -**[v1: next: s390/fcx: Replace zero-length array with flexible-array member](http://lore.kernel.org/linux-hardening/ZC7XT5prvoE4Yunm@work/)** - -> Zero-length arrays are deprecated [1] and have to be replaced by C99 -> flexible-array members. -> -> This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines -> on memcpy() and help to make progress towards globally enabling -> -fstrict-flex-arrays=3 [2] -> - -**[v1: next: s390/diag: Replace zero-length array with flexible-array member](http://lore.kernel.org/linux-hardening/ZC7XGpUtVhqlRLhH@work/)** - -> Zero-length arrays are deprecated [1] and have to be replaced by C99 -> flexible-array members. -> -> This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines -> on memcpy() and help to make progress towards globally enabling -> -fstrict-flex-arrays=3 [2] -> - -**[v2: ubsan: Tighten UBSAN_BOUNDS on GCC](http://lore.kernel.org/linux-hardening/20230405022356.gonna.338-kees@kernel.org/)** - -> The use of -fsanitize=bounds on GCC will ignore some trailing arrays, -> leaving a gap in coverage. Switch to using -fsanitize=bounds-strict to -> match Clang's stricter behavior. -> - -#### 异步 IO - -**[v2: optimise resheduling due to deferred tw](http://lore.kernel.org/io-uring/cover.1680782016.git.asml.silence@gmail.com/)** - -> io_uring extensively uses task_work, but when a task is waiting -> every new queued task_work batch will try to wake it up and so -> cause lots of scheduling activity. This series optimises it, -> specifically applied for rw completions and send-zc notifications -> for now, and will helpful for further optimisations. -> - -**[v1: ublk: read any SQE values upfront](http://lore.kernel.org/io-uring/4ea9c4da-5eb8-c9b1-46de-93697291baa5@kernel.dk/)** - -> Since SQE memory is shared with userspace, we should only be reading it -> once. We cannot read it multiple times, particularly when it's read once -> for validation and then read again for the actual use. -> - -#### Rust For Linux - -**[v7: Rust pin-init API for pinned initialization of structs](http://lore.kernel.org/rust-for-linux/20230408122429.1103522-1-y86-dev@protonmail.com/)** - -> This is the seventh version of the pin-init API. See [1] for v6. -> -> The tree at [2] contains these patches applied on top of 6.3-rc1. -> The Rust-doc documentation of the pin-init API can be found at [3]. -> -> These patches are a long way coming, since I held a presentation on -> safe pinned initialization at Kangrejos [4]. And my discovery of this -> problem was almost a year ago [5]. -> - -**[v1: Initial Rust V4L2 support](http://lore.kernel.org/rust-for-linux/20230406215615.122099-1-daniel.almeida@collabora.com/)** - -> media subsystem. -> -> It adds just enough support to write a clone of the virtio-camera -> prototype written by my colleague, Dmitry Osipenko, available at [0]. -> -> Basically, there's support for video_device_register, -> v4l2_device_register and for some ioctls in v4l2_ioctl_ops. There is -> also some initial vb2 support, alongside some wrappers for some types -> found in videodev2.h. -> - -**[v1: v6.1: rust: types: add `Opaque::pin_init`](http://lore.kernel.org/rust-for-linux/20230406065546.787669-1-y86-dev@protonmail.com/)** - -> Add support for pin-init in combination with `Opaque`, the `pin_init` -> function initializes the contents via a user-supplied initializer for -> `T`. -> - -**[v2: rust: virtio: add virtio support](http://lore.kernel.org/rust-for-linux/20230405201416.395840-1-daniel.almeida@collabora.com/)** - -> This used to be a single patch, but I split it into two with the -> addition of struct Scatterlist. -> -> Again a bit new with Rust submissions. I was told by Gary Guo to -> rebase on top of rust-next, but it seems *very* behind? -> - -#### BPF - -**[v2: bpf-next: Introduce BPF_MA_REUSE_AFTER_RCU_GP](http://lore.kernel.org/bpf/20230408141846.1878768-1-houtao@huaweicloud.com/)** - -> As discussed in v1, currently the freed objects in bpf memory allocator -> may be reused immediately by the new allocation, it introduces -> use-after-bpf-ma-free problem for non-preallocated hash map and makes -> lookup procedure return incorrect result. The immediate reuse also makes -> introducing new use case more difficult (e.g. qp-trie). -> - -**[v1: bpf-next: selftests/bpf: Use PERF_COUNT_HW_CPU_CYCLES event for get_branch_snapshot](http://lore.kernel.org/bpf/20230407190130.2093736-1-song@kernel.org/)** - -> perf_event with type=PERF_TYPE_RAW and config=0x1b00 turned out to be not -> reliable in ensuring LBR is active. Thus, test_progs:get_branch_snapshot is -> not reliable in some systems. Replace it with PERF_COUNT_HW_CPU_CYCLES -> event, which gives more consistent results. -> - -**[v1: bpf-next: selftests/bpf: Prevent infinite loop in veristat when base file is too short](http://lore.kernel.org/bpf/20230407154125.896927-1-eddyz87@gmail.com/)** - -> The loop is caused by handle_comparison_mode() not checking if `base` -> variable points to `fallback_stats` prior advancing joined results -> using `base`. -> - -**[v1: bpf-next: bpftool: set program type only if it differs from the desired one](http://lore.kernel.org/bpf/20230407081427.2621590-1-weiyongjun@huaweicloud.com/)** - -> After commit d6e6286a12e7 ("libbpf: disassociate section handler on explicit -> bpf_program__set_type() call"), bpf_program__set_type() will force cleanup -> the program's SEC() definition, this commit fixed the test helper but missed -> the bpftool, which leads to bpftool prog autoattach broken as follows: -> -> $ bpftool prog load spi-xfer-r1v1.o /sys/fs/bpf/test autoattach -> Program spi_xfer_r1v1 does not support autoattach, falling back to pinning -> -> This patch fix bpftool to set program type only if it differs. -> - -**[v1: BPF: replace no-need function call with saved value](http://lore.kernel.org/bpf/20230407064837.32015-1-zhongjun@uniontech.com/)** - -> The var 'is_priv' is already there, needn't call bpf_capable() -> again. -> Applying this patch, to refine the codes making it robust and optimal. -> - -**[v1: BPF: properly precedence of exclusive attr flags](http://lore.kernel.org/bpf/20230407054235.31726-1-zhongjun@uniontech.com/)** - -> BPF_F_STRICT_ALIGNMENT and BPF_F_ANY_ALIGNMENT are exclusive -> flags. Intuitively the strict one should take higher precedence. -> Applying this patch, make semantics of flags more properly. -> - -**[v1: BPF: replace low-entropy member with macro](http://lore.kernel.org/bpf/20230407033418.2295-1-zhongjun@uniontech.com/)** - -> The member orig_idx is a low-entropy once-init invariable data -> member. It can be replace by a series of macros. -> Replace this member by macros can save memory and cpu-time. -> - -**[v4: bpf-next: BPF verifier rotating log](http://lore.kernel.org/bpf/20230406234205.323208-1-andrii@kernel.org/)** - -> This patch set changes BPF verifier log behavior to behave as a rotating log, -> by default. If user-supplied log buffer is big enough to contain entire -> verifier log output, there is no effective difference. But where previously -> user supplied too small log buffer and would get -ENOSPC error result and the -> beginning part of the verifier log, now there will be no error and user will -> get ending part of verifier log filling up user-supplied log buffer. Which -> is, in absolute majority of cases, is exactly what's useful, relevant, and -> what users want and need, as the ending of the verifier log is containing -> details of verifier failure and relevant state that got us to that failure. So -> this rotating mode is made default, but for some niche advanced debugging -> scenarios it's possible to request old behavior by specifying additional -> BPF_LOG_FIXED (8) flag. -> - -**[v2: bpf-next: bpf: Improve verifier for cond_op and spilled loop index variables](http://lore.kernel.org/bpf/20230406164450.1044952-1-yhs@fb.com/)** - -> LLVM commit [1] introduced hoistMinMax optimization like -> (i < VIRTIO_MAX_SGS) && (i < out_sgs) -> to -> upper = MIN(VIRTIO_MAX_SGS, out_sgs) -> ... i < upper ... -> and caused the verification failure. Commit [2] workarounded the issue by -> adding some bpf assembly code to prohibit the above optimization. -> This patch improved verifier such that verification can succeed without -> the above workaround. -> - -**[v4: bpf-next: xsk: Support UMEM chunk_size > PAGE_SIZE](http://lore.kernel.org/bpf/20230406131806.51332-1-kal.conley@dectris.com/)** - -> The main purpose of this patchset is to add AF_XDP support for UMEM -> chunk sizes > PAGE_SIZE. This is enabled for UMEMs backed by HugeTLB -> pages. -> - -**[v1: powerpc/bpf: populate extable entries only during the last pass](http://lore.kernel.org/bpf/20230406073519.75059-1-hbathini@linux.ibm.com/)** - -> Since commit 85e031154c7c ("powerpc/bpf: Perform complete extra passes -> to update addresses"), two additional passes are performed to avoid -> space and CPU time wastage on powerpc. But these extra passes led to -> WARN_ON_ONCE() hits in bpf_add_extable_entry(). Fix it by not adding -> extable entries during the extra pass. -> - -**[v1: BPF: make verifier 'misconfigured' errors more meaningful](http://lore.kernel.org/bpf/20230406014351.8984-1-zhongjun@uniontech.com/)** - -> There are too many so-called 'misconfigured' errors potentially -> feed back to user-space, that make it very hard to judge on -> a glance the reason a verification failure occurred. -> This patch make those similar error outputs more sensitive and readible. -> - -**[v1: Dynptr Verifier Adjustments](http://lore.kernel.org/bpf/20230406004018.1439952-1-drosen@google.com/)** - -> These patches relax a few verifier requirements around dynptrs. -> -> I was unable to test the patch in 0003 due to unrelated issues compiling the -> bpf selftests, but did run an equivalent local test program. -> - -**[v6: bpf-next: bpf: Support 64-bit pointers to kfuncs](http://lore.kernel.org/bpf/20230405213453.49756-1-iii@linux.ibm.com/)** - -> test_ksyms_module fails to emit a kfunc call targeting a module on -> s390x, because the verifier stores the difference between kfunc -> address and __bpf_call_base in bpf_insn.imm, which is s32, and modules -> are roughly (1 << 42) bytes away from the kernel on s390x. -> -> Fix by keeping BTF id in bpf_insn.imm for BPF_PSEUDO_KFUNC_CALLs, -> and storing the absolute address in bpf_kfunc_desc. -> - -**[v2: bpf: selftests/bpf: Wait for receive in cg_storage_multi test](http://lore.kernel.org/bpf/20230405193354.1956209-1-zhuyifei@google.com/)** - -> In some cases the loopback latency might be large enough, causing -> the assertion on invocations to be run before ingress prog getting -> executed. The assertion would fail and the test would flake. -> - -**[v6: Add ftrace direct call for arm64](http://lore.kernel.org/bpf/20230405180250.2046566-1-revest@chromium.org/)** - -> This series adds ftrace direct call support to arm64. -> This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64. -> -> It is meant to be taken by the arm64 tree but it depends on the -> trace-direct-v6.3-rc3 tag of the linux-trace tree: -> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git -> That tag was created by Steven Rostedt so the arm64 tree can pull the prior work -> this depends on. [1] -> - -**[v1: bpf-next: bpf: add netfilter program type](http://lore.kernel.org/bpf/20230405161116.13565-1-fw@strlen.de/)** - -> Add minimal support to hook bpf programs to netfilter hooks, e.g. -> PREROUTING or FORWARD. -> -> For this the most relevant parts for registering a netfilter -> hook via the in-kernel api are exposed to userspace via bpf_link. -> - -**[v3: bpf-next: bpftool: Add inline annotations when dumping program CFGs](http://lore.kernel.org/bpf/20230405132120.59886-1-quentin@isovalent.com/)** - -> This set contains some improvements for bpftool's "visual" program dump -> option, which produces the control flow graph in a DOT format. The main -> objective is to add support for inline annotations on such graphs, so that -> we can have the C source code for the program showing up alongside the -> instructions, when available. The last commits also make it possible to -> display the line numbers or the bare opcodes in the graph, as supported by -> regular program dumps. -> - -**[v1: bpf-next: selftests: xsk: Disable IPv6 on VETH1](http://lore.kernel.org/bpf/20230405082905.6303-1-kal.conley@dectris.com/)** - -> This change fixes flakiness in the BIDIRECTIONAL test: -> -> # [is_pkt_valid] expected length [60], got length [90] -> not ok 1 FAIL: SKB BUSY-POLL BIDIRECTIONAL -> -> When IPv6 is enabled, the interface will periodically send MLDv1 and -> MLDv2 packets. These packets can cause the BIDIRECTIONAL test to fail -> since it uses VETH0 for RX. -> - -**[v1: bpf-next: Exceptions - 1/2](http://lore.kernel.org/bpf/20230405004239.1375399-1-memxor@gmail.com/)** - -> This series implements the bare minimum support for basic BPF -> exceptions. This is a feature to allow programs to simply throw a -> valueless exception within a BPF program to abort its execution. -> Automatic cleanup of held resources and generation of landing pads to -> unwind program state will be done in the part 2 set. -> - -**[v1: bpf-next: bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set'.](http://lore.kernel.org/bpf/20230404060959.2259448-1-martin.lau@linux.dev/)** - -> This set (https://lore.kernel.org/bpf/https://lore.kernel.org/bpf/500d452b-f9d5-d01f-d365-2949c4fd37ab@linux.dev/) -> needs to limit bpf_sock_destroy kfunc to BPF_TRACE_ITER. -> In the earlier reply, I thought of adding a BTF_KFUNC_HOOK_TRACING_ITER. -> - -**[v1: bpf-next: bpf: Follow up to RCU enforcement in the verifier.](http://lore.kernel.org/bpf/20230404045029.82870-1-alexei.starovoitov@gmail.com/)** - -> The patch set is addressing a fallout from -> commit 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier.") -> It was too aggressive with PTR_UNTRUSTED marks. -> Patches 1-6 are cleanup and adding verifier smartness to address real -> use cases in bpf programs that broke with too aggressive PTR_UNTRUSTED. -> The partial revert is done in patch 7 anyway. -> - -### 周边技术动态 - -#### Qemu - -**[v1: target/riscv: Mask the implicitly enabled extensions in isa_string based on priv version](http://lore.kernel.org/qemu-devel/20230407033014.40901-1-liweiwei@iscas.ac.cn/)** - -> Using implicitly enabled extensions such as Zca/Zcf/Zcd instead of their -> super extensions can simplify the extension related check. However, they -> may have higher priv version than their super extensions. So we should mask -> them in the isa_string based on priv version to make them invisible to user -> if the specified priv version is lower than their minimal priv version. -> - -**[v4: hw/riscv: Add ACT related support](http://lore.kernel.org/qemu-devel/20230405095720.75848-1-liweiwei@iscas.ac.cn/)** - -> ACT tests play an important role in riscv tests. This patch tries to -> add related support to run ACT tests. -> -> The port is available here: -> https://github.com/plctlab/plct-qemu/tree/plct-act-upstream-v2 -> - -**[riscv: g_assert for NULL predicate?](http://lore.kernel.org/qemu-devel/e9de7676-b669-4f4e-e3e0-e57fb58b7bd7@intel.com/)** - -> Recent commit 0ee342256af92 switches to g_assert() for the predicate() -> NULL check from returning RISCV_EXCP_ILLEGAL_INST. Qemu doesn't have -> predicate() for un-allocated CSRs, then a buggy userspace application -> reads CSR such as 0x4 causes qemu to exit, I don't think it's expected. -> -> .global _start -> -> .text -> _start: -> csrr t3, 0x4 -> - -#### U-Boot - -**[v1: riscv: Correct a comment in io.h](http://lore.kernel.org/u-boot/20230403033732.2812219-1-bmeng@tinylab.org/)** - -> Replace NDS32 with RISC-V in the comments. -> - -**[v1: riscv: Add a 64-bit image type](http://lore.kernel.org/u-boot/20230402202813.2341959-1-sjg@chromium.org/)** - -> At present it is not possible to know whether an image can be booted by -> a 32- or 64-bit bootloader. This means that U-Boot may attempt to boot -> the wrong image. This may cause a crash which might be hard to debug. -> - -## 20230402:第 40 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v1: RISC-V: KVM: Allow Zbb extension for Guest/VM](http://lore.kernel.org/linux-riscv/20230401112730.2105240-1-apatel@ventanamicro.com/)** - -> We extend the KVM ISA extension ONE_REG interface to allow KVM -> user space to detect and enable Zbb extension for Guest/VM. -> - -**[v7: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230401111934.130844-1-hal.feng@starfivetech.com/)** - -> This patch series adds basic clock, reset & DT support for StarFive -> JH7110 SoC. -> -> @Stephen and @Conor, I have made this series start with the shared -> dt-bindings, so it will be easier to merge. -> - -**[v4: Use dma_default_coherent for devicetree default coherency](http://lore.kernel.org/linux-riscv/20230401091531.47412-1-jiaxun.yang@flygoat.com/)** - -> This series split out second half of my previous series -> "v1: MIPS DMA coherence fixes". -> -> It intends to use dma_default_coherent to determine the default coherency of -> devicetree probed devices instead of hardcoding it with Kconfig options. -> - -**[v1: riscv: dts: nezha-d1: Add memory](http://lore.kernel.org/linux-riscv/20230331182727.4062790-1-evan@rivosinc.com/)** - -> Add memory info for the D1 Nezha, which seems to be required for it to -> boot with the stock firmware. Note that this hardcodes 1GB, which is -> not technically correct as they also make models with different amounts -> of RAM. Is the firmware supposed to populate this? -> - -**[v4: RISC-V KVM ONE_REG interface for SBI](http://lore.kernel.org/linux-riscv/20230331174542.2067560-1-apatel@ventanamicro.com/)** - -> This series first does few cleanups/fixes (PATCH1 to PATCH5) and adds -> ONE-REG interface for customizing the SBI interface visible to the -> Guest/VM. -> -> The testing of this series has been done with KVMTOOL changes in -> riscv_sbi_imp_v1 branch at: -> https://github.com/avpatel/kvmtool.git -> - -**[v10: function_graph: Support recording and printing the return value of function](http://lore.kernel.org/linux-riscv/cover.1680265828.git.pengdonglin@sangfor.com.cn/)** - -> When using the function_graph tracer to analyze system call failures, -> it can be time-consuming to analyze the trace logs and locate the kernel -> function that first returns an error. This change aims to simplify the -> process by recording the function return value to the 'retval' member of -> 'ftrace_graph_ent' and printing it when outputing the trace log. -> - -**[v7: RISC-V non-coherent function pointer based CMO + non-coherent DMA support for AX45MP](http://lore.kernel.org/linux-riscv/20230330204217.47666-1-prabhakar.mahadev-lad.rj@bp.renesas.com/)** - -> On the Andes AX45MP core, cache coherency is a specification option so it -> may not be supported. In this case DMA will fail. To get around with this -> issue this patch series does the below: -> -> 1] Andes alternative ports is implemented as errata which checks if the IOCP -> is missing and only then applies to CMO errata. One vendor specific SBI EXT -> (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of errata. -> - -**[v1: dt-bindings: move cache controller bindings to a cache directory](http://lore.kernel.org/linux-riscv/20230330173255.109731-1-conor@kernel.org/)** - -> There's a bunch of bindings for (mostly l2) cache controllers -> scattered to the four winds, move them to a common directory. -> I renamed the freescale l2cache.txt file, as while that might make sense -> when the parent dir is fsl, it's confusing after the move. -> The two Marvell bindings have had a "marvell," prefix added to match -> their compatibles. -> - -**[v15: Microchip Soft IP corePWM driver](http://lore.kernel.org/linux-riscv/20230330071203.286972-1-conor.dooley@microchip.com/)** - -> Uwe & I had a long back and forth about period calculations on v13, -> my ultimate conclusion being that, after some testing of the "corrected" -> calculation in hardware, the original calculation was correct. -> I think we had gotten sucked into discussion the calculation of the -> period itself, when we were in fact trying to calculate a bound on the -> period instead. That discussion is here: -> https://lore.kernel.org/linux-pwm/Y+ow8tfAHo1yv1XL@wendy/ -> - -**[v8: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230330064321.1008373-1-jeeheng.sia@starfivetech.com/)** - -> This series adds RISC-V Hibernation/suspend to disk support. -> Low level Arch functions were created to support hibernation. -> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write -> cpu state onto the stack, then calling swsusp_save() to save the memory -> image. -> - -**[v1: iommu: PGTABLE_LPAE is also for RISCV](http://lore.kernel.org/linux-riscv/20230330060105.29460-1-rdunlap@infradead.org/)** - -> On riscv64, linux-next-20233030 (and for several days earlier), -> there is a kconfig warning: -> -> WARNING: unmet direct dependencies detected for IOMMU_IO_PGTABLE_LPAE -> Depends on [n]: IOMMU_SUPPORT [=y] && (ARM || ARM64 || COMPILE_TEST [=n]) && !GENERIC_ATOMIC64 [=n] -> Selected by [y]: -> - IPMMU_VMSA [=y] && IOMMU_SUPPORT [=y] && (ARCH_RENESAS [=y] || COMPILE_TEST [=n]) && !GENERIC_ATOMIC64 [=n] -> - -**[v1: DT header disentangling, part 1](http://lore.kernel.org/linux-riscv/20230329-dt-cpu-header-cleanups-v1-0-581e2605fe47@kernel.org/)** - -> This is the first of a series of clean-ups to disentangle the DT -> includes. There's a decade plus old comment in of_device.h: -> -> #include /* temporary until merge */ -> - -**[v1: Add TDM audio on StarFive JH7110](http://lore.kernel.org/linux-riscv/20230329153320.31390-1-walker.chen@starfivetech.com/)** - -> This patchset adds TDM audio driver for the StarFive JH7110 SoC. The -> first patch adds device tree binding for TDM module. The second patch -> adds tdm driver support for JH7110 SoC. The last patch adds device node -> of tdm and sound card to JH7110 dts. -> - -**[v4: Implement GCM ghash using Zbc and Zbkb extensions](http://lore.kernel.org/linux-riscv/20230329140642.2186644-1-heiko.stuebner@vrull.eu/)** - -> This was originally part of my vector crypto series, but was part -> of a separate openssl merge request implementing GCM ghash as using -> non-vector extensions. -> - -**[v2: riscv: Dump user opcode bytes on fatal faults](http://lore.kernel.org/linux-riscv/20230329082950.726-1-cuiyunhui@bytedance.com/)** - -> We encountered such a problem that when the system starts to execute -> init, init exits unexpectedly with error message: "unhandled signal 4 -> code 0x1 ...". -> -> We are more curious about which instruction execution caused the -> exception. After dumping it through show_opcodes(), we found that it -> was caused by a floating-point instruction. -> - -**[v2: riscv: Introduce KASLR](http://lore.kernel.org/linux-riscv/20230329052926.69632-1-alexghiti@rivosinc.com/)** - -> The following KASLR implementation allows to randomize the kernel mapping: -> -> - virtually: we expect the bootloader to provide a seed in the device-tree -> - physically: only implemented in the EFI stub, it relies on the firmware to -> provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation -> hence the patch 3 factorizes KASLR related functions for riscv to take -> advantage. -> - -**[v9: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230329050951.66085-1-alexghiti@rivosinc.com/)** - -> This new version gets rid of the limitation that prevented KASAN kernels -> to use the newly introduced parameters. -> -> While looking into KASLR, I fell onto commit aacd149b6238 ("arm64: head: -> avoid relocating the kernel twice for KASLR"): it allows to use the fdt -> functions very early in the boot process with KASAN enabled by simply -> compiling a new version of those functions without instrumentation. -> - -**[v9: Introduce 64b relocatable kernel](http://lore.kernel.org/linux-riscv/20230329045329.64565-1-alexghiti@rivosinc.com/)** - -> After multiple attempts, this patchset is now based on the fact that the -> 64b kernel mapping was moved outside the linear mapping. -> -> The first patch allows to build relocatable kernels but is not selected -> by default. That patch is a requirement for KASLR. -> The second and third patches take advantage of an already existing powerpc -> script that checks relocations at compile-time, and uses it for riscv. -> - -**[v2: -next: support allocating crashkernel above 4G explicitly on riscv](http://lore.kernel.org/linux-riscv/20230328115150.2700016-1-chenjiahao16@huawei.com/)** - -> On riscv, the current crash kernel allocation logic is trying to -> allocate within 32bit addressible memory region by default, if -> failed, try to allocate without 4G restriction. -> -> In need of saving DMA zone memory while allocating a relatively large -> crash kernel region, allocating the reserved memory top down in -> high memory, without overlapping the DMA zone, is a mature solution. -> Hence this patchset introduces the parameter option crashkernel=X,[high,low]. -> - -**[v18: RISC-V IPI Improvements](http://lore.kernel.org/linux-riscv/20230328035223.1480939-1-apatel@ventanamicro.com/)** - -> This series aims to improve IPI support in Linux RISC-V in following ways: -> 1) Treat IPIs as normal per-CPU interrupts instead of having custom RISC-V -> specific hooks. This also makes Linux RISC-V IPI support aligned with -> other architectures. -> 2) Remote TLB flushes and icache flushes should prefer local IPIs instead -> of SBI calls whenever we have specialized hardware (such as RISC-V AIA -> IMSIC and RISC-V SWI) which allows S-mode software to directly inject -> IPIs without any assistance from M-mode runtime firmware. -> - -**[v17: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230327164941.20491-1-andy.chiu@sifive.com/)** - -> This patchset is implemented based on vector 1.0 spec to add vector support -> in riscv Linux kernel. There are some assumptions for this implementations. -> -> 1. We assume all harts has the same ISA in the system. -> 2. We disable vector in both kernel andy user space [1] by default. Only -> enable an user's vector after an illegal instruction trap where it -> actually starts executing vector (the first-use trap [2]). -> 3. We detect "riscv,isa" to determine whether vector is support or not. -> - -**[v1: dma-mapping: unify support for cache flushes](http://lore.kernel.org/linux-riscv/20230327121317.4081816-1-arnd@kernel.org/)** - -> After a long discussion about adding SoC specific semantics for when -> to flush caches in drivers/soc/ drivers that we determined to be -> fundamentally flawed[1], I volunteered to try to move that logic into -> architecture-independent code and make all existing architectures do -> the same thing. -> - -**[v1: riscv/fault: Dump user opcode bytes on fatal faults](http://lore.kernel.org/linux-riscv/20230327115642.1610-1-cuiyunhui@bytedance.com/)** - -> We encountered such a problem(logs are below). We are more curious about -> which instruction execution caused the exception. After dumping it -> through show_opcodes(), we found that it was caused by a floating-point -> instruction. -> -> In this way, we found the problem: in the system bringup , it is -> precisely that we have not enabled the floating point function. -> - -#### 进程调度 - -**[v1: sched: Introduce per-mm/cpu concurrency id state](http://lore.kernel.org/lkml/20230330230911.228720-1-mathieu.desnoyers@efficios.com/)** - -> Keep track of the currently allocated mm_cid for each mm/cpu rather than -> freeing them immediately. This eliminates most atomic ops when context -> switching back and forth between threads belonging to different memory -> spaces in multi-threaded scenarios (many processes, each with many -> threads). -> - -**[v1: sched/deadline: cpuset: Rework DEADLINE bandwidth restoration](http://lore.kernel.org/lkml/20230329125558.255239-1-juri.lelli@redhat.com/)** - -> Qais reported [1] that iterating over all tasks when rebuilding root -> domains for finding out which ones are DEADLINE and need their bandwidth -> correctly restored on such root domains can be a costly operation (10+ -> ms delays on suspend-resume). He proposed we skip rebuilding root -> domains for certain operations, but that approach seemed arch specific -> and possibly prone to errors, as paths that ultimately trigger a rebuild -> might be quite convoluted (thanks Qais for spending time on this!). -> - -**[v1: perf sched: sync task state macros with kernel](http://lore.kernel.org/lkml/20230329035203.6194-1-zegao2021@gmail.com/)** - -> commit 8ef9925b02c2 ("sched/debug: Add explicit TASK_PARKED printing") -> changes some task state macros, this patch makes perf-sched in sync -> - -**[v1: sched: EEVDF using latency-nice](http://lore.kernel.org/lkml/20230328092622.062917921@infradead.org/)** - -> Many changes since last time; most notably it now fully replaces CFS and uses -> lag based placement for migrations. Smaller changes include: -> -> - uses scale_load_down() for avg_vruntime; I measured the max delta to be -> 44 -> bits on a system/cgroup based kernel build. -> - fixed a bunch of reweight / cgroup placement issues -> - adaptive placement strategy for smaller slices -> - rename se->lag to se->vlag -> - -**[v3: sched: print parent comm in sched_show_task()](http://lore.kernel.org/lkml/20230328034438.GA8421@didi-ThinkCentre-M930t-N000/)** - -> Knowing who the parent is might be useful for debugging. -> For example, we can sometimes resolve kernel hung tasks by stopping -> the person who begins those hung tasks. -> With the parent's name printed in sched_show_task(), -> it might be helpful to let people know which "service" should be operated. -> Also, we move the parent info to a following new line. -> It would be better to solve the situation when the task -> is not alive and we could not get information about the parent. -> - -**[v1: sched/cputime: Make cputime_adjust() more accurate](http://lore.kernel.org/lkml/20230328024827.12187-1-maxing.lan@bytedance.com/)** - -> In the current algorithm of cputime_adjust(), the accumulated stime and -> utime are used to divide the accumulated rtime. When the value is very -> large, it is easy for the stime or utime not to be updated. It can cause -> sys or user utilization to be zero for long time. -> - -**[v2: selftests: sched: Add more core schedule prctl calls](http://lore.kernel.org/lkml/20230327201855.121821-1-ivan.orlov0322@gmail.com/)** - -> The core sched kselftest makes prctl calls only with correct -> parameters. This patch will extend this test with more core -> schedule prctl calls with wrong parameters to increase code -> coverage. -> - -**[v1: sched: Introduce mm_cid runqueue cache](http://lore.kernel.org/lkml/20230327195318.137094-1-mathieu.desnoyers@efficios.com/)** - -> Introduce a per-runqueue cache containing { mm, mm_cid } entries. -> Keep track of the recently allocated mm_cid for each mm rather than -> freeing them immediately. This eliminates most atomic ops when -> context switching back and forth between threads belonging to -> - -**[v1: sched/fair: Make tg->load_avg per node](http://lore.kernel.org/lkml/20230327053955.GA570404@ziqianlu-desk2/)** - -> When using sysbench to benchmark Postgres in a single docker instance -> with sysbench's nr_threads set to nr_cpu, it is observed there are times -> update_cfs_group() and update_load_avg() shows noticeable overhead on -> cpus of one node of a 2sockets/112core/224cpu Intel Sapphire Rapids: -> - -**[GIT PULL: sched/urgent for v6.3-rc4](http://lore.kernel.org/lkml/20230326130354.GDZCBCum4r9MJ8thhi@fat_crate.local/)** - -> please pull an urgent sched fix for 6.3. -> -> Thx. -> - -**[v1: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230325185514.425745-1-yury.norov@gmail.com/)** - -> for_each_cpu() is widely used in kernel, and it's beneficial to create -> a NUMA-aware version of the macro. -> -> Recently added for_each_numa_hop_mask() works, but switching existing -> codebase to it is not an easy process. -> - -#### 内存管理 - -**[v3: Providing mount in memfd_restricted() syscall](http://lore.kernel.org/linux-mm/cover.1680306489.git.ackerleytng@google.com/)** - -> This patchset builds upon the memfd_restricted() system call that was -> discussed in the ‘KVM: mm: fd-based approach for supporting KVM’ patch -> series, at -> https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/ -> - -**[v3: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES)](http://lore.kernel.org/linux-mm/20230331160914.1608208-1-dhowells@redhat.com/)** - -> I've been looking at how to make pipes handle the splicing in of multipage -> folios and also looking to see if I could implement a suggestion from Willy -> that pipe_buffers could perhaps hold a list of pages (which could make -> splicing simpler - an entire splice segment would go in a single -> pipe_buffer). -> - -**[v5: userfaultfd: convert userfaultfd functions to use folios](http://lore.kernel.org/linux-mm/20230331093937.945725-1-zhangpeng362@huawei.com/)** - -> This patch series converts several userfaultfd functions to use folios. -> -> Change log: -> - -**[v3: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230331070818.2792558-1-yosryahmed@google.com/)** - -> Upon running some proactive reclaim tests using memory.reclaim, we -> noticed some tests flaking where writing to memory.reclaim would be -> successful even though we did not reclaim the requested amount fully. -> Looking further into it, I discovered that *sometimes* we over-report -> the number of reclaimed pages in memcg reclaim. -> - -**[v1: memcg: Set memory min, low, high values along with max](http://lore.kernel.org/linux-mm/20230330202232.355471-1-shaun.tancheff@gmail.com/)** - -> memcg-v1 does not expose memory min, low, and high. -> -> These values should to be set to reasonable non-zero values -> when max is set. -> -> This patch sets them to 10%, 20% and 80% respective to max. -> - -**[v3: memcg: avoid flushing stats atomically where possible](http://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/)** - -> rstat flushing is an expensive operation that scales with the number of -> cpus and the number of cgroups in the system. The purpose of this series -> is to minimize the contexts where we flush stats atomically. -> - -**[v1: selftests/mm: Split / Refactor userfault test](http://lore.kernel.org/linux-mm/20230330155707.3106228-1-peterx@redhat.com/)** - -> [Sorry for the test case bomb] -> -> This patchset splits userfaultfd.c into two tests: -> -> - uffd-stress: the "vanilla", old and powerful stress test -> - uffd-unit-tests: all the unit tests will be moved here -> -> This is on my todo list for a long time but I never did it for real. The -> uffd test is growing into a small and cute monster. I start to notice it's -> going harder to maintain such a test and make it useful. -> - -**[v2: bio: check return values of bio_add_page](http://lore.kernel.org/linux-mm/cover.1680172791.git.johannes.thumshirn@wdc.com/)** - -> We have two functions for adding a page to a bio, __bio_add_page() which is -> used to add a single page to a freshly created bio and bio_add_page() which is -> used to add a page to an existing bio. -> -> While __bio_add_page() is expected to succeed, bio_add_page() can fail. -> -> This series converts the callers of bio_add_page() which can easily use -> __bio_add_page() to using it and checks the return of bio_add_page() for -> callers that don't work on a freshly created bio. -> -> Lastly it marks bio_add_page() as __must_check so we don't have to go again -> and audit all callers. -> - -**[v1: mm: ksm: support hwpoison for ksm page](http://lore.kernel.org/linux-mm/20230330074501.205092-1-xialonglong1@huawei.com/)** - -> Currently, ksm does not support hwpoison. As ksm is being used more widely -> for deduplication at the system level, container level, and process level, -> supporting hwpoison for ksm has become increasingly important. However, ksm -> pages were not processed by hwpoison in 2009 [1]. -> - -**[v1: kmemleak-test: Optimize kmemleak_test.c build flow](http://lore.kernel.org/linux-mm/20230330060904.292975-1-gehao@kylinos.cn/)** - -> Now kmemleak-test.c is moved to samples directory, -> if CONFIG_DEBUG_KMEMLEAK_TEST=m,but CONFIG_SAMPLES -> is not set,it will be meaningless. -> -> So we will remove CONFIG_DEBUG_KMEMLEAK_TEST and -> add CONFIG_SAMPLE_KMEMLEAK which in samples directory -> to control kmemleak-test.c build or not -> - -**[v3: regmap: Add basic maple tree register cache](http://lore.kernel.org/linux-mm/20230325-regcache-maple-v3-0-23e271f93dc7@kernel.org/)** - -> The current state of the art for sparse register maps is the -> rbtree cache. This works well for most applications but isn't -> always ideal for sparser register maps since the rbtree can get -> deep, requiring a lot of walking. Fortunately the kernel has a -> data structure intended to address this very problem, the maple -> tree. Provide an initial implementation of a register cache -> based on the maple tree to start taking advantage of it. -> - -**[v12: Memory poison recovery in khugepaged collapsing](http://lore.kernel.org/linux-mm/20230329151121.949896-1-jiaqiyan@google.com/)** - -> Memory DIMMs are subject to multi-bit flips, i.e. memory errors. -> As memory size and density increase, the chances of and number of -> memory errors increase. The increasing size and density of server -> RAM in the data center and cloud have shown increased uncorrectable -> memory errors. There are already mechanisms in the kernel to recover -> from uncorrectable memory errors. This series of patches provides -> the recovery mechanism for the particular kernel agent khugepaged -> when it collapses memory pages. -> - -#### 文件系统 - -**[v1: fs: consolidate duplicate dt_type helpers](http://lore.kernel.org/linux-fsdevel/20230330104144.75547-1-jlayton@kernel.org/)** - -> There are three copies of the same dt_type helper sprinkled around the -> tree. Convert them to use the common fs_umode_to_dtype function instead, -> which has the added advantage of properly returning DT_UNKNOWN when -> given a mode that contains an unrecognized type. -> - -**[v2: fs: consolidate dt_type() helper definitions](http://lore.kernel.org/linux-fsdevel/20230330000157.297698-1-jlayton@kernel.org/)** - -> There are 4 functions named dt_type() in the kernel. There is also the -> S_DT macro in fs_types.h. -> -> Replace the S_DT macro with a static inline named dt_type, and have all -> of the existing copies call that instead. The v9fs helper is renamed to -> distinguish it from the others. -> - -**[v8: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20230327084103.21601-1-anuj20.g@samsung.com/)** - -> The patch series covers the points discussed in November 2021 virtual -> call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0]. -> We have covered the initial agreed requirements in this patchset and -> further additional features suggested by community. -> Patchset borrows Mikulas's token based approach for 2 bdev -> implementation. -> - -**[v1: zonefs: Always invalidate last cache page on append write](http://lore.kernel.org/linux-fsdevel/20230329055823.1677193-1-damien.lemoal@opensource.wdc.com/)** - -> When a direct append write is executed, the append offset may correspond -> to the last page of an inode which might have been cached already by -> buffered reads, page faults with mmap-read or non-direct readahead. -> To ensure that the on-disk and cached data is consistant for such last -> cached page, make sure to always invalidate it in -> zonefs_file_dio_append(). This invalidation will always be a no-op when -> the device block size is equal to the page size (e.g. 4K). -> - -#### 网络设备 - -**[v1: bpf-next: Add FOU support for externally controlled ipip devices](http://lore.kernel.org/netdev/cover.1680379518.git.cehrig@cloudflare.com/)** - -> This patch set adds support for using FOU or GUE encapsulation with -> an ipip device operating in collect-metadata mode and a set of kfuncs -> for controlling encap parameters exposed to a BPF tc-hook. -> -> BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses) -> in the ingress path of an externally controlled tunnel interface via -> the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be -> redirected to the same or a different externally controlled tunnel -> interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt} -> helpers and a call to bpf_redirect. This enables us to redirect packets -> between tunnel interfaces - and potentially change the encapsulation -> type - using only a single BPF program. -> - -**[v1: net-next: ice: lower CPU usage with GNSS](http://lore.kernel.org/netdev/20230401172659.38508-1-mschmidt@redhat.com/)** - -> This series lowers the CPU usage of the ice driver when using its -> provided /dev/gnss*. -> -> Intel engineers, in addition to reviewing the patches for correctness, -> please also consider my doubts expressed in the descriptions of patches -> 1 and 2. There may be better solutions possible. -> - -**[v1: net: ethernet: mtk_eth_soc: use be32 type to store be32 values](http://lore.kernel.org/netdev/20230401-mtk_eth_soc-sparse-v1-1-84e9fc7b8eab@kernel.org/)** - -> Perhaps there is a nicer way to handle this but the code -> calls for converting an array of host byte order 32bit values -> to big endian 32bit values: an ipv6 address to be pretty printed. -> -> Use a sparse-friendly array of be32 to store these values. -> -> Also make use of the cpu_to_be32_array helper rather -> than open coding the conversion. -> - -**[v3: Add EMAC3 support for sa8540p-ride (devicetree/clk bits)](http://lore.kernel.org/netdev/20230331215804.783439-1-ahalaney@redhat.com/)** - -> This is a forward port / upstream refactor of code delivered -> downstream by Qualcomm over at [0] to enable the DWMAC5 based -> implementation called EMAC3 on the sa8540p-ride dev board. -> - -**[v3: net-next: Add EMAC3 support for sa8540p-ride](http://lore.kernel.org/netdev/20230331214549.756660-1-ahalaney@redhat.com/)** - -> This is a forward port / upstream refactor of code delivered -> downstream by Qualcomm over at [0] to enable the DWMAC5 based -> implementation called EMAC3 on the sa8540p-ride dev board. -> - -**[v5: bpf: XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash](http://lore.kernel.org/netdev/168028882260.4030852.1100965689789226162.stgit@firesoul/)** - -> Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value, -> but doesn't provide information on the RSS hash type (part of 6.3-rc). -> -> This patchset proposal is to change the function call signature via adding -> a pointer value argument for providing the RSS hash type. -> - -**[v1: iproute2-next: tc: m_tunnel_key: support code for "nofrag" tunnels](http://lore.kernel.org/netdev/c43213bed30edfa0d6fa1b084e4d48c26417edc9.1680281221.git.dcaratti@redhat.com/)** - -> add control plane for setting TCA_TUNNEL_KEY_NO_FRAG flag on -> act_tunnel_key actions. -> - -**[v1: net-next: mlxsw: Use static trip points for transceiver modules](http://lore.kernel.org/netdev/cover.1680272119.git.petrm@nvidia.com/)** - -> Ido Schimmel writes: -> -> See patch #1 for motivation and implementation details. -> -> Patches #2-#3 are simple cleanups as a result of the changes in the -> first patch. -> - -**[v1: iproute2: ip-xfrm: accept "allow" as action in ip xfrm policy setdefault](http://lore.kernel.org/netdev/dc8c3fcd81a212e47547ae59ee6857ce25048ddd.1680268153.git.sd@queasysnail.net/)** - -> The help text claims that setdefault takes ACTION values, ie block | -> allow. In reality, xfrm_str_to_policy takes block | accept. -> -> We could also fix that by changing the help text/manpage, but then -> it'd be frustrating to have multiple ACTION with similar values used -> in different subcommands. -> - -**[v1: net-next: net: phy: introduce phy_reg_field interface](http://lore.kernel.org/netdev/20230331123259.567627-1-radu-nicolae.pirea@oss.nxp.com/)** - -> Some PHYs can be heavily modified between revisions, and the addresses of -> the registers are changed and the register fields are moved from one -> register to another. -> -> To integrate more PHYs in the same driver with the same register fields, -> but these register fields were located in different registers at -> - -**[v1: net-next: ice: allow matching on metadata](http://lore.kernel.org/netdev/20230331105747.89612-1-michal.swiatkowski@linux.intel.com/)** - -> This patchset is intended to improve the usability of the switchdev -> slow path. Without matching on a metadata values slow path works -> based on VF's MAC addresses. It causes a problem when the VF wants -> to use more than one MAC address (e.g. when it is in trusted mode). -> - -**[v6: net-next: sfc: support unicast PTP](http://lore.kernel.org/netdev/20230331111404.17256-1-ihuguet@redhat.com/)** - -> Unicast PTP was not working with sfc NICs. -> -> The reason was that these NICs don't timestamp all incoming packets, -> but instead they only timestamp packets of the queues that are selected -> for that. Currently, only one RX queue is configured for timestamp: the -> RX queue of the PTP channel. The packets that are put in the PTP RX -> queue are selected according to firmware filters configured from the -> driver. -> - -**[v1: net-next: net: stmmac: publish actual MTU restriction](http://lore.kernel.org/netdev/20230331092344.268981-1-vinschen@redhat.com/)** - -> Apart from devices setting the max MTU value from device tree, -> the initialization functions in many drivers use a default value -> of JUMBO_LEN. -> -> However, that doesn't reflect reality. The stmmac_change_mtu -> function restricts the MTU to the size of a single queue in the TX -> FIFO. -> - -**[v1: net-next: net: stmmac: allow ethtool action on PCI devices if device is down](http://lore.kernel.org/netdev/20230331092341.268964-1-vinschen@redhat.com/)** - -> So far stmmac is only able to handle ethtool commands if the device -> is UP. However, PCI devices usually just have to be in the active -> state for ethtool commands. -> - -**[v2: net: dsa: mv88e6xxx: Reset mv88e6393x force WD event bit](http://lore.kernel.org/netdev/20230331084014.1144597-1-gustav.ekelund@axis.com/)** - -> The force watchdog event bit is not cleared during SW reset in the -> mv88e6393x switch. This is a different behavior compared to mv886390 which -> clears the force WD event bit as advertised. This causes a force WD event -> to be handled over and over again as the SW reset following the event never -> clears the force WD event bit. -> - -**[v1: qlcnic: check pci_reset_function result](http://lore.kernel.org/netdev/20230331080605.42961-1-den-plotnikov@yandex-team.ru/)** - -> Static code analyzer complains to unchecked return value. -> It seems that pci_reset_function return something meaningful -> only if "reset_methods" is set. -> Even if reset_methods isn't used check the return value to avoid -> possible bugs leading to undefined behavior in the future. -> - -**[v1: net: vsock/vmci: convert VMCI error code to -ENOMEM on send](http://lore.kernel.org/netdev/2c3aeeac-2fcb-16f6-41cd-c0ca4e6a6d3e@sberdevices.ru/)** - -> This adds conversion of VMCI specific error code to general -ENOMEM. It -> is needed, because af_vsock.c passes error value returned from transport -> to the user, which does not expect to get VMCI_ERROR_* values. -> - -**[v2: net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT](http://lore.kernel.org/netdev/1680248937-16617-1-git-send-email-quic_srichara@quicinc.com/)** - -> On the remote side, when QRTR socket is removed, af_qrtr will call -> qrtr_port_remove() which broadcasts the DEL_CLIENT packet to all neighbours -> including local NS. NS upon receiving the DEL_CLIENT packet, will remove -> the lookups associated with the node:port and broadcasts the DEL_SERVER -> packet. -> - -#### 安全增强 - -**[v1: LoongArch: Add kernel address sanitizer support](http://lore.kernel.org/linux-hardening/20230328111714.2056-1-zhangqing@loongson.cn/)** - -> 1/8 of kernel addresses reserved for shadow memory. But for LoongArch, -> There are a lot of holes between different segments and valid address -> space(256T available) is insufficient to map all these segments to kasan -> shadow memory with the common formula provided by kasan core, saying -> addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET -> - -**[[RFC/RFT,V2] CFI: Add support for gcc CFI in aarch64](http://lore.kernel.org/linux-hardening/20230325085416.95191-1-ashimida.1990@gmail.com/)** - -> Based on Sami's patch[1], this patch makes the corresponding kernel -> configuration of CFI available when compiling the kernel with the gcc[2]. -> - -#### 异步 IO - -**[v6: io_uring/ublk: add generic IORING_OP_FUSED_CMD](http://lore.kernel.org/io-uring/20230330113630.1388860-1-ming.lei@redhat.com/)** - -> Hello Jens and Guys, -> -> Add generic fused command, which can include one primary command and multiple -> secondary requests. This command provides one safe way to share resource between -> primary command and secondary requests, and primary command is always -> completed after all secondary requests are done, and resource lifetime -> is bound with primary command. -> - -**[v5: io_uring/ublk: add IORING_OP_FUSED_CMD](http://lore.kernel.org/io-uring/20230328150958.1253547-1-ming.lei@redhat.com/)** - -> Hello Jens, -> -> Add IORING_OP_FUSED_CMD, it is one special URING_CMD, the 1st SQE(primary) is -> one 64byte URING_CMD, and the 2nd 64byte SQE(secondary) is another normal -> 64byte OP. The primary command provides device/file io buffer and -> submits OP represented by the secondary SQE using the provided buffer. This way -> solves ublk zero copy problem easily, since io buffer shares same lifetime with -> the primary command. -> - -**[v1: io_uring/poll: clear single/double poll flags on poll arming](http://lore.kernel.org/io-uring/61e3fefd-0a99-5916-c049-9143d3342379@kernel.dk/)** - -> Unless we have at least one entry queued, then don't call into -> io_poll_remove_entries(). Normally this isn't possible, but if we -> retry poll then we can have ->nr_entries cleared again as we're -> setting it up. If this happens for a poll retry, then we'll still have -> at least REQ_F_SINGLE_POLL set. io_poll_remove_entries() then thinks -> it has entries to remove. -> - -#### Rust For Linux - -**[v4: Rust pin-init API for pinned initialization of structs](http://lore.kernel.org/rust-for-linux/20230331215053.585759-1-y86-dev@protonmail.com/)** - -> This is the fourth version of the pin-init API. See [1] for v3. -> -> The tree at [2] contains these patches applied on top of 6.3-rc1. -> The Rust-doc documentation of the pin-init API can be found at [3]. -> -> These patches are a long way coming, since I held a presentation on -> safe pinned initialization at Kangrejos [4]. And my discovery of this -> problem was almost a year ago [5]. -> - -**[v2: rust: error: Add missing wrappers to convert to/from kernel error codes](http://lore.kernel.org/rust-for-linux/20230224-rust-error-v2-0-3900319812da@asahilina.net/)** - -> This series is part of the set of dependencies for the drm/asahi -> Apple M1/M2 GPU driver. -> -> It adds a bunch of missing wrappers in kernel::error, which are useful -> to convert to/from kernel error codes. Since these will be used by many -> abstractions coming up soon, I think it makes sense to merge them as -> soon as possible instead of bundling them with the first user. Hence, -> they have allow() tags to silence dead code warnings. These can be -> removed as soon as the first user is in the kernel crate. -> - -**[v1: rust: Add uapi crate](http://lore.kernel.org/rust-for-linux/20230329-rust-uapi-v1-0-ee78f2933726@asahilina.net/)** - -> In general, direct bindgen bindings for C kernel APIs are not intended -> to be used by drivers outside of the `kernel` crate. However, some -> drivers do need to interact directly with UAPI definitions to implement -> userspace APIs. -> - -#### BPF - -**[v2: bpf-next: bpf: optimize hashmap lookups when key_size is divisible by 4](http://lore.kernel.org/bpf/20230401200602.3275-1-aspsk@isovalent.com/)** - -> The BPF hashmap uses the jhash() hash function. There is an optimized version -> of this hash function which may be used if hash size is a multiple of 4. Apply -> this optimization to the hashmap in a similar way as it is done in the bloom -> filter map. -> - -**[v3: bpf-next: Prepare veristat for packaging](http://lore.kernel.org/bpf/20230331222405.3468634-1-andrii@kernel.org/)** - -> This patch set relicenses veristat.c to dual GPL-2.0/BSD-2 license and -> prepares it to be mirrored to Github at libbpf/veristat repo. -> -> Few small issues in the source code are fixed, found during Github sync -> preparetion. -> - -**[v2: bpf-next: Enable RCU semantics for task kptrs](http://lore.kernel.org/bpf/20230331195733.699708-1-void@manifault.com/)** - -> In commit 22df776a9a86 ("tasks: Extract rcu_users out of union"), the -> 'refcount_t rcu_users' field was extracted out of a union with the -> 'struct rcu_head rcu' field. This allows us to use the field for -> refcounting struct task_struct with RCU protection, as the RCU callback -> no longer flips rcu_users to be nonzero after the callback is scheduled. -> - -**[v10: evm: Do HMAC of multiple per LSM xattrs for new inodes](http://lore.kernel.org/bpf/20230331123221.3273328-1-roberto.sassu@huaweicloud.com/)** - -> One of the major goals of LSM stacking is to run multiple LSMs side by side -> without interfering with each other. The ultimate decision will depend on -> individual LSM decision. -> - -**[v1: bpf-next: veristat: change guess for __sk_buff from CGROUP_SKB to SCHED_CLS](http://lore.kernel.org/bpf/20230330190115.3942962-1-andrii@kernel.org/)** - -> SCHED_CLS seems to be a better option as a default guess for freplace -> programs that have __sk_buff as a context type. -> - -**[[PATCH bpf RFC-V3 0/5] XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash](http://lore.kernel.org/bpf/168019602958.3557870.9960387532660882277.stgit@firesoul/)** - -> Notice targeted 6.3-rc kernel via bpf git tree. -> -> Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value, -> but doesn't provide information on the RSS hash type (part of 6.3-rc). -> - -**[v5: bpf-next: bpf: Add socket destroy capability](http://lore.kernel.org/bpf/20230330151758.531170-1-aditi.ghag@isovalent.com/)** - -> This patch adds the capability to destroy sockets in BPF. We plan to use -> the capability in Cilium to force client sockets to reconnect when their -> remote load-balancing backends are deleted. The other use case is -> on-the-fly policy enforcement where existing socket connections prevented -> by policies need to be terminated. -> - -**[v2: bpf-next: kallsyms: move module-related functions under correct configs](http://lore.kernel.org/bpf/20230330102001.2183693-1-vmalik@redhat.com/)** - -> Functions for searching module kallsyms should have non-empty -> definitions only if CONFIG_MODULES=y and CONFIG_KALLSYMS=y. Until now, -> only CONFIG_MODULES check was used for many of these, which may have -> caused complilation errors on some configs. -> -> This patch moves all relevant functions under the correct configs. -> - -**[v1: bpf-next: bpf: Improve verifier for cond_op and spilled loop index variables](http://lore.kernel.org/bpf/20230330055600.86870-1-yhs@fb.com/)** - -> LLVM commit [1] introduced hoistMinMax optimization like -> (i < VIRTIO_MAX_SGS) && (i < out_sgs) -> to -> upper = MIN(VIRTIO_MAX_SGS, out_sgs) -> ... i < upper ... -> and caused the verification failure. Commit [2] workarounded the issue by -> adding some bpf assembly code to prohibit the above optimization. -> This patch improved verifier such that verification can succeed without -> the above workaround. -> - -**[v1: bpf-next: Teach verifier to determine necessary log buffer size](http://lore.kernel.org/bpf/20230330041642.1118787-1-andrii@kernel.org/)** - -> My imagination is failing me on how to succinctly name this feature and patch -> set, but the point here is to perform internal accounting of what should be -> the necessary size of user-supplied log buffer such as to fit entire log -> contents without truncation, thus avoiding -ENOSPC. -> - -**[v2: bpf-next: xsk: Support UMEM chunk_size > PAGE_SIZE](http://lore.kernel.org/bpf/20230329180502.1884307-1-kal.conley@dectris.com/)** - -> The main purpose of this patchset is to add AF_XDP support for UMEM -> chunk sizes > PAGE_SIZE. This is enabled for UMEMs backed by HugeTLB -> pages. -> - -**[[PATCH bpf RFC-V2 0/5] XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash](http://lore.kernel.org/bpf/168010726310.3039990.2753040700813178259.stgit@firesoul/)** - -> Notice targeted 6.3-rc kernel via bpf git tree. -> -> Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value, -> but doesn't provide information on the RSS hash type (part of 6.3-rc). -> -> This patchset proposal is to use the return value from -> bpf_xdp_metadata_rx_hash() to provide the RSS hash type. -> - -**[v2: bpf-next: Allow BPF TCP CCs to write app_limited](http://lore.kernel.org/bpf/20230329073558.8136-1-bobankhshen@gmail.com/)** - -> This series allow BPF TCP CCs to write app_limited of struct -> tcp_sock. A built-in CC or one from a kernel module is already -> able to write to app_limited of struct tcp_sock. Until now, -> a BPF CC doesn't have write access to this member of struct -> tcp_sock. -> - -**[v2: bpf-next: BPF verifier rotating log](http://lore.kernel.org/bpf/20230328235610.3159943-1-andrii@kernel.org/)** - -> This patch set changes BPF verifier log behavior to behave as a rotating log, -> by default. If user-supplied log buffer is big enough to contain entire -> verifier log output, there is no effective difference. But where previously -> user supplied too small log buffer and would get -ENOSPC error result and the -> beginning part of the verifier log, now there will be no error and user will -> get ending part of verifier log filling up user-supplied log buffer. Which -> is, in absolute majority of cases, is exactly what's useful, relevant, and -> what users want and need, as the ending of the verifier log is containing -> details of verifier failure and relevant state that got us to that failure. So -> this rotating mode is made default, but for some niche advanced debugging -> scenarios it's possible to request old behavior by specifying additional -> BPF_LOG_FIXED (8) flag. -> - -**[v2: memcg: make rstat flushing irq and sleep](http://lore.kernel.org/bpf/20230328221644.803272-1-yosryahmed@google.com/)** - -> Patches 1 and 2 are cleanups requested during reviews of prior versions -> of this series. -> -> Patch 3 makes sure we never try to flush from within an irq context, and -> patch 4 adds a WARN_ON_ONCE() to make sure we catch any violations. -> - -**[v1: Allow BPF TCP CCs to write app_limited](http://lore.kernel.org/bpf/20230328132035.50839-1-bobankhshen@gmail.com/)** - -> This series allow BPF TCP CCs to write app_limited of struct -> tcp_sock. A built-in CC or one from a kernel module is already -> able to write to app_limited of struct tcp_sock. Until now, -> a BPF CC doesn't have write access to this member of struct -> tcp_sock. -> - -**[v2: bpf-next: selftests/bpf: Rewrite two infinite loops in bound check cases](http://lore.kernel.org/bpf/20230329011048.1721937-1-xukuohai@huaweicloud.com/)** - -> The two infinite loops in bound check cases added by commit -> increased the execution time of test_verifier from about 6 seconds to -> about 9 seconds. Rewrite these two infinite loops to finite loops to get -> rid of this extra time cost. -> - -**[v1: net-next: virtio_net: refactor xdp codes](http://lore.kernel.org/bpf/20230328120412.110114-1-xuanzhuo@linux.alibaba.com/)** - -> Due to historical reasons, the implementation of XDP in virtio-net is relatively -> chaotic. For example, the processing of XDP actions has two copies of similar -> code. Such as page, xdp_page processing, etc. -> -> The purpose of this patch set is to refactor these code. Reduce the difficulty -> of subsequent maintenance. Subsequent developers will not introduce new bugs -> because of some complex logical relationships. -> - -**[v1: net-next: bpf, net: support redirecting to ifb with bpf](http://lore.kernel.org/bpf/20230328115105.13553-1-laoar.shao@gmail.com/)** - -> In our container environment, we are using EDT-bpf to limit the egress -> bandwidth. EDT-bpf can be used to limit egress only, but can't be used -> to limit ingress. Some of our users also want to limit the ingress -> bandwidth. But after applying EDT-bpf, which is based on clsact qdisc, -> it is impossible to limit the ingress bandwidth currently, due to some -> reasons, -> 1). We can't add ingress qdisc -> The ingress qdisc can't coexist with clsact qdisc as clsact has both -> ingress and egress handler. So our traditional method to limit ingress -> bandwidth can't work any more. -> 2). We can't redirect ingress packet to ifb with bpf -> By trying to analyze if it is possible to redirect the ingress packet to -> ifb with a bpf program, we find that the ifb device is not supported by -> bpf redirect yet. -> - -**[v2: loongarch/bpf: Skip speculation barrier opcode, which caused ltp testcase bpf_prog02 to fail](http://lore.kernel.org/bpf/20230328071335.2664966-1-guodongtai@kylinos.cn/)** - -> Here just skip the opcode(BPF_ST | BPF_NOSPEC) that has no couterpart to the loongarch. -> -> To verify, use ltp testcase: -> -> Without this patch: -> $ ./bpf_prog02 -> ... ... -> bpf_common.c:123: TBROK: Failed verification: ??? (524) -> - -**[v1: bpf-next: verifier/xdp_direct_packet_access.c converted to inline assembly](http://lore.kernel.org/bpf/20230328020813.392560-1-eddyz87@gmail.com/)** - -> verifier/xdp_direct_packet_access.c automatically converted to inline -> assembly using [1]. -> -> This is a leftover from [2], the last patch in a batch was blocked by -> mail server for being too long. This patch-set splits it in two: -> - one to add migrated test to progs/ -> - one to remove old test from verifier/ -> - -**[v1: bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp](http://lore.kernel.org/bpf/20230328004232.2134233-1-martin.lau@linux.dev/)** - -> While reviewing the udp-iter batching patches, notice the bpf_iter_tcp -> calling sock_put() is incorrect. It should call sock_gen_put instead -> because bpf_iter_tcp is iterating the ehash table which has the -> req sk and tw sk. This patch replaces all sock_put with sock_gen_put -> in the bpf_iter_tcp codepath. -> - -### 周边技术动态 - -#### Qemu - -**[v2: riscv: Add support for the Zfa extension](http://lore.kernel.org/qemu-devel/20230331182824.4104580-1-christoph.muellner@vrull.eu/)** - -> This patch introduces the RISC-V Zfa extension, which introduces -> additional floating-point extensions: -> * fli (load-immediate) with pre-defined immediates -> * fminm/fmaxm (like fmin/fmax but with different NaN behaviour) -> * fround/froundmx (round to integer) -> * fcvtmod.w.d (Modular Convert-to-Integer) -> * fmv* to access high bits of float register bigger than XLEN -> * Quiet comparison instructions (fleq/fltq) -> - -**[v1: target/riscv: Set opcode to env->bins for illegal/virtual instruction fault](http://lore.kernel.org/qemu-devel/20230330034636.44585-1-liweiwei@iscas.ac.cn/)** - -> decode_save_opc() will not work for generate_exception(), since 0 is passed -> to riscv_raise_exception() as pc in helper_raise_exception(), and bins will -> not be restored in this case. -> - -**[v6: target/riscv: rework CPU extensions validation](http://lore.kernel.org/qemu-devel/20230329200856.658733-1-dbarboza@ventanamicro.com/)** - -> This series contains changes proposed by Weiwei Li in v5. -> -> All patches are acked. -> - -#### U-Boot - -**[v2: Add ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/u-boot/20230329102720.25439-1-yanhong.wang@starfivetech.com/)** - -> This series adds ethernet support for the StarFive JH7110 RISC-V SoC. -> The series includes PHY and MAC drivers. The PHY model is -> YT8531 (from Motorcomm Inc), and the MAC version is dwmac-5.20 -> (from Synopsys DesignWare). -> -> The implementation of the phy driver is ported from linux, but it -> has been adjusted for the u-boot framework. -> - -**[v3: Add StarFive JH7110 PCIe drvier support](http://lore.kernel.org/u-boot/20230329100143.10724-1-minda.chen@starfivetech.com/)** - -> The PCIe driver depends on gpio, pinctrl, clk and reset driver to do init. -> The PCIe dts configuation includes all these setting. -> -> The PCIe drivers codes has been tested on the VisionFive V2 boards. -> The test devices includes M.2 NVMe SSD and Realtek 8169 Ethernet adapter. -> - -**[v5: Basic StarFive JH7110 RISC-V SoC support](http://lore.kernel.org/u-boot/20230329034224.26545-1-yanhong.wang@starfivetech.com/)** - -> This series of patches base on the latest branch/master, and add support -> for the StarFive JH7110 RISC-V SoC and VisionFive V2 board. In order for -> this to be achieved, the respective DT nodes have been added, and the -> required defconfigs have been added to the boards' defconfig. What is more, -> the basic required DM drivers have been added, such as reset, clock, pinctrl, -> uart, ram etc. -> - -## 20230326:第 39 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v9: riscv: Use PUD/P4D/PGD pages for the linear mapping](http://lore.kernel.org/linux-riscv/20230324155421.271544-1-alexghiti@rivosinc.com/)** - -> This patchset intends to improve tlb utilization by using hugepages for -> the linear mapping. -> - -**[v7: function_graph: Support recording and printing the return value of function](http://lore.kernel.org/linux-riscv/20230324123731.3801920-1-pengdonglin@sangfor.com.cn/)** - -> When using the function_graph tracer to analyze system call failures, -> it can be time-consuming to analyze the trace logs and locate the kernel -> function that first returns an error. This change aims to simplify the -> process by recording the function return value to the 'retval' member of -> 'ftrace_graph_ent' and printing it when outputing the trace log. -> - -**[v1: RISC-V: convert new selectors of RISCV_ALTERNATIVE to dependencies](http://lore.kernel.org/linux-riscv/20230324121240.3594777-1-conor.dooley@microchip.com/)** - -> for-next contains two additional extensions that select -> RISCV_ALTERNATIVE. RISCV_ALTERNATIVE no longer needs to be selected by -> individual config options as it is now selected for !XIP_KERNEL builds -> by the top level RISCV option. -> These extensions rely on the alternative framework, so convert the -> "select"s to "depends on"s instead. -> - -**[v1: RISC-V: align Svpbmt Kconfig help text with other extensions](http://lore.kernel.org/linux-riscv/20230324092840.3504267-1-conor.dooley@microchip.com/)** - -> Other extensions only capitalise the first letter in Kconfig text -> menus, and provide a short comment about the extension's meaning. -> Do the same for Svpbmt. -> While editing one of the lines, reformat the "spelling" of 64-bit. -> - -**[v4: -next: riscv: jump_label: Optimize the code size with compressed instruction](http://lore.kernel.org/linux-riscv/20230324082320.290410-1-guoren@kernel.org/)** - -> Reduce the size of the static branch instruction and prevent atomic -> update problems when CONFIG_RISCV_ISA_C=y. It also reduces the jump -> range from 1MB to 4KB, but 4KB is enough for the current riscv -> requirement. -> - -**[v11: -next: riscv: Add independent irq/softirq stacks](http://lore.kernel.org/linux-riscv/20230324071239.151677-1-guoren@kernel.org/)** - -> This patch series adds independent irq/softirq stacks to decrease the -> press of the thread stack. Also, add a thread STACK_SIZE config for -> users to adjust the proper size during compile time. -> - -**[v1: riscv: dts: starfive: jh7110: Correct the properties of S7 core](http://lore.kernel.org/linux-riscv/20230324064651.84670-1-hal.feng@starfivetech.com/)** - -> The S7 core has no L1 data cache and MMU, so delete some -> related properties. -> - -**[v8: riscv: Optimize function trace](http://lore.kernel.org/linux-riscv/20230324033342.3177979-1-suagrfillet@gmail.com/)** - -> The first 3 independent patches has been picked in the V7 version of -> this series, this version continues the following 4 patches. -> - -**[v8: Add Ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230324022819.2324-1-samin.guo@starfivetech.com/)** - -> This series adds ethernet support for the StarFive JH7110 RISC-V SoC, -> which includes a dwmac-5.20 MAC driver (from Synopsys DesignWare). -> This series has been tested and works fine on VisionFive-2 v1.2A and -> v1.3B SBC boards. -> - -**[v4: Kconfig: introduce HAS_IOPORT option and select it as necessary](http://lore.kernel.org/linux-riscv/20230323163354.1454196-1-schnelle@linux.ibm.com/)** - -> We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O -> Port access. In a future patch HAS_IOPORT=n will disable compilation of -> the I/O accessor functions inb()/outb() and friends on architectures -> which can not meaningfully support legacy I/O spaces such as s390. -> - -**[v16: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230323145924.4194-1-andy.chiu@sifive.com/)** - -> This patchset is implemented based on vector 1.0 spec to add vector support -> in riscv Linux kernel. There are some assumptions for this implementations. -> - -**[v2: riscv: export cpu/freq invariant to scheduler](http://lore.kernel.org/linux-riscv/20230323123924.3032174-1-suagrfillet@gmail.com/)** - -> RISC-V now manages CPU topology using arch_topology which provides -> CPU capacity and frequency related interfaces to access the cpu/freq -> invariant in possible heterogeneous or DVFS-enabled platforms. -> - -**[v7: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230323045604.536099-1-jeeheng.sia@starfivetech.com/)** - -> This series adds RISC-V Hibernation/suspend to disk support. -> Low level Arch functions were created to support hibernation. -> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write -> cpu state onto the stack, then calling swsusp_save() to save the memory -> image. -> - -**[v1: RISC-V: KVM: Require alternatives](http://lore.kernel.org/linux-riscv/20230322192858.1189272-1-ajones@ventanamicro.com/)** - -> KVM makes use of riscv_has_extension_unlikely() to check for the -> svinval extension. riscv_has_extension_unlikely() is built on -> alternatives, which means KVM should ensure alternatives support -> is available. -> - -**[v1: riscv: require alternatives framework when selecting FPU support](http://lore.kernel.org/linux-riscv/20230322120907.2968494-1-Jason@zx2c4.com/)** - -> When moving switch_to's has_fpu() over to using riscv_has_extension_ -> likely() rather than static branchs, the FPU code gained a dependency on -> the alternatives framework. If CONFIG_RISCV_ALTERNATIVE isn't selected -> when CONFIG_FPU is, then has_fpu() returns false, and switch_to does not -> work as intended. So select CONFIG_RISCV_ALTERNATIVE when CONFIG_FPU is -> selected. -> - -**[v6: Add DMA driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230322094820.24738-1-walker.chen@starfivetech.com/)** - -> This patch series adds dma support for the StarFive JH7110 RISC-V -> SoC. The first patch adds device tree binding. The second patch includes -> dma driver. The last patch adds device node of dma to JH7110 dts. -> - -**[v2: Enable I2S support for RK3588/RK3588S SoCs](http://lore.kernel.org/linux-riscv/20230321215624.78383-1-cristian.ciocaltea@collabora.com/)** - -> There are five I2S/PCM/TDM controllers and two I2S/PCM controllers embedded in -> the RK3588 and RK3588S SoCs. Furthermore, RK3588 provides four additional -> I2S/PCM/TDM controllers. -> - -**[v3: Use dma_default_coherent for devicetree default coherency](http://lore.kernel.org/linux-riscv/20230321110813.26808-1-jiaxun.yang@flygoat.com/)** - -> This series split out second half of my previous series -> "v1: MIPS DMA coherence fixes". -> -> It intends to use dma_default_coherent to determine the default coherency of -> devicetree probed devices instead of hardcoding it with Kconfig options. -> - -**[v6: hwmon: Add StarFive JH71X0 temperature sensor](http://lore.kernel.org/linux-riscv/20230321022644.107027-1-hal.feng@starfivetech.com/)** - -> This adds a driver for the temperature sensor on the JH7100 and JH7110, -> RISC-V SoCs by StarFive Technology Co. Ltd.. The JH7100 is used on the -> BeagleV Starlight board and StarFive VisionFive board. The JH7110 is -> used on the StarFive VisionFive 2 board. -> - -**[v2: Add timer driver for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230320135433.144832-1-xingyu.wu@starfivetech.com/)** - -> This patch serises are to add timer driver for the StarFive JH7110 -> RISC-V SoC. The first patch adds documentation to describe device -> tree bindings. The subsequent patch adds timer driver and support -> JH7110 SoC. The last patch adds device node about timer to JH7110 -> dts. -> - -**[v1: -next: support allocating crashkernel above 4G explicitly on riscv](http://lore.kernel.org/linux-riscv/20230320204244.1637821-1-chenjiahao16@huawei.com/)** - -> On riscv, the current crash kernel allocation logic is trying to -> allocate within 32bit addressible memory region by default, if -> failed, try to allocate without 4G restriction. -> - -**[v6: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230320103750.60295-1-hal.feng@starfivetech.com/)** - -> This patch series adds basic clock, reset & DT support for StarFive -> JH7110 SoC. -> -> You can simply review or test the patches at the link [1]. -> -> [1]: https://github.com/hal-feng/linux/commits/visionfive2-minimal -> - -**[v1: riscv: mm: execute local TLB flush after populating vmemmap](http://lore.kernel.org/linux-riscv/20230320065324.1045276-1-vincent.chen@sifive.com/)** - -> The vmemmap_populate() creates VA to PA mapping for the VMEMMAP area, where -> all "strcut page" are located once CONFIG_SPARSEMEM_VMEMMAP is defined. -> These "struct page" are later initialized in the zone_sizes_init() -> function. However, during this process, no sfence.vma instruction is -> executed for this VMEMMAP area. -> - -**[v1: Deduplicating RISCV cmpxchg.h macros](http://lore.kernel.org/linux-riscv/20230318080059.1109286-1-leobras@redhat.com/)** - -> While studying riscv's cmpxchg.h file, I got really interested in -> understanding how RISCV asm implemented the different versions of -> {cmp,}xchg. -> -> When I understood the pattern, it made sense for me to remove the -> duplications and create macros to make it easier to understand what exactly -> changes between the versions: Instruction sufixes & barriers. -> - -#### 进程调度 - -**[v1: sched/topology: add for_each_numa_cpu() macro](http://lore.kernel.org/lkml/20230325185514.425745-1-yury.norov@gmail.com/)** - -> for_each_cpu() is widely used in kernel, and it's beneficial to create -> a NUMA-aware version of the macro. -> -> Recently added for_each_numa_hop_mask() works, but switching existing -> codebase to it is not an easy process. -> - -**[v2: sched/core: Reduce cost of sched_move_task when config autogroup](http://lore.kernel.org/lkml/20230321064459.39421-1-wuchi.zero@gmail.com/)** - -> Some sched_move_task calls are useless because that -> task_struct->sched_task_group maybe not changed (equals task_group -> of cpu_cgroup) when system enable autogroup. So do some checks in -> sched_move_task. -> - -**[v1: sched: core: Optimize the structure of 'tg_cfs_schedulable_down' function](http://lore.kernel.org/lkml/20230319200255.3640-1-kunyu@nfschina.com/)** - -> Optimize if branches and define in the branch statement -> block parent_quota variable. -> - -#### 内存管理 - -**[v1: regmap: Add basic maple tree register cache](http://lore.kernel.org/linux-mm/20230325-regcache-maple-v1-0-1c76916359fb@kernel.org/)** - -> The current state of the art for sparse register maps is the rbtree cache. -> This works well for most applications but isn't always ideal for sparser -> register maps since the rbtree can get deep, requiring a lot of walking. -> Fortunately the kernel has a data structure intended to address this very -> problem, the maple tree. Provide an initial implementation of a register -> cache based on the maple tree to start taking advantage of it. -> - -**[v3: userfaultfd: convert userfaultfd functions to use folios](http://lore.kernel.org/linux-mm/20230325065608.601391-1-zhangpeng362@huawei.com/)** - -> This patch series converts several userfaultfd functions to use folios. -> -> Change log: -> - -**[v7: -next: Delay the initialization of zswap](http://lore.kernel.org/linux-mm/20230325071420.2246461-1-liushixin2@huawei.com/)** - -> In the initialization of zswap, about 18MB memory will be allocated for -> zswap_pool. Since some users may not use zswap, the zswap_pool is wasted. -> Save memory by delaying the initialization of zswap until enabled. -> - -**[v9: tracing/user_events: Remote write ABI](http://lore.kernel.org/linux-mm/20230324223028.172-1-beaub@linux.microsoft.com/)** - -> As part of the discussions for user_events aligned with user space -> tracers, it was determined that user programs should register a aligned -> value to set or clear a bit when an event becomes enabled. Currently a -> shared page is being used that requires mmap(). Remove the shared page -> implementation and move to a user registered address implementation. -> - -**[v1: mm/damon/sysfs: make more kobj_type structures constant](http://lore.kernel.org/linux-mm/20230324-b4-kobj_type-damon2-v1-1-48ddbf1c8fcf@weissschuh.net/)** - -> Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") -> the driver core allows the usage of const struct kobj_type. -> -> Take advantage of this to constify the structure definition to prevent -> modification at runtime. -> - -**[v2: mm: Be less noisy during memory hotplug](http://lore.kernel.org/linux-mm/20230323174349.35990-1-krckatom@amazon.de/)** - -> Turn a pr_info() into a pr_debug() to prevent dmesg spamming on systems -> where memory hotplug is a frequent operation. -> - -**[v1: selftests/mm: Implement support for arm64 on va](http://lore.kernel.org/linux-mm/20230323105243.2807166-1-chaitanyas.prakash@arm.com/)** - -> The va_128TBswitch selftest is designed and implemented for PowerPC and -> x86 architectures which support a 128TB switch, up to 256TB of virtual -> address space and hugepage sizes of 16MB and 2MB respectively. Arm64 -> platforms on the other hand support a 256Tb switch, up to 4PB of virtual -> address space and a default hugepage size of 512MB when 64k pagesize is -> enabled. -> - -**[v8: convert read_kcore(), vread() to use iterators](http://lore.kernel.org/linux-mm/cover.1679566220.git.lstoakes@gmail.com/)** - -> While reviewing Baoquan's recent changes to permit vread() access to -> vm_map_ram regions of vmalloc allocations, Willy pointed out [1] that it -> would be nice to refactor vread() as a whole, since its only user is -> read_kcore() and the existing form of vread() necessitates the use of a -> bounce buffer. -> - -**[v1: Make rstat flushing IRQ and sleep friendly](http://lore.kernel.org/linux-mm/20230323040037.2389095-1-yosryahmed@google.com/)** - -> Currently, if rstat flushing is invoked using the irqsafe variant -> cgroup_rstat_flush_irqsafe(), we keep interrupts disabled and do not -> sleep for the entire flush operation, which is O(# cpus * # cgroups). -> This can be rather dangerous. -> - -**[v1: iov_iter: Add an iterator-of-iterators](http://lore.kernel.org/linux-mm/3416400.1679508945@warthog.procyon.org.uk/)** - -> Trond Myklebust wrote: -> -> > Add an enum iter_type for ITER_ITER ? :-) -> -> Well, you asked for it... It's actually fairly straightforward once -> ITER_PIPE is removed. -> - -**[v1: memcg v1: provide read access to memory.pressure_level](http://lore.kernel.org/linux-mm/20230322142525.162469-1-flosch@nutanix.com/)** - -> cgroups v1 has a unique way of setting up memory pressure notifications: -> the user opens "memory.pressure_level" of the cgroup they want to -> monitor for pressure, then open "cgroup.event_control" and write the fd -> (among other things) to that file. memory.pressure_level has no other -> use, specifically it does not support any read or write operations. -> Consequently, no handlers are provided, and the file ends up with -> permissions 000. However, to actually use the mechanism, the subscribing -> user must have read access to the file and open the fd for reading, see -> memcg_write_event_control(). -> - -**[v2: Providing mount in memfd_restricted() syscall](http://lore.kernel.org/linux-mm/cover.1679428901.git.ackerleytng@google.com/)** - -> This patchset builds upon the memfd_restricted() system call that was -> discussed in the 'KVM: mm: fd-based approach for supporting KVM' patch -> series, at -> https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/#m7e944d7892afdd1d62a03a287bd488c56e377b0c -> - -**[v1: MAINTAINERS: add myself as vmalloc reviewer](http://lore.kernel.org/linux-mm/55f663af6100c84a71a0065ac0ed22463aa340de.1679421959.git.lstoakes@gmail.com/)** - -> I have recently been involved in both reviewing and submitting patches to -> the vmalloc code in mm and would be willing and happy to help out with -> review going forward if it would be helpful! -> - -#### 文件系统 - -**[v6: ext4: Convert inode preallocation list to an rbtree](http://lore.kernel.org/linux-fsdevel/cover.1679731817.git.ojaswin@linux.ibm.com/)** - -> This patch series aim to improve the performance and scalability of -> inode preallocation by changing inode preallocation linked list to an -> rbtree. I've ran xfstests quick on this series and plan to run auto group -> as well to confirm we have no regressions. -> - -**[v2: Convert most of ext4 to folios](http://lore.kernel.org/linux-fsdevel/20230324180129.1220691-1-willy@infradead.org/)** - -> On top of next-20230321, this converts most of ext4 to use folios instead -> of pages. It does not enable large folios although it fixes some places -> that will need to be fixed before they can be enabled for ext4. It does -> not convert mballoc to use folios. write_begin() and write_end() still -> take a page parameter instead of a folio. -> - -**[v1: fsdax: force clear dirty mark if CoW](http://lore.kernel.org/linux-fsdevel/1679653680-2-1-git-send-email-ruansy.fnst@fujitsu.com/)** - -> XFS allows CoW on non-shared extents to combat fragmentation[1]. The -> old non-shared extent could be mwrited before, its dax entry is marked -> dirty. To be able to delete this entry, clear its dirty mark before -> invalidate_inode_pages2_range(). -> -> [1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/ -> - -**[v1: netfs: Pass a pointer to virt_to_page()](http://lore.kernel.org/linux-fsdevel/20230324102728.712018-1-linus.walleij@linaro.org/)** - -> Like the other calls in this function virt_to_page() expects -> a pointer, not an integer. -> -> However since many architectures implement virt_to_pfn() as -> a macro, this function becomes polymorphic and accepts both a -> (unsigned long) and a (void *). -> -> Fix this up with an explicit cast. -> - -**[v1: Legacy mount option "sloppy" support](http://lore.kernel.org/linux-fsdevel/167963629788.253682.5439077048343743982.stgit@donald.themaw.net/)** - -> There's been some recent discussion about support of the "sloppy" -> mount option. -> -> It's an option that people want to get rid of from time to time and -> when we do we get complaints and end up having to re-instate it. -> -> I think the (fairly) recent mount API changes are the best way to -> eliminate the need for this option over time. -> - -**[v1: vfs: handle sloppy option in fs context monolithic parser](http://lore.kernel.org/linux-fsdevel/167963635629.253682.12145104262169969353.stgit@donald.themaw.net/)** - -> The sloppy option doesn't make sense for fsconfig() and knowedge of how -> to handle this case needs to be present in the caller. It does make -> sense in the legacy options parser, generic_parse_monolithic(), so it -> should allow for it. -> - -**[v1: fs/buffer: adjust the order of might_sleep() in __getblk_gfp()](http://lore.kernel.org/linux-fsdevel/20230323093752.17461-1-gouhao@uniontech.com/)** - -> If 'bh' is found in cache, just return directly. -> might_sleep() is only required on slow paths. -> - -**[v1: fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN](http://lore.kernel.org/linux-fsdevel/1679483469-2-1-git-send-email-ruansy.fnst@fujitsu.com/)** - -> unshare copies data from source to destination. But if the source is -> HOLE or UNWRITTEN extents, we should zero the destination, otherwise the -> result will be unexpectable. -> - -**[v1: fsdax: dedupe should compare the min of two iters' length](http://lore.kernel.org/linux-fsdevel/1679469958-2-1-git-send-email-ruansy.fnst@fujitsu.com/)** - -> In an dedupe corporation iter loop, the length of iomap_iter decreases -> because it implies the remaining length after each iteration. The -> compare function should use the min length of the current iters, not the -> total length. -> - -**[v1: splice: report related fsnotify events](http://lore.kernel.org/linux-fsdevel/20230322062519.409752-1-cccheng@synology.com/)** - -> The fsnotify ACCESS and MODIFY event are missing when manipulating a file -> with splice(2). -> - -**[v2: Add results of early memtest to /proc/meminfo](http://lore.kernel.org/linux-fsdevel/20230321103430.7130-1-tomas.mudrunka@gmail.com/)** - -> Currently the memtest results were only presented in dmesg. -> This adds /proc/meminfo entry which can be easily used by scripts. -> - -**[v1: fuse uring communication](http://lore.kernel.org/linux-fsdevel/20230321011047.3425786-1-bschubert@ddn.com/)** - -> This adds support for uring communication between kernel and -> userspace daemon using opcode the IORING_OP_URING_CMD. The basic -> appraoch was taken from ublk. The patches are in RFC state - -> I'm not sure about all decisions and some questions are marked -> with XXX. -> - -**[v1: Split a folio to any lower order folios](http://lore.kernel.org/linux-fsdevel/20230321004829.2012847-1-zi.yan@sent.com/)** - -> File folio supports any order and people would like to support flexible orders -> for anonymous folio[1] too. Currently, split_huge_page() only splits a huge -> page to order-0 pages, but splitting to orders higher than 0 is also useful. -> This patchset adds support for splitting a huge page to any lower order pages -> and uses it during folio truncate operations. -> - -**[v3: mm: memory-failure: Move memory failure sysctls to its own file](http://lore.kernel.org/linux-fsdevel/20230320074010.50875-1-wangkefeng.wang@huawei.com/)** - -> The sysctl_memory_failure_early_kill and memory_failure_recovery -> are only used in memory-failure.c, move them to its own file. -> - -**[v1: fs: allow to tuck mounts explicitly](http://lore.kernel.org/linux-fsdevel/20230202-fs-move-mount-replace-v1-0-9b73026d5f10@kernel.org/)** - -> Various distributions are adding or are in the process of adding support -> for system extensions and in the future configuration extensions through -> various tools. A more detailed explanation on system and configuration -> extensions can be found on the manpage which is listed below at [1]. -> - -**[v1: 5.10: xfs backports for 5.10.y (from v5.15.103)](http://lore.kernel.org/linux-fsdevel/20230318101529.1361673-1-amir73il@gmail.com/)** - -> Following backports catch up with recent 5.15.y xfs backports. -> -> Patches 1-3 are the backports from the previous 5.15 xfs backports -> round that Chandan requested for 5.4 [1]. -> -> Patches 4-14 are the SGID fixes that I collaborated with Leah [2]. -> Christian has reviewed the backports of his vfs patches to 5.10. -> - -**[v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/)** - -> This removes the dependency on interrupts to wake up task. Set task -> state as TASK_RUNNING, if need_resched() returns true, -> while polling for IO completion. -> Earlier, polling task used to sleep, relying on interrupt to wake it up. -> This made some IO take very long when interrupt-coalescing is enabled in -> NVMe. -> - -#### 网络设备 - -**[v1: net-next: Support MACsec VLAN](http://lore.kernel.org/netdev/20230326072636.3507-1-ehakim@nvidia.com/)** - -> Dear maintainers, -> -> This patch series introduces support for hardware (HW) offload MACsec -> devices with VLAN configuration. The patches address both scenarios -> where the VLAN header is both the inner and outer header for MACsec. -> - -**[v1: return errors other than -ENOMEM to socket](http://lore.kernel.org/netdev/97f19214-ba04-c47e-7486-72e8aa16c690@sberdevices.ru/)** - -> this patchset removes behaviour, where error code returned from any -> transport was always switched to ENOMEM. This works in the same way as -> patch from Bobby Eshleman: -> commit c43170b7e157 ("vsock: return errors other than -ENOMEM to socket"), -> but for receive calls. -> -> vsock_test suite is also updated. -> - -**[v5: net-next: allocate multiple skbuffs on tx](http://lore.kernel.org/netdev/b0d15942-65ba-3a32-ba8d-fed64332d8f6@sberdevices.ru/)** - -> This adds small optimization for tx path: instead of allocating single -> skbuff on every call to transport, allocate multiple skbuff's until -> credit space allows, thus trying to send as much as possible data without -> return to af_vsock.c. -> - -**[v19: net-next: vmxnet3: Add XDP support.](http://lore.kernel.org/netdev/20230325172828.24923-1-witu@nvidia.com/)** - -> The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT. -> -> Background: -> The vmxnet3 rx consists of three rings: ring0, ring1, and dataring. -> For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma -> mapped to the ring's descriptor. If LRO is enabled and packet size larger -> than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of -> the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is -> allocated using alloc_page. So for LRO packets, the payload will be in one -> buffer from r0 and multiple from r1, for non-LRO packets, only one -> descriptor in r0 is used for packet size less than 3k. -> - -**[v1: net: stmmac: don't reject VLANs when IFF_PROMISC is set](http://lore.kernel.org/netdev/20230325112815.3053288-1-vladimir.oltean@nxp.com/)** - -> First, take the case of a Linux bridge. If the kernel is compiled with -> CONFIG_BRIDGE_VLAN_FILTERING=y, then this bridge shall have a VLAN -> database. The bridge shall try to call vlan_add_vid() on its bridge -> ports for each VLAN in the VLAN table. It will do this irrespectively of -> whether that port is *currently* VLAN-aware or not. So it will do this -> even when the bridge was created with vlan_filtering 0. -> But the Linux bridge, in VLAN-unaware mode, configures its ports in -> promiscuous (IFF_PROMISC) mode, so that they accept packets with any -> MAC DA (a switch must do this in order to forward those packets which -> are not directly targeted to its MAC address). -> - -**[v1: driver core: class: mark the struct class for sysfs callbacks as constant](http://lore.kernel.org/netdev/20230325084537.3622280-1-gregkh@linuxfoundation.org/)** - -> struct class should never be modified in a sysfs callback as there is -> nothing in the structure to modify, and frankly, the structure is almost -> never used in a sysfs callback, so mark it as constant to allow struct -> class to be moved to read-only memory. -> -> While we are touching all class sysfs callbacks also mark the attribute -> as constant as it can not be modified. The bonding code still uses this -> structure so it can not be removed from the function callbacks. -> - -**[v2: net-next: tools: ynl: fill in some gaps of ethtool spec](http://lore.kernel.org/netdev/20230324225656.3999785-1-sdf@google.com/)** - -> I was trying to fill in the spec while exploring ethtool API for some -> related work. I don't think I'll have the patience to fill in the rest, -> so decided to share whatever I currently have. -> - -**[v1: net-next: net: phy: bcm7xxx: use devm_clk_get_optional_enabled to simplify the code](http://lore.kernel.org/netdev/5603487f-3b80-b7ec-dbd2-609fa8020e58@gmail.com/)** - -> Use devm_clk_get_optional_enabled to simplify the code. -> - -**[v4: net-next: ynl: add support for user headers and struct attrs](http://lore.kernel.org/netdev/20230324191900.21828-1-donald.hunter@gmail.com/)** - -> Add support for user headers and struct attrs to YNL. This patchset adds -> features to ynl and add a partial spec for openvswitch that demonstrates -> use of the features. -> - -**[v1: net-next: tools: ynl: default to treating enums as flags for mask generation](http://lore.kernel.org/netdev/20230324190356.2418748-1-kuba@kernel.org/)** - -> I was a bit too optimistic in commit bf51d27704c9 ("tools: ynl: fix -> get_mask utility routine"), not every mask we use is necessarily -> coming from an enum of type "flags". We also allow flipping an -> enum into flags on per-attribute basis. That's done by -> the 'enum-as-flags' property of an attribute. -> - -**[v6: net-next: pds_core driver](http://lore.kernel.org/netdev/20230324190243.27722-1-shannon.nelson@amd.com/)** - -> This patchset implements a new driver for use with the AMD/Pensando -> Distributed Services Card (DSC), intended to provide core configuration -> services through the auxiliary_bus and through a couple of EXPORTed -> functions for use initially in VFio and vDPA feature specific drivers. -> - -**[v1: net-next: selftests: tls: add a test for queuing data before setting the ULP](http://lore.kernel.org/netdev/20230324181757.2407412-1-kuba@kernel.org/)** - -> Other tests set up the connection fully on both ends before -> communicating any data. Add a test which will queue up TLS -> records to TCP before the TLS ULP is installed. -> - -**[v1: net-next: net: phy: move getting (R)MII refclock to phylib](http://lore.kernel.org/netdev/0c529488-0fd8-19e1-c5a9-9cf1fab78ed3@gmail.com/)** - -> >From c578be6534254bfc3fd627d9d7be07b1bb46f92c Mon Sep 17 00:00:00 2001 -> Few PHY drivers (smsc, bcm7xxx, micrel) get and enable the (R)MII -> reference clock in their probe() callback. Move this common -> functionality to phylib, this allows to remove it from drivers. -> - -**[v1: net-next: net/core: add optional threading for backlog processing](http://lore.kernel.org/netdev/20230324171314.73537-1-nbd@nbd.name/)** - -> When dealing with few flows or an imbalance on CPU utilization, static RPS -> CPU assignment can be too inflexible. Add support for enabling threaded NAPI -> for backlog processing in order to allow the scheduler to better balance -> processing. This helps better spread the load across idle CPUs. -> - -**[v1: net: ice: make writes to /dev/gnssX synchronous](http://lore.kernel.org/netdev/20230324162056.200752-1-mschmidt@redhat.com/)** - -> The current ice driver's GNSS write implementation buffers writes and -> works through them asynchronously in a kthread. That's bad because: -> - The GNSS write_raw operation is supposed to be synchronous[1][2]. -> - There is no upper bound on the number of pending writes. -> Userspace can submit writes much faster than the driver can process, -> consuming unlimited amounts of kernel memory. -> - -**[v4: vdpa_sim: add support for user VA](http://lore.kernel.org/netdev/20230324153607.46836-1-sgarzare@redhat.com/)** - -> This series adds support for the use of user virtual addresses in the -> vDPA simulator devices. -> -> The main reason for this change is to lift the pinning of all guest memory. -> Especially with virtio devices implemented in software. -> - -**[v1: net: vsock/loopback: use only sk_buff_head.lock to protect the packet queue](http://lore.kernel.org/netdev/20230324115450.11268-1-sgarzare@redhat.com/)** - -> pkt_list_lock was used before commit 71dc9ec9ac7d ("virtio/vsock: -> replace virtio_vsock_pkt with sk_buff") to protect the packet queue. -> After that commit we switched to sk_buff and we are using -> sk_buff_head.lock in almost every place to protect the packet queue -> except in vsock_loopback_work() when we call skb_queue_splice_init(). -> - -**[v1: wpan-next: ieee802154: Handle imited devices](http://lore.kernel.org/netdev/20230324110558.90707-1-miquel.raynal@bootlin.com/)** - -> As rightly pointed out by Alexander a few months ago, ca8210 devices -> will not support sending frames which are not pure datagrams (hardMAC -> wired to the softMAC layer). In order to not confuse users and clarify -> that scanning and beaconing is not supported on these devices, let's add -> a flag to prevent them to be used with the new APIs. -> - -**[v4: bpf-next: xsk: allow remap of fill and/or completion rings](http://lore.kernel.org/netdev/20230324100222.13434-1-nunog@fr24.com/)** - -> The remap of fill and completion rings was frowned upon as they -> control the usage of UMEM which does not support concurrent use. -> At the same time this would disallow the remap of these rings -> into another process. -> -> A possible use case is that the user wants to transfer the socket/ -> UMEM ownership to another process (via SYS_pidfd_getfd) and so -> would need to also remap these rings. -> - -**[v1: Introduce a generic regmap-based MDIO driver](http://lore.kernel.org/netdev/20230324093644.464704-1-maxime.chevallier@bootlin.com/)** - -> When the Altera TSE PCS driver was initially introduced, there were -> comments by Russell that the register layout looked very familiar to the -> existing Lynx PCS driver, the only difference being that the TSE PCS -> driver is memory-mapped whereas the Lynx PCS driver sits on an MDIO bus. -> - -**[[PATCH RESEND net-next 0/3] Constify a few sfp/phy fwnodes](http://lore.kernel.org/netdev/ZB1sBYQnqWbGoasq@shell.armlinux.org.uk/)** - -> This series constifies a bunch of fwnode_handle pointers that are only -> used to refer to but not modify the contents of the fwnode structures. -> -> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ -> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! -> - -**[v1: net-next: Constify a few sfp/phy fwnodes](http://lore.kernel.org/netdev/ZB1rNMAJ9oLr8myx@shell.armlinux.org.uk/)** - -> This series constifies a bunch of fwnode_handle pointers that are only -> used to refer to but not modify the contents of the fwnode structures. -> -> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ -> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! -> - -**[v2: net: dsa: b53: mdio: add support for BCM53134](http://lore.kernel.org/netdev/20230324084138.664285-1-noltari@gmail.com/)** - -> This is based on the initial work from Paul Geurts that was sent to the -> incorrect linux development lists and recipients. -> I've simplified his patches by adding BCM53134 to the is531x5() block since it -> seems that the switch doesn't need a special RGMII config. -> - -**[v1: next: rtlwifi: Replace fake flex-array with flex-array member](http://lore.kernel.org/netdev/ZBz4x+MWoI%2Ff65o1@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated and we are -> moving towards adopting C99 flexible-array members instead. -> - -**[v2: net-next: net: phy: Improved PHY error reporting in state machine](http://lore.kernel.org/netdev/20230323214559.3249977-1-f.fainelli@gmail.com/)** - -> When the PHY library calls phy_error() something bad has happened, and -> we halt the PHY state machine. Calling phy_error() from the main state -> machine however is not precise enough to know whether the issue is -> reading the link status or starting auto-negotiation. -> - -**[v3: net, refcount: Address dst_entry reference count scalability issues](http://lore.kernel.org/netdev/20230323102649.764958589@linutronix.de/)** - -> This is version 3 of this series. Version 2 can be found here: -> -> https://lore.kernel.org/lkml/20230307125358.772287565@linutronix.de -> -> Wangyang and Arjan reported a bottleneck in the networking code related to -> struct dst_entry::__refcnt. Performance tanks massively when concurrency on -> a dst_entry increases. -> - -**[v2: net-next: sfc: support TC decap rules](http://lore.kernel.org/netdev/cover.1679603051.git.ecree.xilinx@gmail.com/)** - -> This series adds support for offloading tunnel decapsulation TC rules to -> ef100 NICs, allowing matching encapsulated packets to be decapsulated in -> hardware and redirected to VFs. -> For now an encap match must be on precisely the following fields: -> ethertype (IPv4 or IPv6), source IP, destination IP, ipproto UDP, -> UDP destination port. This simplifies checking for overlaps in the -> driver; the hardware supports a wider range of match fields which -> future driver work may expose. -> - -**[v2: net: vmxnet3: use gro callback when UPT is enabled](http://lore.kernel.org/netdev/20230323200721.27622-1-doshir@vmware.com/)** - -> Currently, vmxnet3 uses GRO callback only if LRO is disabled. However, -> on smartNic based setups where UPT is supported, LRO can be enabled -> from guest VM but UPT devicve does not support LRO as of now. In such -> cases, there can be performance degradation as GRO is not being done. -> - -#### Rust For Linux - -**[v2: rust: macros: Allow specifying multiple module aliases](http://lore.kernel.org/rust-for-linux/20230224-rust-macros-v2-1-7396e8b7018d@asahilina.net/)** - -> Modules can (and usually do) have multiple alias tags, in order to -> specify multiple possible device matches for autoloading. Allow this by -> changing the alias ModuleInfo field to an Option>. -> - -#### 安全增强 - -**[[RFC/RFT,V2 0/3] Add compiler support for Kernel Control Flow Integrity](http://lore.kernel.org/linux-hardening/20230325081117.93245-1-ashimida.1990@gmail.com/)** - -> This series of patches is mainly used to support the control flow -> integrity protection of the linux kernel [1], which is similar to -> -fsanitize=kcfi in clang 16.0 [2,3]. -> - -**[v1: next: uapi: net: ipv6: Replace fake flex-array with flex-array member](http://lore.kernel.org/linux-hardening/ZBy5bNygP5yxnE9k@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated and we are -> moving towards adopting C99 flexible-array members instead. -> - -**[v1: next: wifi: rndis_wlan: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZBtIbU77L9eXqa4j@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated and we are -> moving towards adopting C99 flexible-array members instead. -> -> Address the following warning found with GCC-13 and -> -fstrict-flex-array=3 enabled: -> drivers/net/wireless/rndis_wlan.c:2902:23: warning: array subscript 0 is outside array bounds of ‘struct ndis_80211_auth_request[0]’ [-Warray-bounds=] -> - -#### 异步 IO - -**[v1: io_uring/rw: transform single vector readv/writev into ubuf](http://lore.kernel.org/io-uring/43cb1fb7-b30b-8df1-bba6-e50797d680c6@kernel.dk/)** - -> It's very common to have applications that use vectored reads or writes, -> even if they only pass in a single segment. Obviously they should be -> using read/write at that point, but... -> -> Vectored IO comes with the downside of needing to retain iovec state, -> and hence they require and allocation and state copy if they end up -> getting deferred. Additionally, they also require extra cleanup when -> completed as the memory as the allocated state memory has to be freed. -> - -**[v1: liburing: add multishot timeout support](http://lore.kernel.org/io-uring/20230323233632.2376374-1-davidhwei@meta.com/)** - -> Single change to sync the new IORING_TIMEOUT_MULTISHOT flag with kernel. -> -> Mostly unit tests for multishot timeouts. -> - -**[v1: block/io_uring: pass in issue_flags for uring_cmd task_work handling](http://lore.kernel.org/io-uring/c56fc63e-7e6b-480e-dfdc-417b00802f11@kernel.dk/)** - -> io_uring_cmd_done() currently assumes that the uring_lock is held -> when invoked, and while it generally is, this is not guaranteed. -> Pass in the issue_flags associated with it, so that we have -> IO_URING_F_UNLOCKED available to be able to lock the CQ ring -> appropriately when completing events. -> - - -#### BPF - -**[v1: bpf-next: Don't invoke KPTR_REF destructor on NULL xchg](http://lore.kernel.org/bpf/20230325213144.486885-1-void@manifault.com/)** - -> When a map value is being freed, we loop over all of the fields of the -> corresponding BPF object and issue the appropriate cleanup calls -> corresponding to the field's type. If the field is a referenced kptr, we -> atomically xchg the value out of the map, and invoke the kptr's -> destructor on whatever was there before. -> - -**[v1: bpf-next: First set of verifier/*.c migrated to inline assembly](http://lore.kernel.org/bpf/20230325025524.144043-1-eddyz87@gmail.com/)** - -> This is a follow up for RFC [1]. It migrates a first batch of 38 -> verifier/*.c tests to inline assembly and use of ./test_progs for -> actual execution. The migration is done by a python script (see [2]). -> -> Each migrated verifier/xxx.c file is mapped to progs/verifier_xxx.c -> plus an entry in the prog_tests/verifier.c. One patch per each file. -> - -**[v1: bpf-next: libbpf: synchronize access to print function pointer](http://lore.kernel.org/bpf/20230325010845.46000-1-inwardvessel@gmail.com/)** - -> This patch prevents races on the print function pointer, allowing the -> libbpf_set_print() function to become thread safe. -> - -**[v2: bpf-next: veristat: add better support of freplace programs](http://lore.kernel.org/bpf/20230324232745.3959567-1-andrii@kernel.org/)** - -> Teach veristat how to deal with freplace BPF programs. As they can't be -> directly loaded by veristat without custom user-space part that sets correct -> target program FD, veristat always fails freplace programs. This patch set -> teaches veristat to guess target program type that will be inherited by -> freplace program itself, and subtitute it for BPF_PROG_TYPE_EXT (freplace) one -> for the purposes of BPF verification. -> - -**[v1: bpf-next: bpftool: Add inline annotations when dumping program CFGs](http://lore.kernel.org/bpf/20230324230209.161008-1-quentin@isovalent.com/)** - -> This set contains some improvements for bpftool's "visual" program dump -> option, which produces the control flow graph in a DOT format. The main -> objective is to add support for inline annotations on such graphs, so that -> we can have the C source code for the program showing up alongside the -> instructions, when available. The last commits also make it possible to -> display the line numbers or the bare opcodes in the graph, as supported by -> regular program dumps. -> - -**[v3: Add ftrace direct call for arm64](http://lore.kernel.org/bpf/20230324171451.2752302-1-revest@chromium.org/)** - -> This series adds ftrace direct call support to arm64. -> This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64. -> - -**[v1: capability: test_deny_namespace breakage due to capability conversion to u64](http://lore.kernel.org/bpf/20230324123626.2177476-1-sashal@kernel.org/)** - -> Commit f122a08b197d ("capability: just use a 'u64' instead of a 'u32[2]' -> array") attempts to use BIT_LL() but actually wanted to use BIT_ULL(), -> fix it up to make the test compile and run again. -> - -**[v4: bpf-next: bpf-nex: Add socket destroy capability](http://lore.kernel.org/bpf/20230323200633.3175753-1-aditi.ghag@isovalent.com/)** - -> This patch adds the capability to destroy sockets in BPF. We plan to use -> the capability in Cilium to force client sockets to reconnect when their -> remote load-balancing backends are deleted. The other use case is -> on-the-fly policy enforcement where existing socket connections prevented -> by policies need to be terminated. -> - -**[v2: bpf-next: bpf: add bound tracking for BPF_MOD](http://lore.kernel.org/bpf/20230324045842.729719-1-xukuohai@huaweicloud.com/)** - -> dst_reg is marked as unknown when BPF_MOD instruction is verified, causing -> the following bpf prog to be incorrectly rejected. -> - -**[v12: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230323032405.3735486-1-kuifeng@meta.com/)** - -> Previously, BPF struct_ops didn't go off, as even when the user -> program creating it was terminated, none of these ever were pinned. -> For instance, the TCP congestion control subsystem indirectly -> maintains a reference count on the struct_ops of any registered BPF -> implemented algorithm. Thus, the algorithm won't be deactivated until -> someone deliberately unregisters it. For compatibility with other BPF -> programs, bpf_links have been created to work in coordination with -> struct_ops maps. This ensures that the registration and unregistration -> of these respective maps is carried out at the start and end of the -> bpf_link. -> - -**[v1: bpf-next: bpf: remember meta->iter info only for initialized iters](http://lore.kernel.org/bpf/20230322232502.836171-1-andrii@kernel.org/)** - -> For iter_new() functions iterator state's slot might not be yet -> initialized, in which case iter_get_spi() will return -ERANGE. This is -> expected and is handled properly. But for iter_next() and iter_destroy() -> cases iter slot is supposed to be initialized and correct, so -ERANGE is -> not possible. -> - -**[v3: bpf-next: bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage](http://lore.kernel.org/bpf/20230322215246.1675516-1-martin.lau@linux.dev/)** - -> This set is a continuation of the effort in using -> bpf_mem_cache_alloc/free in bpf_local_storage [1] -> -> Major change is only using bpf_mem_alloc for task and cgrp storage -> while sk and inode stay with kzalloc/kfree. The details is -> in patch 2. -> -> [1]: https://lore.kernel.org/bpf/20230308065936.1550103-1-martin.lau@linux.dev/ -> - -**[v2: bpf-next: error checking where helpers call bpf_map_ops](http://lore.kernel.org/bpf/20230322194754.185781-1-inwardvessel@gmail.com/)** - -> Within bpf programs, the bpf helper functions can make inline calls to -> kernel functions. In this scenario there can be a disconnect between the -> register the kernel function writes a return value to and the register the -> bpf program uses to evaluate that return value. -> - -**[v3: bpf-next: XDP-hints kfuncs for Intel driver igc](http://lore.kernel.org/bpf/167950085059.2796265.16405349421776056766.stgit@firesoul/)** - -> Implemented XDP-hints metadata kfuncs for Intel driver igc. -> -> Primarily used the tool in tools/testing/selftests/bpf/ xdp_hw_metadata, -> when doing driver development of these features. Recommend other driver -> developers to do the same. In the process xdp_hw_metadata was updated to -> help assist development. I've documented my practical experience with igc -> and tool here[1]. -> - -**[v1: net-next: virtio_net: refactor xdp codes](http://lore.kernel.org/bpf/20230322030308.16046-1-xuanzhuo@linux.alibaba.com/)** - -> Due to historical reasons, the implementation of XDP in virtio-net is relatively -> chaotic. For example, the processing of XDP actions has two copies of similar -> code. Such as page, xdp_page processing, etc. -> - -**[v10: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230321232813.3376064-1-kuifeng@meta.com/)** - -> Major changes: -> -> - Create bpf_links in the kernel for BPF struct_ops to register and -> unregister it. -> -> - Enables switching between implementations of bpf-tcp-cc under a -> name instantly by replacing the backing struct_ops map of a -> bpf_link. -> - -**[v2: bpf-next: bpf: Support ksym detection in light skeleton.](http://lore.kernel.org/bpf/20230321203854.3035-1-alexei.starovoitov@gmail.com/)** - -> v1->v2: update denylist on s390 -> -> Patch 1: Cleanup internal libbpf names. -> Patch 2: Teach the verifier that rdonly_mem != NULL. -> Patch 3: Fix gen_loader to support ksym detection. -> Patch 4: Selftest and update denylist. -> - -**[v3: bpf-next: bpf-next: Add socket destroy capability](http://lore.kernel.org/bpf/20230321184541.1857363-1-aditi.ghag@isovalent.com/)** - -> This patch adds the capability to destroy sockets in BPF. We plan to use -> the capability in Cilium to force client sockets to reconnect when their -> remote load-balancing backends are deleted. The other use case is -> on-the-fly policy enforcement where existing socket connections prevented -> by policies need to be terminated. -> - -**[v2: bpf: xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support](http://lore.kernel.org/bpf/167940675120.2718408.8176058626864184420.stgit@firesoul/)** - -> When driver doesn't implement a bpf_xdp_metadata kfunc the fallback -> implementation returns EOPNOTSUPP, which indicate device driver doesn't -> implement this kfunc. -> - -**[v1: tracing: Refuse fprobe if RCU is not watching](http://lore.kernel.org/bpf/20230321020103.13494-1-laoar.shao@gmail.com/)** - -> It hits below warning on my test machine when running -> selftests/bpf/test_progs, -> - -**[v2: bpf-next: net: skbuff: skb bitfield compaction - bpf](http://lore.kernel.org/bpf/20230321014115.997841-1-kuba@kernel.org/)** - -> I'm trying to make more of the sk_buff bits optional. -> Move the BPF-accessed bits a little - because they must -> be at coding-time-constant offsets they must precede any -> optional bit. While at it clean up the naming a bit. -> - -### 周边技术动态 - -#### Qemu - -**[v4: for-8.1: target/riscv: rework CPU extensions validation](http://lore.kernel.org/qemu-devel/20230322222004.357013-1-dbarboza@ventanamicro.com/)** - -> In this version I simplified the logic used in write_misa() after -> reviews from Weiwei Li. The patch that handled RVV activation was -> removed, making RVV a regular MISA bit to activate/deactivate. -> - -**[v3: target/riscv: reduce overhead of MSTATUS_SUM change](http://lore.kernel.org/qemu-devel/20230322121240.232303-1-fei2.wu@intel.com/)** - -> Kernel needs to access user mode memory e.g. during syscalls, the window -> is usually opened up for a very limited time through MSTATUS.SUM, the -> overhead is too much if tlb_flush() gets called for every SUM change. -> - -**[v3: qemu: linux-user: Emulate /proc/cpuinfo output for riscv](http://lore.kernel.org/qemu-devel/324c2fd4-7044-0dd9-7ad9-b716fbefa5d9@gmail.com/)** - -> RISC-V does not expose all extensions via hwcaps, thus some userspace -> applications may want to query these via /proc/cpuinfo. -> - -## 20230319:第 38 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v1: Deduplicating RISCV cmpxchg.h macros](http://lore.kernel.org/linux-riscv/20230318080059.1109286-1-leobras@redhat.com/)** - -> While studying riscv's cmpxchg.h file, I got really interested in -> understanding how RISCV asm implemented the different versions of -> {cmp,}xchg. -> - -**[v1: KVM: RISC-V: Retry fault if vma_lookup() results become invalid](http://lore.kernel.org/linux-riscv/20230317211106.1234484-1-dmatlack@google.com/)** - -> Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can -> detect if the results of vma_lookup() (e.g. vma_shift) become stale -> before it acquires kvm->mmu_lock. This fixes a theoretical bug where a -> VMA could be changed by userspace after vma_lookup() and before KVM -> reads the mmu_invalidate_seq, causing KVM to install page table entries -> based on a (possibly) no-longer-valid vma_shift. -> - -**[v1: riscv: say disabling zicbom if no or bad riscv,cbom-block-size found](http://lore.kernel.org/linux-riscv/20230317134512.254627-1-ben.dooks@codethink.co.uk/)** - -> If Zicbom is present but there was no riscv,cbom-blocks-size property found -> during the cpu feeatures probe, or the cbom-block-size is not valid, then -> the extension will be disabled. Make the print explicitly say this is -> disabled to ensure that there is no confusion about what is being done. -> - -**[v15: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230317113538.10878-1-andy.chiu@sifive.com/)** - -> This patchset is implemented based on vector 1.0 spec to add vector support -> in riscv Linux kernel. There are some assumptions for this implementations. -> -> 1. We assume all harts has the same ISA in the system. -> 2. We disable vector in both kernel andy user space [1] by default. Only -> enable an user's vector after an illegal instruction trap where it -> actually starts executing vector (the first-use trap [2]). -> 3. We detect "riscv,isa" to determine whether vector is support or not. -> -> - [1] https://lore.kernel.org/all/20220921214439.1491510-17-stillson@rivosinc.com/ -> - [2] https://lore.kernel.org/all/73c0124c-4794-6e40-460c-b26df407f322@rivosinc.com/T/#u -> - -**[[PATCH AUTOSEL 4.14] riscv: Bump COMMAND_LINE_SIZE value to 1024](http://lore.kernel.org/linux-riscv/20230316163422.709087-1-sashal@kernel.org/)** - -> [ Upstream commit 61fc1ee8be26bc192d691932b0a67eabee45d12f ] -> -> Increase COMMAND_LINE_SIZE as the current default value is too low -> for syzbot kernel command line. -> -> There has been considerable discussion on this patch that has led to a -> larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all -> ports. That's not quite done yet, but it's gotten far enough we're -> confident this is not a uABI change so this is safe. -> - -**[[PATCH AUTOSEL 5.4] riscv: Bump COMMAND_LINE_SIZE value to 1024](http://lore.kernel.org/linux-riscv/20230316163408.709028-1-sashal@kernel.org/)** - -> [ Upstream commit 61fc1ee8be26bc192d691932b0a67eabee45d12f ] -> -> Increase COMMAND_LINE_SIZE as the current default value is too low -> for syzbot kernel command line. -> -> There has been considerable discussion on this patch that has led to a -> larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all -> ports. That's not quite done yet, but it's gotten far enough we're -> confident this is not a uABI change so this is safe. -> -> [Palmer: it's not uabi] -> - -**[[PATCH AUTOSEL 5.10] riscv: Bump COMMAND_LINE_SIZE value to 1024](http://lore.kernel.org/linux-riscv/20230316163401.708994-1-sashal@kernel.org/)** - -> [ Upstream commit 61fc1ee8be26bc192d691932b0a67eabee45d12f ] -> -> Increase COMMAND_LINE_SIZE as the current default value is too low -> for syzbot kernel command line. -> -> There has been considerable discussion on this patch that has led to a -> larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all -> ports. That's not quite done yet, but it's gotten far enough we're -> confident this is not a uABI change so this is safe. -> -> [Palmer: it's not uabi] -> - -**[v8: riscv: Use PUD/P4D/PGD pages for the linear mapping](http://lore.kernel.org/linux-riscv/20230316131711.1284451-1-alexghiti@rivosinc.com/)** - -> This patchset intends to improve tlb utilization by using hugepages for -> the linear mapping. -> -> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must -> take care of isolating the kernel text and rodata so that they are not -> mapped with a PUD mapping which would then assign wrong permissions to -> the whole region: it is achieved by introducing a new memblock API. -> - -**[v7: Add Ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230316043714.24279-1-samin.guo@starfivetech.com/)** - -> This series adds ethernet support for the StarFive JH7110 RISC-V SoC. -> The series includes MAC driver. The MAC version is dwmac-5.20 (from -> Synopsys DesignWare). -> The series has been tested on the VisionFive-2-v1.2A and -> VisionFive-2-v1.3B board which equip with JH7110 SoC and works normally. -> -> For more information and support, you can visit RVspace wiki[1]. -> You can simply review or test the patches at the link [2]. -> - -**[v2: Add PLL clocks driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230316030514.137427-1-xingyu.wu@starfivetech.com/)** - -> This patch serises are to add PLL clocks driver and providers by writing -> and reading syscon registers for the StarFive JH7110 RISC-V SoC. -> -> PLL are high speed, low jitter frequency synthesizers in JH7110. -> Each PLL clocks work in integer mode or fraction mode by some dividers, -> and the dividers are set in several syscon registers. -> The formula for calculating frequency is: -> Fvco = Fref * (NI + NF) / M / Q1 -> - -**[v4: function_graph: Support recording and printing the return value of function](http://lore.kernel.org/linux-riscv/20230315133911.958741-1-pengdonglin@sangfor.com.cn/)** - -> When using the function_graph tracer to analyze system call failures, -> it can be time-consuming to analyze the trace logs and locate the kernel -> function that first returns an error. This change aims to simplify the -> process by recording the function return value to the 'retval' member of -> 'ftrace_graph_ent' and printing it when outputing the trace log. -> - -**[v1: Enable I2S support for RK3588/RK3588S SoCs](http://lore.kernel.org/linux-riscv/20230315114806.3819515-1-cristian.ciocaltea@collabora.com/)** - -> There are five I2S/PCM/TDM controllers and two I2S/PCM controllers embedded -> in the RK3588 and RK3588S SoCs. Furthermore, RK3588 provides four additional -> I2S/PCM/TDM controllers. -> -> This patch series adds the required device tree nodes to support all the above. -> - -**[v3: Add JH7110 USB and USB PHY driver support](http://lore.kernel.org/linux-riscv/20230315104411.73614-1-minda.chen@starfivetech.com/)** - -> This patchset adds USB driver and USB PHY for the StarFive JH7110 SoC. -> USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. -> The patch has been tested on the VisionFive 2 board. -> - -**[v3: Add JH7110 MIPI DPHY RX support](http://lore.kernel.org/linux-riscv/20230315100421.133428-1-changhuang.liang@starfivetech.com/)** - -> This patchset adds mipi dphy rx driver for the StarFive JH7110 SoC. -> It is used to transfer CSI camera data. The series has been tested on -> the VisionFive 2 board. -> - -**[v1: Add PTP support for sama7g5](http://lore.kernel.org/linux-riscv/20230315095053.53969-1-durai.manickamkr@microchip.com/)** - -> This patch series is intended to add PTP capability to the GEM and -> EMAC for sama7g5. -> - -**[v2: perf tools riscv: Add support for riscv lookup_binutils_path](http://lore.kernel.org/linux-riscv/20230315051500.13064-1-p4ranlee@gmail.com/)** - -> Add RISC-V binutils path on lookup triplets. -> - -**[v2: mm: Stop alaising VM_FAULT_HINDEX_MASK in arch code](http://lore.kernel.org/linux-riscv/20230315030359.14162-1-palmer@rivosinc.com/)** - -> When reviewing -> -> I noticed that the arch-specific VM_FAULT flags used by arm and s390 -> alias with VM_FAULT_HINDEX_MASK. I'm not sure if it's possible to -> manifest this as a bug, but it certainly seems fragile. -> -> I'm including that original patch this time in the hope that makes it -> easier for folks to review. There were some boring conflicts so I -> figured I'd rebase rather than pinging again. -> - -**[v6: StarFive's SDIO/eMMC driver support](http://lore.kernel.org/linux-riscv/20230315034853.93677-1-william.qiu@starfivetech.com/)** - -> This patchset adds initial rudimentary support for the StarFive -> designware mobile storage host controller driver. And this driver will -> be used in StarFive's VisionFive 2 board. The main purpose of adding -> this driver is to accommodate the ultra-high speed mode of eMMC. -> - -**[v1: RISCV: CANAAN: Make K210_SYSCTL depend on CLK_K210](http://lore.kernel.org/linux-riscv/20230314211030.3953195-1-Mr.Bossman075@gmail.com/)** - -> CLK_K210 is no longer a dependency of SOC_CANAAN, -> but K210_SYSCTL depends on CLK_K210. This patch makes K210_SYSCTL -> depend on CLK_K210. Also fix whitespace errors. -> - -**[v5: Add watchdog driver for StarFive JH7100/JH7110 RISC-V SoCs](http://lore.kernel.org/linux-riscv/20230314132437.121534-1-xingyu.wu@starfivetech.com/)** - -> This patch serises are to add watchdog driver for the StarFive -> JH7100 and JH7110 RISC-V SoCs. The first patch adds docunmentation to -> describe device tree bindings. The subsequent patch adds watchdog driver -> and support JH7100/JH7110 SoCs. And the last patch adds watchdog node in -> the JH7100 dts. And the addition of JH7110 device tree node will be -> submitted after the JH7110 dts merge. This patchset is based on 6.3-rc1. -> -> The watchdog driver has been tested on the VisionFive 1 and VisionFive 2 -> boards which equip with JH7100 and JH7110 SoCs respectively and both -> works normally. -> - -**[v3: Add new partial clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230314124404.117592-1-xingyu.wu@starfivetech.com/)** - -> This patch serises are to add new partial clock drivers and reset -> supports about System-Top-Group(STG), Image-Signal-Process(ISP) -> and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. -> -> Patches 1 to 3 are about the System-Top-Group clock and reset -> generator(STGCRG) part. -> The first patch adds docunmentation to describe STG bindings, and -> the second patch adds support about STG resets. The last patch adds -> clock driver to support STG clocks for JH7110. -> - -**[v3: Kconfig: Introduce HAS_IOPORT config option](http://lore.kernel.org/linux-riscv/20230314121216.413434-1-schnelle@linux.ibm.com/)** - -> Hello Kernel Hackers, -> -> Some platforms such as s390 do not support PCI I/O spaces. On such platforms -> I/O space accessors like inb()/outb() are stubs that can never actually work. -> The way these stubs are implemented in asm-generic/io.h leads to compiler -> warnings because any use will be a NULL pointer access on these platforms. In -> a previous patch we tried handling this with a run-time warning on access. This -> approach however was rejected by Linus[0] with the argument that this really -> should be a compile-time check and, though a much more invasive change, we -> believe that is indeed the right approach. -> - -**[v6: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230314050316.31701-1-jeeheng.sia@starfivetech.com/)** - -> This series adds RISC-V Hibernation/suspend to disk support. -> Low level Arch functions were created to support hibernation. -> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write -> cpu state onto the stack, then calling swsusp_save() to save the memory -> image. -> - -**[v1: riscv: Handle zicsr/zifencei issues between clang and binutils](http://lore.kernel.org/linux-riscv/20230313-riscv-zicsr-zifencei-fiasco-v1-1-dd1b7840a551@kernel.org/)** - -> There are two related issues that appear in certain combinations with -> clang and GNU binutils. -> -> The first occurs when a version of clang that supports zicsr or zifencei -> via '-march=' [1] (i.e, >= 17.x) is used in combination with a version -> of GNU binutils that do not recognize zicsr and zifencei in the -> '-march=' value (i.e., < 2.36): -> - -**[v3: RISC-V: support some cryptography accelerations](http://lore.kernel.org/linux-riscv/20230313191302.580787-1-heiko.stuebner@vrull.eu/)** - -> The base is v14 of the vector patchset but the first patches up to doing -> the Zbc-based GCM GHash can also run without those. Of course the vector- -> crypto extensions are also not ratified yet, hence the marking as RFC. -> - -**[v5: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230311090733.56918-1-hal.feng@starfivetech.com/)** - -> This patch series adds basic clock, reset & DT support for StarFive -> JH7110 SoC. -> -> You can simply review or test the patches at the link [1]. -> -> [1]: https://github.com/hal-feng/linux/commits/visionfive2-minimal -> - -#### 进程调度 - -**[v2: sched/fair: sanitize vruntime of entity being migrated](http://lore.kernel.org/lkml/20230317160810.107988-1-vincent.guittot@linaro.org/)** - -> Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed") -> fixes an overflowing bug, but ignore a case that se->exec_start is reset -> after a migration. -> - -**[v1: sched/core: Avoid selecting the task that is throttled to run when core-sched enable](http://lore.kernel.org/lkml/20230316081806.69544-1-jiahao.os@bytedance.com/)** - -> When {rt, cfs}_rq or dl task is throttled, since cookied tasks -> are not dequeued from the core tree, So sched_core_find() and -> sched_core_next() may return throttled task, which may -> cause throttled task to run on the CPU. -> - -**[v1: net/sched: use real_num_tx_queues in dev_watchdog()](http://lore.kernel.org/lkml/20230315183408.2723-1-praveen.kannoju@oracle.com/)** - -> Currently dev_watchdog() loops through num_tx_queues[Number of TX queues -> allocated at alloc_netdev_mq() time] instead of real_num_tx_queues -> [Number of TX queues currently active in device] to detect transmit -> queue time out. Make this efficient by using real_num_tx_queues. -> - -**[v1: sched/deadline: cpuset: Rework DEADLINE bandwidth restoration](http://lore.kernel.org/lkml/20230315121812.206079-1-juri.lelli@redhat.com/)** - -> Qais reported [1] that iterating over all tasks when rebuilding root -> domains for finding out which ones are DEADLINE and need their bandwidth -> correctly restored on such root domains can be a costly operation (10+ -> ms delays on suspend-resume). He proposed we skip rebuilding root -> domains for certain operations, but that approach seemed arch specific -> and possibly prone to errors, as paths that ultimately trigger a rebuild -> might be quite convoluted (thanks Qais for spending time on this!). -> - -**[v1: sched/rt: Reset sysctl_sched_rr_timeslice when it non-positive](http://lore.kernel.org/lkml/20230314031323.3638994-1-yajun.deng@linux.dev/)** - -> When sysctl_sched_rr_timeslice was set a non-positive number, only -> sched_rr_timeslice was reset to default, This behavior should let -> users know. -> -> So reset sysctl_sched_rr_timeslice at the same time when it -> non-positive. -> - -**[v1: sched/fair: Don't balance migration disabled tasks](http://lore.kernel.org/lkml/20230313065759.39698-1-yangyicong@huawei.com/)** - -> On load balance we didn't check whether the candidate task is migration -> disabled or not, this may hit the WARN_ON in set_task_cpu() since the -> migration disabled tasks are expected to run on their current CPU. -> - -**[v1: sched/fair: scale vruntime delta on migration](http://lore.kernel.org/lkml/20230313021442.115425-1-mathieu.desnoyers@efficios.com/)** - -> On migration, use the respective runqueue spread of the source and -> destination runqueues to scale the vruntime delta of the scheduling -> entity. -> -> The intent of this change is to prevent a task migrated from a very busy -> runqueue (with vruntime going fast) to a less busy runqueue (with a -> vruntime with a slower pace) to enqueue the migrated task far away at -> the very end of the runqueue, thus increasing the destination runqueue -> spread and preventing the enqueued task from being scheduled for a while -> until the vruntime reaches it. -> - -#### 内存管理 - -**[v1: convert read_kcore(), vread() to use iterators](http://lore.kernel.org/linux-mm/cover.1679183626.git.lstoakes@gmail.com/)** - -> While reviewing Baoquan's recent changes to permit vread() access to -> vm_map_ram regions of vmalloc allocations, Willy pointed out [1] that it -> would be nice to refactor vread() as a whole, since its only user is -> read_kcore() and the existing form of vread() necessitates the use of a -> bounce buffer. -> - -**[v8: Shadow stacks for userspace](http://lore.kernel.org/linux-mm/20230319001535.23210-1-rick.p.edgecombe@intel.com/)** - -> This series implements Shadow Stacks for userspace using x86's Control-flow -> Enforcement Technology (CET). CET consists of two related security features: -> shadow stacks and indirect branch tracking. This series implements just the -> shadow stack part of this feature, and just for userspace. -> - -**[v1: Refactor do_fault_around()](http://lore.kernel.org/linux-mm/cover.1679089214.git.lstoakes@gmail.com/)** - -> Refactor do_fault_around() to avoid bitwise tricks and arather difficult to -> follow logic. Additionally, prefer fault_around_pages to -> fault_around_bytes as the operations are performed at a base page -> granularity. -> - -**[v1: Add results of early memtest to /proc/meminfo](http://lore.kernel.org/linux-mm/CAH2-hcJicFJ0h76JzY2DoLNF+4Nk7vGtk8gQv8JWFikt6X-wfA@mail.gmail.com/)** - -> Currently the memtest results were only presented in dmesg. -> This adds /proc/meminfo entry which can be easily used by scripts. -> - -**[v1: mm/page_alloc: Make deferred page init free pages in MAX_ORDER blocks](http://lore.kernel.org/linux-mm/20230317153501.19807-1-kirill.shutemov@linux.intel.com/)** - -> Normal page init path frees pages during the boot in MAX_ORDER chunks, -> but deferred page init path does it in pageblock blocks. -> -> Change deferred page init path to work in MAX_ORDER blocks. -> - -**[v12: mm,kfence: decouple kfence from page granularity mapping judgement](http://lore.kernel.org/linux-mm/1679066974-690-1-git-send-email-quic_zhenhuah@quicinc.com/)** - -> Kfence only needs its pool to be mapped as page granularity, if it is -> inited early. Previous judgement was a bit over protected. From [1], Mark -> suggested to "just map the KFENCE region a page granularity". So I -> decouple it from judgement and do page granularity mapping for kfence -> pool only. Need to be noticed that late init of kfence pool still requires -> page granularity mapping. -> - -**[v3: ACPI: APEI: handle synchronous exceptions with proper si_code](http://lore.kernel.org/linux-mm/20230317072443.3189-1-xueshuai@linux.alibaba.com/)** - -> changes since v2 by addressing comments from Naoya: -> - rename mce_task_work to sync_task_work -> - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify() -> - add steps to reproduce this problem in cover letter -> - Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/T/#mb3dede6b7a6d189dc8de3cf9310071e38a192f8e -> - -**[v1: kvm: mmu: move the added page that exists in current lru list to its tail](http://lore.kernel.org/linux-mm/20230317064920.12700-1-jiangjianwen@uniontech.com/)** - -> If the added page existing in current lru list, it's better to move that -> page to the end of that list. This modification can prolong the lifecycle -> of activated page and decrease I/O requirements while memory is limited. -> - -**[v1: kfence, kcsan: avoid passing -g for tests](http://lore.kernel.org/linux-mm/20230316155104.594662-1-elver@google.com/)** - -> This is because `-g` defaults to the compiler debug info default. If the -> assembler does not support some of the directives used, the above errors -> occur. To fix, remove the explicit passing of `-g`. -> -> All these tests want is that stack traces print valid function names, -> and debug info is not required for that. I currently cannot recall why I -> added the explicit `-g`. -> - -**[v1: splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES)](http://lore.kernel.org/linux-mm/20230316152618.711970-1-dhowells@redhat.com/)** - -> [NOTE! This patchset is a work in progress and some modules will not -> compile with it.] -> -> I've been looking at how to make pipes handle the splicing in of multipage -> folios and also looking to see if I could implement a suggestion from Willy -> that pipe_buffers could perhaps hold a list of pages (which could make -> splicing simpler - an entire splice segment would go in a single -> pipe_buffer). -> - -**[v11: mm,kfence: decouple kfence from page granularity mapping judgement](http://lore.kernel.org/linux-mm/1678979429-25815-1-git-send-email-quic_zhenhuah@quicinc.com/)** - -> Kfence only needs its pool to be mapped as page granularity, if it is -> inited early. Previous judgement was a bit over protected. From [1], Mark -> suggested to "just map the KFENCE region a page granularity". So I -> decouple it from judgement and do page granularity mapping for kfence -> pool only. Need to be noticed that late init of kfence pool still requires -> page granularity mapping. -> - -**[v10: mm,kfence: decouple kfence from page granularity mapping judgement](http://lore.kernel.org/linux-mm/1678969110-11941-1-git-send-email-quic_zhenhuah@quicinc.com/)** - -> Kfence only needs its pool to be mapped as page granularity, if it is -> inited early. Previous judgement was a bit over protected. From [1], Mark -> suggested to "just map the KFENCE region a page granularity". So I -> decouple it from judgement and do page granularity mapping for kfence -> pool only. Need to be noticed that late init of kfence pool still requires -> page granularity mapping. -> - -**[v1: Additional selftests for restrictedmem](http://lore.kernel.org/linux-mm/cover.1678926164.git.ackerleytng@google.com/)** - -> This is a series containing additional selftests for restrictedmem, -> prepared to be used with the next iteration of the restrictedmem -> series after v10. -> -> restrictedmem v10 is available at -> https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/. -> - -**[v1: mm/thp: Rename TRANSPARENT_HUGEPAGE_NEVER_DAX to _UNSUPPORTED](http://lore.kernel.org/linux-mm/20230315171642.1244625-1-peterx@redhat.com/)** - -> TRANSPARENT_HUGEPAGE_NEVER_DAX has nothing to do with DAX. It's set when -> has_transparent_hugepage() returns false, checked in hugepage_vma_check() -> and will disable THP completely if false. Rename it to reflect its real -> purpose. -> - -**[v19: splice, block: Use page pinning and kill ITER_PIPE](http://lore.kernel.org/linux-mm/20230315163549.295454-1-dhowells@redhat.com/)** - -> The first half of this patchset kills off ITER_PIPE to avoid a race between -> truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a -> bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes -> memory corruption[2]. Instead, we use filemap_splice_read(), which invokes -> the buffered file reading code and splices from the pagecache into the -> pipe; direct_splice_read(), which bulk-allocates a buffer, reads into it -> and then pushes the filled pages into the pipe; or handle it in -> filesystem-specific code. -> - -**[v1: splice: Convert longs and some ints into ssize_t](http://lore.kernel.org/linux-mm/295324.1678898094@warthog.procyon.org.uk/)** - -> Christoph Hellwig wrote: -> -> > The (pre-existing) long here is odd given that ->splice_read -> > returns a ssize_t. This might be a good time to fix that up. -> -> Here's a patch to do that. I'm not sure yet that I've got all the places that -> need changing as there are a couple of function pointer-taking functions where -> the pointed-to function return value should be changed. -> - -**[v1: Randomized slab caches for kmalloc()](http://lore.kernel.org/linux-mm/20230315095459.186113-1-gongruiqi1@huawei.com/)** - -> When exploiting memory vulnerabilities, "heap spraying" is a common -> technique targeting those related to dynamic memory allocation (i.e. the -> "heap"), and it plays an important role in a successful exploitation. -> Basically, it is to overwrite the memory area of vulnerable object by -> triggering allocation in other subsystems or modules and therefore -> getting a reference to the targeted memory location. It's usable on -> various types of vulnerablity including use after free (UAF), heap out- -> of-bound write and etc. -> - -**[v4: New page table range API](http://lore.kernel.org/linux-mm/20230315051444.3229621-1-willy@infradead.org/)** - -> This patchset changes the API used by the MM to set up page table entries. -> The four APIs are: -> set_ptes(mm, addr, ptep, pte, nr) -> update_mmu_cache_range(vma, addr, ptep, nr) -> flush_dcache_folio(folio) -> flush_icache_pages(vma, page, nr) -> -> flush_dcache_folio() isn't technically new, but no architecture -> implemented it, so I've done that for you. The old APIs remain around -> but are mostly implemented by calling the new interfaces. -> - -#### 安全增强 - -**[v1: next: drm/i915/uapi: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZBSu2QsUJy31kjSE@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated and we are -> moving towards adopting C99 flexible-array members instead. -> - -**[v1: next: wifi: carl9170: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZBSl2M+aGIO1fnuG@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated and we are -> moving towards adopting C99 flexible-array members instead. -> - -**[v1: next: uapi: target: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZBSchMvTdl7VObKI@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated and we are -> moving towards adopting C99 flexible-array members instead. -> -> This helps with the ongoing efforts to tighten the FORTIFY_SOURCE -> routines on memcpy() and help us make progress towards globally -> enabling -fstrict-flex-arrays=3 [1]. -> - -**[v1: mm/slub: reduce the calculation times of 'MAX_OBJS_PER_PAGE'](http://lore.kernel.org/linux-hardening/20230316012517.10479-1-gouhao@uniontech.com/)** - -> when calling calc_slab_order(), 'slub_min_order' -> and 'size' are fixed values, if the condition of -> 'MAX_OBJS_PER_PAGE' is true, it will be returned from -> here every time. -> -> So we can calculate the condition of 'MAX_OBJS_PER_PAGE' -> before calling calculate_order(). -> - -**[v5: x86_64: Improvements at compressed kernel stage](http://lore.kernel.org/linux-hardening/cover.1678785672.git.baskov@ispras.ru/)** - -> This patchset is aimed -> * to improve UEFI compatibility of compressed kernel code for x86_64 -> * to setup proper memory access attributes for code and rodata sections -> * to implement W^X protection policy throughout the whole execution -> of compressed kernel for EFISTUB code path. -> - -**[R: v1: Introduce per-interrupt kernel-stack randomization](http://lore.kernel.org/linux-hardening/414aee3992a54b6c933597bdbf9e0f71@intre.it/)** - -> > -----Messaggio originale----- -> > Da: Jere Viikari -> > A: Ornaghi Davide -> > ; keescook@chromium.org; -> > paulmck@kernel.org; nsaenzju@redhat.com; peterz@infradead.org; -> > bigeasy@linutronix.de; frederic@kernel.org; linux-hardening@vger.kernel.org; -> > linux-kernel@vger.kernel.org -> > -> > I am concerned about the disclaimer. When I replied, I had also to remove all -> > other information to ensure that I did not violate the terms. -> > -> - -**[R: v1: Introduce per-interrupt kernel-stack randomization](http://lore.kernel.org/linux-hardening/c2d598d5a11d4a29815a4eca63606159@intre.it/)** - -> Davide Ornaghi -> Offensive Security Specialist & Intrusion Analyst -> -> T. +39 039 28.45.774 +39 039 96.34.717 -> Intré Security - a venture of Intré S.r.l. -> www.intre.it -> - -#### 异步 IO - -**[v1: for-next: io_uring/kbuf: disallow mapping a badly aligned provided ring buffer](http://lore.kernel.org/io-uring/a0c3e328-badc-3f54-f7ff-b468a316a9d3@kernel.dk/)** - -> On at least parisc, we have strict requirements on how we virtually map -> an address that is shared between the application and the kernel. On -> these platforms, IOU_PBUF_RING_MMAP should be used when setting up a -> shared ring buffer for provided buffers. If the application is mapping -> these pages and asking the kernel to pin+map them as well, then we have -> no control over what virtual address we get in the kernel. -> - -**[[PATCH liburing for-next 0/2] fd msg-ring slot allocation tests](http://lore.kernel.org/io-uring/cover.1678968783.git.asml.silence@gmail.com/)** - -> Add a helper for fd msg-ring passing a file and auto allocating the -> target index, and test it. -> - -**[[v2 PATCH] io_uring: rsrc: Optimize return value variable 'ret'](http://lore.kernel.org/io-uring/20230317182538.3027-1-zeming@nfschina.com/)** - -> The initialization assignment of the variable ret is changed to 0, only -> in 'goto fail;' Use the ret variable as the function return value. -> - -**[v1: io_uring: rsrc: Optimize return value variable 'ret'](http://lore.kernel.org/io-uring/20230316181303.6583-1-zeming@nfschina.com/)** - -> The function returns here and returns ret directly. It may look better. -> - -**[v1: io_uring/sqpoll: Do not set PF_NO_SETAFFINITY on sqpoll threads](http://lore.kernel.org/io-uring/20230314183332.25834-1-mkoutny@suse.com/)** - -> Users may specify a CPU where the sqpoll thread would run. This may -> conflict with cpuset operations because of strict PF_NO_SETAFFINITY -> requirement. That flag is unnecessary for polling "kernel" threads, see -> the reasoning in commit 01e68ce08a30 ("io_uring/io-wq: stop setting -> PF_NO_SETAFFINITY on io-wq workers"). Drop the flag on poll threads too. -> - -**[v3: io_uring/ublk: add IORING_OP_FUSED_CMD](http://lore.kernel.org/io-uring/20230314125727.1731233-1-ming.lei@redhat.com/)** - -> Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to -> be SQE128. The 1st SQE(master) is one 64byte URING_CMD, and the 2nd -> 64byte SQE(slave) is another normal 64byte OP. For any OP which needs -> to support slave OP, io_issue_defs[op].fused_slave needs to be set as 1, -> and its ->issue() can retrieve/import buffer from master request's -> fused_cmd_kbuf. The slave OP is actually submitted from kernel, part of -> this idea is from Xiaoguang's ublk ebpf patchset, but this patchset -> submits slave OP just like normal OP issued from userspace, that said, -> SQE order is kept, and batching handling is done too. -> - -#### Rust For Linux - -**[v1: Rust version of the VGEM driver](http://lore.kernel.org/rust-for-linux/20230317121213.93991-1-mcanal@igalia.com/)** - -> This is my first take on using the DRM Rust abstractions [1] to convert a DRM -> driver, written originally in C, to Rust. This patchset consists of a conversion -> of the vgem driver to a DRM Rust driver. This new driver has the exactly same -> functionalities of the original C driver, but takes advantages of all the Rust -> features. -> - -**[v1: Rust pin-init API for pinned initialization of structs](http://lore.kernel.org/rust-for-linux/Bk4Yd1TBtgoLg2g_c37V3c_Wt30FMS89z7LrjnfadhDquwG_0dUGz1c_9BlMDmymg0tCACBpmCw-wZxlg4Jl4W2gkorh5P78ePgSnJVR5cU=@protonmail.com/)** - -> This series adds the pin-init API for initializing pinned structs in-place. -> It reduces the need for `unsafe` and streamlines initialization of structs. -> -> The first patch adds a utility macro `quote!` for proc-macros. This macro -> converts the typed characters directly into Rust tokens that are the output -> of proc-macros. It is used by the pin-init API. -> - -#### BPF - -**[v8: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230318053144.1180301-1-kuifeng@meta.com/)** - -> Major changes: -> -> - Create bpf_links in the kernel for BPF struct_ops to register and -> unregister it. -> -> - Enables switching between implementations of bpf-tcp-cc under a -> name instantly by replacing the backing struct_ops map of a -> bpf_link. -> - -**[v1: bpf-next: error checking where helpers call bpf_map_ops](http://lore.kernel.org/bpf/20230318011324.203830-1-inwardvessel@gmail.com/)** - -> Within bpf programs, the bpf helper functions can make inline calls to -> kernel functions. In this scenario there can be a disconnect between the -> register the kernel function writes a return value to and the register the -> bpf program uses to evaluate that return value. -> - -**[v1: bpf-next: BPF verifier rotating log](http://lore.kernel.org/bpf/20230317220351.2970665-1-andrii@kernel.org/)** - -> This patch set changes BPF verifier log behavior to behave as a rotating log, -> by default. If user-supplied log buffer is big enough to contain entire -> verifier log output, there is no effective difference. But where previously -> user supplied too small log buffer and would get -ENOSPC error result and the -> beginning part of the verifier log, now there will be no error and user will -> get ending part of verifier log filling up user-supplied log buffer. -> - -**[v2: bpf-next: bpf: Add detection of kfuncs.](http://lore.kernel.org/bpf/20230317201920.62030-1-alexei.starovoitov@gmail.com/)** - -> Allow BPF programs detect at load time whether particular kfunc exists. -> -> Patch 1: Allow ld_imm64 to point to kfunc in the kernel. -> Patch 2: Fix relocation of kfunc in ld_imm64 insn when kfunc is in kernel module. -> Patch 3: Introduce bpf_ksym_exists() macro. -> Patch 4: selftest. -> -> NOTE: detection of kfuncs from light skeleton is not supported yet. -> - -**[v2: bpf-next: selftests/bpf: add --json-summary option to test_progs](http://lore.kernel.org/bpf/20230317163256.3809328-1-chantr4@gmail.com/)** - -> Currently, test_progs outputs all stdout/stderr as it runs, and when it -> is done, prints a summary. -> -> It is non-trivial for tooling to parse that output and extract meaningful -> information from it. -> -> This change adds a new option, `--json-summary`/`-J` that let the caller -> specify a file where `test_progs{,-no_alu32}` can write a summary of the -> run in a json format that can later be parsed by tooling. -> - -**[v1: usermode_driver: Add management library and API](http://lore.kernel.org/bpf/20230317145240.363908-1-roberto.sassu@huaweicloud.com/)** - -> A User Mode Driver (UMD) is a specialization of a User Mode Helper (UMH), -> which runs a user space process from a binary blob, and creates a -> bidirectional pipe, so that the kernel can make a request to that process, -> and the latter provides its response. It is currently used by bpfilter, -> although it does not seem to do any useful work. -> - -**[v1: bpf-next: XDP-hints kfuncs for Intel driver igc](http://lore.kernel.org/bpf/167906343576.2706833.17489167761084071890.stgit@firesoul/)** - -> Implemented XDP-hints metadata kfuncs for Intel driver igc. -> -> Primarily used the tool in tools/testing/selftests/bpf/ xdp_hw_metadata, -> when doing driver development of these features. Recommend other driver -> developers to do the same. In the process xdp_hw_metadata was updated to -> help assist development. I've documented my practical experience with igc -> and tool here[1]. -> -> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org -> - -**[v1: bpf-next: selftests/bpf: Filter out preempt_count_ functions from kprobe_multi bench](http://lore.kernel.org/bpf/20230317114832.13622-1-laoar.shao@gmail.com/)** - -> It's caused by bench test attaching kprobe_multi link to preempt_count_sub -> function, which is not executed in rcu safe context so the kprobe handler -> on top of it will trigger the rcu warning. -> -> Filtering out preempt_count_ functions from the bench test. -> - -**[v2: net: xdp: don't call notifiers during driver init](http://lore.kernel.org/bpf/20230316220234.598091-1-kuba@kernel.org/)** - -> Drivers will commonly perform feature setting during init, if they use -> the xdp_set_features_flag() helper they'll likely run into an ASSERT_RTNL() -> inside call_netdevice_notifiers_info(). -> - -**[v3: bpf-next: xdp: recycle Page Pool backed skbs built from XDP frames](http://lore.kernel.org/bpf/20230313214300.1043280-1-aleksander.lobakin@intel.com/)** - -> Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway. -> -> __xdp_build_skb_from_frame() missed the moment when the networking stack -> became able to recycle skb pages backed by a page_pool. This was making -> e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was -> also affected in some scenarios. -> A lot of drivers use skb_mark_for_recycle() already, it's been almost -> two years and seems like there are no issues in using it in the generic -> code too. {__,}xdp_release_frame() can be then removed as it losts its -> last user. -> Page Pool becomes then zero-alloc (or almost) in the abovementioned -> cases, too. Other memory type models (who needs them at this point) -> have no changes. -> - -**[v2: bpf-next: Make struct bpf_cpumask RCU safe](http://lore.kernel.org/bpf/20230316054028.88924-1-void@manifault.com/)** - -> The struct bpf_cpumask type is currently not RCU safe. It uses the -> bpf_mem_cache_{alloc,free}() APIs to allocate and release cpumasks, and -> those allocations may be reused before an RCU grace period has elapsed. -> - -**[v1: module/decompress: Never use kunmap() for local un-mappings](http://lore.kernel.org/bpf/20230315125256.22772-1-fmdefrancesco@gmail.com/)** - -> Use kunmap_local() to unmap pages locally mapped with kmap_local_page(). -> -> kunmap_local() must be called on the kernel virtual address returned by -> kmap_local_page(), differently from how we use kunmap() which instead -> expects the mapped page as its argument. -> -> In module_zstd_decompress() we currently map with kmap_local_page() and -> unmap with kunmap(). This breaks the code and so it should be fixed. -> - -**[v4: net-next: add some detailed data when reading softnet_stat](http://lore.kernel.org/bpf/20230315092041.35482-1-kerneljasonxing@gmail.com/)** - -> Adding more detailed display of softnet_data when cating -> /proc/net/softnet_stat, which could help users understand more about -> which can be the bottlneck and then tune. -> -> Based on what we've dicussed in the previous mails, we could implement it -> in different ways, like put those display into separate sysfs file or add -> some tracepoints. Still I chose to touch the legacy file to print more -> useful data without changing some old data, say, length of backlog queues -> and time_squeeze. -> - -**[v1: tools/resolve_btfids: Add libsubcmd to .gitignore](http://lore.kernel.org/bpf/20230315054932.1639169-1-gthelen@google.com/)** - -> After building the kernel I see: -> $ git status -s -> ?? tools/bpf/resolve_btfids/libbpf/ -> -> Commit af03299d8536 ("tools/resolve_btfids: Install subcmd headers") -> started copying header files into -> tools/bpf/resolve_btfids/libsubcmd/include/subcmd. These *.h files are -> not covered by higher level wildcard gitignores. -> - -**[v1: net-next: virtio_net: refactor xdp codes](http://lore.kernel.org/bpf/20230315041042.88138-1-xuanzhuo@linux.alibaba.com/)** - -> Due to historical reasons, the implementation of XDP in virtio-net is relatively -> chaotic. For example, the processing of XDP actions has two copies of similar -> code. Such as page, xdp_page processing, etc. -> -> The purpose of this patch set is to refactor these code. Reduce the difficulty -> of subsequent maintenance. Subsequent developers will not introduce new bugs -> because of some complex logical relationships. -> - -**[v2: dwarves: Support for new btf_type_tag encoding](http://lore.kernel.org/bpf/20230314230417.1507266-1-eddyz87@gmail.com/)** - -> In recent discussion in BPF mailing list ([1], look for Solution #2) -> participants agreed to add a new DWARF representation for -> "btf_type_tag" annotations. -> -> Existing representation is DW_TAG_LLVM_annotation object attached as a -> child to a DW_TAG_pointer_type. It means that "btf_type_tag" -> annotation is attached to a pointee type. -> - -**[v1: bpf/for-next: cgroup: Make current_cgns_cgroup_dfl() safe to call after exit_task_namespace()](http://lore.kernel.org/bpf/ZBDuVWiFj2jiz3i8@slm.duckdns.org/)** - -> 332ea1f697be ("bpf: Add bpf_cgroup_from_id() kfunc") added -> bpf_cgroup_from_id() which calls current_cgns_cgroup_dfl() through -> cgroup_get_from_id(). However, BPF programs may be attached to a point where -> current->nsproxy has already been cleared to NULL by exit_task_namespace() -> and calling bpf_cgroup_from_id() would cause an oops. -> - -**[v1: bpf-next: bpf: Allow helpers access ptr_to_btf_id.](http://lore.kernel.org/bpf/20230313235845.61029-1-alexei.starovoitov@gmail.com/)** - -> Allow code like: -> bpf_strncmp(task->comm, 16, "foo"); -> - -### 周边技术动态 - -#### Qemu - -**[v3: for-8.1: target/riscv: rework CPU extensions validation](http://lore.kernel.org/qemu-devel/20230318200436.299464-1-dbarboza@ventanamicro.com/)** - -> This new version contains changes suggested by Weiwei Li. I've also -> reworked write_misa() to cover more cases. write_misa() is now able to -> properly enable RVG, RVV and RVE. -> -> A more in-depth description of what was attempted here can be found in -> [1]. Note that the current validation flow already prevents certain misa -> bits from being disabled (e.g. RVF) due to the presence of Z extensions -> that are already enabled in the hart, so I decided not to add extra -> logic to handle these cases. -> - -**[v1: disas/riscv: Add support for XThead* instructions](http://lore.kernel.org/qemu-devel/20230315133510.3511784-1-christoph.muellner@vrull.eu/)** - -> Support for emulating XThead* instruction has been added recently. -> This patch adds support for these instructions to the RISC-V disassembler. -> - -**[v1: riscv-to-apply queue](http://lore.kernel.org/qemu-devel/20230314063812.30450-1-alistair.francis@opensource.wdc.com/)** - -> The following changes since commit 284c52eec2d0a1b9c47f06c3eee46762c5fc0915: -> -> Merge tag 'win-socket-pull-request' of https://gitlab.com/marcandre.lureau/qemu into staging (2023-03-13 13:44:17 +0000) -> -> are available in the Git repository at: -> -> https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20230314 -> -> for you to fetch changes up to 0d581506de803204c5a321100afa270573382932: -> - -#### Buildroot - -**[package/stress-ng: bump to version V0.15.04](http://lore.kernel.org/buildroot/20230312214625.C78EA87007@busybox.osuosl.org/)** - -> commit: https://git.buildroot.net/buildroot/commit/?id=00553ea186357fd3e2b3c89fa560e9711cc67472 -> branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master -> -> This commit dropped the patch, included upstream in: -> https://github.com/ColinIanKing/stress-ng/commit/5d419c790e648c7a2f96f34ed1b93b326f725545 -> which was included in V0.14.04. -> -> Three patches are also introduced to fix build issues (all -> upstream not but not yet in version). -> -> Also, this new version now depends on BR2_TOOLCHAIN_HAS_SYNC_4. -> - -**[[branch/next] package/stress-ng: bump to version V0.15.04](http://lore.kernel.org/buildroot/20230312213225.AD3AF86FEC@busybox.osuosl.org/)** - -> commit: https://git.buildroot.net/buildroot/commit/?id=00553ea186357fd3e2b3c89fa560e9711cc67472 -> branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/next -> -> This commit dropped the patch, included upstream in: -> https://github.com/ColinIanKing/stress-ng/commit/5d419c790e648c7a2f96f34ed1b93b326f725545 -> which was included in V0.14.04. -> -> Three patches are also introduced to fix build issues (all -> upstream not but not yet in version). -> - -#### U-Boot - -**[v4: Basic StarFive JH7110 RISC-V SoC support](http://lore.kernel.org/u-boot/20230316025332.3297-1-yanhong.wang@starfivetech.com/)** - -> This series of patches base on the latest branch/master, and add support -> for the StarFive JH7110 RISC-V SoC and VisionFive V2 board. In order for -> this to be achieved, the respective DT nodes have been added, and the -> required defconfigs have been added to the boards' defconfig. What is more, -> the basic required DM drivers have been added, such as reset, clock, pinctrl, -> uart, ram etc. -> - -**["bootelf -p" loads every segement without checking its type](http://lore.kernel.org/u-boot/MA0P287MB0617180B862434E07B83A326B2BE9@MA0P287MB0617.INDP287.PROD.OUTLOOK.COM/)** - -> I am making a toy OS kernel on RISC-V platform. The kernel image I built is a ELF file that should be booted using U-Boot's bootelf command. However, I was getting such error when doing bootelf -p : -> -> Unhandled exception: Store/AMO access fault -> -> My kernel file is a 64-bit ELF file with the structure of: (the full readelf report is attached in the email) -> - -## 20230312:第 37 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v1: perf tools riscv: Add support for riscv lookup_binutils_path](http://lore.kernel.org/linux-riscv/20230311112122.28894-1-p4ranlee@gmail.com/)** - -> Add to know RISC-V binutils path. -> Secondarily, edit the code block with alphabetical order. -> - -**[v5: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230311090733.56918-1-hal.feng@starfivetech.com/)** - -> This patch series adds basic clock, reset & DT support for StarFive -> JH7110 SoC. -> -> You can simply review or test the patches at the link [1]. -> -> [1]: https://github.com/hal-feng/linux/commits/visionfive2-minimal -> - -**[v6: Add support for stacked/parallel memories](http://lore.kernel.org/linux-riscv/20230310173217.3429788-1-amit.kumar-mahapatra@amd.com/)** - -> This patch is in the continuation to the discussions which happened on -> 'commit f89504300e94 ("spi: Stacked/parallel memories bindings")' for -> adding dt-binding support for stacked/parallel memories. -> -> This patch series updated the spi-nor, spi core and the spi drivers -> to add stacked and parallel memories support. -> - -**[v3: vdso: Improve cmd_vdso_check to check all dynamic relocations](http://lore.kernel.org/linux-riscv/20230310190750.3323802-1-maskray@google.com/)** - -> The actual intention is that no dynamic relocation exists. However, some -> GNU ld ports produce unneeded R_*_NONE. (If a port fails to determine -> the exact .rel[a].dyn size, the trailing zeros become R_*_NONE -> relocations. E.g. ld's powerpc port recently fixed -> https://sourceware.org/bugzilla/show_bug.cgi?id=29540) R_*_NONE are -> generally no-op in the dynamic loaders. So just ignore them. -> - -**[v1: riscv: relocate R_RISCV_CALL_PLT in kexec_file](http://lore.kernel.org/linux-riscv/20230310182726.GA25154@lst.de/)** - -> Depending on the toolchain (here: gcc-12, binutils-2.40) the -> relocation entries for function calls are no longer R_RISCV_CALL, but -> R_RISCV_CALL_PLT. When trying kexec_load_file on such kernels, it will -> fail with -> - -**[v1: riscv: Kconfig: enable SCHED_MC kconfig](http://lore.kernel.org/linux-riscv/20230310110336.970985-1-suagrfillet@gmail.com/)** - -> RISC-V now builds the sched domain based on the simple possible map. -> -> Enable SCHED_MC to make the building based on cpu_coregroup_mask() -> which also takes care of the NUMA and cores with LLC. -> - -**[v7: riscv: Use PUD/P4D/PGD pages for the linear mapping](http://lore.kernel.org/linux-riscv/20230310094539.764357-1-alexghiti@rivosinc.com/)** - -> This patchset intends to improve tlb utilization by using hugepages for -> the linear mapping. -> -> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must -> take care of isolating the kernel text and rodata so that they are not -> mapped with a PUD mapping which would then assign wrong permissions to -> the whole region: it is achieved by introducing a new memblock API. -> - -**[v2: RISC-V: mm: Support huge page in vmalloc_fault()](http://lore.kernel.org/linux-riscv/20230310075021.3919290-1-dylan@andestech.com/)** - -> Since RISC-V supports ioremap() with huge page (pud/pmd) mapping, -> However, vmalloc_fault() assumes that the vmalloc range is limited -> to pte mappings. To complete the vmalloc_fault() function by adding -> huge page support. -> - -**[v1: Convert users of SOC_MICROCHIP_POLARFIRE to ARCH_MICROCHIP_POLARFIRE](http://lore.kernel.org/linux-riscv/20230309204452.969574-1-conor@kernel.org/)** - -> RISC-V's SOC_FOO symbols for micro-archs are going away, and being -> replaced with the more common ARCH_FOO pattern that is used by other -> archs (and by vendors with a history outside of RISC-V). -> Kick the conversion off by converting the Microchip RISC-V bits to use -> their replacement symbol -> There are no dependencies here, everything can go via subsystem trees. -> We've already added the replacement symbols to RISC-V's Kconfig bits. -> - -**[v1: riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode](http://lore.kernel.org/linux-riscv/20230308091639.602024-1-alexghiti@rivosinc.com/)** - -> When CONFIG_FRAME_POINTER is unset, the stack unwinding function -> walk_stackframe randomly reads the stack and then, when KASAN is enabled, -> - -**[v2: Add JH7110 USB driver support](http://lore.kernel.org/linux-riscv/20230308082800.3008-1-minda.chen@starfivetech.com/)** - -> This patchset adds USB driver for the StarFive JH7110 SoC. -> USB work mode is peripheral and using USB 2.0 PHY in VisionFive 2 board. -> The patch has been tested on the VisionFive 2 board. -> - -**[v5: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230308080612.122398-1-jeeheng.sia@starfivetech.com/)** - -> This series adds RISC-V Hibernation/suspend to disk support. -> Low level Arch functions were created to support hibernation. -> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write -> cpu state onto the stack, then calling swsusp_save() to save the memory -> image. -> - -**[v14: riscv, mm: detect svnapot cpu support at runtime](http://lore.kernel.org/linux-riscv/20230308074853.4393-1-panqinglin00@gmail.com/)** - -> Svnapot is a RISC-V extension for marking contiguous 4K pages as a non-4K -> page. This patch set is for using Svnapot in hugetlb fs and huge vmap. -> -> This patchset adds a Kconfig item for using Svnapot in -> "Platform type"->"SVNAPOT extension support". Its default value is on, -> and people can set it off if they don't allow kernel to detect Svnapot -> hardware support and leverage it. -> - -**[v1: Revert "riscv: Set more data to cacheinfo"](http://lore.kernel.org/linux-riscv/20230308064734.512457-1-suagrfillet@gmail.com/)** - -> There are some duplicate cache attributes populations executed -> in both ci_leaf_init() and later cache_setup_properties(). -> -> Revert the commit baf7cbd94b56 ("riscv: Set more data to cacheinfo") -> to setup only the level and type attributes at this early place. -> - -**[v4: irqchip/irq-sifive-plic: Add syscore callbacks for hibernation](http://lore.kernel.org/linux-riscv/20230308064643.24805-1-mason.huo@starfivetech.com/)** - -> The priority and enable registers of plic will be reset -> during hibernation power cycle in poweroff mode, -> add the syscore callbacks to save/restore those registers. -> - -**[v4: Add watchdog driver for StarFive JH7100/JH7110 RISC-V SoCs](http://lore.kernel.org/linux-riscv/20230308034036.99213-1-xingyu.wu@starfivetech.com/)** - -> This patch serises are to add watchdog driver for the StarFive -> JH7100 and JH7110 RISC-V SoCs. The first patch adds docunmentation to -> describe device tree bindings. The subsequent patch adds watchdog driver -> and support JH7100/JH7110 SoCs. And the last patch adds watchdog node in -> the JH7100 dts. And the addition of JH7110 device tree node will be -> submitted after the JH7110 dts merge. This patchset is based on 6.3-rc1. -> - -**[mailbox,soc: mpfs: add support for fallible services (was v3: Hey Jassi, all,)](http://lore.kernel.org/linux-riscv/d7c3ec51-8493-444a-bdec-2a30b0a15bdc@spud/)** - -> On Tue, Mar 07, 2023 at 08:22:50PM +0000, Conor Dooley wrote: -> > -> -> I botched $subject, I blame copy pasting the branch-description from -> lore and not double checking the output of --cover-from-description=auto -> - -**[v17: RISC-V IPI Improvements](http://lore.kernel.org/linux-riscv/20230307173231.2189275-1-apatel@ventanamicro.com/)** - -> This series aims to improve IPI support in Linux RISC-V in following ways: -> 1) Treat IPIs as normal per-CPU interrupts instead of having custom RISC-V -> specific hooks. This also makes Linux RISC-V IPI support aligned with -> other architectures. -> 2) Remote TLB flushes and icache flushes should prefer local IPIs instead -> of SBI calls whenever we have specialized hardware (such as RISC-V AIA -> IMSIC and RISC-V SWI) which allows S-mode software to directly inject -> IPIs without any assistance from M-mode runtime firmware. -> - -**[v5: Generic IPI sending tracepoint](http://lore.kernel.org/linux-riscv/20230307143558.294354-1-vschneid@redhat.com/)** - -> Detecting IPI *reception* is relatively easy, e.g. using -> trace_irq_handler_{entry,exit} or even just function-trace -> flush_smp_call_function_queue() for SMP calls. -> - -**[v1: RISC-V: enable rust](http://lore.kernel.org/linux-riscv/20230307102441.94417-1-conor.dooley@microchip.com/)** - -> After the authorship debacle on the RFC, I've tried to be even more -> careful this time around. Gary opted for a Co-developed-by in the replies -> of the RFC stuff, so I have given them one. -> I have added SoB's too, but if that is not okay Gary, then please scream -> loudly. -> -> As this is lifted from the state of the Rust-for-Linux tree, the commit -> messages from there cannot be preserved, so these patches have commit -> messages that I wrote. -> - -**[v5: StarFive's SDIO/eMMC driver support](http://lore.kernel.org/linux-riscv/20230307024646.10216-1-william.qiu@starfivetech.com/)** - -> This patchset adds initial rudimentary support for the StarFive -> designware mobile storage host controller driver. And this driver will -> be used in StarFive's VisionFive 2 board. The main purpose of adding -> this driver is to accommodate the ultra-high speed mode of eMMC. -> - -**[v1: RISC-V: Add basic support for the vector extension](http://lore.kernel.org/linux-riscv/20230306222321.1992900-1-conor@kernel.org/)** - -> I've started hitting this in CI while testing Andy's vector enablement -> series. I'm not entirely sure if there is more to do here, other than -> squeezing in the duplicate of what has been done for other extensions. -> - -**[v2: KVM: Refactor KVM stats macros and enable custom stat names](http://lore.kernel.org/linux-riscv/20230306190156.434452-1-dmatlack@google.com/)** - -> This series refactors the KVM stats macros to reduce duplication and -> adds the support for choosing custom names for stats. -> -> Custom name makes it possible to decouple the userspace-visible stat -> names from their internal representation in C. This can allow future -> commits to refactor the various stats structs without impacting -> userspace tools that read KVM stats. -> - -**[v5: spi: Add support for stacked/parallel memories](http://lore.kernel.org/linux-riscv/20230306172109.595464-1-amit.kumar-mahapatra@amd.com/)** - -> This patch is in the continuation to the discussions which happened on -> 'commit f89504300e94 ("spi: Stacked/parallel memories bindings")' for -> adding dt-binding support for stacked/parallel memories. -> -> This patch series updated the spi-nor, spi core and the spi drivers -> to add stacked and parallel memories support. -> - -**[v4: Add DMA driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230306140430.28951-1-walker.chen@starfivetech.com/)** - -> This patch series adds dma support for the StarFive JH7110 RISC-V -> SoC. The first patch adds device tree binding. The second patch includes -> dma driver. The last patch adds device node of dma to JH7110 dts. -> -> The series has been tested on the VisionFive 2 board which equip with -> JH7110 SoC and works normally. -> - -**[v14: Microchip Soft IP corePWM driver](http://lore.kernel.org/linux-riscv/20230306094858.1614819-1-conor.dooley@microchip.com/)** - -> v14 is rebased on top of v6.3-rc1. -> -> Uwe & I had a long back and forth about period calculations on v13, -> my ultimate conclusion being that, after some testing of the "corrected" -> calculation in hardware, the original calculation was correct. -> I think we had gotten sucked into discussion the calculation of the -> period itself, when we were in fact trying to calculate a bound on the -> period instead. That discussion is here: -> https://lore.kernel.org/linux-pwm/Y+ow8tfAHo1yv1XL@wendy/ -> - -#### 进程调度 - -**[v1: sched: EEVDF using latency-nice](http://lore.kernel.org/lkml/20230306132521.968182689@infradead.org/)** - -> Ever since looking at the latency-nice patches, I've wondered if EEVDF would -> not make more sense, and I did point Vincent at some older patches I had for -> that (which is here his augmented rbtree thing comes from). -> -> Also, since I really dislike the dual tree, I also figured we could dynamically -> switch between an augmented tree and not (and while I have code for that, -> that's not included in this posting because with the current results I don't -> think we actually need this). -> - -**[v2: sched/fair: sanitize vruntime of entity being migrated](http://lore.kernel.org/lkml/20230306132418.50389-1-zhangqiao22@huawei.com/)** - -> Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of -> entity being placed") fix an overflowing bug, but ignore -> a case that se->exec_start is reset after a migration. -> - -**[v1: sched: push force idled core_pick task to another cpu](http://lore.kernel.org/lkml/1678106502-58189-1-git-send-email-CruzZhao@linux.alibaba.com/)** - -> When a task with the max priority of its rq is force -> idled because of unmatched cookie, we'd better to find -> a suitable cpu for it to run as soon as possible, which -> is idle and cookie matched. In order to achieve this -> goal, we push the task in sched_core_balance(), after -> steal_cookie_task(). -> - -#### 内存管理 - -**[v4: mm: introduce Designated Movable Blocks](http://lore.kernel.org/linux-mm/20230311003855.645684-1-opendmb@gmail.com/)** - -> This is essentially a resubmission of v3 rebased with a -> rewritten cover letter to hopefully clarify the submission based -> on feedback and follow-on discussion. The individual patches -> have not materially changed. -> - -**[v3: use canonical ftrace path whenever possible](http://lore.kernel.org/linux-mm/20230310192050.4096886-1-zwisler@kernel.org/)** - -> v2 here: -> https://lore.kernel.org/linux-trace-kernel/20230215223350.2658616-1-zwisler@google.com/ -> - -**[v4: mm: process/cgroup ksm support](http://lore.kernel.org/linux-mm/20230310182851.2579138-1-shr@devkernel.io/)** - -> So far KSM can only be enabled by calling madvise for memory regions. To -> be able to use KSM for more workloads, KSM needs to have the ability to be -> enabled / disabled at the process / cgroup level. -> - -**[v1: Using MAP_SHARE_VALIDATE in mmap without fd](http://lore.kernel.org/linux-mm/20230310171617.wqnqs42l2viwjsz5@archlinux/)** - -> I have a rather simple question about the MAP_SHARED_VALIDATE flag in mmap. -> When used without a file pointer, EINVAL is returned. Is there a reason for this? -> I researched a bit but could not find anything. I attached a simple patch that adds MAP_SHARE_VALIDATE to the flags switch and checks for invalid flags. -> - -**[v1: io-mapping: Don't disable preempt on RT in io_mapping_map_atomic_wc().](http://lore.kernel.org/linux-mm/20230310162905.O57Pj7hh@linutronix.de/)** - -> io_mapping_map_atomic_wc() disables preemption and pagefaults for historical -> reasons. The conversion to io_mapping_map_local_wc(), which only disables -> migration, cannot be done wholesale because quite some call sites need to be -> updated to accommodate with the changed semantics. -> - -**[v1: mm: memory-failure: correct HWPOISON_INJECT config](http://lore.kernel.org/linux-mm/20230310133843.76883-1-wangkefeng.wang@huawei.com/)** - -> Use IS_ENABLED(CONFIG_HWPOISON_INJECT) to check whether or not to -> enable HWPoison injector module. -> - -**[v4: mm,kfence: decouple kfence from page granularity mapping judgement](http://lore.kernel.org/linux-mm/1678440604-796-1-git-send-email-quic_zhenhuah@quicinc.com/)** - -> Kfence only needs its pool to be mapped as page granularity, previous -> judgement was a bit over protected. Decouple it from judgement and do -> page granularity mapping for kfence pool only [1]. -> -> To implement this, also relocate the kfence pool allocation before the -> linear mapping setting up, arm64_kfence_alloc_pool is to allocate phys -> addr, __kfence_pool is to be set after linear mapping set up. -> -> LINK: [1] https://lore.kernel.org/linux-arm-kernel/1675750519-1064-1-git-send-email-quic_zhenhuah@quicinc.com/T/ -> - -**[v2: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230309093109.3039327-1-yosryahmed@google.com/)** - -> Upon running some proactive reclaim tests using memory.reclaim, we -> noticed some tests flaking where writing to memory.reclaim would be -> successful even though we did not reclaim the requested amount fully. -> Looking further into it, I discovered that *sometimes* we over-report -> the number of reclaimed pages in memcg reclaim. -> - -**[v17: splice, block: Use page pinning and kill ITER_PIPE](http://lore.kernel.org/linux-mm/20230308165251.2078898-1-dhowells@redhat.com/)** - -> The first half of this patchset kills off ITER_PIPE to avoid a race between -> truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a -> bio with unpinned/unref'ed pages from an O_DIRECT splice read. -> - -**[v16: splice, block: Use page pinning and kill ITER_PIPE](http://lore.kernel.org/linux-mm/20230308143754.1976726-1-dhowells@redhat.com/)** - -> The first half of this patchset kills off ITER_PIPE to avoid a race between -> truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a -> bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes -> memory corruption[2]. Instead, we use filemap_splice_read(), which invokes -> the buffered file reading code and splices from the pagecache into the -> pipe; direct_splice_read(), which bulk-allocates a buffer, reads into it -> and then pushes the filled pages into the pipe; or handle it in -> filesystem-specific code. -> - -**[v1: Prototype for direct map awareness in page allocator](http://lore.kernel.org/linux-mm/20230308094106.227365-1-rppt@kernel.org/)** - -> This is a third attempt to make page allocator aware of the direct map -> layout and allow grouping of the pages that must be unmapped from -> the direct map. -> - -**[v3: mm/damon/paddr: minor code improvement](http://lore.kernel.org/linux-mm/20230308083311.120951-1-wangkefeng.wang@huawei.com/)** - -> Unify folio_put() to make code more clear, and also fix minor issue in -> damon_pa_young(). -> - -**[v11: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230308032748.609510-1-nphamcs@gmail.com/)** - -> There is currently no good way to query the page cache state of large -> file sets and directory trees. There is mincore(), but it scales poorly: -> the kernel writes out a lot of bitmap data that userspace has to -> aggregate, when the user really doesn not care about per-page information -> in that case. The user also needs to mmap and unmap each file as it goes -> along, which can be quite slow as well. -> - -**[v1: mm/slub: Reduce memory consumption in extreme scenarios](http://lore.kernel.org/linux-mm/20230307082811.120774-1-chenjun102@huawei.com/)** - -> If call kmalloc_node with NO __GFP_THISNODE and node[A] with no memory. -> Slub will alloc a slub page which is not belong to A, and put the page -> to kmem_cache_node[page_to_nid(page)]. The page can not be reused -> at next calling, because NULL will be get from get_partical(). -> That make kmalloc_node consume more memory. -> - -**[v1: mm/oom_kill: don't kill exiting tasks in oom_kill_memcg_member](http://lore.kernel.org/linux-mm/20230307074808.235649-1-haifeng.xu@shopee.com/)** - -> If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member() -> to kill all processes in the memcg. When scanning tasks in memcg, maybe -> the provided task is marked as oom victim. Also, some tasks are likely -> to release their address space. There is no need to kill the exiting tasks. -> -> In order to handle these tasks which may free memory in the future, add -> a function helper reap_task_will_free_mem() to mark it oom victim and -> queue it in oom reaper. -> - -**[v1: mm: rmap: merge HugeTLB mapcount logic with THPs](http://lore.kernel.org/linux-mm/20230306230004.1387007-1-jthoughton@google.com/)** - -> HugeTLB pages may soon support being mapped with PTEs. To allow for this -> case, merge HugeTLB's mapcount scheme with THP's. -> -> The first patch of this series comes from the HugeTLB high-granularity -> mapping series[1], though with some updates, as the original version -> was buggy[2] and incomplete. -> - -#### 文件系统 - -**[v3: sunrpc: simplfy sysctl registrations](http://lore.kernel.org/linux-fsdevel/20230311233944.354858-1-mcgrof@kernel.org/)** - -> This is my v3 series to simplify sysctl registration for sunrpc. The -> first series was posted just yesterday [0] but 0-day found an issue with -> CONFIG_SUNRPC_DEBUG. After this fix I poasted a fix for v2 [1] but alas -> 0-day then found an issue when CONFIG_SUNRPC_DEBUG is disabled. This -> fixes both cases... hopefully that's it. -> - -**[v2: mm: hugetlb: move hugeltb sysctls to its own file](http://lore.kernel.org/linux-fsdevel/20230311074734.123269-1-wangkefeng.wang@huawei.com/)** - -> This moves all hugetlb sysctls to its own file, also kill an -> useless hugetlb_treat_movable_handler() since commit d6cb41cc44c6 -> ("mm, hugetlb: remove hugepages_treat_as_movable sysctl"). -> - -**[v1: s390: simplify sysctl registration](http://lore.kernel.org/linux-fsdevel/20230310234525.3986352-1-mcgrof@kernel.org/)** - -> s390 is the last architecture and one of the last users of -> register_sysctl_table(). It was last becuase it had one use case -> with dynamic memory allocation and it just required a bit more -> thought. -> - -**[v1: arm: simplify two-level sysctl registration for ctl_isa_vars](http://lore.kernel.org/linux-fsdevel/20230310233521.3971907-1-mcgrof@kernel.org/)** - -> There is no need to declare two tables to just create directories, -> this can be easily be done with a prefix path with register_sysctl(). -> -> Simplify this registration. -> - -**[v1: x86: simplify sysctl registrations](http://lore.kernel.org/linux-fsdevel/20230310233248.3965389-1-mcgrof@kernel.org/)** - -> These are trivial conversions to reduce more code and avoid API calls -> that we are deprecating [0]. -> -> [0] https://lore.kernel.org/all/20230310223947.3917711-1-mcgrof@kernel.org/T/#u -> - -**[v1: ppc: simplify sysctl registration](http://lore.kernel.org/linux-fsdevel/20230310232850.3960676-1-mcgrof@kernel.org/)** - -> We can simplify the way we do sysctl registration both by -> reducing the number of lines and also avoiding calllers which -> could do recursion. The docs are being updated to help reflect -> this better [0]. -> -> [0] https://lore.kernel.org/all/20230310223947.3917711-1-mcgrof@kernel.org/T/#u -> - -**[v1: ia64: simplify one-level sysctl registration for kdump_ctl_table](http://lore.kernel.org/linux-fsdevel/20230310232416.3958751-1-mcgrof@kernel.org/)** - -> There is no need to declare an extra tables to just create directory, -> this can be easily be done with a prefix path with register_sysctl(). -> -> Simplify this registration. -> - -**[v1: misc filesystems: simplify sysctl registration](http://lore.kernel.org/linux-fsdevel/20230310231206.3952808-1-mcgrof@kernel.org/)** - -> This simplifies syctl registration for a few misc filesystems according -> to our latest preference / guidance [0]. register_sysctl_table() incurs -> possible recursion and we can avoid that by dealing with flat -> directories with files in them, and having the subdirectories explicitly -> named with register_sysctl(). -> - -**[v1: xfs: simplify two-level sysctl registration for xfs_table](http://lore.kernel.org/linux-fsdevel/20230310230219.3948819-1-mcgrof@kernel.org/)** - -> There is no need to declare two tables to just create directories, -> this can be easily be done with a prefix path with register_sysctl(). -> -> Simplify this registration. -> - -**[v1: proc_sysctl: enhance documentation](http://lore.kernel.org/linux-fsdevel/20230310223947.3917711-1-mcgrof@kernel.org/)** - -> Expand documentation to clarify: -> -> o that paths don't need to exist for the new API callers -> o clarify that we *require* callers to keep the memory of -> the table around during the lifetime of the sysctls -> o annotate routines we are trying to deprecate and later remove -> - -**[git pull: common helper for kmap_local_page() users in local filesystems](http://lore.kernel.org/linux-fsdevel/20230310204431.GW3390869@ZenIV/)** - -> kmap_local_page() conversions in local filesystems keep running into -> kunmap_local_page()+put_page() combinations; we can keep inventing names -> for identical inline helpers, but it's getting rather inconvenient. I've added -> a trivial helper to linux/highmem.h instead. -> - -**[v2: mm: memory-failure: Move memory failure sysctls to its own file](http://lore.kernel.org/linux-fsdevel/20230310035709.16281-1-wangkefeng.wang@huawei.com/)** - -> The sysctl_memory_failure_early_kill and memory_failure_recovery -> are only used in memory-failure.c, move them to its own file. -> - -**[v1: filelocks: use mount idmapping for setlease permission check](http://lore.kernel.org/linux-fsdevel/20230309-generic_setlease-use-idmapping-v1-1-6c970395ac4d@kernel.org/)** - -> A user should be allowed to take out a lease via an idmapped mount if -> the fsuid matches the mapped uid of the inode. generic_setlease() is -> checking the unmapped inode uid, causing these operations to be denied. -> - -**[v11: Implement IOCTL to get and optionally clear info about PTEs](http://lore.kernel.org/linux-fsdevel/20230309135718.1490461-1-usama.anjum@collabora.com/)** - -> These patches are based on next-20230307 and UFFD_FEATURE_WP_UNPOPULATED -> patches from Peter. -> -> *Changes in v11* -> - Rebase on top of next-20230307 -> - Base patches on UFFD_FEATURE_WP_UNPOPULATED (https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com) -> - Do a lot of cosmetic changes and review updates -> - Remove ENGAGE_WP + ! GET operation as it can be performed with UFFDIO_WRITEPROTECT -> - -**[v5: epoll: use refcount to reduce ep_mutex contention](http://lore.kernel.org/linux-fsdevel/323de732635cc3513c1837c6cbb98f012174f994.1678312201.git.pabeni@redhat.com/)** - -> The application is multi-threaded, creates a new epoll entry for -> each incoming connection, and does not delete it before the -> connection shutdown - that is, before the connection's fd close(). -> -> Many different threads compete frequently for the epmutex lock, -> affecting the overall performance. -> - -**[v1: MAINTAINERS: repair a malformed T: entry in IDMAPPED MOUNTS](http://lore.kernel.org/linux-fsdevel/20230308143640.9811-1-lukas.bulwahn@gmail.com/)** - -> The T: entries shall be composed of a SCM tree type (git, hg, quilt, stgit -> or topgit) and location. -> -> Add the SCM tree type to the T: entry and reorder the file entries in -> alphabetical order. -> - -#### 网络设备 - -**[v6: Create common DPLL/clock configuration API](http://lore.kernel.org/netdev/20230312022807.278528-1-vadfed@meta.com/)** - -> Implement common API for clock/DPLL configuration and status reporting. -> The API utilises netlink interface as transport for commands and event -> notifications. This API aim to extend current pin configuration and -> make it flexible and easy to cover special configurations. -> - -**[v2: net-next: net: dsa: mv88e6xxx: accelerate C45 scan](http://lore.kernel.org/netdev/20230311203132.156467-1-klaus.kudielka@gmail.com/)** - -> Starting with commit 1a136ca2e089 ("net: mdio: scan bus based on bus -> capabilities for C22 and C45"), mdiobus_scan_bus_c45() is being called on -> buses with MDIOBUS_NO_CAP. On a Turris Omnia (Armada 385, 88E6176 switch), -> this causes a significant increase of boot time, from 1.6 seconds, to 6.3 -> seconds. The boot time stated here is until start of /init. -> - -**[v1: net: phy: smsc: bail out in lan87xx_read_status if genphy_read_status fails](http://lore.kernel.org/netdev/026aa4f2-36f5-1c10-ab9f-cdb17dda6ac4@gmail.com/)** - -> If genphy_read_status fails then further access to the PHY may result -> in unpredictable behavior. To prevent this bail out immediately if -> genphy_read_status fails. -> - -**[v1: net-next: net: introduce budget_squeeze to help us tune rx behavior](http://lore.kernel.org/netdev/20230311163614.92296-1-kerneljasonxing@gmail.com/)** - -> When we encounter some performance issue and then get lost on how -> to tune the budget limit and time limit in net_rx_action() function, -> we can separately counting both of them to avoid the confusion. -> - -**[v1: net-next: net-sysfs: display two backlog queue len separately](http://lore.kernel.org/netdev/20230311151756.83302-1-kerneljasonxing@gmail.com/)** - -> Sometimes we need to know which one of backlog queue can be exactly -> long enough to cause some latency when debugging this part is needed. -> Thus, we can then separate the display of both. -> - -**[v1: net: wireless: wcn36xx: Add support for pronto-v3](http://lore.kernel.org/netdev/20230311150647.22935-1-sireeshkodali1@gmail.com/)** - -> Pronto-v3 is a WiFi remoteproc found on MSM8953 and other Qualcomm -> platforms. Support for booting the remoteproc has already been merged, -> however, due to a slight change in the register map between v2 and v3, -> the wcn36xx driver does not work on pronot-v3. This patch updates the -> register definitions to make wcn36xx work on pronto-v3 as well. -> - -**[v1: dt-bindings: pinctrl: ti-k3: Move k3.h to arch specific](http://lore.kernel.org/netdev/20230311131325.9750-1-nm@ti.com/)** - -> As discussed in [1], lets do some basic cleanups and move pin ctrl -> definitions to arch folder. -> -> Base: next-20230310 -> -> [1] https://lore.kernel.org/all/c4d53e9c-dac0-8ccc-dc86-faada324beba@linaro.org/ -> - -**[v1: nfc: trf7970a: mark OF related data as maybe unused](http://lore.kernel.org/netdev/20230311111328.251219-1-krzysztof.kozlowski@linaro.org/)** - -> The driver can be compile tested with !CONFIG_OF making certain data -> unused: -> -> drivers/nfc/trf7970a.c:2232:34: error: ‘trf7970a_of_match’ defined but not used [-Werror=unused-const-variable=] -> - -**[v2: RFC: rtw88: Add SDIO support](http://lore.kernel.org/netdev/20230310202922.2459680-1-martin.blumenstingl@googlemail.com/)** - -> Recently the rtw88 driver has gained locking support for the "slow" bus -> types (USB, SDIO) as part of USB support. Thanks to everyone who helped -> make this happen! -> -> Based on the USB work (especially the locking part and various -> bugfixes) this series adds support for SDIO based cards. It's the -> result of a collaboration between Jernej and myself. Neither of us has -> access to the rtw88 datasheets. All of our work is based on studying -> the RTL8822BS and RTL8822CS vendor drivers and trial and error. -> - -**[v1: net: tunnels: annotate lockless accesses to dev->needed_headroom](http://lore.kernel.org/netdev/20230310191109.2384387-1-edumazet@google.com/)** - -> IP tunnels can apparently update dev->needed_headroom -> in their xmit path. -> -> This patch takes care of three tunnels xmit, and also the -> core LL_RESERVED_SPACE() and LL_RESERVED_SPACE_EXTRA() -> helpers. -> - -**[v8: Add support for NXP bluetooth chipsets](http://lore.kernel.org/netdev/20230310181921.1437890-1-neeraj.sanjaykale@nxp.com/)** - -> This patch adds a driver for NXP bluetooth chipsets. -> -> The driver is based on H4 protocol, and uses serdev APIs. It supports host -> to chip power save feature, which is signalled by the host by asserting -> break over UART TX lines, to put the chip into sleep state. -> - -**[v1: wifi: mt76: mt7921e: Set memory space enable in PCI_COMMAND if unset](http://lore.kernel.org/netdev/20230310170002.200-1-mario.limonciello@amd.com/)** - -> When the BIOS has been configured for Fast Boot, systems with mt7921e -> have non-functional wifi. Turning on Fast boot caused both bus master -> enable and memory space enable bits in PCI_COMMAND not to get configured. -> - -**[v8: iommu/dma: s390 DMA API conversion and optimized IOTLB flushing](http://lore.kernel.org/netdev/20230310-dma_iommu-v8-0-2347dfbed7af@linux.ibm.com/)** - -> This patch series converts s390's PCI support from its platform specific DMA -> API implementation in arch/s390/pci/pci_dma.c to the common DMA IOMMU layer. -> The conversion itself is done in patches 3-4 with patch 2 providing the final -> necessary IOMMU driver improvement to handle s390's special IOTLB flush -> out-of-resource indication in virtualized environments. Patches 1-2 may be -> applied independently. The conversion itself only touches the s390 IOMMU driver -> and s390 arch code moving over remaining functions from the s390 DMA API -> implementation. No changes to common code are necessary. -> - -**[v4: net: bnxt_en: reset PHC frequency in free-running mode](http://lore.kernel.org/netdev/20230310151356.678059-1-vadfed@meta.com/)** - -> When using a PHC in shared between multiple hosts, the previous -> frequency value may not be reset and could lead to host being unable to -> compensate the offset with timecounter adjustments. To avoid such state -> reset the hardware frequency of PHC to zero on init. Some refactoring is -> needed to make code readable. -> - -**[v1: net-next: net: Extend address label support](http://lore.kernel.org/netdev/cover.1678448186.git.petrm@nvidia.com/)** - -> IPv4 addresses can be tagged with label strings. Unlike IPv6 addrlabels, -> which are used for prioritization of IPv6 addresses, these "ip address -> labels" are simply tags that the userspace can assign to IP addresses -> arbitrarily. -> -> IPv4 has had support for these tags since before Linux was tracked in GIT. -> However it has never been possible to change the label after it is once -> defined. This limits usefulness of this feature. A userspace that wants to -> change a label might drop and recreate the address, but that disrupts -> routing and is just impractical. -> - -**[RFC: Adding Microchip's LAN865x 10BASE-T1S MAC-PHY driver support to Linux](http://lore.kernel.org/netdev/076fbcec-27e9-7dc2-14cb-4b0a9331b889@microchip.com/)** - -> I would like to add Microchip's LAN865x 10BASE-T1S MAC-PHY driver -> support to Linux kernel. -> (Product link: https://www.microchip.com/en-us/product/LAN8650) -> -> The LAN8650 combines a Media Access Controller (MAC) and an Ethernet PHY -> to access 10BASE‑T1S networks. The common standard Serial Peripheral -> Interface (SPI) is used so that the transfer of Ethernet packets and -> LAN8650 control/status commands are performed over a single, serial -> interface. -> - -**[v3: net-next: net: hns3: support wake on lan configuration and query](http://lore.kernel.org/netdev/20230310081404.947-1-lanhao@huawei.com/)** - -> The HNS3 driver supports Wake-on-LAN, which can wake up -> the server from power off state to power on state by magic -> packet or magic security packet. -> -> ChangeLog: -> - -**[v1: mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep()](http://lore.kernel.org/netdev/20230310013144.970964-1-joel@joelfernandes.org/)** - -> The k[v]free_rcu() macro's single-argument form is deprecated. -> Therefore switch to the new k[v]free_rcu_mightsleep() variant. The goal -> is to avoid accidental use of the single-argument forms, which can -> introduce functionality bugs in atomic contexts and latency bugs in -> non-atomic contexts. -> - -#### 安全增强 - -**[v1: next: wifi: ath11k: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZAe5L5DtmsQxzqRH@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated and we are -> moving towards adopting C99 flexible-array members instead. -> - -**[v1: next: net/mlx4_en: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZAZ8mNbphtPyZWM6@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated and we are -> moving towards adopting C99 flexible-array members instead. -> -> Transform zero-length array into flexible-array member in struct -> mlx4_en_rx_desc. -> - -**[v1: next: netxen_nic: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZAZ57I6WdQEwWh7v@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated and we are -> moving towards adopting C99 flexible-array members instead. -> -> Transform zero-length array into flexible-array member in struct -> nx_cardrsp_rx_ctx_t. -> - -**[v1: next: platform/chrome: Replace fake flexible arrays with flexible-array member](http://lore.kernel.org/linux-hardening/ZAZUGBmSLc5wg7AK@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated and we are -> moving towards adopting C99 flexible-array members instead. -> -> Use the DECLARE_FLEX_ARRAY() helper macro to transform zero-length -> arrays in unions with flexible-array members. -> - -**[v1: next: rxrpc: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/ZAZT11n4q5bBttW0@work/)** - -> Zero-length arrays as fake flexible arrays are deprecated and we are -> moving towards adopting C99 flexible-array members instead. -> -> Transform zero-length array into flexible-array member in struct -> rxrpc_ackpacket. -> - -**[v7: arm64: dts: qcom: sm6125: UFS and xiaomi-laurel-sprout support](http://lore.kernel.org/linux-hardening/20230306170817.3806-1-they@mint.lgbt/)** - -> Introduce Universal Flash Storage support on SM6125 and add support for the Xiaomi Mi A3 based on the former platform. -> - -**[v1: VT: Protect KD_FONT_OP_GET_TALL from unbound access](http://lore.kernel.org/linux-hardening/20230306094921.tik5ewne4ft6mfpo@begin/)** - -> In ioctl(KD_FONT_OP_GET_TALL), userland tells through op->height which -> vpitch should be used to copy over the font. In con_font_get, we were -> not checking that it is within the maximum height value, and thus -> userland could make the vc->vc_sw->con_font_get(vc, &font, vpitch); -> call possibly overflow the allocated max_font_size bytes, and the -> copy_to_user(op->data, font.data, c) call possibly read out of that -> allocated buffer. -> - -#### 异步 IO - -**[v1: optimise local-tw task resheduling](http://lore.kernel.org/io-uring/cover.1678474375.git.asml.silence@gmail.com/)** - -> io_uring extensively uses task_work, but when a task is waiting -> for multiple CQEs it causes lots of rescheduling. This series -> is an attempt to optimise it and be a base for future improvements. -> - -**[v1: io_uring/uring_cmd: ensure that device supports IOPOLL](http://lore.kernel.org/io-uring/2349df76-0acb-0a56-bda1-2cb05aa55151@kernel.dk/)** - -> It's possible for a file type to support uring commands, but not -> pollable ones. Hence before issuing one of those, we should check -> that it is supported and error out upfront if it isn't. -> - -**[v2: liburing: sendzc test improvements](http://lore.kernel.org/io-uring/cover.1677993039.git.asml.silence@gmail.com/)** - -> Add affinity, multithreading and the server, and also fix TPC -> performance issues -> - -#### Rust For Linux - -**[v3: scripts: `make rust-analyzer` for out-of-tree modules](http://lore.kernel.org/rust-for-linux/20230307144233.205819-1-varmavinaym@gmail.com/)** - -> Adds support for out-of-tree rust modules to use the `rust-analyzer` -> make target to generate the rust-project.json file. -> - -**[v1: Rust DRM subsystem abstractions (& preview AGX driver)](http://lore.kernel.org/rust-for-linux/20230307-rust-drm-v1-0-917ff5bc80a8@asahilina.net/)** - -> This is my first take on the Rust abstractions for the DRM -> subsystem. It includes the abstractions themselves, some minor -> prerequisite changes to the C side, as well as the drm-asahi GPU driver -> (for reference on how the abstractions are used, but not necessarily -> intended to land together). -> - -**[v1: rust: virtio: add virtio support](http://lore.kernel.org/rust-for-linux/20230307130332.53029-1-daniel.almeida@collabora.com/)** - -> This patch adds virtIO support to the rust crate. This includes the -> capability to create a virtIO driver (through the module_virtio_driver -> macro and the respective Driver trait) as well as initial virtqueue -> support. -> - -**[v1: scripts: rust-analyzer: Skip crate module directories](http://lore.kernel.org/rust-for-linux/20230307120736.75492-1-nmi@metaspace.dk/)** - -> When generating rust-analyzer configuration, skip module directories. This fixes -> an issue that occur if we have -> -> - drivers/block/driver.rs -> - drivers/block/driver_mod/mod.rs -> -> If `driver_mod` is a module of the crate `driver`, the directory `driver_mod` -> may not contain `Makefile`, and `generate_rust_analyzer.py` will fail. -> - -#### BPF - -**[v2: bpf-next: Support stashing local kptrs with bpf_kptr_xchg](http://lore.kernel.org/bpf/20230310230743.2320707-1-davemarchevsky@fb.com/)** - -> Local kptrs are kptrs allocated via bpf_obj_new with a type specified in program -> BTF. A BPF program which creates a local kptr has exclusive control of the -> lifetime of the kptr, and, prior to terminating, must: -> -> * free the kptr via bpf_obj_drop -> * If the kptr is a {list,rbtree} node, add the node to a {list, rbtree}, -> thereby passing control of the lifetime to the collection -> -> This series adds a third option: -> -> * stash the kptr in a map value using bpf_kptr_xchg -> - -**[v1: kernel/module: add documentation for try_module_get()](http://lore.kernel.org/bpf/20230310190457.3779415-1-mcgrof@kernel.org/)** - -> There is quite a bit of tribal knowledge around proper use of try_module_get() -> and requiring *somehow* the module to still exist to use this call in a way -> that is safe. Document this bit of tribal knowledge. To be clear, you should -> only use try_module_get() *iff* you are 100% sure the module already does -> exist and is not on its way out. -> - -**[v1: libbpf: Explicitly call write to append content to file](http://lore.kernel.org/bpf/20230310150216.922-1-patteliu@gmail.com/)** - -> Write data to fd by calling "vdprintf", in most implementations -> of the standard library, the data is finally written by the writev syscall. -> But "uprobe_events/kprobe_events" does not allow segmented writes, -> so switch the "append_to_file" function to explicit write() call. -> - -**[v1: dwarves: dwarves: improve BTF encoder comparison method](http://lore.kernel.org/bpf/1678459850-16140-1-git-send-email-alan.maguire@oracle.com/)** - -> Currently when looking for function prototype mismatches with a view -> to excluding inconsistent functions, we fall back to a comparison -> between parameter names when the name and number of parameters match. -> This is brittle, as it is sometimes the case that a function has -> multiple type-identical definitions which use different parameters. -> - -**[v1: dwarves: syscall functions in BTF](http://lore.kernel.org/bpf/ZAsBYpsBV0wvkhh0@krava/)** - -> hi, -> with latest pahole fixes we get rid of some syscall functions (with -> __x64_sys_ prefix) and it seems to fall down to 2 cases: -> -> - weak syscall functions generated in kernel/sys_ni.c prevent these syscalls -> to be generated in BTF. -> - -**[v1: bpf-next: bpf: ensure state checkpointing at iter_next() call sites](http://lore.kernel.org/bpf/20230310060149.625887-1-andrii@kernel.org/)** - -> State equivalence check and checkpointing performed in is_state_visited() -> employs certain heuristics to try to save memory by avoiding state checkpoints -> if not enough jumps and instructions happened since last checkpoint. This leads -> to unpredictability of whether a particular instruction will be checkpointed -> and how regularly. While normally this is not causing much problems (except -> inconveniences for predictable verifier tests, which we overcome with -> BPF_F_TEST_STATE_FREQ flag), turns out it's not the case for open-coded -> iterators. -> - -**[v6: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230310043812.3087672-1-kuifeng@meta.com/)** - -> Major changes: -> -> - Create bpf_links in the kernel for BPF struct_ops to register and -> unregister it. -> -> - Enables switching between implementations of bpf-tcp-cc under a -> name instantly by replacing the backing struct_ops map of a -> bpf_link. -> - -**[v1: bpf: take into account liveness when propagating precision](http://lore.kernel.org/bpf/20230309224131.57449-1-andrii@kernel.org/)** - -> When doing state comparison, if old state has register that is not -> marked as REG_LIVE_READ, then we just skip comparison, regardless what's -> the state of corresponing register in current state. This is because not -> REG_LIVE_READ register is irrelevant for further program execution and -> correctness. All good here. -> - -**[v2: net-next:pull request: i40e: support XDP multi-buffer](http://lore.kernel.org/bpf/20230309212819.1198218-1-anthony.l.nguyen@intel.com/)** - -> Tirthendu Sarkar says: -> -> This patchset adds multi-buffer support for XDP. Tx side already has -> support for multi-buffer. This patchset focuses on Rx side. The last -> patch contains actual multi-buffer changes while the previous ones are -> preparatory patches. -> - -**[v2: enable bpf_prog_pack allocator for powerpc](http://lore.kernel.org/bpf/20230309180213.180263-1-hbathini@linux.ibm.com/)** - -> Most BPF programs are small, but they consume a page each. For systems -> with busy traffic and many BPF programs, this may also add significant -> pressure on instruction TLB. High iTLB pressure usually slows down the -> whole system causing visible performance degradation for production -> workloads. -> - -**[v2: bpf-next: selftests/bpf: use ifname instead of ifindex in XDP](http://lore.kernel.org/bpf/cover.1678382940.git.lorenzo@kernel.org/)** - -> Use interface name instead of interface index in XDP compliance test tool logs. -> Improve XDP compliance test tool error messages. -> - -**[v3: security: Always enable integrity LSM](http://lore.kernel.org/bpf/20230309085433.1810314-1-roberto.sassu@huaweicloud.com/)** - -> Since the integrity (including IMA and EVM) functions are currently always -> called by the LSM infrastructure, and always after all LSMs, formalize -> these requirements by introducing a new LSM ordering called LSM_ORDER_LAST, -> and set it for the 'integrity' LSM (patch 1). -> - -**[v1: bpf-next: selftests/bpf: make BPF_CFLAGS stricter with -Wall](http://lore.kernel.org/bpf/20230309054015.4068562-1-andrii@kernel.org/)** - -> Make BPF-side compiler flags stricter by adding -Wall. Fix tons of small -> issues pointed out by compiler immediately after that. That includes newly -> added bpf_for(), bpf_for_each(), and bpf_repeat() macros. -> - -**[v1: Revert "libbpf: Poison strlcpy()"](http://lore.kernel.org/bpf/20230309004836.2808610-1-jesussanp@google.com/)** - -> It added the pragma poison directive to libbpf_internal.h to protect -> against accidental usage of strlcpy but ended up breaking the build for -> toolchains based on libcs which provide the strlcpy() declaration from -> string.h (e.g. uClibc-ng). The include order which causes the issue is: -> - -**[v4: bpf-next: bpf: Refactor release_regno searching logic](http://lore.kernel.org/bpf/20230309004504.1153898-1-davemarchevsky@fb.com/)** - -> Kfuncs marked KF_RELEASE indicate that they release some -> previously-acquired arg. The verifier assumes that such a function will -> only have one arg reg w/ ref_obj_id set, and that that arg is the one to -> be released. Multiple kfunc arg regs have ref_obj_id set is considered -> an invalid state. -> - -**[v3: net: ixgbe: Panic during XDP_TX with > 64 CPUs](http://lore.kernel.org/bpf/20230308220756.587317-1-jjh@daedalian.us/)** - -> In commit 'ixgbe: let the xdpdrv work with more than 64 cpus' -> (4fe815850bdc), support was added to allow XDP programs to run on systems -> with more than 64 CPUs by locking the XDP TX rings and indexing them -> using cpu % 64 (IXGBE_MAX_XDP_QS). -> -> Upon trying this out patch via the Intel 5.18.6 out of tree driver -> on a system with more than 64 cores, the kernel paniced with an -> array-index-out-of-bounds at the return in ixgbe_determine_xdp_ring in -> ixgbe.h, which means ixgbe_determine_xdp_q_idx was just returning the -> cpu instead of cpu % IXGBE_MAX_XDP_QS. -> - -**[v5: bpf-next: BPF open-coded iterators](http://lore.kernel.org/bpf/20230308184121.1165081-1-andrii@kernel.org/)** - -> Add support for open-coded (aka inline) iterators in BPF world. This is a next -> evolution of gradually allowing more powerful and less restrictive looping and -> iteration capabilities to BPF programs. -> - -**[v3: bpf: xsk: Add missing overflow check in xdp_umem_reg](http://lore.kernel.org/bpf/20230308174013.1114745-1-kal.conley@dectris.com/)** - -> The number of chunks can overflow u32. Make sure to return -EINVAL on -> overflow. -> -> Also remove a redundant u32 cast assigning umem->npgs. -> - -**[v1: net-next: net: stmmac: call stmmac_finalize_xdp_rx() on a condition](http://lore.kernel.org/bpf/20230308162619.329372-1-lsahn@ooseel.net/)** - -> The current codebase calls the function no matter net device has XDP -> programs or not. So the finalize function is being called everytime when RX -> bottom-half in progress. It needs a few machine instructions for nothing -> in the case that XDP programs are not attached at all. -> - -**[v1: xsk: Add missing overflow check in xdp_umem_reg](http://lore.kernel.org/bpf/20230308105130.1113833-1-kal.conley@dectris.com/)** - -> The number of chunks can overflow u32. Make sure to return -EINVAL on -> overflow. -> - -**[v2: bpf-next: bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage](http://lore.kernel.org/bpf/20230308065936.1550103-1-martin.lau@linux.dev/)** - -> This set is to use bpf_mem_cache_alloc/free in bpf_local_storage. -> The primary motivation is to solve the deadlock/recursion issue -> when bpf_task_storage is used in a bpf tracing prog [1]. This set -> also comes with a micro-benchmark to test the storage creation. -> - -**[[PATCH net, stable v1 0/3] add checking sq is full inside xdp xmit](http://lore.kernel.org/bpf/20230308024935.91686-1-xuanzhuo@linux.alibaba.com/)** - -> If the queue of xdp xmit is not an independent queue, then when the xdp -> xmit used all the desc, the xmit from the __dev_queue_xmit() may encounter -> the following error. -> - -**[v4: net-next: udp: introduce __sk_mem_schedule() usage](http://lore.kernel.org/bpf/20230308021153.99777-1-kerneljasonxing@gmail.com/)** - -> Keep the accounting schema consistent across different protocols -> with __sk_mem_schedule(). Besides, it adjusts a little bit on how -> to calculate forward allocated memory compared to before. After -> applied this patch, we could avoid receive path scheduling extra -> amount of memory. -> - -**[v1: bpf-next: net: skbuff: skb bitfield compaction - bpf](http://lore.kernel.org/bpf/20230308003159.441580-1-kuba@kernel.org/)** - -> I'm trying to make more of the sk_buff bits optional. -> Move the BPF-accessed bits a little - because they must -> be at coding-time-constant offsets they must precede any -> optional bit. While at it clean up the naming a bit. -> - -**[[RFC/PATCHSET 0/9] perf record: Implement BPF sample filter (v4)](http://lore.kernel.org/bpf/20230307233309.3546160-1-namhyung@kernel.org/)** - -> There have been requests for more sophisticated perf event sample -> filtering based on the sample data. Recently the kernel added BPF -> programs can access perf sample data and this is the userspace part -> to enable such a filtering. -> -> This still has some rough edges and needs more improvements. But -> I'd like to share the current work and get some feedback for the -> directions and idea for further improvements. -> - -### 周边技术动态 - -#### Qemu - -**[v1: Add RISC-V vector cryptographic instruction set support](http://lore.kernel.org/qemu-devel/20230310160346.1193597-1-lawrence.hunter@codethink.co.uk/)** - -> NB: this is an update over the patch series submitted today (2023/03/10) at 09:11. It fixes some accidental mangling of commits 02, 04 and 08/45. -> -> This patchset provides an implementation for Zvkb, Zvkned, Zvknh, Zvksh, Zvkg, and Zvksed of the draft RISC-V vector cryptography extensions as per the 20230303 version of the specification(1) (1fcbb30). Please note that the Zvkt data-independent execution latency extension has not been implemented, and we would recommend not using these patches in an environment where timing attacks are an issue. -> - -**[v2: target/riscv: Add RVV registers to log](http://lore.kernel.org/qemu-devel/20230309135403.102703-1-ivan.klokov@syntacore.com/)** - -> Added QEMU option 'rvv' to add RISC-V RVV registers to log like regular regs. -> - -**[v5: target/riscv: refactor Zicond and reuse in XVentanaCondOps](http://lore.kernel.org/qemu-devel/20230307180708.302867-1-philipp.tomsich@vrull.eu/)** - -> After the original Zicond support was stuck/fell through the cracks on -> the mailing list at v3 (and a different implementation was merged in -> the meanwhile), we now refactor Zicond and then reuse it in -> XVentanaCondOps. -> - -**[v3: hw/riscv: Add ACT related support](http://lore.kernel.org/qemu-devel/20230307032915.10059-1-liweiwei@iscas.ac.cn/)** - -> ACT tests play an important role in riscv tests. This patch tries to -> add related support to run ACT tests. -> -> The port is available here: -> https://github.com/plctlab/plct-qemu/tree/plct-act-upstream-v3 -> - -**[v1: Sixth RISC-V PR for 8.0](http://lore.kernel.org/qemu-devel/20230306220259.7748-1-palmer@rivosinc.com/)** - -> The following changes since commit 2946e1af2704bf6584f57d4e3aec49d1d5f3ecc0: -> -> configure: Disable thread-safety warnings on macOS (2023-03-04 14:03:46 +0000) -> -> are available in the Git repository at: -> -> https://gitlab.com/palmer-dabbelt/qemu.git tags/pull-riscv-to-apply-20230306 -> - -**[v1: qemu: linux-user: Emulate /proc/cpuinfo output for riscv](http://lore.kernel.org/qemu-devel/167811752616.21558.7117682501860352029-0@git.sr.ht/)** - -> RISC-V does not expose all extensions via hwcaps, thus some userspace -> applications may want to query these via /proc/cpuinfo. -> -> Currently when querying this file the host's file is shown instead -> which is slightly confusing. Emulate a basic /proc/cpuinfo file -> with mmu info and an ISA sting. -> - -## 20230305:第 36 期 - -### 内核动态 - -#### RISC-V 架构支持 - -**[v1: dt-bindings: yamllint: Require a space after a comment '#'](http://lore.kernel.org/linux-riscv/20230303214223.49451-1-robh@kernel.org/)** - -> Enable yamllint to check the prefered commenting style of requiring a -> space after a comment character '#'. Fix the cases in the tree which -> have a warning with this enabled. -> - -**[v5: RISC-V: Don't check text_mutex during stop_machine](http://lore.kernel.org/linux-riscv/20230303143754.4005217-1-conor.dooley@microchip.com/)** - -> We're currently using stop_machine() to update ftrace & kprobes, which -> means that the thread that takes text_mutex during may not be the same -> as the thread that eventually patches the code. -> - -**[v3: Add basic ACPI support for RISC-V](http://lore.kernel.org/linux-riscv/20230303133647.845095-1-sunilvl@ventanamicro.com/)** - -> This patch series enables the basic ACPI infrastructure for RISC-V. -> Supporting external interrupt controllers is in progress and hence it is -> tested using poll based HVC SBI console and RAM disk. -> - -**[GIT PULL: RISC-V Patches for the 6.3 Merge Window, Part 2](http://lore.kernel.org/linux-riscv/mhng-030bb7f3-2a9b-4061-8f41-e3e20c9b1671@palmer-ri-x1c9/)** - -> merged tag 'riscv-for-linus-6.3-mw1' -> The following changes since commit 01687e7c935ef70eca69ea2d468020bc93e898dc: -> -> Merge tag 'riscv-for-linus-6.3-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux (2023-02-25 11:14:08 -0800) -> - -**[v5: Add Ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230303085928.4535-1-samin.guo@starfivetech.com/)** - -> This series adds ethernet support for the StarFive JH7110 RISC-V SoC. -> The series includes MAC driver. The MAC version is dwmac-5.20 (from -> Synopsys DesignWare). For more information and support, you can visit -> RVspace wiki[1]. -> -> You can simply review or test the patches at the link [2]. -> - -**[v3: lib/test_string.c: Add strncmp() tests](http://lore.kernel.org/linux-riscv/20230302071934.254111-1-bjorn@kernel.org/)** - -> The RISC-V strncmp() fails on some inputs, see the linked thread for -> more details. It turns out there were no strncmp() calls in the self -> tests, this adds one. -> -> Reported-by: Heiko Stübner -> - -**[v6: riscv: Use PUD/P4D/PGD pages for the linear mapping](http://lore.kernel.org/linux-riscv/20230301082552.274331-1-alexghiti@rivosinc.com/)** - -> This patchset intends to improve tlb utilization by using hugepages for -> the linear mapping. -> -> base-commit-tag: v6.2-rc7 -> - -**[v3: Add RISC-V 32 NOMMU support](http://lore.kernel.org/linux-riscv/20230301002657.352637-1-Mr.Bossman075@gmail.com/)** - -> This patch-set aims to add NOMMU support to RV32. -> Many people want to build simple emulators or HDL -> models of RISC-V this patch makes it possible to -> run linux on them. -> -> Yimin Gu is the original author of this set. -> - -**[v1: RISC-V: T-Head vector handling](http://lore.kernel.org/linux-riscv/20230228215435.3366914-1-heiko@sntech.de/)** - -> As is widely known the T-Head C9xx cores used for example in the -> Allwinner D1 implement an older non-ratified variant of the vector spec. -> -> While userspace will probably have a lot more problems implementing -> support for both, on the kernel side the needed changes are actually -> somewhat small'ish and can be handled via alternatives somewhat nicely. -> - -**[v8: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230228154629.240541-1-alexghiti@rivosinc.com/)** - -> This new version gets rid of the limitation that prevented KASAN kernels -> to use the newly introduced parameters. -> -> While looking into KASLR, I fell onto commit aacd149b6238 ("arm64: head: -> avoid relocating the kernel twice for KASLR"): it allows to use the fdt -> functions very early in the boot process with KASAN enabled by simply -> compiling a new version of those functions without instrumentation. -> - -**[v1: riscv: support ELF format binaries in nommu mode](http://lore.kernel.org/linux-riscv/20230228135126.1686427-1-gerg@kernel.org/)** - -> The following changes add the ability to run ELF format binaries when -> running RISC-V in nommu mode. That support is actually part of the -> ELF-FDPIC loader, so these changes are all about making that work on -> RISC-V. -> - -**[v2: RISC-V: support some cryptography accelerations](http://lore.kernel.org/linux-riscv/20230228000544.2234136-1-heiko@sntech.de/)** - -> So this was my playground the last days. -> -> The base is v13 of the vector patchset but the first patches up to doing -> the Zbc-based GCM GHash can also run without those. Of course the vector- -> crypto extensions are also not ratified yet, hence the marking as RFC. -> -> As v13 of the vector patchset dropped the patches for in-kernel usage of -> vector instructions, I picked the ones from v12 over into this series -> for now. -> - -**[v5: hwmon: Add StarFive JH71X0 temperature sensor](http://lore.kernel.org/linux-riscv/20230227134125.120638-1-hal.feng@starfivetech.com/)** - -> This adds a driver for the temperature sensor on the JH7100 and JH7110, -> RISC-V SoCs by StarFive Technology Co. Ltd.. The JH7100 is used on the -> BeagleV Starlight board and StarFive VisionFive board. The JH7110 is -> used on the StarFive VisionFive 2 board. -> - -**[v3: Add DMA driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230227131042.16125-1-walker.chen@starfivetech.com/)** - -> This patch series adds dma support for the StarFive JH7110 RISC-V -> SoC. The first patch adds device tree binding. The second patch includes -> dma driver. The last patch adds device node of dma to JH7110 dts. -> -> The series has been tested on the VisionFive 2 board which equip with -> JH7110 SoC and works normally. -> - -**[v1: sched/doc: supplement CPU capacity with RISC-V](http://lore.kernel.org/linux-riscv/20230227105941.2749193-1-suagrfillet@gmail.com/)** - -> This commit 7d2078310cbf ("dt-bindings: arm: move cpu-capacity to a -> shared loation") updates some references about capacity-dmips-mhz -> property in this document. -> - -#### 进程调度 - -**[v2: sched/debug: Put sched/domains files under the verbose flag](http://lore.kernel.org/lkml/20230303183754.3076321-1-pauld@redhat.com/)** - -> The debug files under sched/domains can take a long time to regenerate, -> especially when updates are done one at a time. Move these files under -> the sched verbose debug flag. Allow changes to verbose to trigger -> generation of the files. This lets a user batch the updates but still -> have the information available. The detailed topology printk messages -> are also under verbose. - -**[v3: REBASE: sched/numa: Enhance vma scanning](http://lore.kernel.org/lkml/cover.1677672277.git.raghavendra.kt@amd.com/)** - -> The patchset proposes one of the enhancements to numa vma scanning -> suggested by Mel. This is continuation of [3]. -> -> Reposting the rebased patchset to akpm mm-unstable tree (March 1) -> -> Existing mechanism of scan period involves, scan period derived from -> per-thread stats. Process Adaptive autoNUMA [1] proposed to gather NUMA -> fault stats at per-process level to capture aplication behaviour better. -> - -**[v1: sched/core: Use do-while instead of for loop in set_nr_if_polling](http://lore.kernel.org/lkml/20230228161426.4508-1-ubizjak@gmail.com/)** - -> Use equivalent do-while loop instead of infinite for loop. -> -> There are no asm code changes. -> - -**[v3: sched/numa: Enhance vma scanning](http://lore.kernel.org/lkml/cover.1677557481.git.raghavendra.kt@amd.com/)** - -> The patchset proposes one of the enhancements to numa vma scanning -> suggested by Mel. This is continuation of [3]. -> -> Existing mechanism of scan period involves, scan period derived from -> per-thread stats. Process Adaptive autoNUMA [1] proposed to gather NUMA -> fault stats at per-process level to capture aplication behaviour better. -> - -#### 内存管理 - -**[v3: fold per-CPU vmstats remotely](http://lore.kernel.org/linux-mm/20230303195841.310844446@redhat.com/)** - -> By having vmstat_shepherd flush the per-CPU counters to the -> global counters from remote CPUs. -> -> This is done using cmpxchg to manipulate the counters, -> both CPU locally (via the account functions), -> and remotely (via cpu_vm_stats_fold). -> -> Thanks to Aaron Tomlin for diagnosing issue 1 and writing -> the initial patch series. -> - -**[v2: mm/damon/paddr: minor code improvement](http://lore.kernel.org/linux-mm/20230303084343.171958-1-wangkefeng.wang@huawei.com/)** - -> Unify folio_put() to make code more clear. -> - -**[v2: dma-buf: system_heap: avoid reclaim for order 4](http://lore.kernel.org/linux-mm/20230303050332.10138-1-jaewon31.kim@samsung.com/)** - -> Using order 4 pages would be helpful for IOMMUs mapping, but trying to -> get order 4 pages could spend quite much time in the page allocation. -> From the perspective of responsiveness, the deterministic memory -> allocation speed, I think, is quite important. -> - -**[v1: mm: compaction: limit illegal input parameters of compact_memory interface](http://lore.kernel.org/linux-mm/202303030844412743985@zte.com.cn/)** - -> Available only when CONFIG_COMPACTION is set. When 1 is written to -> the file, all zones are compacted such that free memory is available -> in contiguous blocks where possible. -> But echo others-parameter > compact_memory, this function will be -> triggered by writing parameters to the interface. -> - -**[v1: tmpfs: add the option to disable swap](http://lore.kernel.org/linux-mm/20230302232758.888157-1-mcgrof@kernel.org/)** - -> After a couple of RFCs I think this is ready for PATCH form. Review -> is appreciated. Below the changes I also list the series of tests -> I performed to verify correctness. In short you either create a fs -> with swap or without, but if you can't change that option later. -> If we really wanted to, we could work on accepting this change on -> reconfigure (remount) but its not clear yet that is desirable so -> for now keep things simple. -> - -**[v1: mm: teach mincore_hugetlb about pte markers](http://lore.kernel.org/linux-mm/20230302222404.175303-1-jthoughton@google.com/)** - -> By checking huge_pte_none(), we incorrectly classify PTE markers as -> "present". Instead, check huge_pte_none_mostly(), classifying PTE -> markers the same as if the PTE were completely blank. -> -> PTE markers, unlike other kinds of swap entries, don't reference any -> physical page and don't indicate that a physical page was mapped -> previously. As such, treat them as non-present for the sake of -> mincore(). -> - -**[v1: mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage](http://lore.kernel.org/linux-mm/20230302175423.589164-1-david@redhat.com/)** - -> Currently, we'd lose the userfaultfd-wp marker when PTE-mapping a huge -> zeropage, resulting in the next write faults in the PMD range -> not triggering uffd-wp events. -> -> Various actions (partial MADV_DONTNEED, partial mremap, partial munmap, -> partial mprotect) could trigger this. However, most importantly, -> un-protecting a single sub-page from the userfaultfd-wp handler when -> processing a uffd-wp event will PTE-map the shared huge zeropage and -> lose the uffd-wp bit for the remainder of the PMD. -> - -**[v1: -next: mm/damon/paddr: minor refactor of damon_pa_pageout()](http://lore.kernel.org/linux-mm/20230302144926.40012-1-wangkefeng.wang@huawei.com/)** - -> Omit two lines by converting if(!folio_isolate_lru()) to -> if(folio_isolate_lru()). -> - -**[v2: mm/debug_vm_pgtable: Replace pte_mkhuge() with arch_make_huge_pte()](http://lore.kernel.org/linux-mm/20230302114845.421674-1-anshuman.khandual@arm.com/)** - -> Since the following commit arch_make_huge_pte() should be used directly in -> generic memory subsystem as a platform provided page table helper, instead -> of pte_mkhuge(). Change hugetlb_basic_tests() to call arch_make_huge_pte() -> directly, and update its relevant documentation entry as required. -> - -**[v1: migrate_pages: silence gcc notes for mis-casting](http://lore.kernel.org/linux-mm/20230302012610.17055-1-ying.huang@intel.com/)** - -> The following GCC notes was reported for commit 64c8902ed441 -> ("migrate_pages: split unmap_and_move() to _unmap() and _move()"). -> - -**[v1: maple_tree: export symbol mas_preallocate()](http://lore.kernel.org/linux-mm/20230302011035.4928-1-dakr@redhat.com/)** - -> Fix missing EXPORT_SYMBOL_GPL() statement for mas_preallocate(). -> - -**[v3: kcov: improve documentation](http://lore.kernel.org/linux-mm/72be5c215c275f35891229b90622ed859f196a46.1677684837.git.andreyknvl@google.com/)** - -> Improve KCOV documentation: -> -> - Use KCOV instead of kcov, as the former is more widely-used. -> -> - Mention Clang in compiler requirements. -> -> - Use ``annotations`` for inline code. -> -> - Rework remote coverage collection documentation for better clarity. -> -> - Various smaller changes. -> - -**[v5: mm: ioremap: Convert architectures to take GENERIC_IOREMAP way](http://lore.kernel.org/linux-mm/20230301034247.136007-1-bhe@redhat.com/)** - -> Motivation and implementation: -> In this patchset, firstly introduce generic_ioremap_prot() and -> generic_iounmap() to extract the generic codes for GENERIC_IOREMAP. -> By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(), -> generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap() -> and iounmap() are all visible and available to arch. Arch needs to -> provide wrapper functions to override the generic version if there's -> arch specific handling in its corresponding ioremap_prot(), ioremap() -> or iounmap(). With these changes, duplicated ioremap/iounmap() code uder -> ARCH-es are removed, and the equivalent functioality is kept as before. -> - -**[v3: New page table range API](http://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/)** - -> This patchset changes the API used by the MM to set up page table entries. -> The four APIs are: -> set_ptes(mm, addr, ptep, pte, nr) -> update_mmu_cache_range(vma, addr, ptep, nr) -> flush_dcache_folio(folio) -> flush_icache_pages(vma, page, nr) -> -> flush_dcache_folio() isn't technically new, but no architecture -> implemented it, so I've done that for you. The old APIs remain around -> but are mostly implemented by calling the new interfaces. -> - -**[v1: tomoyo: replace tomoyo_round2() with kmalloc_size_roundup()](http://lore.kernel.org/linux-mm/20230228093556.19027-1-vbabka@suse.cz/)** - -> It seems tomoyo has had its own implementation of what -> kmalloc_size_roundup() does today. Remove the function tomoyo_round2() -> and replace it with kmalloc_size_roundup(). It provides more accurate -> results and doesn't contain a while loop. -> - -**[v2: bpf-next: mm/bpf/perf: Store build id in inode object](http://lore.kernel.org/linux-mm/20230228093206.821563-1-jolsa@kernel.org/)** - -> hi, -> this is RFC patchset for adding build id under inode's object. -> -> The main change to previous post [1] is to use inode object instead of file -> object for build id data. -> -> However.. ;-) while using inode as build id storage place saves some memory -> by keeping just one copy of the build id for all file instances, there seems -> to be another problem. -> - -**[v1: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230228085002.2592473-1-yosryahmed@google.com/)** - -> Reclaimed pages through other means than LRU-based reclaim are tracked -> through reclaim_state in struct scan_control, which is stashed in -> current task_struct. These pages are added to the number of reclaimed -> pages through LRUs. For memcg reclaim, these pages generally cannot be -> linked to the memcg under reclaim and can cause an overestimated count -> of reclaimed pages. This short series tries to address that. -> - -**[v2: mm/uffd: UFFD_FEATURE_WP_UNPOPULATED](http://lore.kernel.org/linux-mm/20230227230044.1596744-1-peterx@redhat.com/)** - -> This is a new feature that controls how uffd-wp handles none ptes. When -> it's set, the kernel will handle anonymous memory the same way as file -> memory, by allowing the user to wr-protect unpopulated ptes. -> - -#### 文件系统 - -**[v1: security: Move IMA and EVM to the LSM infrastructure](http://lore.kernel.org/linux-fsdevel/20230303181842.1087717-1-roberto.sassu@huaweicloud.com/)** - -> This patch set depends on: -> - https://lore.kernel.org/linux-integrity/20221201104125.919483-1-roberto.sassu@huaweicloud.com/ (there will be a v8 shortly) -> - https://lore.kernel.org/linux-security-module/20230217032625.678457-1-paul@paul-moore.com/ -> -> IMA and EVM are not effectively LSMs, especially due the fact that in the -> past they could not provide a security blob while there is another LSM -> active. -> - -**[v1: folio_copy_tail](http://lore.kernel.org/linux-fsdevel/20230303064315.701090-1-willy@infradead.org/)** - -> I'm trying to make it easy & efficient for a filesystem to read its file -> tails into a folio. iomap's implementation was pretty good, but had -> some limitations (eg tails couldn't cross a page boundary). -> -> This should be an all-singing, all-dancing implementation which copies -> the correct part of the buffer into the correct part of the folio and -> zeroes the remainder of the folio. It should work with highmem, but -> the calculations are a bit tricky and I may have got something wrong. -> - -**[cifs test patch to make cifs use its own version of write_cache_pages()](http://lore.kernel.org/linux-fsdevel/522532.1677800499@warthog.procyon.org.uk/)** - -> Here's my patch to give cifs its own copy of write_cache_pages() so that the -> function pointer can be eliminated in case some sort of spectre thing is -> causing a slowdown. -> -> This goes on top of "cifs test patch to convert to using write_cache_pages()". -> - -**[v1: sysctl: deprecate register_sysctl_paths()](http://lore.kernel.org/linux-fsdevel/20230302202826.776286-1-mcgrof@kernel.org/)** - -> As we trim down the insane kernel/sysctl.c large array and move -> sysctls out we're looking to optimize the way we do syctl registrations -> so we deal with just flat entries so to make the registration code -> much easier to maintain and so it does not recurse. In dealing with -> some of these things it reminded us that we will eventually get to the -> point of just passing in the ARRAY_SIZE() we want, to get there we -> should strive to move away from the older callers that do need the -> recursion. -> - -**[v1: printk: serial: 8250: implement non-BKL console](http://lore.kernel.org/linux-fsdevel/87wn3zsz5x.fsf@jogness.linutronix.de/)** - -> Implement the necessary callbacks to allow the 8250 console driver -> to perform as a non-BKL console. Remove the implementation for the -> legacy console callback (write) and add implementations for the -> non-BKL consoles (write_atomic, write_thread, port_lock) and add -> CON_NO_BKL to the initial flags. -> - -**[v1: fs/ceph/mds_client: ignore responses for waiting requests](http://lore.kernel.org/linux-fsdevel/20230302130650.2209938-1-max.kellermann@ionos.com/)** - -> If a request is put on the waiting list, its submission is postponed -> until the session becomes ready (e.g. via `mdsc->waiting_for_map` or -> `session->s_waiting`). If a `CEPH_MSG_CLIENT_REPLY` happens to be -> received before `CEPH_MSG_MDS_MAP`, the request gets freed, and then -> this assertion fails: -> -> WARN_ON_ONCE(!list_empty(&req->r_wait)); -> - -**[v1: kernfs: Introduce separate rwsem to protect inode](http://lore.kernel.org/linux-fsdevel/20230302043203.1695051-1-imran.f.khan@oracle.com/)** - -> This change set is consolidating the changes discussed and/or mentioned -> in [1] and [2]. I have not received any feedback about any of the -> patches included in this change set, so I am rebasing them on current -> linux-next tip and bringing them all in one place. -> - -**[v1: userfaultfd: move unprivileged_userfaultfd sysctl to its own file](http://lore.kernel.org/linux-fsdevel/20230301100627.3505739-1-zhangpeng362@huawei.com/)** - -> The sysctl_unprivileged_userfaultfd is part of userfaultfd, move it to -> its own file. -> - -**[v1: erofs: support for mounting a single block device with multiple devices](http://lore.kernel.org/linux-fsdevel/20230301070417.13084-1-zhujia.zj@bytedance.com/)** - -> In order to support mounting multi-layer container image as a block -> device, add single block device with multiple devices feature for EROFS. -> -> In this mode, all meta/data contents will be mapped into one block address. -> User could directly mount the block device by EROFS. -> - -**[v2: hostfs: handle idmapped mounts](http://lore.kernel.org/linux-fsdevel/20230301015002.2402544-1-development@efficientek.com/)** - -> Let hostfs handle idmapped mounts. This allows to have the same hostfs -> mount appear in multiple locations with different id mappings. -> - -**[v1: GDB VFS utils](http://lore.kernel.org/linux-fsdevel/cover.1677631565.git.development@efficientek.com/)** - -> I've created a couple GDB convenience functions that I found useful when -> debugging some VFS issues and figure others might find them useful. For -> instance, they are useful in setting conditional breakpoints on VFS -> functions where you only care if the dentry path is a certain value. I -> took the opportunity to create a new "vfs" python module to give VFS -> related utilities a home. -> - -**[GIT PULL: xfs: moar new code for 6.3](http://lore.kernel.org/linux-fsdevel/167762780388.3622158.16184008545274432486.stg-ugh@magnolia/)** - -> Please pull this branch with changes for xfs for 6.3-rc1. This second -> pull request contains a fix for a deadlock in the allocator. It -> continues the slow march towards being able to offline AGs, and it -> refactors the interface to the xfs allocator to be less indirection -> happy. -> - -**[v2: splice: Prevent gifting of multipage folios](http://lore.kernel.org/linux-fsdevel/2740801.1677513063@warthog.procyon.org.uk/)** - -> Don't let parts of compound pages/multipage folios be gifted by (vm)splice -> into a pipe as the other end may only be expecting single-page gifts (fuse -> and virtio console for example). -> -> replace_page_cache_folio(), for example, will do the wrong thing if it -> tries to replace a single paged folio with a multipage folio. -> - -#### 网络设备 - -**[v1: nfc: change order inside nfc_se_io error path](http://lore.kernel.org/netdev/20230304164844.133931-1-pchelkin@ispras.ru/)** - -> cb_context should be freed on error paths in nfc_se_io as stated by commit -> 25ff6f8a5a3b ("nfc: fix memory leak of se_io context in nfc_genl_se_io"). -> -> Make the error path in nfc_se_io unwind everything in reverse order, i.e. -> free the cb_context after unlocking the device. -> - -**[v1: net: dsa: mt7530: move PLL setup out of port 6 pad configuration](http://lore.kernel.org/netdev/20230304125453.53476-1-arinc.unal@arinc9.com/)** - -> Move the PLL setup of the MT7530 switch out of the pad configuration of -> port 6 to mt7530_setup, after reset. -> -> This fixes the improper initialisation of the switch when only port 5 is -> used as a CPU port. -> - -**[v4: netdevice: use ifmap instead of plain fields](http://lore.kernel.org/netdev/20230304122432.265902-1-vincenzopalazzodev@gmail.com/)** - -> clean the code by using the ifmap instead of plain fields, -> and avoid code duplication. -> -> v4 with some build error that the 0 day bot found while -> compiling some drivers that I was not able to build on -> my machine. -> - -**[v2: net: dpaa2-mac: Get serdes only for backplane links](http://lore.kernel.org/netdev/20230304003159.1389573-1-sean.anderson@seco.com/)** - -> When commenting on what would become commit 085f1776fa03 ("net: dpaa2-mac: -> add backplane link mode support"), Ioana Ciornei said [1]: -> -> > ...DPMACs in TYPE_BACKPLANE can have both their PCS and SerDes managed -> > by Linux (since the firmware is not touching these). That being said, -> > DPMACs in TYPE_PHY (the type that is already supported in dpaa2-mac) can -> > also have their PCS managed by Linux (no interraction from the -> > firmware's part with the PCS, just the SerDes). -> - -**[v1: nf-next: netfilter: handle ipv6 jumbo packets properly for bridge ovs and tc](http://lore.kernel.org/netdev/cover.1677888566.git.lucien.xin@gmail.com/)** - -> Currently pskb_trim_rcsum() is always done on the RX path. However, IPv6 -> jumbo packets hide the real packet len in the Hop-by-hop option header, -> which should be parsed before doing the trim. -> - -**[v6: Another crack at a handshake upcall mechanism](http://lore.kernel.org/netdev/167786872946.7199.12490725847535629441.stgit@91.116.238.104.host.secureserver.net/)** - -> Here is v6 of a series to add generic support for transport layer -> security handshake on behalf of kernel socket consumers (user space -> consumers use a security library directly, of course). A summary of -> the purpose of these patches is archived here: -> -> https://lore.kernel.org/netdev/1DE06BB1-6BA9-4DB4-B2AA-07DE532963D6@oracle.com/ -> - -**[v1: bpf-next: selftests/bpf: use ifname instead of ifindex in XDP compliance test tool](http://lore.kernel.org/netdev/5d11c9163490126fdc391dacb122480e4c059e62.1677863821.git.lorenzo@kernel.org/)** - -> Rely on interface name instead of interface index in error messages or logs -> from XDP compliance test tool. -> Improve XDP compliance test tool error messages. -> - -**[v2: Up until now, there was no way to let the user select the layer at which time stamping occurs. The stack assumed that PHY time stamping is always preferred, but some MAC/PHY combinations were buggy.](http://lore.kernel.org/netdev/20230303164248.499286-1-kory.maincent@bootlin.com/)** - -> This series aims to allow the user to select the desired layer -> administratively. -> -> This patch is broken out for review, but it will eventually be -> squashed into Patch 3 after comments come in. -> - -**[v1: net: phylib: get rid of unnecessary locking](http://lore.kernel.org/netdev/E1pY8Pq-00D0sw-NY@rmk-PC.armlinux.org.uk/)** - -> The locking in phy_probe() and phy_remove() does very little to prevent -> any races with e.g. phy_attach_direct(), but instead causes lockdep ABBA -> warnings. Remove it. -> - -**[v1: netdevice: use ifmap isteand of plain fields](http://lore.kernel.org/netdev/20230303150818.132386-1-vincenzopalazzodev@gmail.com/)** - -> clean the code by using the ifmap instead of plain fields, -> and avoid code duplication. -> -> P.S: I'm giving credit to the author of the FIXME commit. -> - -**[v2: bpf-next: xdp: recycle Page Pool backed skbs built from XDP frames](http://lore.kernel.org/netdev/20230303133232.2546004-1-aleksander.lobakin@intel.com/)** - -> Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway. -> -> __xdp_build_skb_from_frame() missed the moment when the networking stack -> became able to recycle skb pages backed by a page_pool. This was making -> e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was -> also affected in some scenarios. -> A lot of drivers use skb_mark_for_recycle() already, it's been almost -> two years and seems like there are no issues in using it in the generic -> code too. {__,}xdp_release_frame() can be then removed as it losts its -> last user. -> - -**[v1: net-next: net: netfilter: Keep conntrack reference until IPsecv6 policy checks are done](http://lore.kernel.org/netdev/20230303094221.1501961-1-madhu.koriginja@nxp.com/)** - -> Keep the conntrack reference until policy checks have been performed for -> IPsec V6 NAT support. The reference needs to be dropped before a packet is -> queued to avoid having the conntrack module unloadable. -> -> V1-v2: added missing () in ip6_input.c in below condition -> if (!(ipprot->flags & INET6_PROTO_NOPOLICY)) -> V2-v3: replaced nf_reset with nf_reset_ct -> - -**[回复: v5: wwan: core: Support slicing in port TX flow of WWAN subsystem](http://lore.kernel.org/netdev/PSAPR03MB5653D7BAA0E5DDB2D03B341BF7B39@PSAPR03MB5653.apcprd03.prod.outlook.com/)** - -> I'm sorry to bother you, but I want to know whether my patch is accepted by the community. -> Because it seems to be a merge window, but the patch state still is "Not Applicable". Could you -> give me some suggestions about this patch state? -> - -**[v3: net-next: net/smc: Use percpu ref for wr tx reference](http://lore.kernel.org/netdev/20230303082115.449-1-KaiShen@linux.alibaba.com/)** - -> The refcount wr_tx_refcnt may cause cache thrashing problems among -> cores and we can use percpu ref to mitigate this issue here. We -> gain some performance improvement with percpu ref here on our -> customized smc-r verion. Applying cache alignment may also mitigate -> this problem but it seem more reasonable to use percpu ref here. -> We can also replace wr_reg_refcnt with one percpu reference like -> wr_tx_refcnt. -> - -**[v5: bpf-next: bpf: Introduce kptr RCU.](http://lore.kernel.org/netdev/20230303041446.3630-1-alexei.starovoitov@gmail.com/)** - -> - make KF_RCU stronger and require that bpf program checks for NULL -> before passing such pointers into kfunc. The prog has to do that anyway -> to access fields and it aligns with BTF_TYPE_SAFE_RCU allowlist. -> - -**[v2: net-next: Add tx push buf len param to ethtool](http://lore.kernel.org/netdev/20230302203045.4101652-1-shayagr@amazon.com/)** - -> Changed since v1: -> - Added the new ethtool param to generic netlink specs -> - Dropped dynamic advertisement of tx push buff support in ENA. -> The driver will advertise it for all platforms -> -> This patchset adds a new sub-configuration to ethtool get/set queue -> params (ethtool -g) called 'tx-push-buf-len'. -> - -**[v8: mac80211_hwsim: Add PMSR support](http://lore.kernel.org/netdev/20230302160310.923349-1-jaewan@google.com/)** - -> Dear Kernel maintainers, -> -> First of all, thank you for spending your precious time for reviewing -> my changes, and also sorry for my mistakes in previous patchsets. -> -> Let me propose series of CLs for adding PMSR support in the mac80211_hwsim. -> -> PMSR (peer measurement) is generalized measurement between STAs, -> and currently FTM (fine time measurement or flight time measurement) -> is the one and only measurement. -> - -**[v1: [net:netfilter]: Keep conntrack reference until IPsecv6 policy checks are done](http://lore.kernel.org/netdev/20230302112324.906365-1-madhu.koriginja@nxp.com/)** - -> Keep the conntrack reference until policy checks have been performed for -> IPsec V6 NAT support. The reference needs to be dropped before a packet is -> queued to avoid having the conntrack module unloadable. -> - -**[v2: linux-next: selftests: net: udpgso_bench_tx: Add test for IP fragmentation of UDP packets](http://lore.kernel.org/netdev/202303021838359696196@zte.com.cn/)** - -> The UDP GSO bench only tests the performance of userspace payload splitting -> and UDP GSO. But we are also concerned about the performance comparing with -> IP fragmentation and UDP GSO. In other words comparing IP fragmentation and -> segmentation. -> - -**[v2: net: stmmac: add to set device wake up flag when stmmac init phy](http://lore.kernel.org/netdev/20230302062143.181285-1-clementwei90@163.com/)** - -> When MAC is not support PMT, driver will check PHY's WoL capability -> and set device wakeup capability in stmmac_init_phy(). We can enable -> the WoL through ethtool, the driver would enable the device wake up -> flag. Now the device_may_wakeup() return true. -> - -**[v3: net: ice: copy last block omitted in ice_get_module_eeprom()](http://lore.kernel.org/netdev/20230301204707.2592337-1-poros@redhat.com/)** - -> ice_get_module_eeprom() is broken since commit e9c9692c8a81 ("ice: -> Reimplement module reads used by ethtool") In this refactor, -> ice_get_module_eeprom() reads the eeprom in blocks of size 8. -> But the condition that should protect the buffer overflow -> ignores the last block. The last block always contains zeros. -> - -**[v11: net-next: net: ethernet: mtk_eth_soc: various enhancements](http://lore.kernel.org/netdev/cover.1677699407.git.daniel@makrotopia.org/)** - -> This series brings a variety of fixes and enhancements for mtk_eth_soc, -> adds support for the MT7981 SoC and facilitates sharing the SGMII PCS -> code between mtk_eth_soc and mt7530. -> -> Note that this series depends on commit 697c3892d825 -> ("regmap: apply reg_base and reg_downshift for single register ops") to -> not break mt7530 pcs register access. -> - -**[v13: bpf-next: Add skb + xdp dynptrs](http://lore.kernel.org/netdev/20230301154953.641654-1-joannelkoong@gmail.com/)** - -> This patchset is the 2nd in the dynptr series. The 1st can be found here [0]. -> -> This patchset adds skb and xdp type dynptrs, which have two main benefits for -> packet parsing: -> * allowing operations on sizes that are not statically known at -> compile-time (eg variable-sized accesses). -> * more ergonomic and less brittle iteration through data (eg does not need -> manual if checking for being within bounds of data_end) -> - -**[v6: Add support for NXP bluetooth chipsets](http://lore.kernel.org/netdev/20230301154514.3292154-1-neeraj.sanjaykale@nxp.com/)** - -> This patch adds a driver for NXP bluetooth chipsets. -> -> The driver is based on H4 protocol, and uses serdev APIs. It supports host -> to chip power save feature, which is signalled by the host by asserting -> break over UART TX lines, to put the chip into sleep state. -> -> To support this feature, break_ctl has also been added to serdev-tty along -> with a new serdev API serdev_device_break_ctl(). -> - -**[v1: net: ieee802154: Prevent user from crashing the host](http://lore.kernel.org/netdev/20230301154450.547716-1-miquel.raynal@bootlin.com/)** - -> Avoid crashing the machine by checking -> info->attrs[NL802154_ATTR_SCAN_TYPE] presence before de-referencing it, -> which was the primary intend of the blamed patch. -> - -**[v1: [NETFILTER]: Keep conntrack reference until IPsecv6 policy checks are done](http://lore.kernel.org/netdev/20230301145534.421569-1-madhu.koriginja@nxp.com/)** - -> Keep the conntrack reference until policy checks have been performed for -> IPsec V6 NAT support. The reference needs to be dropped before a packet is -> queued to avoid having the conntrack module unloadable. -> - -**[v1: vsock: check error queue to set EPOLLERR](http://lore.kernel.org/netdev/76e7698d-890b-d14d-fa34-da5dd7dd13d8@sberdevices.ru/)** - -> EPOLLERR must be set not only when there is error on the socket, but also -> when error queue of it is not empty (may be it contains some control -> messages). Without this patch 'poll()' won't detect data in error queue. -> This patch is based on 'tcp_poll()'. -> - -**[v1: net: ionic: catch failure from devlink_alloc](http://lore.kernel.org/netdev/20230301013623.32226-1-shannon.nelson@amd.com/)** - -> Add a check for NULL on the alloc return. If devlink_alloc() fails and -> we try to use devlink_priv() on the NULL return, the kernel gets very -> unhappy and panics. With this fix, the driver load will still fail, -> but at least it won't panic the kernel. -> - -**[v1: net: tls: avoid hanging tasks on the tx_lock](http://lore.kernel.org/netdev/20230301002857.2101894-1-kuba@kernel.org/)** - -> syzbot sent a hung task report and Eric explains that adversarial -> receiver may keep RWIN at 0 for a long time, so we are not guaranteed -> to make forward progress. Thread which took tx_lock and went to sleep -> may not release tx_lock for hours. Use interruptible sleep where -> possible and reschedule the work if it can't take the lock. -> - -**[v3: net-next: vsock: add support for sockmap](http://lore.kernel.org/netdev/20230227-vsock-sockmap-upstream-v3-0-7e7f4ce623ee@bytedance.com/)** - -> Add support for sockmap to vsock. -> -> We're testing usage of vsock as a way to redirect guest-local UDS -> requests to the host and this patch series greatly improves the -> performance of such a setup. -> -> Compared to copying packets via userspace, this improves throughput by -> 121% in basic testing. -> - -**[v4: bpf-next: net/smc: Introduce BPF injection capability](http://lore.kernel.org/netdev/1677602291-1666-1-git-send-email-alibuda@linux.alibaba.com/)** - -> This patches attempt to introduce BPF injection capability for SMC, -> and add selftest to ensure code stability. -> -> As we all know that the SMC protocol is not suitable for all scenarios, -> especially for short-lived. However, for most applications, they cannot -> guarantee that there are no such scenarios at all. Therefore, apps -> may need some specific strategies to decide shall we need to use SMC -> or not, for example, apps can limit the scope of the SMC to a specific -> IP address or port. -> - -#### 安全增强 - -**[v1: ubsan: Tighten UBSAN_BOUNDS on GCC](http://lore.kernel.org/linux-hardening/20230302225444.never.053-kees@kernel.org/)** - -> The use of -fsanitize=bounds on GCC will ignore some trailing arrays, -> leaving a gap in coverage. Switch to using -fsanitize=bounds-strict to -> match Clang's stricter behavior. -> - -**[v1: kheaders: Use array declaration instead of char](http://lore.kernel.org/linux-hardening/20230302224946.never.243-kees@kernel.org/)** - -> Under CONFIG_FORTIFY_SOURCE, memcpy() will check the size of destination -> and source buffers. Defining kernel_headers_data as "char" would trip -> this check. Since these addresses are treated as byte arrays, define -> them as arrays (as done everywhere else). -> - -#### 异步 IO - -**[v1: io_uring/poll: don't pass in wake func to io_init_poll_iocb()](http://lore.kernel.org/io-uring/f7f8fd3e-a810-d9d7-5433-32957e880652@kernel.dk/)** - -> We only use one, and it's io_poll_wake(). Hardwire that in the initial -> init, as well as in __io_queue_proc() if we're setting up for double -> poll. -> - -**[v1: io_uring: add IORING_OP_FUSED_CMD](http://lore.kernel.org/io-uring/20230301140611.163055-1-ming.lei@redhat.com/)** - -> Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to -> be SQE128. The 1st SQE(master) is one 64byte URING_CMD, and the 2nd -> 64byte SQE(slave) is another normal 64byte OP. For any OP which needs -> to support slave OP, io_issue_defs[op].fused_slave needs to be set as 1, -> and its ->issue() can retrieve/import buffer from master request's -> fused_cmd_kbuf. -> - -**[v1: io_uring/poll: allow some retries for poll triggering spuriously](http://lore.kernel.org/io-uring/8a746fe0-dd72-568c-e601-19c9192c38fb@kernel.dk/)** - -> If we get woken spuriously when polling and fail the operation with -> -EAGAIN again, then we generally only allow polling again if data -> had been transferred at some point. This is indicated with -> REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket -> was originally empty, then we haven't transferred data yet and we will -> fail the poll re-arm. This either punts the socket to io-wq if it's -> blocking, or it fails the request with -EAGAIN if not. Neither condition -> is desirable, as the former will slow things down, while the latter -> will make the application confused. -> - -#### Rust For Linux - -**[v1: rust: sort uml documentation arch support table](http://lore.kernel.org/rust-for-linux/I0YeaNjTtc4Nh47ZLJfAs6rgfAc_QZxhynNfz-GQKssVZ1S2UI_cTScCkp9-oX-hPYVcP3EfF7N0HMB9iAlm1FcvOJagnQoLeHtiW3bGCgM=@bamelis.dev/)** - -> The arch_support table was not sorted alphabetically. -> Sorts the table properly. -> - -#### BPF - -**[v1: bpf-next: bpf: Use separate RCU callbacks for freeing selem](http://lore.kernel.org/bpf/20230303141542.300068-1-memxor@gmail.com/)** - -> Martin suggested that instead of using a byte in the hole (which he has -> a use for in his future patch) in bpf_local_storage_elem, we can -> dispatch a different call_rcu callback based on whether we need to free -> special fields in bpf_local_storage_elem data. The free path, described -> in commit 9db44fdd8105 ("bpf: Support kptrs in local storage maps"), -> only waits for call_rcu callbacks when there are special (kptrs, etc.) -> fields in the map value, hence it is necessary that we only access -> smap in this case. -> - -**[v1: cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers](http://lore.kernel.org/bpf/20230303095310.238553-1-kamalesh.babulal@oracle.com/)** - -> Replace mutex_[un]lock() with cgroup_[un]lock() wrappers to stay -> consistent across cgroup core and other subsystem code, while -> operating on the cgroup_mutex. -> - -**[v2: bpf-next: libbpf: usdt arm arg parsing support](http://lore.kernel.org/bpf/20230303083706.3597-1-puranjay12@gmail.com/)** - -> Parsing of USDT arguments is architecture-specific; on arm it is -> relatively easy since registers used are r[0-10], fp, ip, sp, lr, -> pc. Format is slightly different compared to aarch64; forms are -> -> - "size @ [ reg, #offset ]" for dereferences, for example -> "-8 @ [ sp, #76 ]" ; " -4 @ [ sp ]" -> - "size @ reg" for register values; for example -> "-4@r0" -> - "size @ #value" for raw values; for example -> "-8@#1" -> -> Add support for parsing USDT arguments for ARM architecture. -> - -**[v3: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230303012122.852654-1-kuifeng@meta.com/)** - -> Previously, BPF struct_ops didn't go off, as even when the user -> program creating it was terminated, none of these ever were pinned. -> For instance, the TCP congestion control subsystem indirectly -> maintains a reference count on the struct_ops of any registered BPF -> implemented algorithm. -> - -**[v3: bpf-next: selftests/bpf: Add -Wuninitialized flag to bpf prog flags](http://lore.kernel.org/bpf/20230303005500.1614874-1-davemarchevsky@fb.com/)** - -> Per C99 standard [0], Section 6.7.8, Paragraph 10: -> -> If an object that has automatic storage duration is not initialized -> explicitly, its value is indeterminate. -> -> And in the same document, in appendix "J.2 Undefined behavior": -> -> The behavior is undefined in the following circumstances: -> [...] -> The value of an object with automatic storage duration is used while -> it is indeterminate (6.2.4, 6.7.8, 6.8). -> - -**[v2: bpf-next: bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program types](http://lore.kernel.org/bpf/ZAD8QyoszMZiTzBY@slm.duckdns.org/)** - -> These helpers are safe to call from any context and there's no reason to -> restrict access to them. Remove them from bpf_trace and filter lists and add -> to bpf_base_func_proto() under perfmon_capable(). -> - -**[v2: bpf-next: bpf: add netfilter program type](http://lore.kernel.org/bpf/20230302172757.9548-1-fw@strlen.de/)** - -> Add minimal support to hook bpf programs to netfilter hooks, -> e.g. PREROUTING or FORWARD. -> -> For this the most relevant parts for registering a netfilter -> hook via the in-kernel api are exposed to userspace via bpf_link. -> -> The new program type is 'tracing style' and assumes skb dynptrs are used -> rather than 'direct packet access'. -> - -**[v4: bpf-next: Make uprobe attachment APK aware](http://lore.kernel.org/bpf/20230301212308.1839139-1-deso@posteo.net/)** - -> On Android, APKs (android packages; zip packages with somewhat -> prescriptive contents) are first class citizens in the system: the -> shared objects contained in them don't exist in unpacked form on the -> file system. Rather, they are mmaped directly from within the archive -> and the archive is also what the kernel is aware of. -> - -**[v1: bpf-next: selftests/bpf: support custom per-test flags and multiple expected messages](http://lore.kernel.org/bpf/20230301175417.3146070-1-eddyz87@gmail.com/)** - -> This patch allows to specify program flags and multiple verifier log -> messages for the test_loader kind of tests. For example: -> -> tools/testing/selftets/bpf/progs/foobar.c: -> -> SEC("tc") -> __success __log_level(7) -> __msg("first message") -> __msg("next message") -> __flag(BPF_F_ANY_ALIGNMENT) -> int buz(struct __sk_buff *skb) -> { ... } -> - -**[v1: Discard .note.gnu.property in vmlinux](http://lore.kernel.org/bpf/SY4P282MB108446E9ED9FB180AE717D5F9DAD9@SY4P282MB1084.AUSP282.PROD.OUTLOOK.COM/)** - -> When the kernel image is finally linked, all the notes are packed into a -> single .notes section, but these notes may have different alignments. -> -> binutils above 2.32 adds a ".note.gnu.property" section to the compiled -> output, which is 4-byte aligned on 32-bit, but 8-byte aligned on 64-bit. -> At present, the notes generated by both the ELFNOTE macro and the VDSO -> linker script are 4-byte aligned. -> - -**[v1: bpf-next: libbpf: Use text error for btf_custom_path failures](http://lore.kernel.org/bpf/20230228142531.439324-1-9erthalion6@gmail.com/)** - -> Use libbpf_strerror_r to expand the error when failed to parse the btf -> file at btf_custom_path. It does not change a lot locally, but since the -> error will bubble up through a few layers, it may become quite -> confusing otherwise. -> - -**[v3: bpf-next: selftests/bpf: Set __BITS_PER_LONG if target is bpf for LoongArch](http://lore.kernel.org/bpf/1677585781-21628-1-git-send-email-yangtiezhu@loongson.cn/)** - -> If target is bpf, there is no __loongarch__ definition, __BITS_PER_LONG -> defaults to 32, __NR_nanosleep is not defined: -> -> #if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32 -> #define __NR_nanosleep 101 -> __SC_3264(__NR_nanosleep, sys_nanosleep_time32, sys_nanosleep) -> #endif -> - -**[v2: bpf-next: Support defragmenting IPv(4|6) packets in BPF](http://lore.kernel.org/bpf/cover.1677526810.git.dxu@dxuuu.xyz/)** - -> In the context of a middlebox, fragmented packets are tricky to handle. -> The full 5-tuple of a packet is often only available in the first -> fragment which makes enforcing consistent policy difficult. There are -> really only two stateless options, neither of which are very nice: -> -> Enforce policy on first fragment and accept all subsequent fragments. -> - -**[v3: bpf-next: bpf: bpf memory usage](http://lore.kernel.org/bpf/20230227152032.12359-1-laoar.shao@gmail.com/)** - -> Currently we can't get bpf memory usage reliably. bpftool now shows the -> bpf memory footprint, which is difference with bpf memory usage. The -> - -### 周边技术动态 - -#### Qemu - -**[v11: riscv: Allow user to set the satp mode](http://lore.kernel.org/qemu-devel/20230303131252.892893-1-alexghiti@rivosinc.com/)** - -> This introduces new properties to allow the user to set the satp mode, -> see patch 3 for full syntax. In addition, it prevents cpus to boot in a -> satp mode they do not support (see patch 4). -> -> https://gitlab.com/bonzini/qemu into staging") -> - -**[v1: Risc-V CPU state by hart ID](http://lore.kernel.org/qemu-devel/20230303065055.915652-1-mchitale@ventanamicro.com/)** - -> Currently a Risc-V platform cannot realizes multiple CPUs with non contiguous -> hart IDs because the APLIC, IMSIC and ACLINT emulation code uses the -> contiguous logical CPU ID to fetch per CPU state. -> -> This patchset implements cpu_by_arch_id for Risc-V to get the CPU state -> by hart ID which may be sparse instead of the contigous logical CPU id. -> - -**[v2: hw/riscv/virt.c: add cbo[mz]-block-size fdt properties](http://lore.kernel.org/qemu-devel/20230302091406.407824-1-dbarboza@ventanamicro.com/)** - -> Based-on: 20230224132536.552293-1-dbarboza@ventanamicro.com -> ("v8: riscv: Add support for Zicbo[m,z,p] instructions") -> -> This second version, which is still dependent on: -> - -**[v1: hw/riscv/virt.c: add cbom-block-size fdt property](http://lore.kernel.org/qemu-devel/20230301215902.375217-1-dbarboza@ventanamicro.com/)** - -> I'm sending this almost last minute patch as part of the work done in: -> - -#### Buildroot - -**[[branch/2022.11.x] package/wolfssl: disable assembly when not supported](http://lore.kernel.org/buildroot/20230228153524.51C5486CA7@busybox.osuosl.org/)** - -> commit: https://git.buildroot.net/buildroot/commit/?id=348b2e25df76b9a603639dc9a99c03442ab673e1 -> branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/2022.11.x -> -> wolfssl contains some assembly code and its configure.ac script -> enables the assembly code depending on the CPU architecture. However, -> the detection logic is not sufficient and leads to using the assembly -> code in situation where it should not. -> -> Here are two examples: -> -> - As soon as the architecture is mips64/mips64el, it uses assembly -> code, but that assembly code is not mips64r6 compatible. -> -> - As soon as the architecture is RISC-V, it uses assembly code, but -> that assembly code uses multiplication instructions, without paying -> attention that the "M" extension may not be available in the RISC-V -> CPU instruction set. -> - -#### U-Boot - -**[v3: Basic StarFive JH7110 RISC-V SoC support](http://lore.kernel.org/u-boot/20230303032432.7837-1-yanhong.wang@starfivetech.com/)** - -> This series of patches base on the latest branch/master, and add support -> for the StarFive JH7110 RISC-V SoC and VisionFive V2 board. In order for -> this to be achieved, the respective DT nodes have been added, and the -> required defconfigs have been added to the boards' defconfig. What is more, -> the basic required DM drivers have been added, such as reset, clock, pinctrl, -> uart, ram etc. -> - -**[Question regarding U-boot MultiCore SMP](http://lore.kernel.org/u-boot/44c8dbba-961b-3fb1-1c3e-f196b7a95e20@sysgo.com/)** - -> I am working on the PolarFire RISC-V icicle kit and use u-boot to start -> my application. -> I configured the firmware to start u-boot on all harts (cores) and found -> out that u-boot uses a "HART lottery system" to decide which core/hart -> it runs on. -> In my special case I want u-boot to start on the first hart and the -> other harts shall wait for the interrupt. -> - -**[v1: Kconfig: Sort the BUILD_TARGET list](http://lore.kernel.org/u-boot/20230228062221.489088-1-marek.vasut+renesas@mailbox.org/)** - -> Sort the defaults list in BUILD_TARGET Kconfig option. No functional change. -> - -## 20230226:第 35 期 - -### 内核动态 - -#### RISC-V 架构支持 - -* [v14: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230224170118.16766-1-andy.chiu@sifive.com/) - - This patchset is implemented based on vector 1.0 spec to add vector support - in riscv Linux kernel. There are some assumptions for this implementations. - -* [v6: RISC-V: Apply Zicboz to clear_page](http://lore.kernel.org/linux-riscv/20230224162631.405473-1-ajones@ventanamicro.com/) - - When the Zicboz extension is available we can more rapidly zero naturally - aligned Zicboz block sized chunks of memory. As pages are always page - aligned and are larger than any Zicboz block size will be, then - clear_page() appears to be a good candidate for the extension. - -* [v1: RESEND: RISC-V: enable rust](http://lore.kernel.org/linux-riscv/20230224135044.2882109-1-conor.dooley@microchip.com/) - - This is a somewhat blind (and maybe foolish) attempt at enabling Rust - for RISC-V. I've tested this on Icicle, and the modules seem to work. - -* [v1: RISC-V: mm: Support huge page in vmalloc_fault()](http://lore.kernel.org/linux-riscv/20230224104001.2743135-1-dylan@andestech.com/) - - RISC-V supports ioremap() with huge page (pud/pmd) mapping, but - vmalloc_fault() assumes that the vmalloc range is limited to pte - mappings. Add huge page support to complete the vmalloc_fault() function. - -* [v7: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230224100218.1824569-1-alexghiti@rivosinc.com/) - - This new version gets rid of the limitation that prevented KASAN kernels - to use the newly introduced parameters. -* [v2: RISC-V: Stop emitting attributes](http://lore.kernel.org/linux-riscv/20230223224605.6995-1-palmer@rivosinc.com/) - - The RISC-V ELF attributes don't contain any useful information. New - toolchains ignore them, but they frequently trip up various older/mixed - toolchains. So just turn them off. - -* [v1: RISC-V: avoid build issues for clang/llvm-17 with binutils 2.35](http://lore.kernel.org/linux-riscv/20230223220546.52879-1-conor@kernel.org/) - - Here's an attempted (interim?) fix for issues on v5.10 due to the - presence of zifencei & zicsr in object files. - -* [v2: Use dma_default_coherent for devicetree default coherency](http://lore.kernel.org/linux-riscv/20230223113644.23356-1-jiaxun.yang@flygoat.com/) - - This series split out second half of my previous series - "v1: MIPS DMA coherence fixes". - - It intends to use dma_default_coherent to determine the default coherency of - devicetree probed devices instead of hardcoding it with Kconfig options. - -* [v2: Add JH7110 MIPI DPHY RX support](http://lore.kernel.org/linux-riscv/20230223015952.201841-1-changhuang.liang@starfivetech.com/) - - This patchset adds mipi dphy rx driver for the StarFive JH7110 SoC. - It is used to transfer CSI camera data. The series has been tested on - the VisionFive 2 board. - -* [v1: MAINTAINERS: add missing clock driver coverage for Microchip FPGAs](http://lore.kernel.org/linux-riscv/20230222124610.257101-1-conor.dooley@microchip.com/) - - When the CCC support was added, the clock binding coverage was - converted to a regex in commit 71c8517e004b ("MAINTAINERS: update - polarfire soc clock binding"), but the coverage for the clock drivers - themselves was not updated. Rectify that now. - -* [v1: RESEND: scripts/gdb: add lx_current support for riscv](http://lore.kernel.org/linux-riscv/20230222093730.1826523-1-suagrfillet@gmail.com/) - - RISC-V uses the tp register to save the current task_struct address - as its current() defines. So lx_current() of riscv just returns the - dereference of the address cast via task_ptr_type. - -* [v17: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230222033021.983168-1-guoren@kernel.org/) - - The patches convert riscv to use the generic entry infrastructure from - kernel/entry/*. Some optimization for entry.S with new .macro and merge - ret_from_kernel_thread into ret_from_fork. - -* [v3: RISC-V Hardware Probing User Interface](http://lore.kernel.org/linux-riscv/20230221190858.3159617-1-evan@rivosinc.com/) - - There's been a bunch of off-list discussions about this, including at Plumbers. - - Instead this patch set takes a very different approach and provides a set - of key/value pairs that encode various bits about the system. - -* [v1: Add PLL clocks driver for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230221141147.303642-1-xingyu.wu@starfivetech.com/) - - This patch serises are to add PLL clocks driver and modify - the system clock driver to depend on PLL clocks driver for the - StarFive JH7110 RISC-V SoC. - -* [v2: Add DMA driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230221140424.719-1-walker.chen@starfivetech.com/) - - This patch series adds dma support for the StarFive JH7110 RISC-V SoC. - The first patch adds device tree binding. The second patch includes dma - driver. The last patch adds device node of dma to JH7110 dts. - -* [v3: bpf-next: riscv, bpf: Add kfunc support for RV64](http://lore.kernel.org/linux-riscv/20230221140656.3480496-1-pulehui@huaweicloud.com/) - - This patch adds kernel function call support for RV64. Since the offset - from RV64 kernel and module functions to bpf programs is almost within - the range of s32, the current infrastructure of RV64 is already - sufficient for kfunc, so let's turn it on. - -* [v1: Add PTP support for sama7g5](http://lore.kernel.org/linux-riscv/20230221092104.730504-1-durai.manickamkr@microchip.com/) - - This patch series is intended to add PTP capability to the GEM and - EMAC for sama7g5. - -* [v2: Add new partial clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230221083323.302471-1-xingyu.wu@starfivetech.com/) - - This patch serises are to add new partial clock drivers and reset - supports about System-Top-Group(STG), Image-Signal-Process(ISP) - and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. - -* [v4: Basic clock, reset & device tree support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230221024645.127922-1-hal.feng@starfivetech.com/) - - This patch series adds basic clock, reset & DT support for StarFive - JH7110 SoC. Patch 17 depends on series [1] which provides pinctrl - dt-bindings. Patch 19 depends on series [2] which provides dt-bindings - of VisionFive 2 board and JH7110 SoC. - -* [v4: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230221023523.1498500-1-jeeheng.sia@starfivetech.com/) - - This series adds RISC-V Hibernation/suspend to disk support. - Low level Arch functions were created to support hibernation. - -* [v3: Add watchdog driver for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230220081926.267695-1-xingyu.wu@starfivetech.com/) - - This patch serises are to add watchdog driver for the StarFive JH7110 - RISC-V SoC. The first patch adds docunmentation to describe device - tree bindings. The subsequent patch adds watchdog driver and support - JH7110 SoC. And the addition of device tree node will be submitted - after the JH7110 dts merge. This patchset is based on 6.2. - -#### 进程调度 - -* [v1: net: net/sched: cls_api: Move call to tcf_exts_miss_cookie_base_destroy()](http://lore.kernel.org/lkml/20230224-cls_api-wunused-function-v1-1-12c77986dc2d@kernel.org/) - - Move the call to tcf_exts_miss_cookie_base_destroy() in - tcf_exts_destroy() out of the '#ifdef CONFIG_NET_CLS_ACT', so that it - always appears used to the compiler, while not changing any behavior - with any of the various configuration combinations. - -* [v3: sched/fair: Interleave cfs bandwidth timers for improved single thread performance at low utilization](http://lore.kernel.org/lkml/20230223185918.1500132-1-sshegde@linux.vnet.ibm.com/) - - CPU cfs bandwidth controller uses hrtimer. Currently there is no initial - value set. Hence all period timers would align at expiry. - This happens when there are multiple CPU cgroup's. - -* [v1: kernel/sched/core.c: Modified prio_less().](http://lore.kernel.org/lkml/CAHOvCC7yjceArav9Ps0v1EP4CjfkrxbfXFgABK54cdFKNoE8iw@mail.gmail.com/) - - The sched_class structure is defined to be sorted by pointer size. - - This matches the sched class priority. - In the prio_less() function in kernel/sched/core.c, - the less value can be determined by pointer operation as follows. - -#### 内存管理 - -* [v1: cifs: Improve use of filemap_get_folios_tag()](http://lore.kernel.org/linux-mm/2244151.1677251586@warthog.procyon.org.uk/) - - The inefficiency derived from filemap_get_folios_tag() get a batch of - contiguous folios in Vishal's change to afs that got copied into cifs can - be reduced by skipping over those folios that have been passed by the start - position rather than going through the process of locking, checking and - trying to write them. - -* [v1: mm: hugetlb_vmemmap: simplify hugetlb_vmemmap_init() a bit](http://lore.kernel.org/linux-mm/20230223065947.64134-1-songmuchun@bytedance.com/) - - The check of IS_ENABLED(CONFIG_PROC_SYSCTL) is unnecessary since - register_sysctl_init() will be empty in this case. So, there is - no warnings after removing the check. - -* [v1: [nvdimm][crash] pmem memmap dump support](http://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/) - - This mail raises a pmem memmap dump requirement and possible solutions, but they are all still premature. - I really hope you can provide some feedback. - - pmem memmap can also be called pmem metadata here. - -* [v2: tmpfs: add the option to disable swap](http://lore.kernel.org/linux-mm/20230223024412.3522465-1-mcgrof@kernel.org/) - - This adds noswap support to tmpfs. This follows up the first RFC [0], - you can look at that link for details of the testing done. On this - v2 I've addressed the feedback provided by Matthew Wilcox and Yosry Ahmed. - -* [v1: RFC: mm: pagemap: add vma(VM_PFNMAP) support in pagemap_pte_hole()](http://lore.kernel.org/linux-mm/20230223024332.1337578-1-sunke@kylinos.cn/) - - pagemap currently does not support vma(FIXMAP), add support - in pagemap_pte_hole(). - -* [v2: mm: userfaultfd: refactor and add UFFDIO_CONTINUE_MODE_WP](http://lore.kernel.org/linux-mm/20230223005754.2700663-1-axelrasmussen@google.com/) - - The refactors are sorted by increasing controversial-ness, the idea being we - could drop some of the refactors if they are deemed not worth it. - -* [v2: mm/khugepaged: alloc_charge_hpage() take care of mem charge errors](http://lore.kernel.org/linux-mm/20230222195247.791227-1-peterx@redhat.com/) - - If memory charge failed, instead of returning the hpage but with an error, - allow the function to cleanup the folio properly, which is normally what a - function should do in this case - either return successfully, or return - with no side effect of partial runs with an indicated error. - -* [v1: swiotlb: mark swiotlb_memblock_alloc() as __init](http://lore.kernel.org/linux-mm/20230222070411.6186-1-rdunlap@infradead.org/) - - swiotlb_memblock_alloc() calls memblock_alloc(), which calls - (__init) memblock_alloc_try_nid(). However, swiotlb_membloc_alloc() - can be marked as __init since it is only called by swiotlb_init_remap(), - which is already marked as __init. - -* [v8: tracing/user_events: Remote write ABI](http://lore.kernel.org/linux-mm/20230221211143.574-1-beaub@linux.microsoft.com/) - - As part of the discussions for user_events aligned with user space - tracers, it was determined that user programs should register a aligned - value to set or clear a bit when an event becomes enabled. Currently a - shared page is being used that requires mmap(). Remove the shared page - implementation and move to a user registered address implementation. - -* [v1: dmapool: push new blocks in ascending order](http://lore.kernel.org/linux-mm/20230221165400.1595247-1-kbusch@meta.com/) - - Some users of the dmapool need their allocations to happen in ascending - order. The recent optimizations pushed the blocks in reverse order, so - restore the previous behavior by linking the next available block from - low-to-high. - -* [Sv: v1: mm/memcontrol: add memory.peak in cgroup root](http://lore.kernel.org/linux-mm/DB4PR02MB93344BAA949FA7E25E298C90FEA59@DB4PR02MB9334.eurprd02.prod.outlook.com/) - - Thanks for the quick response! I think we are just trying to get the same value that was available for us in cgroup v1 memory.max_usage_in_bytes. I guess this value also is incomplete for representing the system memory usage. Is it due the incompleteness that the memory.peak has been left out in the root of cgroup v2? - -* [v1: mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON](http://lore.kernel.org/linux-mm/20230221085905.1465385-1-naoya.horiguchi@linux.dev/) - - After a memory error happens on a clean folio, a process unexpectedly - receives SIGBUS when it accesses to the error page. This SIGBUS killing - is pointless and simply degrades the level of RAS of the system, because - the clean folio can be dropped without any data lost on memory error - handling as we do for a clean pagecache. - -* [v1: mm: slub: make kobj_type structure constant](http://lore.kernel.org/linux-mm/20230220-kobj_type-mm-slub-v1-1-5ae49b96d9aa@weissschuh.net/) - - Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") - the driver core allows the usage of const struct kobj_type. - - Take advantage of this to constify the structure definition to prevent - modification at runtime. - -* [v1: mm/zsmalloc: Split zsdesc from struct page](http://lore.kernel.org/linux-mm/20230220132218.546369-1-42.hyeyoo@gmail.com/) - - The purpose of this series is to define own memory descriptor for zsmalloc, - instead of re-using various fields of struct page. This is a part of the - effort to reduce the size of struct page to unsigned long and enable - dynamic allocation of memory descriptors. - -* [v2: Add tests for memblock_alloc_node()](http://lore.kernel.org/linux-mm/59d4745b-7b2-bf6-7b8-f6571d78d336@mail.polimi.it/) - - This test is aimed at verifying the memblock_alloc_node() to work as - expected, so setting the correct NUMA node for the new allocated - region. The memblock_alloc_node() is called directly without using any - stub. The core check is between the requested NUMA node and the `nid` field inside the memblock_region structure. -* [v10: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230219073318.366189-1-nphamcs@gmail.com/) - - There is currently no good way to query the page cache state of large - file sets and directory trees. There is mincore(), but it scales poorly: - the kernel writes out a lot of bitmap data that userspace has to - aggregate, when the user really doesn not care about per-page information - in that case. The user also needs to mmap and unmap each file as it goes - along, which can be quite slow as well. - -#### 文件系统 - -* [v1: SSDFS: flash-friendly LFS file system for ZNS SSD](http://lore.kernel.org/linux-fsdevel/20230225010927.813929-1-slava@dubeyko.com/) - - I am completely aware that patchset is big. And I am opened for any - advices how I can split the patchset on reasonable portions with - the goal to introduce SSDFS for the review. Even now, I excluded - the code of several subsystems to make the patchset slightly - smaller. Potentially, I can introduce SSDFS by smaller portions - with limited fucntionality. However, it can confuse and makes it - hard to understand how declared goals are achieved by implemented functionality. - -* [[RESEND v2 PATCH] init/do_mounts.c: add virtiofs root fs support](http://lore.kernel.org/linux-fsdevel/20230224143751.36863-1-david@ixit.cz/) - - Make it possible to boot directly from a virtiofs file system with tag - 'myfs' using the following kernel parameters: - - rootfstype=virtiofs root=myfs rw - - Booting directly from virtiofs makes it possible to use a directory on - the host as the root file system. This is convenient for testing and - situations where manipulating disk image files is cumbersome. - -* [git pull: vfs.git misc bits](http://lore.kernel.org/linux-fsdevel/Y%2FgxyQA+yKJECwyp@ZenIV/) - - That should cover the rest of what I had in -next; I'd been sick for - several weeks, so a lot of pending stuff I hoped to put into -next - is going to miss this window ;-/ - - Al, off to deal with the remaining pile in the mailbox... - -* [GIT PULL: iomap: new code for 6.3](http://lore.kernel.org/linux-fsdevel/167703901677.1909640.1798642413122202835.stg-ugh@magnolia/) - - Please pull this branch with changes for iomap for 6.3-rc1. This is - mostly rearranging things to make life easier for gfs2, nothing all that - mindblowing for this release. - - As usual, I did a test-merge with the main upstream branch as of a few - minutes ago, and didn't see any conflicts. Please let me know if you - encounter any problems. - -* [v1: Minor documentation clean-up in fs](http://lore.kernel.org/linux-fsdevel/20230220170210.15677-1-lukas.bulwahn@gmail.com/) - - please pick this minor documentation clean-up in fs. It is not in the - Documentation directory, but I would consider these README files also some unsorted largely distributed kernel documentation. - - -* [v7: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20230220105336.3810-1-nj.shetty@samsung.com/) - - The patch series covers the points discussed in November 2021 virtual - call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0]. - We have covered the initial agreed requirements in this patchset and - further additional features suggested by community. - Patchset borrows Mikulas's token based approach for 2 bdev - implementation. - -* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) - - This removes the dependency on interrupts to wake up task. Set task - state as TASK_RUNNING, if need_resched() returns true, - while polling for IO completion. - Earlier, polling task used to sleep, relying on interrupt to wake it up. - This made some IO take very long when interrupt-coalescing is enabled in - NVMe. - -#### 网络设备 - -* [v12: bpf-next: Add skb + xdp dynptrs](http://lore.kernel.org/netdev/20230226085120.3907863-1-joannelkoong@gmail.com/) - - This patchset is the 2nd in the dynptr series. The 1st can be found here [0]. - - When comparing the differences in runtime for packet parsing without dynptrs - vs. with dynptrs, there is no noticeable difference. Patch 9 contains more - details as well as examples of how to use skb and xdp dynptrs. - -* [v1: r8169: disable ASPM during NAPI poll](http://lore.kernel.org/netdev/af076f1f-a034-82e5-8f76-f3ec32a14eaa@gmail.com/) - - This is a rework of ideas from Kai-Heng on how to avoid the known - ASPM issues whilst still allowing for a maximum of ASPM-related power - savings. As a prerequisite some locking is added first. - -* [v9: net-next: r8169: Temporarily disable ASPM on NAPI poll](http://lore.kernel.org/netdev/20230225034635.2220386-1-kai.heng.feng@canonical.com/) - - The series is to temporarily disable ASPM on NAPI poll, so the NIC can - "regain" the performace loss when ASPM is enabled. The idea is from - Realtek vendor driver's feature "dynamic ASPM" . - - We have "dynamic ASPM" mechanism in Ubuntu 22.04 LTS kernel for quite a - while, and AFAIK it hasn't introduced any regression so far. - -* [v1: iproute2: genl: print caps for all families](http://lore.kernel.org/netdev/20230225003754.1726760-1-kuba@kernel.org/) - - Back in 2006 kernel commit 334c29a64507 ("[GENETLINK]: Move - command capabilities to flags.") removed some attributes and - moved the capabilities to flags. Corresponding iproute2 - commit 26328fc3933f ("Add controller support for new features - exposed") added the ability to print those caps. - - Printing is gated on version of the family, but we're checking - the version of each individual family rather than the control - family. The format of attributes in the control family - is dictated by the version of the control family alone. - -* [v3: Self-encapsulate the thermal zone device structure](http://lore.kernel.org/netdev/20230224210634.3994365-1-daniel.lezcano@linaro.org/) - - The exported thermal headers expose the thermal core structure while those - should be private to the framework. The initial idea was the thermal sensor - drivers use the thermal zone device structure pointer to pass it around from - the ops to the thermal framework API like a handler. - - * v2: [kernel: Clear workqueue to avoid use-after-free](http://lore.kernel.org/netdev/20230224195313.1877313-1-jiangzp@google.com/) - - After the hci_sync rework, cmd_sync_work was cleared when calling - hci_unregister_dev, but not when powering off the adapter. - Use-after-free errors happen when a work is still scheduled - when cmd is freed by __mgmt_power_off. - -* [v5: Another crack at a handshake upcall mechanism](http://lore.kernel.org/netdev/167726551328.5428.13732817493891677975.stgit@91.116.238.104.host.secureserver.net/) - - Here is v5 of a series to add generic support for transport layer - security handshake on behalf of kernel socket consumers (user space consumers use a security library directly, of course). - -* [v1: net: avoid indirect memory pressure calls](http://lore.kernel.org/netdev/20230224184606.7101-1-fw@strlen.de/) - - There is a noticeable tcp performance regression (loopback or cross-netns), - seen with iperf3 -Z (sendfile mode) when generic retpolines are needed. - - With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave - memory pressure happen much more often. - -* [v1: net: Regressions in Ocelot switch drivers](http://lore.kernel.org/netdev/20230224155235.512695-1-vladimir.oltean@nxp.com/) - - These are 3 patches which resolve a regression in the Seville driver, - one in the Felix driver and a generic one which affects any kernel - compiled with 2 Kconfig options enabled. All of them have in common my - lack of attention during review/testing. The patches touch the DSA, MFD - and MDIO drivers for Ocelot. I think it would be preferable if all - patches went through netdev (with Lee's Ack). - -* [v1: brcmfmac: pcie: Add 4359C0 firmware definition](http://lore.kernel.org/netdev/20230224-topic-brcm_tone-v1-1-333b0ac67934@linaro.org/) - - Some phones from around 2016, as well as other random devices have - this chip called 43956 or 4359C0 or 43596A0, which is more or less - just a rev bump (v9) of the already-supported 4359. Add a corresponding - firmware definition to allow for choosing the correct blob. - -* [v1: net-next: packet: allow MSG_NOSIGNAL in recvmsg](http://lore.kernel.org/netdev/20230224071745.20717-1-equinox@diac24.net/) - - packet_recvmsg() whitelists a bunch of MSG_* flags, which notably does - not include MSG_NOSIGNAL. Unfortunately, io_uring always sets - MSG_NOSIGNAL, meaning AF_PACKET sockets can't be used in io_uring recvmsg(). - -* [v1: linux-next: selftests: net: udpgso_bench_tx: Add test for IP fragmentation of UDP packets](http://lore.kernel.org/netdev/202302241438536013777@zte.com.cn/) - - The UDP GSO bench only tests the performance of userspace payload splitting - and UDP GSO. But we are also concerned about the performance comparing with I. - -* [v2: net-next: sfc: support offloading TC VLAN push/pop actions to the MAE](http://lore.kernel.org/netdev/20230223235026.26066-1-edward.cree@amd.com/) - - EF100 can pop and/or push up to two VLAN tags. - -* [v1: net-next: ibmvnic: Assign XPS map to correct queue index](http://lore.kernel.org/netdev/20230223153944.44969-1-nnac123@linux.ibm.com/) - - When setting the XPS map value for TX queues, use the index of the - transmit queue. - Previously, the function was passing the index of the loop that iterates - over all queues (RX and TX). This was causing invalid XPS map values. - -* [v1: net: net/sched: act_connmark: handle errno on tcf_idr_check_alloc](http://lore.kernel.org/netdev/20230223141639.13491-1-pctammela@mojatatu.com/) - - Smatch reports that 'ci' can be used uninitialized. - The current code ignores errno coming from tcf_idr_check_alloc, which - will lead to the incorrect usage of 'ci'. Handle the errno as it should. - -* [[net PATCH v2] octeontx2-af: Unlock contexts in the queue context cache in case of fault detection](http://lore.kernel.org/netdev/20230223110125.2172509-1-saikrishnag@marvell.com/) - - NDC caches contexts of frequently used queue's (Rx and Tx queues) - contexts. Due to a HW errata when NDC detects fault/poision while - accessing contexts it could go into an illegal state where a cache - line could get locked forever. To makesure all cache lines in NDC - are available for optimum performance upon fault/lockerror/posion - errors scan through all cache lines in NDC and clear the lock bit. - -* [v5: Bluetooth: NXP: Add protocol support for NXP Bluetooth chipsets](http://lore.kernel.org/netdev/20230223103614.4137309-4-neeraj.sanjaykale@nxp.com/) - - This adds a driver based on serdev driver for the NXP BT serial protocol - based on running H:4, which can enable the built-in Bluetooth device - inside an NXP BT chip. - - This driver has Power Save feature that will put the chip into sleep state - whenever there is no activity for 2000ms, and will be woken up when any - activity is to be initiated over UART. - -* [v5: Add support for NXP bluetooth chipsets](http://lore.kernel.org/netdev/20230223103614.4137309-1-neeraj.sanjaykale@nxp.com/) - - This patch adds a driver for NXP bluetooth chipsets. - - The driver is based on H4 protocol, and uses serdev APIs. It supports - host to chip power save feature, which is signalled by the host by - asserting break over UART TX lines, to put the chip into sleep state. - -* [[REGRESSION PATCH RFC] net: phy: don't resume PHY via MDIO when iface is not up](http://lore.kernel.org/netdev/20230223070519.2211-1-wsa+renesas@sang-engineering.com/) - - TLDR; Commit 96fb2077a517 ("net: phy: consider that suspend2ram may cut - off PHY power") caused regressions for us when resuming an interface - which is not up. It turns out the problem is another one, the above - commit only makes it visible. The attached patch is probably not the - right fix, but at least is proving my assumptions AFAICS. - -* [v1: net: add no-op for napi_busy_loop if CONFIG_NET_RX_BUSY_POLL=n](http://lore.kernel.org/netdev/20230223012258.1701175-1-jacob.e.keller@intel.com/) - - Commit 7db6b048da3b ("net: Commonize busy polling code to focus on napi_id - instead of socket") introduced napi_busy_loop and refactored sk_busy_loop - to call this new function. The commit removed the no-op implementation of - sk_busy_loop in the #else block for CONFIG_NET_RX_BUSY_POLL, and placed the - declaration of napi_busy_poll inside the # block where sk_busy_loop used to - be declared. - -* [v2: net-next: mlx5 technical debt of hairpin params](http://lore.kernel.org/netdev/20230222230202.523667-1-saeed@kernel.org/) - - As previously discussed, this series provides the switch from debugfs to devlink - params for hairpin. - - Per the discussion in [1], move the hairpin queues control (number and size) - from debugfs to devlink. - - [1] https://lore.kernel.org/all/20230111194608.7f15b9a1@kernel.org/ - -* [v1: 5.4/5.10: mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh](http://lore.kernel.org/netdev/20230222200301.254791-1-pchelkin@ispras.ru/) - - The null-ptr-deref problem fixed in the following patch is hit on older - branches. - - The patch failed to be initially backported into stable branches older - than 5.15 due to the fix-spell-comment commit ab4040df6efb ("mac80211: fix - some spelling mistakes"). - -* [v1: net: sunhme: Return an error when we are out of slots](http://lore.kernel.org/netdev/20230222170935.1820939-1-seanga2@gmail.com/) - - We only allocate enough space for four devices when the parent is a QFE. If - we couldn't find a spot (because five devices were created for whatever - reason), we would not return an error from probe(). Return ENODEV, which - was what we did before. - -* [v1: can: esd_usb: Improve code readability by means of replacing struct esd_usb_msg with a union](http://lore.kernel.org/netdev/20230222163754.3711766-1-frank.jungclaus@esd.eu/) - - As suggested by Vincent Mailhol, declare struct esd_usb_msg as a union - instead of a struct. Then replace all msg->msg.something constructs, - that make use of esd_usb_msg, with simpler and prettier looking - msg->something variants. - -* [v2: gro: optimise redundant parsing of packets](http://lore.kernel.org/netdev/20230222145917.GA12590@debian/) - - The first commit frees up space in the GRO CB. The second commit reduces the - redundant parsing during the complete phase, using the freed CB space. - - In addition, the second commit contains a fix for a potential problem in BIG - TCP, which is detailed in the commit message itself. - -* [v3: octeontx2-pf: Use correct struct reference in test condition](http://lore.kernel.org/netdev/Y%2FYYkKddeHOt80cO@ubun2204.myguest.virtualbox.org/) - - Fix the typo/copy-paste error by replacing struct variable ah_esp_mask name - by ah_esp_hdr. - Issue identified using doublebitand.cocci Coccinelle semantic patch. - -* [[net PATCH v2] octeontx2-pf: Recalculate UDP checksum for ptp 1-step sync packet](http://lore.kernel.org/netdev/20230222113600.1965116-1-saikrishnag@marvell.com/) - - When checksum offload is disabled in the driver via ethtool, - the PTP 1-step sync packets contain incorrect checksum, since - the stack calculates the checksum before driver updates - PTP timestamp field in the packet. This results in PTP packets - getting dropped at the other end. This patch fixes the issue by - re-calculating the UDP checksum after updating PTP - timestamp field in the driver. - -* [v3: net-next: net: virtio_net: implement exact header length guest feature](http://lore.kernel.org/netdev/20230222080638.382211-1-jiri@resnulli.us/) - - Virtio spec introduced a feature VIRTIO_NET_F_GUEST_HDRLEN which when - which when set implicates that device benefits from knowing the exact - size of the header. For compatibility, to signal to the device that - the header is reliable driver also needs to set this feature. - Without this feature set by driver, device has to figure - out the header size itself. - -* [v3: net: stmmac: Premature loop termination check was ignored](http://lore.kernel.org/netdev/87y1oq5es0.fsf@henneberg-systemdesign.com/) - - The premature loop termination check makes sense only in case of the - jump to read_again where the count may have been updated. But - read_again did not include the check. - -* [[net PATCH] octeontx2-af: Unlock contexts in the queue context cache in case of fault detection](http://lore.kernel.org/netdev/20230222065921.1852686-1-saikrishnag@marvell.com/) - - NDC caches contexts of frequently used queue's (Rx and Tx queues) - contexts. Due to a HW errata when NDC detects fault/poision while - accessing contexts it could go into an illegal state where a cache - line could get locked forever. To makesure all cache lines in NDC - are available for optimum performance upon fault/lockerror/posion - errors scan through all cache lines in NDC and clear the lock bit. - -* [v11: bpf-next: Add skb + xdp dynptrs](http://lore.kernel.org/netdev/20230222060747.2562549-1-joannelkoong@gmail.com/) - - When comparing the differences in runtime for packet parsing without dynptrs - vs. with dynptrs, there is no noticeable difference. Patch 9 contains more - details as well as examples of how to use skb and xdp dynptrs. - -#### 安全增强 - -* [v1: next: usb: host: oxu210hp-hcd: Replace fake flex-array with flexible-array member](http://lore.kernel.org/linux-hardening/Y%2FgynI9Wv8RZTD8M@work/) - - Zero-length arrays as fake flexible arrays are deprecated and we are - moving towards adopting C99 flexible-array members instead. - - Transform zero-length array into flexible-array member in struct ehci_regs. - -* [GIT PULL: flexible-array transformations for 6.3-rc1](http://lore.kernel.org/linux-hardening/Y%2FfnjS5eHNauiUUR@work/) - - The following changes since commit 88603b6dc419445847923fcb7fe5080067a30f98: - - Linux 6.2-rc2 (2023-01-01 13:53:16 -0800) - - are available in the Git repository at: - - git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git tags/flex-array-transformations-6.3-rc1 - - for you to fetch changes up to b942a520d9e43bc31f0808d2f2267a1ddba75518: - - bcache: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper (2023-01-05 17:48:45 -0600) - - flexible-array transformations for 6.3-rc1 - - Please, pull the following patches that transform zero-length arrays, - in unions, into flexible arrays. These patches have been baking in - linux-next for the whole development cycle. - -* [v1: wifi: iwlwifi: dvm: Add struct_group for struct iwl_keyinfo keys](http://lore.kernel.org/linux-hardening/20230218191056.never.374-kees@kernel.org/) - - Function iwlagn_send_sta_key() was trying to write across multiple - structure members in a single memcpy(). Add a struct group "keys" to - let the compiler see the intended bounds of the memcpy, which includes - the tkip keys as well. Silences false positive memcpy() run-time - warning: - - memcpy: detected field-spanning write (size 32) of single field "sta_cmd.key.key" at drivers/net/wireless/intel/iwlwifi/dvm/sta.c:1103 (size 16) - -#### 异步 IO - -* [v3: io_uring: Add KASAN support for alloc caches](http://lore.kernel.org/io-uring/20230223164353.2839177-1-leitao@debian.org/) - - This patchset enables KASAN for alloc cache buffers. These buffers are - used by apoll and netmsg code path. These buffers will now be poisoned - when not used, so, if randomly touched, a KASAN warning will pop up. - -* [v1: for-next: io_uring: registered huge buffer optimisations](http://lore.kernel.org/io-uring/cover.1677041932.git.asml.silence@gmail.com/) - - Improve support for registered buffers consisting of huge pages by - keeping them as a single element bvec instead of chunking them into - 4K pages. It improves performance quite a bit cutting CPU cycles on - dma-mapping and promoting a more efficient use of hardware. - -* [v2: Add io_uring & ebpf based methods to implement zero-copy for ublk](http://lore.kernel.org/io-uring/20230222132534.114574-1-xiaoguang.wang@linux.alibaba.com/) - - Normally, userspace block device implementations need to copy data between - kernel block layer's io requests and userspace block device's userspace - daemon. For example, ublk and tcmu both have similar logic, but this - operation will consume cpu resources obviously, especially for large io. - -* [v1: tools/io_uring: tools/io_uring: correctly set "ret" for sq_poll case](http://lore.kernel.org/io-uring/20230221073736.628851-1-ZiyangZhang@linux.alibaba.com/) - - For sq_poll case, "ret" is not initialized or cleared/set. In this way, - output of this test program is incorrect and we can not even stop this - program by pressing CTRL-C. - - Reset "ret" to zero in each submission/completion round, and assign - "ret" to "this_reap". - -* [v1: liburing: test sends with huge pages](http://lore.kernel.org/io-uring/cover.1676941370.git.asml.silence@gmail.com/) - - Add huge pages support for zc send benchmark and huge pages - tests in send-zerocopy.c. - -* [v1: liburing: test/buf-ring: add test for buf ring occupying exactly one page](http://lore.kernel.org/io-uring/20230218184618.70966-1-wlukowicz01@gmail.com/) - - This shows an issue with how the kernel calculates buffer ring sizes - during their registration. - - Allocate two pages, register a buf ring fully occupying the first one, - while protecting the second one to make sure it's not used. The - registration should succeed. - -#### Rust For Linux - -* [v1: rust: xarray: Add an abstraction for XArray](http://lore.kernel.org/rust-for-linux/20230224-rust-xarray-v1-1-80f0904ce5d3@asahilina.net/) - - The XArray is an abstract data type which behaves like a very large - array of pointers. Add a Rust abstraction for this data type. - - The initial implementation uses explicit locking on get operations and - returns a guard which blocks mutation, ensuring that the referenced - object remains alive. - -* [v1: rust: time: New module for timekeeping functions](http://lore.kernel.org/rust-for-linux/20230221-gpu-up-time-v1-1-bf8fe74b7f55@asahilina.net/) - - This module is intended to contain functions related to kernel - timekeeping and time. Initially, this just wraps ktime_get() and - ktime_get_boottime() and returns them as core::time::Duration instances. - This is useful for drivers that need to implement simple retry loops and - timeouts. - -#### BPF - -* [v3: bpf-next: Add support for kptrs in more BPF maps](http://lore.kernel.org/bpf/20230225154010.391965-1-memxor@gmail.com/) - - This set adds support for kptrs in percpu hashmaps, percpu LRU hashmaps, - and local storage maps (covering sk, cgrp, task, inode). - - Tests are expanded to test more existing maps at runtime and also test - the code path for the local storage maps (which is shared by all - implementations). - -* [v2: bpf-next:: Add socket destroy capability](http://lore.kernel.org/bpf/20230223215311.926899-1-aditi.ghag@isovalent.com/) - - This patch adds the capability to destroy sockets in BPF. We plan to use - the capability in Cilium to force client sockets to reconnect when their - remote load-balancing backends are deleted. The other use case is - on-the-fly policy enforcement where existing socket connections prevented - by policies need to be terminated. - -* [v3: blk-ioprio: Introduce promote-to-rt policy](http://lore.kernel.org/bpf/20230223134852.3745349-1-houtao@huaweicloud.com/) - - Since commit a78418e6a04c ("block: Always initialize bio IO priority on - submit"), bio->bi_ioprio will never be IOPRIO_CLASS_NONE when calling - blkcg_set_ioprio(), so there will be no way to promote the io-priority - of one cgroup to IOPRIO_CLASS_RT, because bi_ioprio will always be - greater than or equals to IOPRIO_CLASS_RT. - -* [bpf: RFC for platform specific BPF helper addition](http://lore.kernel.org/bpf/0838bc96-c8a8-c326-a8f0-80240cf6b31a@linux.intel.com/) - - Some background first; on x86 platforms there is a free running TSC - counter which can be used to generate extremely accurate profiling time - stamps. Currently this can be used by BPF programs via hooking into perf - subsystem and reading the value there; however this reduces the accuracy - due to latency + jitter involved with long execution chain, and also the - timebase gets converted into relative from the start of the execution of - the program, instead of getting an absolute system level value. - -* [v2: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230223011238.12313-1-kuifeng@meta.com/) - - Previously, BPF struct_ops didn't go off, as even when the user - program creating it was terminated, none of these ever were pinned. - For instance, the TCP congestion control subsystem indirectly - maintains a reference count on the struct_ops of any registered BPF - implemented algorithm. Thus, the algorithm won't be deactivated until - someone deliberately unregisters it. - -* [[RFC/PATCHSET 0/8] perf record: Implement BPF sample filter (v3)](http://lore.kernel.org/bpf/20230222230141.1729048-1-namhyung@kernel.org/) - - There have been requests for more sophisticated perf event sample - filtering based on the sample data. Recently the kernel added BPF - programs can access perf sample data and this is the userspace part - to enable such a filtering. - - This still has some rough edges and needs more improvements. But - I'd like to share the current work and get some feedback for the - directions and idea for further improvements. - -* [v2: bpf-next: bpf: bpf memory usage](http://lore.kernel.org/bpf/20230222014553.47744-1-laoar.shao@gmail.com/) - - Currently we can't get bpf memory usage reliably. bpftool now shows the - bpf memory footprint, which is difference with bpf memory usage. The - -* [v1: bpf-next: bpf: Add bpf_cgroup_from_id() kfunc](http://lore.kernel.org/bpf/Y%2FVA+jP0mB5cMZEz@slm.duckdns.org/) - - cgroup ID is an userspace-visible 64bit value uniquely identifying a given - cgroup. As the IDs are used widely, it's useful to be able to look up the - matching cgroups. Add bpf_cgroup_from_id(). - -* [v1: bpf: Add support for absolute value BPF timers](http://lore.kernel.org/bpf/20230221151846.2218217-1-tero.kristo@linux.intel.com/) - - Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start() - to start an absolute value timer instead of the default relative value. - This makes the timer expire at an exact point in time, instead of a time - with latencies and jitter induced by both the BPF and timer subsystems. - This is useful e.g. in certain time sensitive profiling cases, where we - need a timer to expire at an exact point in time. - -* [v2: bpf-next: net/smc: Introduce BPF injection capability](http://lore.kernel.org/bpf/1676981919-64884-1-git-send-email-alibuda@linux.alibaba.com/) - - This PATCHes attempt to introduce BPF injection capability for SMC, - and add selftest to ensure code stability. - - As we all know that the SMC protocol is not suitable for all scenarios, - especially for short-lived. However, for most applications, they cannot - guarantee that there are no such scenarios at all. Therefore, apps - may need some specific strategies to decide shall we need to use SMC - or not, for example, apps can limit the scope of the SMC to a specific - IP address or port. - -* [v1: net-next: xsk: add linux/vmalloc.h to xsk.c](http://lore.kernel.org/bpf/20230221075140.46988-1-xuanzhuo@linux.alibaba.com/) - - Fix the failure of the compilation under the sh4. - - Because we introduced remap_vmalloc_range() earlier, this has caused - the compilation failure on the sh4 platform. So this introduction of the - header file of linux/vmalloc.h. - -* [v3: bpf-next: libbpf: allow users to set kprobe/uprobe attach mode](http://lore.kernel.org/bpf/20230221025347.389047-1-imagedong@tencent.com/) - - By default, libbpf will attach the kprobe/uprobe eBPF program in the - latest mode that supported by kernel. In this series, we add the support - to let users manually attach kprobe/uprobe in legacy/perf/link mode in - the 1th patch. - - And in the 2th patch, we split the testing 'attach_probe' into multi - subtests, as Andrii suggested. - - In the 3th patch, we add the testings for loading kprobe/uprobe in - -* [v1: bpf-next: libbpf: Document bpf_{btf,link,map,prog}_get_info_by_fd()](http://lore.kernel.org/bpf/20230220234958.764997-1-iii@linux.ibm.com/) - - Replace the short informal description with the proper doc comments. - -* [v1: bbpf: usdt arm arg parsing support](http://lore.kernel.org/bpf/20230220212233.13229-1-puranjay12@gmail.com/) - - Parsing of USDT arguments is architecture-specific; on arm it is - relatively easy since registers used are r[0-10], fp, ip, sp, lr, - pc. Format is slightly different compared to aarch64; forms are - - - "size @ [ reg, #offset ]" for dereferences, for example - "-8 @ [ sp, #76 ]" ; " -4 @ [ sp ]" - - "size @ reg" for register values; for example - "-4@r0" - - "size @ #value" for raw values; for example - "-8@#1" - - Add support for parsing USDT arguments for ARM architecture. - -* [v1: bpf-next: bpf: Check for helper calls in check_subprogs()](http://lore.kernel.org/bpf/20230220163756.753713-1-iii@linux.ibm.com/) - - The condition src_reg != BPF_PSEUDO_CALL && imm == BPF_FUNC_tail_call - may be satisfied by a kfunc call. This would lead to unnecessarily - setting has_tail_call. Use src_reg == 0 instead. - -* [v2: bpf-next: bpf: Allow reads from uninit stack](http://lore.kernel.org/bpf/20230219200427.606541-1-eddyz87@gmail.com/) - - This patch-set modifies BPF verifier to accept programs that read from - uninitialized stack locations, but only if executed in privileged mode. - This provides significant verification performance gains: 30% to 70% less - processed states for big number of test programs. - -### 周边技术动态 - -#### Qemu - -* [v1: Fourth RISC-V PR for QEMU 8.0, Attempt 2](http://lore.kernel.org/qemu-devel/20230224185908.32706-1-palmer@rivosinc.com/) - - The following changes since commit 417296c8d8588f782018d01a317f88957e9786d6: - - tests/qtest/netdev-socket: Raise connection timeout to 60 seconds (2023-02-09 11:23:53 +0000) - - are available in the Git repository at: - - git@github.com:palmer-dabbelt/qemu.git tags/pull-riscv-to-apply-20230224 - - for you to fetch changes up to 8c89d50c10afdd98da82642ca5e9d7af4f1c18bd: - - target/riscv: Fix vslide1up.vf and vslide1down.vf (2023-02-23 14:21:34 -0800) - - Fourth RISC-V PR for QEMU 8.0, Attempt 2 - -* [v8: riscv: Add support for Zicbo[m,z,p] instructions](http://lore.kernel.org/qemu-devel/20230224132536.552293-1-dbarboza@ventanamicro.com/) - - This version has a change in patch 2, proposed by Weiwei Li, where we're - now triggering virt_instruction_fault before triggering illegal_insn - fault from S mode. - -* [v1: target/riscv: Add support for Svadu extension](http://lore.kernel.org/qemu-devel/20230224040852.37109-1-liweiwei@iscas.ac.cn/) - - This patchset adds support svadu extension. It also fixes some relationship between *envcfg fields and Svpbmt/Sstc extensions. - - Specification for Svadu extension can be found in: - - https://github.com/riscv/riscv-svadu - - The port is available here: - https://github.com/plctlab/plct-qemu/tree/plct-svadu-upstream - -* [v2: NUMA: Apply socket-NUMA-node boundary for aarch64 and RiscV machines](http://lore.kernel.org/qemu-devel/20230223081401.248835-1-gshan@redhat.com/) - - For arm64 and RiscV architecture, the driver (/base/arch_topology.c) is - used to populate the CPU topology in the Linux guest. It's required that - the CPUs in one socket can't span mutiple NUMA nodes. Otherwise, the Linux - scheduling domain can't be sorted out, as the following warning message - indicates. To avoid the unexpected confusion, this series attempts to - rejects such kind of insane configurations. - -* [v1: target/riscv/vector_helper.c: create vext_set_tail_elems_1s()](http://lore.kernel.org/qemu-devel/20230221184525.140704-1-dbarboza@ventanamicro.com/) - - Commit 752614cab8e6 ("target/riscv: rvv: Add tail agnostic for vector - load / store instructions") added code to set the tail elements to 1 in - the end of vext_ldst_stride(), vext_ldst_us(), vext_ldst_index() and - vext_ldff(). Aside from a env->vl versus an evl value being used in the - first loop, the code is being repeated 4 times. - -* [v1: target/riscv: Add support for Zicond extension](http://lore.kernel.org/qemu-devel/20230221091009.36545-1-liweiwei@iscas.ac.cn/) - - The spec can be found in https://github.com/riscv/riscv-zicond. - Two instructions are added: - - czero.eqz: Moves zero to a register rd, if the condition rs2 is - equal to zero, otherwise moves rs1 to rd. - - czero.nez: Moves zero to a register rd, if the condition rs2 is - nonzero, otherwise moves rs1 to rd. - -#### U-Boot - -* [v1: Add StarFive JH7110 PCIe drvier support](http://lore.kernel.org/u-boot/20230223105240.15180-1-minda.chen@starfivetech.com/) - - The PCIe driver depends on gpio, pinctrl, clk and reset driver to do init. - The PCIe dts configuation includes all these setting. - - The PCIe drivers codes has been tested on the VisionFive V2 boards. - The test devices includes M.2 NVMe SSD and Realtek 8169 Ethernet adapter. - -* [Boot from 64-bit memory address?](http://lore.kernel.org/u-boot/BL3PR11MB5713975ADD19187E59776A2389AB9@BL3PR11MB5713.namprd11.prod.outlook.com/) - - Is it possible to boot from a DRAM memory address beyond the 32-bit boundary? I'm trying to configure a new RISC-V board which has 2GB of DRAM starting at offset 0x40_0000_0000. I started from the settings for an existing RISC-V board and made adjustments for my HW, but when I try to boot, I run into an "out of memory" error. - -* [v1: riscv: Support CONFIG_REMAKE_ELF](http://lore.kernel.org/u-boot/20230220060239.42279-1-samuel@sholland.org/) - - Add flags to tell objcopy what kind of ELF to create. - -## 20230219:第 34 期 - -### 内核动态 - -#### RISC-V 架构支持 - -* [v1: Add dead syscalls elimination support](http://lore.kernel.org/linux-riscv/cover.1676594211.git.falcon@tinylab.org/) - - CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION allows to eliminate dead code - and data, this patchset allows to further eliminate dead syscalls which - are not used in target system. - -* [v2: Add basic ACPI support for RISC-V](http://lore.kernel.org/linux-riscv/20230216182043.1946553-1-sunilvl@ventanamicro.com/) - - This patch series enables the basic ACPI infrastructure for RISC-V. - Supporting external interrupt controllers is in progress and hence it is - tested using poll based HVC SBI console and RAM disk. - -* [v1: dt-bindings: riscv: correct starfive visionfive 2 compatibles](http://lore.kernel.org/linux-riscv/20230216131511.3327943-1-conor.dooley@microchip.com/) - - Using "va" and "vb" doesn't match what's written on the board, or the - communications from StarFive. - Switching to using the silkscreened version number will ease confusion & - the risk of another spin of the board containing a "conflicting" version identifier. - -* [v3: RISC-V: Don't check text_mutex during stop_machine](http://lore.kernel.org/linux-riscv/20230215164317.727657-1-conor@kernel.org/) - - We're currently using stop_machine() to update ftrace, which means that - the thread that takes text_mutex during ftrace_prepare() may not be the - same as the thread that eventually patches the code. This isn't - actually a race because the lock is still held (preventing any other - concurrent accesses) and there is only one thread running during - stop_machine(), but it does trigger a lockdep failure. - -* [v1: riscv: Introduce KASLR](http://lore.kernel.org/linux-riscv/20230215145113.465558-1-alexghiti@rivosinc.com/) - - The following KASLR implementation allows to randomize the kernel mapping: - - - virtually: we expect the bootloader to provide a seed in the device-tree - - physically: only implemented in the EFI stub, it relies on the firmware to - provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation - hence the patch 3 factorizes KASLR related functions for riscv to take advantage. - -* [v1: riscv: Avoid enabling interrupts in die()](http://lore.kernel.org/linux-riscv/20230215144828.3370316-1-mnissler@rivosinc.com/) - - While working on something else, I noticed that the kernel would start - accepting interrupts again after crashing in an interrupt handler. Since - the kernel is already in inconsistent state, enabling interrupts is - dangerous and opens up risk of kernel state deteriorating further. - -* [v8: Introduce 64b relocatable kernel](http://lore.kernel.org/linux-riscv/20230215143626.453491-1-alexghiti@rivosinc.com/) - - After multiple attempts, this patchset is now based on the fact that the - 64b kernel mapping was moved outside the linear mapping. - - The first patch allows to build relocatable kernels but is not selected - by default. That patch is a requirement for KASLR. - -* [v1: bpf-next: Support bpf trampoline for RV64](http://lore.kernel.org/linux-riscv/20230215135205.1411105-1-pulehui@huaweicloud.com/) - - BPF trampoline is the critical infrastructure of the bpf - subsystem, acting as a mediator between kernel functions - and BPF programs. Numerous important features, such as - using ebpf program for zero overhead kernel introspection, - rely on this key component. - -* [v4: StarFive's SDIO/eMMC driver support](http://lore.kernel.org/linux-riscv/20230215113249.47727-1-william.qiu@starfivetech.com/) - - This patchset adds initial rudimentary support for the StarFive - designware mobile storage host controller driver. And this driver will - be used in StarFive's VisionFive 2 board. The main purpose of adding - this driver is to accommodate the ultra-high speed mode of eMMC. - -* [v1: MAINTAINERS: repair file entry for STARFIVE JH7110 MMC/SD/SDIO DRIVER](http://lore.kernel.org/linux-riscv/20230215080203.27445-1-lukas.bulwahn@gmail.com/) - - Commit bfde6b3869f5 ("mmc: starfive: Add sdio/emmc driver support") adds a - section in MAINTAINERS refering to the file drivers/mmc/dw_mmc-starfive.c, - but the file is actually located at drivers/mmc/host/dw_mmc-starfive.c. - -* [v1: RISC-V: Guard alternative asm macros with !LINKER_SCRIPT](http://lore.kernel.org/linux-riscv/20230214201358.10647-1-palmer@rivosinc.com/) - - Without this I get a handful of .macro related directives that trip up LD. - -* [v2: RISC-V: Enable dead code elimination](http://lore.kernel.org/linux-riscv/20230214150959.49088-1-falcon@tinylab.org/) - - Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for RISC-V, allowing - the user to enable dead code elimination. In order for this to work, - ensure that we keep the alternative table by annotating them with KEEP. - -* [[PATCH v1 RFC Zisslpcfi 00/20] riscv control-flow integrity for U mode](http://lore.kernel.org/linux-riscv/20230213045351.3945824-1-debug@rivosinc.com/) - - I've been working on linux support for shadow stack and landing pad - instruction on riscv for a while. - - These are still RFC quality. But atleast they're in a shape which can - start a discussion and I can get some feedback. So I decided to sending out patches. - -* [v2: Add RISC-V 32 NOMMU support](http://lore.kernel.org/linux-riscv/20230212205506.1992714-1-Mr.Bossman075@gmail.com/) - - This patch-set aims to add NOMMU support to RV32. - Many people want to build simple emulators or HDL - models of RISC-V this patch makes it possible to run linux on them. - -* [v1: RISC-V: take text_mutex during alternative patching](http://lore.kernel.org/linux-riscv/20230212194735.491785-1-conor@kernel.org/) - - This issue was exposed by 702e64550b12 ("riscv: fpu: switch has_fpu() to - riscv_has_extension_likely()"), as it is the patching in has_fpu() that - triggers the splats in Guenter's report. - -#### 进程调度 - -* [v1: sched: Consider task_struct::saved_state in wait_task_inactive().](http://lore.kernel.org/lkml/Y++UzubyNavLKFDP@linutronix.de/) - - wait_task_inactive() waits for thread to unschedule in a certain task state. - - Check also for task_struct::saved_state if the desired match was not found in - task_struct::__state on PREEMPT_RT. If the state was found in saved_state, wait - until the task is idle and state is visible in task_struct::__state. - -* [v1: net: sched: sch: null pointer dereference in htb_offload_move_qdisc()](http://lore.kernel.org/lkml/20230216104939.3553390-1-alok.a.tiwari@oracle.com/) - - A possible case of null pointer dereference detected by static analyzer - htb_destroy_class_offload() is calling htb_find() which can return NULL value - for invalid class id, moved_cl=htb_find(classid, sch); - in that case it should not pass 'moved_cl' to htb_offload_move_qdisc() if 'moved_cl' is NULL pointer return -EINVAL. - -* [v1: sched: sd_llc_id initialized](http://lore.kernel.org/lkml/20230215015435.100559-1-sunshouxin@chinatelecom.cn/) - - In my test,I use isolcpus to isolate cpu for specific, - and then I noticed different scenario when core binding. - -* [v1: sched/fair: Interleave cfs bandwidth timers for improved single thread performance at low utilization](http://lore.kernel.org/lkml/9c57c92c-3e0c-b8c5-4be9-8f4df344a347@linux.vnet.ibm.com/) - - CPU cfs bandwidth controller uses hrtimer called period timer. Quota is - refilled upon the timer expiry and re-started when there are running tasks - within the cgroup. Each cgroup has a separate period timer which manages - the period and quota for that cgroup. - -#### 内存管理 - -* [v6: Shadow stacks for userspace](http://lore.kernel.org/linux-mm/20230218211433.26859-1-rick.p.edgecombe@intel.com/) - - This series implements Shadow Stacks for userspace using x86's Control-flow - Enforcement Technology (CET). CET consists of two related security features: - shadow stacks and indirect branch tracking. This series implements just the - shadow stack part of this feature, and just for userspace. - -* [v1: Add flag as THP allocation hint for memfd_restricted() syscall](http://lore.kernel.org/linux-mm/cover.1676680548.git.ackerleytng@google.com/) - - This patchset builds upon the memfd_restricted() system call that has - been discussed in the ‘KVM: mm: fd-based approach for supporting KVM’ - patch series, at - https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/#m7e944d7892afdd1d62a03a287bd488c56e377b0c - -* [v2: hugetlb: introduce HugeTLB high-granularity mapping](http://lore.kernel.org/linux-mm/20230218002819.1486479-1-jthoughton@google.com/) - - This series introduces the concept of HugeTLB high-granularity mapping - (HGM). This series teaches HugeTLB how to map HugeTLB pages at - high-granularity, similar to how THPs can be PTE-mapped. - -* [v1: 5.15: of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem](http://lore.kernel.org/linux-mm/20230217200731.285514-1-isaacmanjarres@google.com/) - - commit ce4d9a1ea35ac5429e822c4106cb2859d5c71f3e upstream. - - Patch series "Fix kmemleak crashes when scanning CMA regions", v2. - -* [v2: drm-next: v1: DRM GPUVA Manager & Nouveau VM_BIND UAPI](http://lore.kernel.org/linux-mm/20230217134422.14116-1-dakr@redhat.com/) - - This patch series provides a new UAPI for the Nouveau driver in order to - support Vulkan features, such as sparse bindings and sparse residency. - - Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to - keep track of GPU virtual address (VA) mappings in a more generic way. - -* [v5: mm/userfaultfd: Support WP on multiple VMAs](http://lore.kernel.org/linux-mm/20230217105558.832710-1-usama.anjum@collabora.com/) - - This is a simple use case where user may or may not know if the memory - area has been divided into multiple VMAs. - - We need an implementation which doesn't disrupt the already present - users. So keeping things simple, stop going over all the VMAs if any one - of the VMA hasn't been registered in WP mode. While at it, remove the - un-needed error check as well. - -* [v1: mm-unstable: mm/kvm: lockless accessed bit harvest](http://lore.kernel.org/linux-mm/20230217041230.2417228-1-yuzhao@google.com/) - - This patchset RCU-protects KVM page tables and compare-and-exchanges - KVM PTEs with the accessed bit set by hardware. It significantly - improves the performance of guests when the host is under heavy memory pressure. - -* [RFC for new feature to move pages from one vma to another without split](http://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyjniNVjp0Aw@mail.gmail.com/) - - Requesting comments on a new feature which remaps pages from one - private anonymous mapping to another, without altering the vmas - involved. Two alternatives exist but both have drawbacks: - 1. userfaultfd ioctls allocate new pages, copy data and free the old - ones even when updates could be done in-place; - 2. mremap results in vma splitting in most of the cases due to 'pgoff' mismatch. - -* [v2: dm-crypt: allocate compound pages if possible](http://lore.kernel.org/linux-mm/alpine.LRH.2.21.2302161619430.5436@file01.intranet.prod.int.rdu2.redhat.com/) - - It was reported that allocating pages for the write buffer in dm-crypt - causes measurable overhead [1]. - - This patch changes dm-crypt to allocate compound pages if they are - available. If not, we fall back to the mempool. - - [1] https://listman.redhat.com/archives/dm-devel/2023-February/053284.html - -* [v2: kasan: call clear_page with a match-all tag instead of changing page tag](http://lore.kernel.org/linux-mm/20230216195924.3287772-1-pcc@google.com/) - - Instead of changing the page's tag solely in order to obtain a pointer - with a match-all tag and then changing it back again, just convert the - pointer that we get from kmap_atomic() into one with a match-all tag - before passing it to clear_page(). - -* [v4: mm: ioremap: Convert architectures to take GENERIC_IOREMAP way](http://lore.kernel.org/linux-mm/20230216123419.461016-1-bhe@redhat.com/) - - Currently, many architecutres have't taken the standard GENERIC_IOREMAP - way to implement ioremap_prot(), iounmap(), and ioremap_xx(), but make - these functions specifically under each arch's folder. Those cause many duplicated codes of ioremap() and iounmap(). - -* [v1: mm, page_alloc: reduce page alloc/free sanity checks](http://lore.kernel.org/linux-mm/20230216095131.17336-1-vbabka@suse.cz/) - - Historically, we have performed sanity checks on all struct pages being - allocated or freed, making sure they have no unexpected page flags or - certain field values. This can detect insufficient cleanup and some - cases of use-after-free, although on its own it can't always identify - the culprit. The result is a warning and the "bad page" being leaked. - -#### 文件系统 - -* [v4: ext4: Convert inode preallocation list to an rbtree](http://lore.kernel.org/linux-fsdevel/cover.1676634592.git.ojaswin@linux.ibm.com/) - - This patch series aim to improve the performance and scalability of - inode preallocation by changing inode preallocation linked list to an - rbtree. I've ran xfstests quick on this series and plan to run auto group - as well to confirm we have no regressions. - -* [GIT PULL: Fsnotify changes for 6.3-rc1](http://lore.kernel.org/linux-fsdevel/20230217112939.daimrvd7uivov5eu@quack3/) - - since I'm on vacation next week I'm sending my pull requests for the - merge window a bit earlier. Could you please pull from - - git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify_for_v6.3-rc1 - - to get support for auditing decisions regarding fanotify permission events. - -* [[GIT PULL for-6.3] Make building the legacy dio code conditional](http://lore.kernel.org/linux-fsdevel/754b3cc0-c420-3257-9569-833c42f93808@kernel.dk/) - - The following changes since commit 2241ab53cbb5cdb08a6b2d4688feb13971058f65: - - Linux 6.2-rc5 (2023-01-21 16:27:01 -0800) - - are available in the Git repository at: - - git://git.kernel.dk/linux.git tags/for-6.3/dio-2023-02-16 - - for you to fetch changes up to 9636e650e16f6b01f0044f7662074958c23e4707: - - fs: build the legacy direct I/O code conditionally (2023-01-26 10:30:56 -0700) - -* [v2: eventfd: use wait_event_interruptible_locked_irq() helper](http://lore.kernel.org/linux-fsdevel/tencent_98334C552AB55C90FCE4523A327393DFF606@qq.com/) - - wait_event_interruptible_locked_irq was introduced by commit 22c43c81a51e - ("wait_event_interruptible_locked() interface"), but older code such as - eventfd_{write,read} still uses the open code implementation. - Inspired by commit 8120a8aadb20 - ("fs/timerfd.c: make use of wait_event_interruptible_locked_irq()"), this - patch replaces the open code implementation with a single macro call. - -* [GIT PULL: i_version handling changes for v6.3](http://lore.kernel.org/linux-fsdevel/0d67a8a252ef22c6506f45761c2f7d1185a44190.camel@kernel.org/) - - The following changes since commit 948ef7bb70c4acaf74d87420ea3a1190862d4548: - - Merge tag 'modules-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux (2023-01-24 18:19:44 -0800) - - are available in the Git repository at: - - https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git tags/iversion-v6.3 - - for you to fetch changes up to 58a033c9a3e003e048a0431a296e58c6b363b02b: - - nfsd: remove fetch_iversion export operation (2023-01-26 07:00:06 -0500) - -* [v1: chardev: make kobj_type structures constant](http://lore.kernel.org/linux-fsdevel/20230216-kobj_type-chardev-v1-1-94e213b73e85@weissschuh.net/) - - Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") - the driver core allows the usage of const struct kobj_type. - - Take advantage of this to constify the structure definitions to prevent - modification at runtime. - -* [v1: Revert boot-breaking changes in fs/](http://lore.kernel.org/linux-fsdevel/20230215-topic-next-20230214-revert-v1-0-c58cd87b9086@linaro.org/) - - next-20230213 introduced commit d9722a475711 ("splice: Do splice read from - a buffered file without using ITER_PIPE") which broke booting on any - Qualcomm ARM64 device I grabbed, dereferencing a null pointer in - generic_filesplice_read+0xf8/x598. Revert it (and its dependency) - (or accept better solutions should anybody come up with such) to make - them bootable again. - -* [v1: mm: userfaultfd: add UFFDIO_CONTINUE_MODE_WP to install WP PTEs](http://lore.kernel.org/linux-fsdevel/20230214215046.1187635-1-axelrasmussen@google.com/) - - UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new - PTE to resolve a missing fault, one can install a write-protected one. - This is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in combination. - -* [v14: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-fsdevel/20230214171330.2722188-1-dhowells@redhat.com/) - - Here are patches to provide support for extracting pages from an iov_iter - and to use this in the extraction functions in the block layer bio code. - -* [Attending LFS (was: v2: FUSE BPF: A Stacked Filesystem Extension for FUSE)](http://lore.kernel.org/linux-fsdevel/56d5ac0e-4c54-46b7-85d3-5de127562630@app.fastmail.com/) - - I wouldn't be able to get the travel funded by my employer, and I don't think I'm a suitable recipient for the Linux Foundation's travel fund. Therefore, I think it would make more sense for me to attend potentially relevant sessions remotely. - -* [v3: iov_iter: Adjust styling/location of new splice functions](http://lore.kernel.org/linux-fsdevel/20230214083710.2547248-1-dhowells@redhat.com/) - - Here are patches to make some changes that Christoph requested[1] to the new generic file splice functions that I implemented[2]. - - I've also updated worked the changes into the commits on my iov-extract - branch if that would be preferable, though that means Jens would need to - update his for-6.3/iov-extract again. - -* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) - - This removes the dependency on interrupts to wake up task. Set task - state as TASK_RUNNING, if need_resched() returns true, - while polling for IO completion. - Earlier, polling task used to sleep, relying on interrupt to wake it up. - This made some IO take very long when interrupt-coalescing is enabled in NVMe. - -#### 网络设备 - -* [v1: net: fec: Allow turning off IRQ coalescing](http://lore.kernel.org/netdev/20230218214037.16977-1-richard@nod.at/) - - Setting tx/rx-frames or tx/rx-usecs to zero is currently possible but - has no effect. - Also IRQ coalescing is always enabled on supported hardware. - -* [v1: wifi: iwlwifi: dvm: Add struct_group for struct iwl_keyinfo keys](http://lore.kernel.org/netdev/20230218191056.never.374-kees@kernel.org/) - - Function iwlagn_send_sta_key() was trying to write across multiple - structure members in a single memcpy(). Add a struct group "keys" to - let the compiler see the intended bounds of the memcpy, which includes - the tkip keys as well. - -* [v3: bpf-next: xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support](http://lore.kernel.org/netdev/167673444093.2179692.14745621008776172374.stgit@firesoul/) - - When driver doesn't implement a bpf_xdp_metadata kfunc the default - implementation returns EOPNOTSUPP, which indicate device driver doesn't - implement this kfunc. - -* [v2: net-next: net: phy: micrel: Add support for PTP_PF_PEROUT for lan8841](http://lore.kernel.org/netdev/20230218123038.2761383-1-horatiu.vultur@microchip.com/) - - Lan8841 has 10 GPIOs and it has 2 events(EVENT_A and EVENT_B). It is - possible to assigned the 2 events to any of the GPIOs, but a GPIO can - have only 1 event at a time. - These events are used to generate periodic signals. It is possible to - configure the length, the start time and the period of the signal by - configuring the event. - -* [v1: mt76: mt7915: expose device tree match table](http://lore.kernel.org/netdev/20230218112946.3039855-1-lorenz@brun.one/) - - On MT7986 the WiFi driver currently does not get automatically loaded, - requiring manual modprobing because the device tree compatibles are not - exported into metadata. - - Add the missing MODULE_DEVICE_TABLE macro to fix this. - -* [v1: bnxt: avoid overflow in bnxt_get_nvram_directory()](http://lore.kernel.org/netdev/20230218095024.23193-1-korotkov.maxim.s@gmail.com/) - - The value of an arithmetic expression is subject - of possible overflow due to a failure to cast operands to a larger data - type before performing arithmetic. Used macro for multiplication instead - operator for avoiding overflow. - -* [v1: dt-bindings: net: dsa: mediatek,mt7530: change some descriptions to literal](http://lore.kernel.org/netdev/20230218072348.13089-1-arinc.unal@arinc9.com/) - - The line endings must be preserved on gpio-controller, io-supply, and - reset-gpios properties to look proper when the YAML file is parsed. - -* [v1: nf: netfilter: use skb len to match in length_mt6](http://lore.kernel.org/netdev/361acd69270a8c2746da5774644dda9147b407a1.1676676177.git.lucien.xin@gmail.com/) - - For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0, - and its real payload_len ( > 65535) is saved in hbh exthdr. With 0 - length for the jumbo packets, it may mismatch. - -* [v3: net-next: pds_core driver](http://lore.kernel.org/netdev/20230217225558.19837-1-shannon.nelson@amd.com/) - - This patchset implements new driver for use with the AMD/Pensando - Distributed Services Card (DSC), intended to provide core configuration - services through the auxiliary_bus for VFio and vDPA feature specific drivers. - -* [v13: net-next: net/sched: cls_api: Support hardware miss to tc action](http://lore.kernel.org/netdev/20230217223620.28508-1-paulb@nvidia.com/) - - This series adds support for hardware miss to instruct tc to continue execution - in a specific tc action instance on a filter's action list. The mlx5 driver patch - (besides the refactors) shows its usage instead of using just chain restore. - -* [v3: page_pool: add a comment explaining the fragment counter usage](http://lore.kernel.org/netdev/20230217222130.85205-1-ilias.apalodimas@linaro.org/) - - When reading the page_pool code the first impression is that keeping - two separate counters, one being the page refcnt and the other being fragment pp_frag_count, is counter-intuitive. - -* [v6: intel-next: i40e: support XDP multi-buffer](http://lore.kernel.org/netdev/20230217191515.166819-1-tirthendu.sarkar@intel.com/) - - This patchset adds multi-buffer support for XDP. Tx side already has - support for multi-buffer. This patchset focuses on Rx side. The last - patch contains actual multi-buffer changes while the previous ones are - preparatory patches. - -* [v1: net-next: net: bcmgenet: Support wake-up from s2idle](http://lore.kernel.org/netdev/20230217183415.3300158-1-f.fainelli@gmail.com/) - - When we suspend into s2idle we also need to enable the interrupt line - that generates the MPD and HFB interrupts towards the host CPU interrupt - controller (typically the ARM GIC or MIPS L1) to make it exit s2idle. - -* [v1: net-next: scm: add user copy checks to put_cmsg()](http://lore.kernel.org/netdev/20230217182454.2432057-1-edumazet@google.com/) - - This is a followup of commit 2558b8039d05 ("net: use a bounce - buffer for copying skb->mark") - - x86 and powerpc define user_access_begin, meaning - that they are not able to perform user copy checks - when using user_write_access_begin() / unsafe_copy_to_user() and friends [1] - -* [v3: net-next: net: lan966x: Use automatic selection of VCAP rule actionset](http://lore.kernel.org/netdev/20230217132831.2508465-1-horatiu.vultur@microchip.com/) - - Since commit 81e164c4aec5 ("net: microchip: sparx5: Add automatic - selection of VCAP rule actionset") the VCAP API has the capability to - select automatically the actionset based on the actions that are attached - to the rule. So it is not needed anymore to hardcode the actionset in the - driver, therefore it is OK to remove this. - -* [v2: net-next: net: default_rps_mask follow-up](http://lore.kernel.org/netdev/cover.1676635317.git.pabeni@redhat.com/) - - The first patch namespacify the setting. In the common case, once - proper isolation is in place in the main namespace, forwarding - to/from each child netns will allways happen on the desidered CPUs. - -* [v1: net-next: net: virtio_net: implement exact header length guest feature](http://lore.kernel.org/netdev/20230217121547.3958716-1-jiri@resnulli.us/) - - virtio_net_hdr_from_skb() fills up hdr_len to skb_headlen(skb). - - Virtio spec introduced a feature VIRTIO_NET_F_GUEST_HDRLEN which when - set implicates that the driver provides the exact size of the header. - -* [v2: bpf-next: xdp: bpf_xdp_metadata use NODEV for no device support](http://lore.kernel.org/netdev/167663589722.1933643.15760680115820248363.stgit@firesoul/) - - With our XDP-hints kfunc approach, where individual drivers overload the - default implementation, it can be hard for API users to determine - whether or not the current device driver have this kfunc available. - -* [v2: net-next: add ethtool categorized statistics](http://lore.kernel.org/netdev/20230217110211.433505-1-rakesh.sankaranarayanan@microchip.com/) - - Patch series contain following changes: - - add categorized ethtool statistics for Microchip KSZ series switches, - support "eth-mac", "eth-phy", "eth-ctrl", "rmon" parameters with - ethtool statistics command. mib parameter index are same for all - KSZ family switches except KSZ8830. So, functions can be re-used - across all KSZ Families (except KSZ8830) and LAN937x series. Create - separate functions for KSZ8830 with their mib parameters. - - Remove num_alus member from ksz_chip_data structure since it is unused - -* [v2: net/core: add optional threading for rps backlog processing](http://lore.kernel.org/netdev/20230217100606.1234-1-nbd@nbd.name/) - - When dealing with few flows or an imbalance on CPU utilization, static RPS - CPU assignment can be too inflexible. Add support for enabling threaded NAPI - for RPS backlog processing in order to allow the scheduler to better balance - processing. This helps better spread the load across idle CPUs. - -* [v1: wifi: rtl8xxxu: add LEDS_CLASS dependency](http://lore.kernel.org/netdev/20230217095910.2480356-1-arnd@kernel.org/) - - rtl8xxxu now unconditionally uses LEDS_CLASS, so a Kconfig dependency - is required to avoid link errors: - - aarch64-linux-ld: drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.o: in function `rtl8xxxu_disconnect': - rtl8xxxu_core.c:(.text+0x730): undefined reference to `led_classdev_unregister' - -* [v1: bpf-next: selftests/bpf: run mptcp in a dedicated netns](http://lore.kernel.org/netdev/20230217082607.3309391-1-liuhangbin@gmail.com/) - - The current mptcp test is run in init netns. If the user or default - system config disabled mptcp, the test will fail. Let's run the mptcp - test in a dedicated netns to avoid none kernel default mptcp setting. - -* [v1: Rework MAC drivers EEE support](http://lore.kernel.org/netdev/20230217034230.1249661-1-andrew@lunn.ch/) - - phy_init_eee() is supposed to be called once auto-neg has been - completed to determine if EEE should be used with the current link - mode. The MAC hardware should then be configured to either enable or - disable EEE. Many drivers get this wrong, calling phy_init_eee() once, - or only in the ethtool set_eee callback. - -* [v3: net-next: net/mlx5e: Add GBP VxLAN HW offload support](http://lore.kernel.org/netdev/20230217033925.160195-1-gavinl@nvidia.com/) - - This patch series adds HW offloading support for TC flows with VxLAN GBP encap/decap. - -* [v1: net-next: net: phy: Read EEE abilities when using .features](http://lore.kernel.org/netdev/20230217031520.1249198-1-andrew@lunn.ch/) - - A PHY driver can use a static integer value to indicate what link mode - features it supports, i.e, its abilities.. This is the old way, but - useful when dynamically determining the devices features does not - work, e.g. support of fibre. - -* [v1: net-next: Add additional phydev locks](http://lore.kernel.org/netdev/20230217030714.1249009-1-andrew@lunn.ch/) - - The phydev lock should be held when accessing members of phydev, or - calling into the driver. Some of the phy_ethtool_ functions are - missing locks. Add them. To avoid deadlock the marvell driver is - modified since it calls one of the functions which gain locks, which - would result in a deadlock. - -* [v1: net-next: Add tc-mqprio and tc-taprio support for preemptible traffic classes](http://lore.kernel.org/netdev/20230216232126.3402975-1-vladimir.oltean@nxp.com/) - - The last RFC in August 2022 contained a proposal for the UAPI of both - TSN standards which together form Frame Preemption (802.1Q and 802.3): - https://patchwork.kernel.org/project/netdevbpf/cover/20220816222920.1952936-1-vladimir.oltean@nxp.com/ - -* [v10: bpf-next: Add skb + xdp dynptrs](http://lore.kernel.org/netdev/20230216225524.1192789-1-joannelkoong@gmail.com/) - - When comparing the differences in runtime for packet parsing without dynptrs - vs. with dynptrs, there is no noticeable difference. Patch 9 contains more - details as well as examples of how to use skb and xdp dynptrs. - -* [v3: can: esd_usb: Some more preparation for supporting esd CAN-USB/3](http://lore.kernel.org/netdev/20230216190450.3901254-1-frank.jungclaus@esd.eu/) - - Another small batch of patches to be seen as preparation for adding - support of the newly available esd CAN-USB/3 to esd_usb.c. - - Due to some unresolved questions adding support for - CAN_CTRLMODE_BERR_REPORTING has been postponed to one of the future - patches. - -#### 安全增强 - -* [v3: smb3: Replace smb2pdu 1-element arrays with flex-arrays](http://lore.kernel.org/linux-hardening/20230218002436.give.204-kees@kernel.org/) - - The kernel is globally removing the ambiguous 0-length and 1-element - arrays in favor of flexible arrays, so that we can gain both compile-time - and run-time array bounds checking[1]. - -* [v1: wifi: brcmfmac: p2p: Introduce generic flexible array frame member](http://lore.kernel.org/linux-hardening/20230215224110.never.022-kees@kernel.org/) - - Silence run-time memcpy() false positive warning when processing - management frames: - - memcpy: detected field-spanning write (size 27) of single field "&mgmt_frame->u" at drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c:1469 (size 26) - -* [v1: cifs: Replace remaining 1-element arrays](http://lore.kernel.org/linux-hardening/20230215000945.never.734-kees@kernel.org/) - - The kernel is globally removing the ambiguous 0-length and 1-element - arrays in favor of flexible arrays, so that we can gain both compile-time - and run-time array bounds checking[1]. - -* [v1: cifs: Convert struct fealist away from 1-element array](http://lore.kernel.org/linux-hardening/20230215000832.never.591-kees@kernel.org/) - - The kernel is globally removing the ambiguous 0-length and 1-element - arrays in favor of flexible arrays, so that we can gain both compile-time - and run-time array bounds checking[1]. - -#### 异步 IO - -* [v1: Cache tctx cancelation state in the ctx](http://lore.kernel.org/io-uring/20230217155600.157041-1-axboe@kernel.dk/) - - One of the more expensive parts of io_req_local_work_add() is that it - has to pull in the remote task tctx to check for the very unlikely event - that we are in a cancelation state. - -* [[GIT PULL for-6.3] Switch io_uring to ITER_UBUF](http://lore.kernel.org/io-uring/7ec9c3d0-1028-4d58-8ef1-0cce3083696c@kernel.dk/) - - Since we now have ITER_UBUF available, switch to using it for single - ranges as it's more efficient than ITER_IOVEC for that. - -* [v2: io_uring: Adjust mapping wrt architecture aliasing requirements](http://lore.kernel.org/io-uring/Y+3kwh8BokobVl6o@p100/) - - Some architectures have memory cache aliasing requirements (e.g. parisc) - if memory is shared between userspace and kernel. This patch fixes the - kernel to return an aliased address when asked by userspace via mmap(). - -* [v1: Add io_uring & ebpf based methods to implement zero-copy for ublk](http://lore.kernel.org/io-uring/20230215004122.28917-1-xiaoguang.wang@linux.alibaba.com/) - - Normally, userspace block device impementations need to copy data between - kernel block layer's io requests and userspace block device's userspace - daemon, for example, ublk and tcmu both have similar logic, but this - operation will consume cpu resources obviously, especially for large io. - -* [v1: test/fsnotify: Skip fsnotify test if sys/fanotify.h not available](http://lore.kernel.org/io-uring/20230214164613.2844230-1-alviro.iskandar@gnuweeb.org/) - - Fix build on Termux (Android). Most android devices don't have - on Termux. Skip the test if it's not available. - -#### Rust For Linux - -* [GIT PULL: Rust for 6.3](http://lore.kernel.org/rust-for-linux/20230212183249.162376-1-ojeda@kernel.org/) - - A new set of features for the Rust support. - - By the time you pick this, these commits will have been in linux-next - for quite a while. No conflicts expected. No changes to the C side. - -#### BPF - -* [v1: dwarves: dwarves: change BTF encoding skip logic for functions](http://lore.kernel.org/bpf/1676675433-10583-1-git-send-email-alan.maguire@oracle.com/) - - It has been observed [1] that the recent dwarves changes - that skip BTF encoding for functions that have optimized-out - parameters are too aggressive, leading to missing kfuncs - which generate warnings and a BPF selftest failure. - -* [v1: bpf-next: libbpf: Make uprobe attachment APK aware](http://lore.kernel.org/bpf/20230217191908.1000004-1-deso@posteo.net/) - - On Android, APKs (android packages; zip packages with somewhat - prescriptive contents) are first class citizens in the system: the - shared objects contained in them don't exist in unpacked form on the - file system. Rather, they are mmaped directly from within the archive - and the archive is also what the kernel is aware of. - -* [v1: bpf-next: bpf: Tidy up verifier checking](http://lore.kernel.org/bpf/20230217005451.2438147-1-joannelkoong@gmail.com/) - - This change refactors check_mem_access() to check against the base type of - the register, and uses switch case checking instead of if / else if - checks. This change also uses the existing clear_called_saved_regs() - function for resetting caller saved regs in check_helper_call(). - -* [v1: bpf-next: Allow reads from uninit stack](http://lore.kernel.org/bpf/20230216183606.2483834-1-eddyz87@gmail.com/) - - This patch-set modifies BPF verifier to accept programs that read from - uninitialized stack locations, but only if executed in privileged mode. - This provides significant verification performance gains: 30% to 70% less - processed states for big number of test programs. - -* [v1: intel-net: ice: xsk: disable txq irq before flushing hw](http://lore.kernel.org/bpf/20230216122839.6878-1-maciej.fijalkowski@intel.com/) - - ice_qp_dis() intends to stop a given queue pair that is a target of xsk - pool attach/detach. One of the steps is to disable interrupts on these - queues. It currently is broken in a way that txq irq is turned off - *after* HW flush which in turn takes no effect. - -* [v4: net-next: xsk: support use vaddr as ring](http://lore.kernel.org/bpf/20230216083047.93525-1-xuanzhuo@linux.alibaba.com/) - - When we try to start AF_XDP on some machines with long running time, due - to the machine's memory fragmentation problem, there is no sufficient - contiguous physical memory that will cause the start failure. - -* [v2: bpf-next: bpf: Only allocate one bpf_mem_cache for bpf_cpumask_ma](http://lore.kernel.org/bpf/20230216024821.2202916-1-houtao@huaweicloud.com/) - - The size of bpf_cpumask is fixed, so there is no need to allocate many - bpf_mem_caches for bpf_cpumask_ma, just one bpf_mem_cache is enough. - Also add comments for bpf_mem_alloc_init() in bpf_mem_alloc.h to prevent - future miuse. - -* [v1: bpf: xsk: check IFF_UP earlier in Tx path](http://lore.kernel.org/bpf/20230215143309.13145-1-maciej.fijalkowski@intel.com/) - - Xsk Tx can be triggered via either sendmsg() or poll() syscalls. These - two paths share a call to common function xsk_xmit() which has two - sanity checks within. - -* [v1: libbbpf/bpftool: Support 32-bit Architectures.](http://lore.kernel.org/bpf/CANk7y0joRFw2F4iAuN9r-dWWMvOmbFZz_J4rhGhgVFjdnxPTYw@mail.gmail.com/) - - The BPF selftests fail to compile on 32-bit architectures as the skeleton - generated by bpftool doesn’t take into consideration the size difference of - variables on 32-bit/64-bit architectures. - -* [v1: bpf-next: bpf: Introduce kptr_rcu.](http://lore.kernel.org/bpf/20230215065812.7551-1-alexei.starovoitov@gmail.com/) - - The __kptr_ref turned out to be too limited, since any "trusted" pointer access - requires bpf_kptr_xchg() which is impractical when the same pointer needs - to be dereferenced by multiple cpus. - The __kptr "untrusted" only access isn't very useful in practice. - Rename __kptr to __kptr_untrusted with eventual goal to deprecate it, - and rename __kptr_ref to __kptr, since that looks to be more common use of kptrs. - Introduce __kptr_rcu that can be directly dereferenced and used similar - to native kernel C code. - -* [v2: bpf-next: Improvements for BPF_ST tracking by verifier](http://lore.kernel.org/bpf/20230214232030.1502829-1-eddyz87@gmail.com/) - - This patch-set is a part of preparation work for -mcpu=v4 option for - BPF C compiler (discussed in [1]). Among other things -mcpu=v4 should - enable generation of BPF_ST instruction by the compiler. - -* [v1: bpf-next: Transit between BPF TCP congestion controls.](http://lore.kernel.org/bpf/20230214221718.503964-1-kuifeng@meta.com/) - - Previously, BPF struct_ops didn't go off, as even when the user - program creating it was terminated, none of these ever were pinned. - For instance, the TCP congestion control subsystem indirectly - maintains a reference count on the struct_ops of any registered BPF - implemented algorithm. Thus, the algorithm won't be deactivated until - someone deliberately unregisters it. - -* [v3: bpf-next: bpf: Refactor release_regno searching logic](http://lore.kernel.org/bpf/20230214190551.2264057-1-davemarchevsky@fb.com/) - - Currently the ref_obj_id and OBJ_RELEASE searching is done in the code - that examines each individual arg (check_func_arg for helpers and - check_kfunc_args inner loop for kfuncs). This patch pulls out this - searching to occur before individual arg type handling, resulting in a - cleaner separation of logic and shared logic between kfuncs and helpers. - -* [Attending LFS (was: v2: FUSE BPF: A Stacked Filesystem Extension for FUSE)](http://lore.kernel.org/bpf/56d5ac0e-4c54-46b7-85d3-5de127562630@app.fastmail.com/) - - I wouldn't be able to get the travel funded by my employer, and I don't think I'm a suitable recipient for the Linux Foundation's travel fund. Therefore, I think it would make more sense for me to attend potentially relevant sessions remotely. - -* [v2: bpf-next: selftests/bpf: Cross-compile bpftool](http://lore.kernel.org/bpf/20230214161253.183458-1-bjorn@kernel.org/) - - When the BPF selftests are cross-compiled, only the a host version of - bpftool is built. This version of bpftool is used on the host-side to - generate various intermediates, e.g., skeletons. - -* [v2: LoongArch: BPF: Use 4 instructions for function address in JIT](http://lore.kernel.org/bpf/20230214152633.2265699-1-hengqi.chen@gmail.com/) - - The issus can be reproduced by running the "inline simple bpf_loop call" - verifier test. - - This is because we are emiting 2-4 instructions for 64-bit immediate moves. - During the first pass of JIT, the placeholder address is zero, emiting two - instructions for it. In the extra pass, the function address is in XKVRANGE, - emiting four instructions for it. This change the instruction index in JIT context. - -* [[RFC/PATCHSET 0/7] perf record: Implement BPF sample filter (v1)](http://lore.kernel.org/bpf/20230214050452.26390-1-namhyung@kernel.org/) - - There have been requests for more sophisticated perf event sample - filtering based on the sample data. Recently the kernel added BPF - programs can access perf sample data and this is the userspace part - to enable such a filtering. - -* [v6: bpf-next: BPF rbtree next-gen datastructure](http://lore.kernel.org/bpf/20230214004017.2534011-1-davemarchevsky@fb.com/) - - This series adds a rbtree datastructure following the "next-gen - datastructure" precedent set by recently-added linked-list [0]. This is - a reimplementation of previous rbtree RFC [1] to use kfunc + kptr - instead of adding a new map type. - -* [Proposal for patch - Extend bpftool prog run to accept cpu and flags options](http://lore.kernel.org/bpf/CH2PR21MB14309C209861239DB568C3C1FADD9@CH2PR21MB1430.namprd21.prod.outlook.com/) - - The existing bpf_test_run_opts structure exposes additional fields including "flags" and "cpu". I propose extending the bpftool prog run to accept options so set these additional fields. - -### 周边技术动态 - -#### Qemu - -* [v6: riscv: Add support for Zicbo[m,z,p] instructions](http://lore.kernel.org/qemu-devel/20230217203445.51077-1-dbarboza@ventanamicro.com/) - - This new version contains a change in patch 2 based on Richard's - feedback in v5 [1]. - -* [v1: Fourth RISC-V PR for QEMU 8.0](http://lore.kernel.org/qemu-devel/20230217175203.19510-1-palmer@rivosinc.com/) - - The following changes since commit 417296c8d8588f782018d01a317f88957e9786d6: - - tests/qtest/netdev-socket: Raise connection timeout to 60 seconds (2023-02-09 11:23:53 +0000) - - are available in the Git repository at: - - https://github.com/palmer-dabbelt/qemu.git tags/pull-riscv-to-apply-20230217 - - for you to fetch changes up to e8c0697d79ef05aa5aefb1121dfede59855556b4: - - target/riscv: Fix vslide1up.vf and vslide1down.vf (2023-02-16 08:10:40 -0800) - -#### Buildroot - -* [board/visionfive2: add link to documentation](http://lore.kernel.org/buildroot/20230212205037.0B3D685B14@busybox.osuosl.org/) - - commit: https://git.buildroot.net/buildroot/commit/?id=8f48b3983cdb32dfcd59e7e549c8eaa1503fe342 - branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master - - Add a link to RVspace Documentation Center, which did not exist - when readme.txt was first submitted. It provides datasheet, quick - start, schematics, and so on. - -#### U-Boot - -* [v1: u-boot-riscv/master](http://lore.kernel.org/u-boot/Y+9vIoBzKIo0XKva@ubuntu01/) - - The following changes since commit faac9dee8e0629326dc122f4624fc4897e3f38b0: - - Prepare v2023.04-rc2 (2023-02-13 18:39:15 -0500) - - are available in the Git repository at: - - https://source.denx.de/u-boot/custodians/u-boot-riscv.git - - for you to fetch changes up to 7574b6476afc1fd76816be6567458f6ca4f44234: - - riscv: binman: Add help message for missing blobs (2023-02-17 19:07:48 +0800) - -* [v1: riscv: binman: Add help message for missing blobs](http://lore.kernel.org/u-boot/20230216011945.4833-1-rick@andestech.com/) - - Add the 'missing-msg' for more detailed output - on missing system firmware. - -* [Please pull u-boot-dm into -next](http://lore.kernel.org/u-boot/CAPnjgZ1-mw_DJr1Db-4iuNXgv7o_9CPZ9KgoCK9DBDfvKVptKQ@mail.gmail.com/) - - This is for the -next branch - - https://source.denx.de/u-boot/custodians/u-boot-dm/-/pipelines/15198 - - The following changes since commit faac9dee8e0629326dc122f4624fc4897e3f38b0: - - Prepare v2023.04-rc2 (2023-02-13 18:39:15 -0500) - - are available in the Git repository at: - - git://git.denx.de/u-boot-dm.git tags/dm-next-valentine - - for you to fetch changes up to 9a8a27a76ad7ab51f19c7f019d7cdac8a3f9f3c9: - - dm: test: Add a test for the various migration combinations - (2023-02-14 09:43:27 -0700) - -* [v3: doc: arch: Add document for RISC-V architecture](http://lore.kernel.org/u-boot/20230214101851.11648-1-peterlin@andestech.com/) - - This patch adds a brief introduction to the RISC-V architecture and - the typical boot process used on a variety of RISC-V platforms. - -* [v5: dm: Move to new driver model schema for device tree tags](http://lore.kernel.org/u-boot/20230213155641.1208774-1-sjg@chromium.org/) - - Now that a new schema has been accepted upstream, press it into service in U-Boot. - -* [v3: RFC: Migrate to split config](http://lore.kernel.org/u-boot/20230212231638.1134219-1-sjg@chromium.org/) - - U-Boot uses an SPL prefix on CONFIG options to indicate when an option - relates to SPL. For example, while CONFIG_TEXT_BASE is the text base for - U-Boot proper, CONFIG_SPL_TEXT_BASE is the text base for SPL. - -## 20230212:第 33 期 - -### 内核动态 - -#### RISC-V 架构支持 - -* [v1: RISC-V: add a spin_shadow_stack declaration](http://lore.kernel.org/linux-riscv/20230210185945.915806-1-conor@kernel.org/) - - The patchwork automation reported a sparse complaint that - spin_shadow_stack was not declared and should be static: - ../arch/riscv/kernel/traps.c:335:15: warning: symbol 'spin_shadow_stack' was not declared. Should it be static? - -* [v3: irqchip/irq-sifive-plic: Add syscore callbacks for hibernation](http://lore.kernel.org/linux-riscv/20230210100122.80255-1-mason.huo@starfivetech.com/) - - The priority and enable registers of plic will be reset - during hibernation power cycle in poweroff mode, - add the syscore callbacks to save/restore those registers. - -* [v1: Add JH7110 MIPI DPHY RX support](http://lore.kernel.org/linux-riscv/20230210061713.6449-1-changhuang.liang@starfivetech.com/) - - This patchset adds power mipi dphy rx driver for the StarFive JH7110 SoC. - It use to transfer the CSI cameras data. The series has been tested on - the VisionFive 2 board. - -* [v1: clocksource/drivers/riscv: Refuse to probe on T-Head](http://lore.kernel.org/linux-riscv/20230209232302.25658-1-palmer@rivosinc.com/) - - As of d9f15a9de44a ("Revert "clocksource/drivers/riscv: Events are - stopped during CPU suspend"") this driver no longer functions correctly - for the T-Head firmware. That shouldn't impact any users, as we've got - a functioning driver that's higher priority, but let's just be safe and - ban it from probing at all. - -* [v4: RISC-V: Apply Zicboz to clear_page](http://lore.kernel.org/linux-riscv/20230209152628.129914-1-ajones@ventanamicro.com/) - - When the Zicboz extension is available we can more rapidly zero naturally - aligned Zicboz block sized chunks of memory. As pages are always page - aligned and are larger than any Zicboz block size will be, then - clear_page() appears to be a good candidate for the extension. - -* [v5: Basic pinctrl support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230209143702.44408-1-hal.feng@starfivetech.com/) - - This patch series adds basic pinctrl support for StarFive JH7110 SoC. - -* [v13: riscv, mm: detect svnapot cpu support at runtime](http://lore.kernel.org/linux-riscv/20230209131647.17245-1-panqinglin00@gmail.com/) - - Svnapot is a RISC-V extension for marking contiguous 4K pages as a non-4K - page. This patch set is for using Svnapot in hugetlb fs and huge vmap. - -* [v1: riscv: hwcap: Don't alphabetize ISA extension IDs](http://lore.kernel.org/linux-riscv/20230209123636.123537-1-ajones@ventanamicro.com/) - - While the comment above the ISA extension ID definitions says - "Entries are sorted alphabetically.", this stopped being good - advice with commit d8a3d8a75206 ("riscv: hwcap: make ISA extension - ids can be used in asm"), as we now use macros instead of enums. - -* [v12: riscv, mm: detect svnapot cpu support at runtime](http://lore.kernel.org/linux-riscv/20230209035343.15282-1-panqinglin00@gmail.com/) - - Svnapot is a RISC-V extension for marking contiguous 4K pages as a non-4K - page. This patch set is for using Svnapot in hugetlb fs and huge vmap. - -* [v1: riscv: dts: nezha-d1: add gpio-line-names](http://lore.kernel.org/linux-riscv/20230208014504.18899-1-twoerner@gmail.com/) - - Add descriptive names so users can associate specific lines with their - respective pins on the 40-pin header according to the schematics found at: - - http://dl.linux-sunxi.org/D1/D1_Nezha_development_board_schematic_diagram_20210224.pdf - -* [GIT PULL: KVM/riscv changes for 6.3](http://lore.kernel.org/linux-riscv/CAAhSdy25NgCY23u=icRgcZpEZzNgJkyEN92KEVL8D-SvUwTBXg@mail.gmail.com/) - - We have the following KVM RISC-V changes for 6.3: - 1) Fix wrong usage of PGDIR_SIZE to check page sizes - 2) Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect() - 3) Redirect illegal instruction traps to guest - 4) SBI PMU support for guest - -* [v6: KVM perf support](http://lore.kernel.org/linux-riscv/20230207095529.1787260-1-atishp@rivosinc.com/) - - This series extends perf support for KVM. The KVM implementation relies - on the SBI PMU extension and trap n emulation of hpmcounter CSRs. - The KVM implementation exposes the virtual counters to the guest and internally - manage the counters using kernel perf counters. - -* [v1: RISC-V: support some cryptography accelerations](http://lore.kernel.org/linux-riscv/20230206225846.1381789-1-heiko@sntech.de/) - - So this was my playground the last days. - - The base is v13 of the vector patchset but the first patches up to doing - the Zbc-based GCM GHash can also run without those. Of course the vector- - crypto extensions are also not ratified yet, hence the marking as RFC. - -* [v2: RISC-V Hardware Probing User Interface](http://lore.kernel.org/linux-riscv/20230206201455.1790329-1-evan@rivosinc.com/) - - These are very much up for discussion, as it's a pretty big new user - interface and it's quite a bit different from how we've historically - done things: this isn't just providing an ISA string to userspace, this - has its own format for providing information to userspace. - -* [v1: Add DMA driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230206113811.23133-1-walker.chen@starfivetech.com/) - - This patch series adds dma support for the StarFive JH7110 RISC-V SoC. - The first patch adds device tree binding. The second patch includes dma - driver. The last patch adds device node of dma to JH7110 dts. - -#### 进程调度 - -* [v3: sched/fair: sanitize vruntime of entity being placed](http://lore.kernel.org/lkml/20230209193107.1432770-1-rkagan@amazon.de/) - - When a scheduling entity is placed onto cfs_rq, its vruntime is pulled - to the base level (around cfs_rq->min_vruntime), so that the entity - doesn't gain extra boost when placed backwards. - - However, if the entity being placed wasn't executed for a long time, its - vruntime may get too far behind (e.g. while cfs_rq was executing a - low-weight hog), which can inverse the vruntime comparison due to s64 overflow. - -* [v1: livepatch,sched: Add livepatch task switching to cond_resched()](http://lore.kernel.org/lkml/cover.1675969869.git.jpoimboe@kernel.org/) - - Fix patching stalls caused by busy kthreads. - -* [v1: sched: show cpu number when sched_show_task](http://lore.kernel.org/lkml/20230208124655.2592560-1-peng.fan@oss.nxp.com/) - - It would be helpful to show cpu number when dump task. Such as - when doing system suspend, we could know the failed freezing - process run on which cpu. - -* [v1: sched: sd_llc_id initialized](http://lore.kernel.org/lkml/20230207103636.13783-1-sunshouxin@chinatelecom.cn/) - - In my test,I use isolcpus to isolate cpu for specific, - and then I noticed different scenario when core binding. - -* [v3: sched: Introduce classes of tasks for load balance](http://lore.kernel.org/lkml/20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com/) - - This is third version of this patchset. Previous versions can be found - here [1] and here [2]. For brevity, I did not include the cover letter - from the original posting. You can read it here [1]. - -* [v3: sched: pick_next_rt_entity(): check list_entry](http://lore.kernel.org/lkml/20230128-list-entry-null-check-sched-v3-1-b1a71bd1ac6b@diag.uniroma1.it/) - - Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") - removed any path which could make pick_next_rt_entity() return NULL. - However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of - pick_next_rt_entity()) still checks the error condition, which can - never happen, since list_entry() never returns NULL. - -* [v2: sched/deadline: Add more reschedule cases to prio_changed_dl()](http://lore.kernel.org/lkml/20230206140612.701871-1-vschneid@redhat.com/) - - On that kernel, it is quite easy to trigger using rt-tests's deadline_test - [1] with the test running on isolated CPUs (this reduces the chance of - something unrelated setting TIF_NEED_RESCHED on the idle tasks, making the - issue even more obvious as the hung task detector chimes in). - -#### 内存管理 - -* [GIT PULL: memblock: Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."](http://lore.kernel.org/linux-mm/Y+dqPRXSqoP1x7u5@kernel.org/) - - The following changes since commit 4ec5183ec48656cec489c49f989c508b68b518e3: - - Linux 6.2-rc7 (2023-02-05 13:13:28 -0800) - - are available in the Git repository at: - - https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock tags/fixes-2023-02-11 - -* [v1: New arch interfaces for manipulating multiple pages](http://lore.kernel.org/linux-mm/20230211033948.891959-1-willy@infradead.org/) - - Here's my latest draft of a new set of page table manipulation APIs. I've - only done alpha, arc and x86 (other than x86, I'm going alphabetically). - Before I go much further, some feedback might be a good idea. Or if - someone wants to volunteer to do their architecture ;-) - -* [v1: psi: reduce min window size to 50ms](http://lore.kernel.org/linux-mm/8b7a3270fe253de1cd2b71473e29394409b2a0f7.1676067791.git.quic_sudaraja@quicinc.com/) - - Few systems would require much finer-grained tracking of memory - pressure in the system using PSI mechanism. Reduce the minimum - allowable window size to be 50ms to increase the sampling rate - of PSI monitor for much faster response and reaction to memory - pressures in the system. With 50ms window size, the smallest - resolution of memory pressure that can be tracked is now 5ms. - -* [v1: mm: add tracepoints to ksm](http://lore.kernel.org/linux-mm/20230210214645.2720847-1-shr@devkernel.io/) - - This adds the following tracepoints to ksm: - - start / stop scan - - ksm enter / exit - - merge a page - - merge a page with ksm - - remove a page - - remove a rmap item - - This patch has been split off from the RFC patch series "mm: - process/cgroup ksm support". - -* [v2: bpf-next: bpf, mm: introduce cgroup.memory=nobpf](http://lore.kernel.org/linux-mm/20230210154734.4416-1-laoar.shao@gmail.com/) - - The bpf memory accouting has some known problems in contianer - environment, - - - The container memory usage is not consistent if there's pinned bpf - program - After the container restart, the leftover bpf programs won't account - to the new generation, so the memory usage of the container is not - consistent. This issue can be resolved by introducing selectable - memcg, but we don't have an agreement on the solution yet. - -* [v1: mm/memcg: Skip high limit check in root memcg](http://lore.kernel.org/linux-mm/20230210094550.5125-1-haifeng.xu@shopee.com/) - - The high limit checks the memory usage from given memcg to root memcg. - However, there is no limit in root memcg. So this check makes no sense - and we can ignore it. - -* [v2: fold per-CPU vmstats remotely](http://lore.kernel.org/linux-mm/20230209150150.380060673@redhat.com/) - - This is done using cmpxchg to manipulate the counters, - both CPU locally (via the account functions), - and remotely (via cpu_vm_stats_fold). - - Thanks to Aaron Tomlin for diagnosing issue 1 and writing - the initial patch series. - -* [v1: Writeback handling of pinned pages](http://lore.kernel.org/linux-mm/20230209121046.25360-1-jack@suse.cz/) - - since we are slowly getting into a state where folios used as buffers for - [R]DMA are detectable by folio_maybe_dma_pinned(), I figured it is time we also - address the original problems filesystems had with these pages [1] - namely - that page/folio private data can get reclaimed from the page while it is being - written to by the DMA and also that page contents can be modified while the - page is under writeback. - -* [v13: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-mm/20230209102954.528942-1-dhowells@redhat.com/) - - Here are patches to provide support for extracting pages from an iov_iter - and to use this in the extraction functions in the block layer bio code. - -* [v2: mm/page_alloc: optimize find_suitable_fallback() and fallbacks array](http://lore.kernel.org/linux-mm/20230209101144.496144-1-yajun.deng@linux.dev/) - - There is no need to execute the next loop if it not return in the first - loop. So add a break at the end of the loop. - - At the same time, add !migratetype_is_mergeable() before the loop and - reduce the first index size from MIGRATE_TYPES to MIGRATE_PCPTYPES in - fallbacks array. - -* [v1: mm/page_alloc: optimize the loop in find_suitable_fallback()](http://lore.kernel.org/linux-mm/20230209024435.3392916-1-yajun.deng@linux.dev/) - - There is no need to execute the next loop if it not return in the first - loop. So add a break at the end of the loop. - - There are only three rows in fallbacks, so reduce the first index size - from MIGRATE_TYPES to MIGRATE_PCPTYPES. - -* [v1: Revert "slub: force on no_hash_pointers when slub_debug is enabled"](http://lore.kernel.org/linux-mm/20230208194712.never.999-kees@kernel.org/) - - Linking no_hash_pointers() to slub_debug has had a chilling effect - on using slub_debug features for security hardening, since system - builders are forced to choose between redzoning and heap address location - exposures. Instead, just require that the "no_hash_pointers" boot param - needs to be used to expose pointers during slub_debug reports. - -* [v1: Prevent ->map_pages from sleeping](http://lore.kernel.org/linux-mm/20230208145335.307287-1-willy@infradead.org/) - - In preparation for a larger patch series which will handle (some, easy) - page faults protected only by RCU, change the two filesystems which have - sleeping locks to not take them and hold the RCU lock around calls to - ->map_page to prevent other filesystems from adding sleeping locks. - -* [v1: Memory access profiler(IBS) driven NUMA balancing](http://lore.kernel.org/linux-mm/20230208073533.715-1-bharata@amd.com/) - - Some hardware platforms can provide information about memory accesses - that can be used to do optimal page and task placement on NUMA - systems. AMD processors have a hardware facility called Instruction- - Based Sampling (IBS) that can be used to gather specific metrics - related to instruction fetch and execution activity. - -* [v1: mm/damon/sysfs: make kobj_type structures constant](http://lore.kernel.org/linux-mm/20230207-kobj_type-damon-v1-1-9d4fea6a465b@weissschuh.net/) - - Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") - the driver core allows the usage of const struct kobj_type. - - Take advantage of this to constify the structure definitions to prevent - modification at runtime. - -* [v1: mm: kfence: export kfence_enabled as global variables](http://lore.kernel.org/linux-mm/1675750519-1064-1-git-send-email-quic_zhenhuah@quicinc.com/) - - Export the variable to ease the judgement of whether kfence enabled - at runtime. It should be more precise than through kernel config - "CONFIG_KFENCE". - -* [v1: tmpfs: add the option to disable swap](http://lore.kernel.org/linux-mm/20230207025259.2522793-1-mcgrof@kernel.org/) - - Many folks suggest using tmpfs is not great because it can use swap. - That's not a good reason to *not* use tmpfs, what's just missing is just - the option to let you disable it. And so this does that, to enable that - and also let users experiment with it. - -#### 文件系统 - -* [v1: io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF](http://lore.kernel.org/linux-fsdevel/20230210153212.733006-1-ming.lei@redhat.com/) - - Add two OPs which buffer is retrieved via kernel splice for supporting - fuse/ublk zero copy. - - The 1st patch enhances direct pipe & splice for moving pages in kernel, - so that the two added OPs won't be misused, and avoid potential security hole. - -* [v1: zonefs: make kobj_type structure constant](http://lore.kernel.org/linux-fsdevel/20230210-kobj_type-zonefs-v1-1-9a9c5b40e037@weissschuh.net/) - - Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") - the driver core allows the usage of const struct kobj_type. - - Take advantage of this to constify the structure definition to prevent - modification at runtime. - -* [v1: Add the test_dummy_encryption key on-demand](http://lore.kernel.org/linux-fsdevel/20230208062107.199831-1-ebiggers@kernel.org/) - - This series eliminates the call to fscrypt_destroy_keyring() from - __put_super(), which is causing confusion because it looks like (but - actually isn't) a sleep-in-atomic bug. See the thread "block: sleeping - in atomic warnings", i.e. - https://lore.kernel.org/linux-fsdevel/CAHk-=wg6ohuyrmLJYTfEpDbp2Jwnef54gkcpZ3-BYgy4C6UxRQ@mail.gmail.com - and its responses. - -* [v12: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-fsdevel/20230207171305.3716974-1-dhowells@redhat.com/) - - Here are patches to provide support for extracting pages from an iov_iter - and to use this in the extraction functions in the block layer bio code. - -* [v4: Introduce Copy-On-Write to Page Table](http://lore.kernel.org/linux-fsdevel/20230207035139.272707-1-shiyn.lin@gmail.com/) - - [RUN] vmsplice() + unmap in child ... with hugetlb (2048 kB) - not ok 33 No leak from parent into child - - See the more information about anon cow hugetlb tests: - https://patchwork.kernel.org/project/linux-mm/patch/20220927110120.106906-5-david@redhat.com/ - -* [v1: vfs: Delay root FS switch after UMH completion](http://lore.kernel.org/linux-fsdevel/20230206171032.12801-1-mkoutny@suse.com/) - - We want to make sure no UMHs started with an old root survive into the - world with the new root (they may fail when it is not expected). - Therefore, insert a wait for existing UMHs termination (this assumes UMH - runtime is finite). - -* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) - - This removes the dependency on interrupts to wake up task. Set task - state as TASK_RUNNING, if need_resched() returns true, - while polling for IO completion. - -#### 网络设备 - -* [v1: net-next: sock_map: dump socket map id via diag](http://lore.kernel.org/netdev/20230211201954.256230-1-xiyou.wangcong@gmail.com/) - - Currently there is no way to know which sockmap a socket has been added - to from outside, especially for that a socket can be added to multiple - sockmap's. We could dump this via socket diag, as shown below. - -* [v2: net-next: net: dsa: mt7530: add support for changing DSA master](http://lore.kernel.org/netdev/20230211184101.651462-1-richard@routerhints.com/) - - Add support for changing the master of a port on the MT7530 DSA subdriver. - -* [v5: net: ethernet: mtk_eth_soc: various enhancements](http://lore.kernel.org/netdev/cover.1676128246.git.daniel@makrotopia.org/) - - This series brings a variety of fixes and enhancements for mtk_eth_soc, - adds support for the MT7981 SoC and facilitates sharing the SGMII PCS - code between mtk_eth_soc and mt7530. - -* [v3: net-next: net: wwan: tmi: PCIe driver for MediaTek M.2 modem](http://lore.kernel.org/netdev/20230211083732.193650-1-yanchao.yang@mediatek.com/) - - TMI(T-series Modem Interface) is the PCIe host device driver for MediaTek's - modem. The driver uses the WWAN framework infrastructure to create the - following control ports and network interfaces for data transactions. - -* [v1: net-next: net: Kconfig.debug: wrap socket refcnt debug into an option](http://lore.kernel.org/netdev/20230211065153.54116-1-kerneljasonxing@gmail.com/) - - Since commit 463c84b97f24 ("[NET]: Introduce inet_connection_sock") - commented out the definition of SOCK_REFCNT_DEBUG and later another - patch deleted it, we need to enable it through defining it manually - somewhere. Wrapping it into an option in Kconfig.debug could make - it much clearer and easier for some developers to do things based on this change. - -* [v1: net: make kobj_type structures constant](http://lore.kernel.org/netdev/20230211-kobj_type-net-v1-0-e3bdaa5d8a78@weissschuh.net/) - - Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") - the driver core allows the usage of const struct kobj_type. - - Take advantage of this to constify the structure definitions to prevent - modification at runtime. - -* [v4: net-next: ionic: on-chip descriptors](http://lore.kernel.org/netdev/20230211005017.48134-1-shannon.nelson@amd.com/) - - We start with a couple of house-keeping patches that were originally - presented for 'net', then we add support for on-chip descriptor rings - for tx-push, as well as adding support for rx-push. - -* [v1: net-next: selftests: forwarding: add a test for MAC Merge layer](http://lore.kernel.org/netdev/20230210221243.228932-1-vladimir.oltean@nxp.com/) - - The MAC Merge layer (IEEE 802.3-2018 clause 99) does all the heavy - lifting for Frame Preemption (IEEE 802.1Q-2018 clause 6.7.2), a TSN - feature for minimizing latency. - - Preemptible traffic is different on the wire from normal traffic in - incompatible ways. - -* [v1: iproute2-next: tc: m_ct: add support for helper](http://lore.kernel.org/netdev/ab1e6bfbefff74b2b4fe230162b198c38cf5b394.1676065393.git.lucien.xin@gmail.com/) - - This patch is to add the setup and dump for helper in tc ct action - in userspace, and the support in kernel was added in: - - https://lore.kernel.org/netdev/cover.1667766782.git.lucien.xin@gmail.com/ - -* [v1: net-next: net/sched: transition actions to pcpu stats and rcu](http://lore.kernel.org/netdev/20230210202725.446422-1-pctammela@mojatatu.com/) - - Following the work done for act_pedit[0], transition the remaining tc - actions to percpu stats and rcu, whenever possible. - Percpu stats make updating the action stats very cheap, while combining - it with rcu action parameters makes it possible to get rid of the per - action lock in the datapath. - -* [v1: net: net/sched: act_ctinfo: use percpu stats](http://lore.kernel.org/netdev/20230210200824.444856-1-pctammela@mojatatu.com/) - - The tc action act_ctinfo was using shared stats, fix it to use percpu stats - since bstats_update() must be called with locks or with a percpu pointer argument. - -* [v4: spi: Add support for stacked/parallel memories](http://lore.kernel.org/netdev/20230210193647.4159467-1-amit.kumar-mahapatra@amd.com/) - - This patch is in the continuation to the discussions which happened on - 'commit f89504300e94 ("spi: Stacked/parallel memories bindings")' for - adding dt-binding support for stacked/parallel memories. - -* [v1: net-next: net: ipa: determine GSI register offsets differently](http://lore.kernel.org/netdev/20230210193655.460225-1-elder@linaro.org/) - - This series changes the way GSI register offset are specified, using - the "reg" mechanism currently used for IPA registers. A follow-on - series will extend this work so fields within GSI registers are also - specified this way. - -* [v1: bpf-next: net: lan966x: set xdp_features flag](http://lore.kernel.org/netdev/01f4412f28899d97b0054c9c1a63694201301b42.1676055718.git.lorenzo@kernel.org/) - - Set xdp_features netdevice flag if lan966x nic supports xdp mode. - -* [v1: net-next: net: pcs: tse: port to pcs-lynx](http://lore.kernel.org/netdev/20230210190949.1115836-1-maxime.chevallier@bootlin.com/) - - When submitting the initial driver for the Altera TSE PCS, Russell King - noted that the register layout for the TSE PCS is very similar to the - Lynx PCS. The main difference being that TSE PCS's register space is - memory-mapped, whereas Lynx's is exposed over MDIO. - -* [v1: net: ethernet: efct Add x3 ethernet driver](http://lore.kernel.org/netdev/20230210130321.2898-1-h.jain@amd.com/) - - This patch series adds new ethernet network driver for Alveo X3522[1]. - X3 is a low-latency NIC with an aim to deliver the lowest possible - latency. It accelerates a range of diverse trading strategies - and financial applications. - - [1] https://www.xilinx.com/x3 - -* [v1: net-next: devlink: don't allow to change net namespace for FW_ACTIVATE reload action](http://lore.kernel.org/netdev/20230210115827.3099567-1-jiri@resnulli.us/) - - The change on network namespace only makes sense during re-init reload - action. For FW activation it is not applicable. So check if user passed - an ATTR indicating network namespace change request and forbid it. - -* [v5: Introduce ICSSG based ethernet Driver](http://lore.kernel.org/netdev/20230210114957.2667963-1-danishanwar@ti.com/) - - The Programmable Real-time Unit and Industrial Communication Subsystem - Gigabit (PRU_ICSSG) is a low-latency microcontroller subsystem in the TI - SoCs. This subsystem is provided for the use cases like the implementation - of custom peripheral interfaces, offloading of tasks from the other - processor cores of the SoC, etc. - -* [v1: b43legacy: Add checking for null for ssb_get_devtypedata(dev)](http://lore.kernel.org/netdev/20230210111228.370513-1-n.petrova@fintech.ru/) - - Function ssb_get_devtypedata(dev) may return null (next call - B43legacy_WARN_ON(!wl) is used for error handling, including null-value). - Therefore, a check is added before calling b43legacy_wireless_exit(), - where the argument containing this value is expected to be dereferenced. - -* [v1: net-next: net: micrel: Add PHC support for lan8841](http://lore.kernel.org/netdev/20230210102701.703569-1-horatiu.vultur@microchip.com/) - - Add support for PHC and timestamping operations for the lan8841 PHY. - PTP 1-step and 2-step modes are supported, over Ethernet and UDP both - ipv4 and ipv6. - -* [v1: net-next: nfp: ethtool: supplement nfp link modes supported](http://lore.kernel.org/netdev/20230210095319.603867-1-simon.horman@corigine.com/) - - Add support for the following modes to the nfp driver: - - NFP_MEDIA_10GBASE_LR - NFP_MEDIA_25GBASE_LR - NFP_MEDIA_25GBASE_ER - - These modes are supported by the hardware and, - support for them was recently added to firmware. - -* [v1: bpf: selftests/bpf: enable mptcp before testing](http://lore.kernel.org/netdev/20230210093205.1378597-1-liuhangbin@gmail.com/) - - Some distros may not enable mptcp by default. Enable it before start the - mptcp server. To use the {read/write}_int_sysctl() functions, I moved - them to test_progs.c - -* [v1: bpf-next: selftests/bpf: Cross-compile bpftool](http://lore.kernel.org/netdev/20230210084326.1802597-1-bjorn@kernel.org/) - - When the BPF selftests are cross-compiled, only the a host version of - bpftool is built. This version of bpftool is used to generate various - intermediates, e.g., skeletons. - -* [v1: net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path](http://lore.kernel.org/netdev/2f74aab82a40e4c11c91ccba40f5b620f6cb209c.camel@gmail.com/) - - syzbot reported that act_len in kalmia_send_init_packet() is - uninitialized when passing it to the first usb_bulk_msg error path. Jiri - Pirko noted that it's pointless to pass it in the error path, and that - the value that would be printed in the second error path would be the - value of act_len from the first call to usb_bulk_msg.[1] - -* [v1: net-next: xsk: support use vaddr as ring](http://lore.kernel.org/netdev/20230210021232.108211-1-xuanzhuo@linux.alibaba.com/) - - When we try to start AF_XDP on some machines with long running time, due - to the machine's memory fragmentation problem, there is no sufficient - continuous physical memory that will cause the start failure. - -* [v1: iproute2-next: iplink: support IPv4 BIG TCP](http://lore.kernel.org/netdev/cover.1675985919.git.lucien.xin@gmail.com/) - - Patch 1 fixes some typos in the documents, and Patch 2 adds two - attributes to allow userspace to enable IPv4 BIG TCP. - -#### 安全增强 - -* [v1: ASoC: Intel: Skylake: Replace 1-element array with flex-array](http://lore.kernel.org/linux-hardening/20230210051447.never.204-kees@kernel.org/) - - The kernel is globally removing the ambiguous 0-length and 1-element - arrays in favor of flexible arrays, so that we can gain both compile-time - and run-time array bounds checking[1]. In this instance, struct - skl_cpr_cfg contains struct skl_cpr_gtw_cfg, which defined "config_data" - as a 1-element array. - -* [v1: bpf: Deprecate "data" member of bpf_lpm_trie_key](http://lore.kernel.org/linux-hardening/20230209192337.never.690-kees@kernel.org/) - - The kernel is globally removing the ambiguous 0-length and 1-element - arrays in favor of flexible arrays, so that we can gain both compile-time - and run-time array bounds checking[1]. - -* [v1: RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size](http://lore.kernel.org/linux-hardening/20230208232549.never.139-kees@kernel.org/) - - Clang can do some aggressive inlining, which provides it with greater - visibility into the sizes of various objects that are passed into - helpers. Specifically, compare_netdev_and_ip() can see through the type - given to the "sa" argument, which means it can generate code for "struct - sockaddr_in" that would have been passed to ipv6_addr_cmp() (that expects - to operate on the larger "struct sockaddr_in6"), which would result in a - compile-time buffer overflow condition detected by memcmp(). - -* [v1: randstruct: disable Clang 15 support](http://lore.kernel.org/linux-hardening/20230208065133.220589-1-ebiggers@kernel.org/) - - The randstruct support released in Clang 15 is unsafe to use due to a - bug that can cause miscompilations: "-frandomize-layout-seed - inconsistently randomizes all-function-pointers structs" - (https://github.com/llvm/llvm-project/issues/60349). It has been fixed - on the Clang 16 release branch, so add a Clang version check. - -* [v3: next: scsi: smartpqi: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/Y+LJz%2Fr6+UeLqnV3@work/) - - One-element arrays are deprecated, and we are replacing them with flexible - array members instead. So, replace one-element array with flexible-array - member in struct report_log_lun_list. - -* [v2: pstore/blk: Export a method to implemente panic_write()](http://lore.kernel.org/linux-hardening/20230206061813.44506-1-victor@allwinnertech.com/) - - The panic_write() is necessary to write the pstore frontend message - to blk devices when panic. Here is a way to register panic_write when - we use "best_effort" way to register the pstore blk-backend. - -* [v1: media: imx-jpeg: Bounds check sizeimage access](http://lore.kernel.org/linux-hardening/20230204183804.never.323-kees@kernel.org/) - - The call of mxc_jpeg_get_plane_size() from mxc_jpeg_dec_irq() sets - plane_no argument to 1. - - Silence the warning by bounds checking comp_planes for future robustness. - -* [v1: scsi: mpi3mr: Replace 1-element array with flex-array](http://lore.kernel.org/linux-hardening/20230204183715.never.937-kees@kernel.org/) - - Nothing else defined MPI3_NVME_ENCAP_CMD_MAX, so the "command" - buffer was being defined as a fake flexible array of size 1. Replace - this with a proper flex array. Avoids this GCC 13 warning under - -fstrict-flex-arrays=3: - -* [v1: usb: host: xhci: mvebu: Iterate over array indexes instead of using pointer math](http://lore.kernel.org/linux-hardening/20230204183651.never.663-kees@kernel.org/) - - Walking the dram->cs array was seen as accesses beyond the first array - item by the compiler. Instead, use the array index directly. This allows - for run-time bounds checking under CONFIG_UBSAN_BOUNDS as well. - -* [v1: btrfs: sysfs: Handle NULL return values](http://lore.kernel.org/linux-hardening/20230204183510.never.909-kees@kernel.org/) - - Each of to_fs_info(), discard_to_fs_info(), and to_space_info() can - return NULL values. Check for these so it's not possible to perform - calculations against NULL pointers. - -* [v1: jfs: Use unsigned variable for length calculations](http://lore.kernel.org/linux-hardening/20230204183355.never.877-kees@kernel.org/) - - To avoid confusing the compiler about possible negative sizes, switch - "ssize" which can never be negative from int to u32. - -* [v1: bpf: Replace bpf_lpm_trie_key 0-length array with flexible array](http://lore.kernel.org/linux-hardening/20230204183241.never.481-kees@kernel.org/) - - This includes fixing the selftest which was incorrectly using a - variable length struct as a header, identified earlier[1]. Avoid this - by just explicitly including the prefixlen member instead of struct - bpf_lpm_trie_key. - - [1] https://lore.kernel.org/all/202206281009.4332AA33@keescook/ - -#### 异步 IO - -* [v8: io_uring: add napi busy polling support](http://lore.kernel.org/io-uring/20230209230144.465620-1-shr@devkernel.io/) - - This adds the napi busy polling support in io_uring.c. It adds a new - napi_list to the io_ring_ctx structure. This list contains the list of - napi_id's that are currently enabled for busy polling. This list is - used to determine which napi id's enabled busy polling. For faster - access it also adds a hash table. - -* [v1: for-next: io_uring: mark task TASK_RUNNING before handling resume/task work](http://lore.kernel.org/io-uring/fdbc0707-ace4-d565-402a-4927fe0b9947@kernel.dk/) - - Just like for task_work, set the task mode to TASK_RUNNING before doing - any potential resume work. We're not holding any locks at this point, - but we may have already set the task state to TASK_INTERRUPTIBLE in - preparation for going to sleep waiting for events. - -* [v7: liburing: add api for napi busy poll](http://lore.kernel.org/io-uring/20230205002424.102422-1-shr@devkernel.io/) - - The patch series also contains the documentation for the two new functions - and two example programs. The client program is called napi-busy-poll-client - and the server program napi-busy-poll-server. The client measures the - roundtrip times of requests. - -#### Rust For Linux - -* [v1: rust: allow to use INIT_STACK_ALL_ZERO](http://lore.kernel.org/rust-for-linux/20230210172203.101331-1-andrea.righi@canonical.com/) - - This flag should be dropped in clang-17, but at the moment it seems more - reasonable to add it to the bindgen CFLAGS to prevent the error above. - - In this way we can enable CONFIG_INIT_STACK_ALL_ZERO with CONFIG_RUST - without triggering any build error. - -#### BPF - -* [v4: bpf-next: BPF rbtree next-gen datastructure](http://lore.kernel.org/bpf/20230209174144.3280955-1-davemarchevsky@fb.com/) - - This series adds a rbtree datastructure following the "next-gen - datastructure" precedent set by recently-added linked-list [0]. This is - a reimplementation of previous rbtree RFC [1] to use kfunc + kptr - instead of adding a new map type. - -* [v1: bpf-next: tools/resolve_btfids: Pass HOSTCFLAGS as EXTRA_CFLAGS to prepare targets](http://lore.kernel.org/bpf/20230209143735.4112845-1-jolsa@kernel.org/) - - Thorsten reported build issue with command line that defined extra - HOSTCFLAGS that were not passed into 'prepare' targets, but were - used to build resolve_btfids objects. - -* [v1: bpf-next: bpf: add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25](http://lore.kernel.org/bpf/1675949331-27935-1-git-send-email-alan.maguire@oracle.com/) - - v1.25 of pahole supports filtering out functions with multiple - inconsistent function prototypes or optimized-out parameters - from the BTF representation. These present problems because - there is no additional info in BTF saying which inconsistent - prototype matches which function instance to help guide - attachment, and functions with optimized-out parameters can - lead to incorrect assumptions about register contents. - -* [v1: dwarves: btf_encoder: ensure elf function representation is fully initialized](http://lore.kernel.org/bpf/1675896868-26339-1-git-send-email-alan.maguire@oracle.com/) - - new fields in BTF encoder state (used to support save and later - addition of function) of ELF function representation need to - be initialized. No need to set parameter names to NULL as - got_parameter_names guards their use. - -* [v1: bpf-next: sfc: move xdp_features configuration in efx_pci_probe_post_io()](http://lore.kernel.org/bpf/9bd31c9a29bcf406ab90a249a28fc328e5578fd1.1675875404.git.lorenzo@kernel.org/) - - Move xdp_features configuration from efx_pci_probe() to - efx_pci_probe_post_io() since it is where all the other basic netdev - features are initialised. - -* [v1: nf-next: bpf, netfilter: minimal support for bpf progs](http://lore.kernel.org/bpf/20230208160307.27534-1-fw@strlen.de/) - - Add minimal support to hook bpf programs to netfilter hooks, - e.g. PREROUTING or FORWARD. - - Hooking is currently possible for all supprted protocols, i.e. - arp, bridge, ip, ip6 and inet (both ipv4/ipv6) pseudo-family. - -* [v2: bpf-next: samples: bpf: syscall_tp: Add syscall openat2 enter/exit tracepoint](http://lore.kernel.org/bpf/tencent_9381CB1A158ED7ADD12C4406034E21A3AC07@qq.com/) - - commit fe3300897cbf("samples: bpf: fix syscall_tp due to unused syscall") - add openat() syscall trapoints, this submit support openat2(). - -* [v2: Add ftrace direct call for arm64](http://lore.kernel.org/bpf/20230207182135.2671106-1-revest@chromium.org/) - - This series adds ftrace direct call support to arm64. - This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64. - - It is meant to apply on top of the arm64 tree which contains Mark Rutland's - series on CALL_OPS [1] under the for-next/ftrace tag. - -* [v4: bpf-next: libbpf: Add sample_period to creation options](http://lore.kernel.org/bpf/20230207081916.3398417-1-arilou@gmail.com/) - - Add option to set when the perf buffer should wake up, by default the - perf buffer becomes signaled for every event that is being pushed to it. - - In case of a high throughput of events it will be more efficient to wake - up only once you have X events ready to be read. - -* [v1: samples: bpf: syscall_tp: Add syscall openat2 enter/exit tracepoint](http://lore.kernel.org/bpf/tencent_FB3E886D062242FF59A997492A3BAF2BA308@qq.com/) - - commit fe3300897cbf("samples: bpf: fix syscall_tp due to unused syscall") - add openat() syscall trapoints, this submit support openat2(). - -* [[RFC/PATCH 0/3] perf lock contention: Track lock owner (v2)](http://lore.kernel.org/bpf/20230207002403.63590-1-namhyung@kernel.org/) - - When there're many lock contentions in the system, people sometimes - want to know who caused the contention, IOW who's the owner of the locks. - - This patchset adds -o/--lock-owner option to track the owner info - if it's available. Right now, it supports mutex and rwsem as they - have the owner fields in themselves. Please see the patch 2 for the details. - -* [v2: bpf-next: net: add missing xdp_features description](http://lore.kernel.org/bpf/7878544903d855b49e838c9d59f715bde0b5e63b.1675705948.git.lorenzo@kernel.org/) - - Add missing xdp_features field description in the struct net_device - documentation. This patch fix the following warning: - - ./include/linux/netdevice.h:2375: warning: Function parameter or member 'xdp_features' not described in 'net_device' - -* [v1: bpf-next: selftests: bpf: Use BTF map in sk_assign](http://lore.kernel.org/bpf/4ebd4e68dec83863c51a9114e6507524c8feafb7.1675698070.git.fmaurer@redhat.com/) - - The sk_assign selftest uses tc to load the BPF object file for the test. If - tc is linked against libbpf 1.0+, this test failed, because the BPF file - used the legacy maps section. This approach is considered legacy by libbpf - and tc (see examples/bpf/README in the iproute2 repo). - -* [v1: bpf-next: samples: bpf: Add macro SYSCALL() for aarch64](http://lore.kernel.org/bpf/tencent_9E0636426959DE97692A50AF79A3D9888B08@qq.com/) - - kernel arm64/kernel/sys.c macro __SYSCALL() adds a prefix __arm64_, we - should support it for aarch64. The following is the output of the bpftrace - script: - - $ sudo bpftrace -l | grep sys_write - ... - kprobe:__arm64_sys_write - kprobe:__arm64_sys_writev - ... - -* [v1: net-next: NXP ENETC AF_XDP zero-copy sockets](http://lore.kernel.org/bpf/20230206100837.451300-1-vladimir.oltean@nxp.com/) - - This is RFC because I have a few things I'm not 100% certain about. - I've tested this with the xdpsock test application, I don't have very - detailed knowledge about the internals of AF_XDP sockets. - - Patches where I'd appreciate if people took a look are 02/11, 05/11, - -### 周边技术动态 - -#### Qemu - -* [v1: target/riscv: avoid env_archcpu() in cpu_get_tb_cpu_state()](http://lore.kernel.org/qemu-devel/20230210123836.506286-1-dbarboza@ventanamicro.com/) - - We have a RISCVCPU *cpu pointer available at the start of the function. - -* [v1: target/riscv: Smepmp: Skip applying default rules when address matches](http://lore.kernel.org/qemu-devel/20230209055206.229392-1-hchauhan@ventanamicro.com/) - - When MSECCFG.MML is set, after checking the address range in PMP if the - asked permissions are not same as programmed in PMP, the default - permissions are applied. This should only be the case when there - is no matching address is found. - -* [v1: MAINTAINERS: Add some RISC-V reviewers](http://lore.kernel.org/qemu-devel/20230209003308.738237-1-alistair.francis@opensource.wdc.com/) - - This patch adds some active RISC-V members as reviewers to the - MAINTAINERS file. - -* [v1: hw/riscv: virt: Simplify virt_{get,set}_aclint()](http://lore.kernel.org/qemu-devel/20230206085007.3618715-1-bmeng@tinylab.org/) - - There is no need to declare an intermediate "MachineState *ms". - -* [v1: configure: normalize riscv* cpu types too](http://lore.kernel.org/qemu-devel/20230204112502.2558739-1-mjt@msgid.tls.msk.ru/) - - For most CPU types out there, ./configure normalizes all - variations into base form plus, optionally, variations, - to find the proper arch-specific code. - -#### Buildroot - -* [package/wolfssl: disable assembly when not supported](http://lore.kernel.org/buildroot/20230207213844.DCD8484378@busybox.osuosl.org/) - - commit: https://git.buildroot.net/buildroot/commit/?id=d8dc5315eb712eca0a5cbf793a6714a47ab6e57e - branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master - - wolfssl contains some assembly code and its configure.ac script - enables the assembly code depending on the CPU architecture. However, - the detection logic is not sufficient and leads to using the assembly - code in situation where it should not. - -* [package/python-spake2: new package](http://lore.kernel.org/buildroot/20230207132558.94BFC84150@busybox.osuosl.org/) - - commit: https://git.buildroot.net/buildroot/commit/?id=9aaef2a07780a200512ccadb2381c559c4ffd8e6 - branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master - - SPAKE2 password-authenticated key exchange (in pure python). - -* [package/python-hkdf: new package](http://lore.kernel.org/buildroot/20230207115045.1C17C840AE@busybox.osuosl.org/) - - commit: https://git.buildroot.net/buildroot/commit/?id=433ce2966f787248d5b8a62c46634e2f86250f8e - branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master - - HMAC-based Extract-and-Expand Key Derivation Function (HKDF). - -* [package/rdma-core: new package](http://lore.kernel.org/buildroot/20230205125237.A6202837ED@busybox.osuosl.org/) - - commit: https://git.buildroot.net/buildroot/commit/?id=ea47e177f093d7378e8e8e1f50d6f4e3fce0a088 - branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master - -#### U-Boot - -* [v1: riscv: add sbi v0.2 or later support](http://lore.kernel.org/u-boot/CAJ8bkywohgE_njAYmLJUUDpHW2+R=NYPSqp1G0rCksrCHCdpDw@mail.gmail.com/) - - add rfence and ipi extension for sbi v0.2 or later. sbi_ipi add support - for sbi v0.2 or later. This can make sbi_ipi break through the limit that - the number of cores needs to be less than or equal to xlen - -* [v1: semihosting: use assembly conduit functions](http://lore.kernel.org/u-boot/20230207152105.2167641-1-andre.przywara@arm.com/) - - to trigger the actual semihosting action in the debugger, we used some - carefully constructed inline assembly sequence. This was motivated by - the trigger being really just a single instruction, so originally this - could be neatly inlined by the compiler. - However we now have a separate function anyway, so inlining is no longer - happening. On top of that the inline assembly was really fragile and - hard to read. - -* [v3: RFC: Migrate to split config](http://lore.kernel.org/u-boot/20230206190550.1692420-1-sjg@chromium.org/) - - U-Boot uses an SPL prefix on CONFIG options to indicate when an option - relates to SPL. For example, while CONFIG_TEXT_BASE is the text base for - U-Boot proper, CONFIG_SPL_TEXT_BASE is the text base for SPL. - -## 20230205:第 32 期 - -### 内核动态 - -#### RISC-V 架构支持 - -* [v5: KVM perf support](http://lore.kernel.org/linux-riscv/20230205011515.1284674-1-atishp@rivosinc.com/) - - This series extends perf support for KVM. The KVM implementation relies - on the SBI PMU extension and trap n emulation of hpmcounter CSRs. - The KVM implementation exposes the virtual counters to the guest and internally - manage the counters using kernel perf counters. - -* [v4: Basic pinctrl support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230203141801.59083-1-hal.feng@starfivetech.com/) - - This patch series adds basic pinctrl support for StarFive JH7110 SoC. - -* [v3: StarFive's SDIO/eMMC driver support](http://lore.kernel.org/linux-riscv/20230203081913.81968-1-william.qiu@starfivetech.com/) - - This patchset adds initial rudimentary support for the StarFive - designware mobile storage host controller driver. And this driver will - be used in StarFive's VisionFive 2 board. The main purpose of adding - this driver is to accommodate the ultra-high speed mode of eMMC. - -* [v4: RISC-V kasan rework](http://lore.kernel.org/linux-riscv/20230203075232.274282-1-alexghiti@rivosinc.com/) - - As described in patch 2, our current kasan implementation is intricate, - so I tried to simplify the implementation and mimic what arm64/x86 are doing. - -* [v1: Documentation: RISC-V: Define Xlinuxs{s,m}aia](http://lore.kernel.org/linux-riscv/20230203001201.14770-1-palmer@rivosinc.com/) - - The AIA specification was only partially frozen, but provides no way to - refer to the subset of behavior that has been frozen. It seems like - there's not a whole lot of interest in the non-frozen behavior, so let's - just define an extension that only consists of the frozen behavior - -* [v1: RISC-V: Only provide the single-letter extensions in HWCAP](http://lore.kernel.org/linux-riscv/20230202233832.11036-1-palmer@rivosinc.com/) - - The recent refactoring led to us leaking some HWCAP bits to userspace - that didn't make much sense. With any luck we'll have a better scheme - soon, but for now just mask off those bits to avoid polluting userspace. - -* [v3: spi: Add support for stacked/parallel memories](http://lore.kernel.org/linux-riscv/20230202152258.512973-1-amit.kumar-mahapatra@amd.com/) - - This patch is in the continuation to the discussions which happened on - 'commit f89504300e94 ("spi: Stacked/parallel memories bindings")' for - adding dt-binding support for stacked/parallel memories. - -* [v1: RESEND: dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx](http://lore.kernel.org/linux-riscv/20230202072814.319903-1-uwu@icenowy.me/) - - T-Head C906/C910 CLINT is not compliant to SiFive ones (and even not - compliant to the newcoming ACLINT spec) because of lack of mtime register. - -* [v1: clocksource: riscv: Patch riscv_clock_next_event() jump before first use](http://lore.kernel.org/linux-riscv/512FC581-4097-4433-9C3D-CBCB7CD61954@rivosinc.com/) - - A static key is used to select between SBI and Sstc timer usage in - riscv_clock_next_event(), but currently the direction is resolved - after cpuhp_setup_state() is called (which sets the next event). - -* [v1: riscv: disable generation of unwind tables](http://lore.kernel.org/linux-riscv/mvmzg9xybqu.fsf@suse.de/) - - GCC 13 will enable -fasynchronous-unwind-tables by default on riscv. In - the kernel, we don't have any use for unwind tables yet, so disable them. - More importantly, the .eh_frame section brings relocations - (R_RISC_32_PCREL, R_RISCV_SET{6,8,16}, R_RISCV_SUB{6,8,16}) into modules - that we are not prepared to handle. - -* [v3: riscv: mm: hugetlb: Enable ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP](http://lore.kernel.org/linux-riscv/20230201015259.3222524-1-guoren@kernel.org/) - - Add HVO support for RISC-V; see commit 6be24bed9da3 ("mm: hugetlb: - introduce a new config HUGETLB_PAGE_FREE_VMEMMAP"). This patch is - similar to commit 1e63ac088f20 ("arm64: mm: hugetlb: enable - HUGETLB_PAGE_FREE_VMEMMAP for arm64"), and riscv's motivation is the same as arm64. - -* [v4: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230131151115.1972740-1-alexghiti@rivosinc.com/) - - This new version gets rid of the limitation that prevented KASAN kernels - to use the newly introduced parameters. - - While looking into KASLR, I fell onto commit aacd149b6238 ("arm64: head: - avoid relocating the kernel twice for KASLR"): it allows to use the fdt - functions very early in the boot process with KASAN enabled by simply - compiling a new version of those functions without instrumentation. - -* [v1: Add basic ACPI support for RISC-V](http://lore.kernel.org/linux-riscv/20230130182225.2471414-1-sunilvl@ventanamicro.com/) - - This patch series enables the basic ACPI infrastructure for RISC-V. - Supporting external interrupt controllers is in progress and hence it is - tested using polling based HVC SBI console and RAM disk. - -* [v3: RISC-V: Apply Zicboz to clear_page](http://lore.kernel.org/linux-riscv/20230130120128.1349464-1-ajones@ventanamicro.com/) - - When the Zicboz extension is available we can more rapidly zero naturally - aligned Zicboz block sized chunks of memory. As pages are always page - aligned and are larger than any Zicboz block size will be, then - clear_page() appears to be a good candidate for the extension. - -* [v2: Change PWM-controlled LED pin active mode and algorithm](http://lore.kernel.org/linux-riscv/20230130093229.27489-1-nylon.chen@sifive.com/) - - According to the circuit diagram of User LEDs - RGB described in the - manual hifive-unleashed-a00.pdf[0] and hifive-unmatched-schematics-v3.pdf[1]. - The behavior of PWM is acitve-high. - -* [v2: riscv: mm: Implement pmdp_collapse_flush for THP](http://lore.kernel.org/linux-riscv/20230130074815.1694055-1-mchitale@ventanamicro.com/) - - When THP is enabled, 4K pages are collapsed into a single huge - page using the generic pmdp_collapse_flush() which will further - use flush_tlb_range() to shoot-down stale TLB entries. - -* [v2: mm, arch: add generic implementation of pfn_valid() for FLATMEM](http://lore.kernel.org/linux-riscv/20230129124235.209895-1-rppt@kernel.org/) - - Every architecture that supports FLATMEM memory model defines its own - version of pfn_valid() that essentially compares a pfn to max_mapnr. - -* [v1: riscv: Add header include guards to insn.h](http://lore.kernel.org/linux-riscv/20230129094242.282620-1-liaochang1@huawei.com/) - - Add header include guards to insn.h to prevent repeating declaration of - any identifiers in insn.h. - -* [v1: riscv: support arch_has_hw_pte_young()](http://lore.kernel.org/linux-riscv/20230129064956.143664-1-tjytimi@163.com/) - - The arch_has_hw_pte_young() is false for riscv by default. If it's - false, page table walk is almost skipped for MGLRU reclaim. And it - will also cause useless step in __wp_page_copy_user(). - -#### 进程调度 - -* [v1: sched/isolation: Prep work for pcp cache draining isolation](http://lore.kernel.org/lkml/20230203232409.163847-1-frederic@kernel.org/) - - For reference: https://lore.kernel.org/lkml/20230125073502.743446-1-leobras@redhat.com/ - And the latest proposal: https://lore.kernel.org/lkml/Y90mZQhW89HtYfT9@dhcp22.suse.cz/ - -* [v1: cpu,sched: Mark arch_cpu_idle_dead() __noreturn](http://lore.kernel.org/lkml/cover.1675461757.git.jpoimboe@kernel.org/) - - These are some minor changes to enable the __noreturn attribute for - arch_cpu_idle_dead(). (If there are no objections, I can merge the - entire set through the tip tree.) - - Until recently [1], in Xen, when a previously offlined CPU was brought - back online, it unexpectedly resumed execution where it left off in the - middle of the idle loop by returning from play_dead() and its caller - arch_cpu_idle_dead(). - -* [v1: sched/deadline: Add more reschedule cases to prio_changed_dl()](http://lore.kernel.org/lkml/20230202182854.3696665-1-vschneid@redhat.com/) - - On that kernel, it is quite easy to trigger using rt-tests's deadline_test - [1] with the test running on isolated CPUs (this reduces the chance of - something unrelated setting TIF_NEED_RESCHED on the idle tasks, making the - issue even more obvious as the hung task detector chimes in). - -* [v1: kernel/sched/core: adjust rt_priority accordingly when prio is changed](http://lore.kernel.org/lkml/1675245680-2811-1-git-send-email-chensong_2000@189.cn/) - - When a high priority process is acquiring a rtmutex which is held by a - low priority process, the latter's priority will be boosted up by calling - rt_mutex_setprio->__setscheduler_prio. - -* [v2: sched/numa: Enhance vma scanning](http://lore.kernel.org/lkml/cover.1675159422.git.raghavendra.kt@amd.com/) - - The patchset proposes one of the enhancements to numa vma scanning - suggested by Mel. This is continuation of [2]. Though I have removed - RFC, I do think some parts need more feedback and refinement. - - Existing mechanism of scan period involves, scan period derived from - per-thread stats. Process Adaptive autoNUMA [1] proposed to gather NUMA - fault stats at per-process level to capture aplication behaviour better. - -* [v1: sched: Consider capacity for certain load balancing decisions](http://lore.kernel.org/lkml/20230201012032.2874481-1-xii@google.com/) - - After load balancing was split into different scenarios, CPU capacity - is ignored for the "migrate_task" case, which means a thread can stay - on a softirq heavy cpu for an extended amount of time. - -* [v2: sched: pick_next_rt_entity(): checked list_entry](http://lore.kernel.org/lkml/20230128-list-entry-null-check-sched-v2-1-d8e010cce91b@diag.uniroma1.it/) - - Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") - removed any path which could make pick_next_rt_entity() return NULL. - However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of - pick_next_rt_entity()) still checks the error condition, which can - never happen, since list_entry() never returns NULL. - -#### 内存管理 - -* [v1: bpf-next: bpf, mm: introduce cgroup.memory=nobpf](http://lore.kernel.org/linux-mm/20230205065805.19598-1-laoar.shao@gmail.com/) - - So let's give the user an option to disable bpf memory accouting. - - The idea of "cgroup.memory=nobpf" is originally by Tejun[1]. - - [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/ - -* [v9: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230203190413.2559707-1-nphamcs@gmail.com/) - - There is currently no good way to query the page cache state of large - file sets and directory trees. There is mincore(), but it scales poorly: - the kernel writes out a lot of bitmap data that userspace has to - aggregate, when the user really doesn not care about per-page information - in that case. - -* [v3: folio based filemap_map_pages()](http://lore.kernel.org/linux-mm/20230203131636.1648662-1-fengwei.yin@intel.com/) - - Current filemap_map_pages() uses page granularity even when - underneath folio is large folio. Making it use folio based - granularity allows batched refcount, rmap and mm counter - update. Which brings performance gain. - -* [v1: mm/page_alloc: reduce fallbacks to (MIGRATE_PCPTYPES - 1)](http://lore.kernel.org/linux-mm/20230203100132.1627787-1-yajun.deng@linux.dev/) - - The commit 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable - pageblocks with others") has removed MIGRATE_CMA and MIGRATE_ISOLATE from - fallbacks list. so there is no need to add an element at the end of every type. - -* [v7: shoot lazy tlbs (lazy tlb refcount scalability improvement)](http://lore.kernel.org/linux-mm/20230203071837.1136453-1-npiggin@gmail.com/) - - (Sorry about the double send) - - This series improves scalability of context switching between user and - kernel threads on large systems with a threaded process spread across a lot of CPUs. - -* [v1: Ignore non-LRU-based reclaim in memcg reclaim](http://lore.kernel.org/linux-mm/20230202233229.3895713-1-yosryahmed@google.com/) - - Reclaimed pages through other means than LRU-based reclaim are tracked - through reclaim_state in struct scan_control, which is stashed in - current task_struct. These pages are added to the number of reclaimed - pages through LRUs. - -* [v1: mm: memcontrol: don't account swap failures not due to cgroup limits](http://lore.kernel.org/linux-mm/20230202155626.1829121-1-hannes@cmpxchg.org/) - - Upon closer examination, this is an ARM64 machine that doesn't support - swapping out THPs. In that case, the first get_swap_page() fails, and - the kernel falls back to splitting the THP and swapping the 4k - constituents one by one. /proc/vmstat confirms this with a high rate - of thp_swpout_fallback events. - -* [v2: Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()](http://lore.kernel.org/linux-mm/20230202145030.223740842@infradead.org/) - - Since Linus hated on cmpxchg_double(), a few patches to get rid of it, as - proposed here: - - https://lkml.kernel.org/r/Y2U3WdU61FvYlpUh@hirez.programming.kicks-ass.net - - These patches are based on 6.2.0-rc6 + cryptodev-2.6, but also apply to next/master. - - Available here: - - git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git core/wip-u128 - -* [v10: Implement IOCTL to get and/or the clear info about PTEs](http://lore.kernel.org/linux-mm/20230202112915.867409-1-usama.anjum@collabora.com/) - - Historically, soft-dirty PTE bit tracking has been used in the CRIU - project. The procfs interface is enough for finding the soft-dirty bit - status and clearing the soft-dirty bit of all the pages of a process. - We have the use case where we need to track the soft-dirty PTE bit for - only specific pages on-demand. We need this tracking and clear mechanism - of a region of memory while the process is running to emulate the - getWriteWatch() syscall of Windows. - -* [v1: mm: introduce entrance for root_mem_cgroup's current](http://lore.kernel.org/linux-mm/1675312377-4782-1-git-send-email-zhaoyang.huang@unisoc.com/) - - Introducing memory.root_current for the memory charges on root_mem_cgroup. - -* [v1: mm/bpf/perf: Store build id in file object](http://lore.kernel.org/linux-mm/20230201135737.800527-1-jolsa@kernel.org/) - - This RFC patchset adds new config CONFIG_FILE_BUILD_ID option, which adds - build id object pointer to the file object when enabled. The build id is - read/populated when the file is mmap-ed. - -* [v4: mm/vmalloc: replace BUG_ON to a simple if statement](http://lore.kernel.org/linux-mm/20230201115142.GA7772@min-iamroot/) - - As per the coding standards, in the event of an abnormal condition that - should not occur under normal circumstances, the kernel should attempt - recovery and proceed with execution, rather than halting the machine. - -* [v4: mm/vmalloc.c: allow vread() to read out vm_map_ram areas](http://lore.kernel.org/linux-mm/20230201091339.61761-1-bhe@redhat.com/) - - Stephen reported vread() will skip vm_map_ram areas when reading out - /proc/kcore with drgn utility. Please see below link to get more details. - -* [v4: mm: hwposion: support recovery from ksm_might_need_to_copy()](http://lore.kernel.org/linux-mm/20230201074433.96641-1-wangkefeng.wang@huawei.com/) - - When the kernel copy a page from ksm_might_need_to_copy(), but runs - into an uncorrectable error, it will crash since poisoned page is - consumed by kernel, this is similar to the issue recently fixed by - Copy-on-write poison recovery. - -* [v1: kasan: use %zd format for printing size_t](http://lore.kernel.org/linux-mm/20230201071312.2224452-1-arnd@kernel.org/) - - The size_t type depends on the architecture, so %lu does not work - on most 32-bit ones: - - In file included from include/kunit/assert.h:13, - from include/kunit/test.h:12, - from mm/kasan/report.c:12: - mm/kasan/report.c: In function 'describe_object_addr': - include/linux/kern_levels.h:5:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] - mm/kasan/report.c:270:9: note: in expansion of macro 'pr_err' - 270 | pr_err("The buggy address is located %d bytes %s of\n" - | ^ - -* [v1: mm/khugepaged: skip shmem with armed userfaultfd](http://lore.kernel.org/linux-mm/20230201034137.2463113-1-stevensd@google.com/) - - Collapsing memory in a vma that has an armed userfaultfd results in - zero-filling any missing pages, which breaks user-space paging for those - filled pages. Avoid khugepage bypassing userfaultfd by not collapsing - pages in shmem reached via scanning a vma with an armed userfaultfd if - doing so would zero-fill any pages. - -* [v1: mm: move FOLL_PIN debug accounting under CONFIG_DEBUG_VM](http://lore.kernel.org/linux-mm/54b0b07a-c178-9ffe-b5af-088f3c21696c@kernel.dk/) - - which wasn't there before. The node page state counters are percpu, but - with a very low threshold. On my setup, every 108th update ends up - needing to punt to two atomic_lond_add()'s, which is causing this above regression. - -* [v1: mm,page_alloc,cma: configurable CMA utilization](http://lore.kernel.org/linux-mm/20230131071052.GB19285@hu-sbhattip-lv.qualcomm.com/) - - Commit 16867664936e ("mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations") - added support to use CMA pages when more than 50% of total free pages in - the zone are free CMA pages. - -* [v1: mm/gup: Add folio to list when folio_isolate_lru() succeed](http://lore.kernel.org/linux-mm/20230131063206.28820-1-Kuan-Ying.Lee@mediatek.com/) - - If we call folio_isolate_lru() successfully, we will get - return value 0. We need to add this folio to the movable_pages_list. - -* [v2: mm-unstable: Convert a couple migrate functions to use folios](http://lore.kernel.org/linux-mm/20230130214352.40538-1-vishal.moola@gmail.com/) - - This patch set introduces folio_movable_ops() and converts 3 functions - in mm/migrate.c to use folios. It also introduces - folio_get_nontail_page() for folio conversions which may want to - distinguish between head and tail pages. - -#### 文件系统 - -* [v2: Support negative dentries on case-insensitive ext4 and f2fs](http://lore.kernel.org/linux-fsdevel/20230203210039.16289-1-krisman@suse.de/) - - This patchset enables negative dentries for case-insensitive directories - in ext4/f2fs. It solves the corner cases for this feature, including - those already tested by fstests (generic/556). It also solves an - existing bug with the existing implementation where old negative - dentries are left behind after a directory conversion to case-insensitive. - -* [v2: fsdax: dax_unshare_iter() should return a valid length](http://lore.kernel.org/linux-fsdevel/1675388906-50-1-git-send-email-ruansy.fnst@fujitsu.com/) - - The copy_mc_to_kernel() will return 0 if it executed successfully. - Then the return value should be set to the length it copied. - -* [v1: RESEND: pipe: avoid creating empty pipe buffers](http://lore.kernel.org/linux-fsdevel/20230131121127.466443-1-wiktorg@google.com/) - - pipe_write cannot be called on notification pipes so - post_one_notification cannot race it. - Locking and second pipe_full check are thus redundant. - -* [v9: DEPT(Dependency Tracker)](http://lore.kernel.org/linux-fsdevel/1675154394-25598-1-git-send-email-max.byungchul.park@gmail.com/) - - Nevertheless, I apologize for the lack of document. I promise to add it - before it gets needed to use DEPT's APIs by users. For now, you can use - DEPT just with CONFIG_DEPT on. - -* [GIT PULL: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-fsdevel/3351099.1675077249@warthog.procyon.org.uk/) - - Could you consider pulling this patchset into the block tree? I think that - Al's fears wrt to pinned pages being removed from page tables causing deadlock - have been answered. Granted, there is still the issue of how to handle - vmsplice and a bunch of other places to fix, not least skbuff handling. - -* [v11: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-fsdevel/20230130074129.28120-1-naresh.kamboju@linaro.org/) - - Build test pass on arm, arm64, i386, mips, parisc, powerpc, riscv, s390, sh, - sparc and x86_64. - Boot and LTP smoke pass on qemu-arm64, qemu-armv7, qemu-i386 and qemu-x86_64. - -* [v4: RESEND: fs: coredump: using preprocessor directives for dump_emit_page](http://lore.kernel.org/linux-fsdevel/20230130013347.17654-1-xiehongyu1@kylinos.cn/) - - When CONFIG_COREDUMP is set and CONFIG_ELF_CORE is not, you'll get warnings - like: - fs/coredump.c:841:12: error: ‘dump_emit_page’ defined but not used - [-Werror=unused-function] - 841 | static int dump_emit_page(struct coredump_params *cprm, struct - page *page) - -* [v1: fscrypt: Copy the memcg information to the ciphertext page](http://lore.kernel.org/linux-fsdevel/20230129121851.2248378-1-willy@infradead.org/) - - Both f2fs and ext4 end up passing the ciphertext page to - wbc_account_cgroup_owner(). At the moment, the ciphertext page appears - to belong to no cgroup, so it is accounted to the root_mem_cgroup instead of whatever cgroup the original page was in. - -* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) - - This removes the dependency on interrupts to wake up task. Set task - state as TASK_RUNNING, if need_resched() returns true, - while polling for IO completion. - Earlier, polling task used to sleep, relying on interrupt to wake it up. - This made some IO take very long when interrupt-coalescing is enabled in NVMe. - -#### 网络设备 - -* [v2: net-next: add support for per action hw stats](http://lore.kernel.org/netdev/20230205135525.27760-1-ozsh@nvidia.com/) - - This series provides the platform to query per action stats for in_hw flows. - - The first four patches are preparation patches with no functionality change. - The fifth patch re-uses the existing flow action stats api to query action - stats for both classifier and action dumps. - The rest of the patches add per action stats support to the Mellanox driver. - -* [v1: net-next: net: move more duplicate code of ovs and tc conntrack into nf_conntrack_ovs](http://lore.kernel.org/netdev/cover.1675548023.git.lucien.xin@gmail.com/) - - We've moved some duplicate code into nf_nat_ovs in: - - "net: eliminate the duplicate code in the ct nat functions of ovs and tc" - -* [v3: net-next: tuntap: correctly initialize socket uid](http://lore.kernel.org/netdev/20230131-tuntap-sk-uid-v3-0-81188b909685@diag.uniroma1.it/) - - sock_init_data() assumes that the `struct socket` passed in input is - contained in a `struct socket_alloc` allocated with sock_alloc(). - However, tap_open() and tun_chr_open() pass a `struct socket` embedded - in a `struct tap_queue` and `struct tun_file` respectively, both - allocated with sk_alloc(). - This causes a type confusion when issuing a container_of() with - SOCK_INODE() in sock_init_data() which results in assigning a wrong - sk_uid to the `struct sock` in input. - -* [v1: net-next: vxlan: Add MDB support](http://lore.kernel.org/netdev/20230204170801.3897900-1-idosch@nvidia.com/) - - This patchset implements MDB support in the VXLAN driver, allowing it to - selectively forward IP multicast traffic to VTEPs with interested - receivers instead of flooding it to all the VTEPs as BUM. - -* [v1: net-next:pull request: implement devlink reload in ice](http://lore.kernel.org/netdev/20230203211456.705649-1-anthony.l.nguyen@intel.com/) - - Michal Swiatkowski says: - - This is a part of changes done in patchset [0]. Resource management is - kind of controversial part, so I split it into two patchsets. - - It is the first one, covering refactor and implement reload API call. - -* [v1: firmware: qcom_scm: Move qcom_scm.h to include/linux/firmware/qcom/](http://lore.kernel.org/netdev/20230203210956.3580811-1-quic_eberman@quicinc.com/) - - Move include/linux/qcom_scm.h to include/linux/firmware/qcom/qcom_scm.h. - This removes 1 of a few remaining Qualcomm-specific headers into a more - approciate subdirectory under include/. - -* [v1: net-next: ionic: rx buffers and on-chip descriptors](http://lore.kernel.org/netdev/20230203210016.36606-1-shannon.nelson@amd.com/) - - We start with a couple of house-keeping patches that were - originally presented for 'net', then we add support for on-chip - descriptor rings and Rx buffer page cacheing. - -* [v2: 9p/client: don't assume signal_pending() clears on recalc_sigpending()](http://lore.kernel.org/netdev/9422b998-5bab-85cc-5416-3bb5cf6dd853@kernel.dk/) - - signal_pending() really means that an exit to userspace is required to - clear the condition, as it could be either an actual signal, or it could - be TWA_SIGNAL based task_work that needs processing. The 9p client - does a recalc_sigpending() to take care of the former, but that still - leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL - task_work pending, then we'll sit in a tight loop spinning as - signal_pending() remains true even after recalc_sigpending(). - -* [v11: nvme-tcp receive offloads](http://lore.kernel.org/netdev/20230203132705.627232-1-aaptel@nvidia.com/) - - Here is the next iteration of our nvme-tcp receive offload series. - - The main changes are in patch 3 (netlink). - - Rebased on top of today net-next - - The changes are also available through git: - - Repo: https://github.com/aaptel/linux.git branch nvme-rx-offload-v11 - Web: https://github.com/aaptel/linux/tree/nvme-rx-offload-v11 - - The NVMeTCP offload was presented in netdev 0x16 (video now available): - - https://netdevconf.info/0x16/session.html?NVMeTCP-Offload-%E2%80%93-Implementation-and-Performance-Gains - - https://youtu.be/W74TR-SNgi4 - -* [v1: atm: eni: replace DPRINTK macro with pr_debug()](http://lore.kernel.org/netdev/00f95478-c9cc-1f4b-820e-d427a9113418@icloud.com/) - - The macro DPRINTK is in use in lots of different source files, varying in - their implementation. One of those files is drivers/atm/eni.c. - - Replacing them with pr_debug() and their counterparts makes it more - consistent and easier to read. - -* [v1: Bluetooth: Make sure LE create conn cancel is sent when timeout](http://lore.kernel.org/netdev/20230203173900.1.I9ca803e2f809e339da43c103860118e7381e4871@changeid/) - - When sending LE create conn command, we set a timer with a duration of - HCI_LE_CONN_TIMEOUT before timing out and calling - create_le_conn_complete. Additionally, when receiving the command - complete, we also set a timer with the same duration to call le_conn_timeout. - -* [v1: Bluetooth: Free potentially unfreed SCO connection](http://lore.kernel.org/netdev/20230203173024.1.Ieb6662276f3bd3d79e9134ab04523d584c300c45@changeid/) - - When it happens, hci_cs_setup_sync_conn won't be able to obtain the - reference to the SCO connection, so it will be stuck and potentially hinder subsequent connections to the same device. - - This patch prevents that by also deleting the SCO connection if it is - still not established when the corresponding ACL connection is deleted. - -* [v3: net-next: Wangxun interrupt and RxTx support](http://lore.kernel.org/netdev/20230203091135.3294377-1-jiawenwu@trustnetic.com/) - - Configure interrupt, setup RxTx ring, support to receive and transmit packets. - -* [v1: net: ethernet: mtk_eth_soc: various enhancements](http://lore.kernel.org/netdev/cover.1675407169.git.daniel@makrotopia.org/) - - This series brings a variety of fixes and enhancements for mtk_eth_soc, - adds support for the MT7981 SoC and facilitates sharing the SGMII PCS - code between mtk_eth_soc and mt7530. - -* [v7: io_uring: add napi busy polling support](http://lore.kernel.org/netdev/20230203060850.3060238-1-shr@devkernel.io/) - - This adds the napi busy polling support in io_uring.c. It adds a new - napi_list to the io_ring_ctx structure. This list contains the list of - napi_id's that are currently enabled for busy polling. This list is - used to determine which napi id's enabled busy polling. For faster - access it also adds a hash table. - -* [v1: next: wifi: mwifiex: Replace one-element array with flexible-array member](http://lore.kernel.org/netdev/Y9xkjXeElSEQ0FPY@work/) - - One-element arrays are deprecated, and we are replacing them with flexible - array members instead. So, replace one-element array with flexible-array - member in struct mwifiex_ie_types_rates_param_set. - -* [v1: next: wifi: mwifiex: Replace one-element arrays with flexible-array members](http://lore.kernel.org/netdev/Y9xkECG3uTZ6T1dN@work/) - - One-element arrays are deprecated, and we are replacing them with flexible - array members instead. So, replace one-element arrays with flexible-array - members in multiple structures. - -* [v2: net-next: net: page_pool: use in_softirq() instead](http://lore.kernel.org/netdev/20230203011612.194701-1-dqfext@gmail.com/) - - We use BH context only for synchronization, so we don't care if it's - actually serving softirq or not. - - As a side node, in case of threaded NAPI, in_serving_softirq() will - return false because it's in process context with BH off, making - page_pool_recycle_in_cache() unreachable. - -#### 安全增强 - -* [v1: media: imx-jpeg: Bounds check sizeimage access](http://lore.kernel.org/linux-hardening/20230204183804.never.323-kees@kernel.org/) - - The call of mxc_jpeg_get_plane_size() from mxc_jpeg_dec_irq() sets - plane_no argument to 1. - -* [v1: scsi: mpi3mr: Replace 1-element array with flex-array](http://lore.kernel.org/linux-hardening/20230204183715.never.937-kees@kernel.org/) - - Nothing else defined MPI3_NVME_ENCAP_CMD_MAX, so the "command" - buffer was being defined as a fake flexible array of size 1. Replace - this with a proper flex array. - -* [v1: USB: ene_usb6250: Allocate enough memory for full object](http://lore.kernel.org/linux-hardening/20230204183546.never.849-kees@kernel.org/) - - The allocation of PageBuffer is 512 bytes in size, but the dereferencing - of struct ms_bootblock_idi (also size 512) happens at a calculated offset - within the allocation, which means the object could potentially extend - beyond the end of the allocation. Avoid this case by just allocating - enough space to catch any accesses beyond the end. - -* [v1: btrfs: sysfs: Handle NULL return values](http://lore.kernel.org/linux-hardening/20230204183510.never.909-kees@kernel.org/) - - Each of to_fs_info(), discard_to_fs_info(), and to_space_info() can - return NULL values. Check for these so it's not possible to perform - calculations against NULL pointers. - -* [v1: bpf: Replace bpf_lpm_trie_key 0-length array with flexible array](http://lore.kernel.org/linux-hardening/20230204183241.never.481-kees@kernel.org/) - - Replace deprecated 0-length array in struct bpf_lpm_trie_key with flexible array. - - This includes fixing the selftest which was incorrectly using a - variable length struct as a header, identified earlier[1]. Avoid this - by just explicitly including the prefixlen member instead of struct - bpf_lpm_trie_key. - - [1] https://lore.kernel.org/all/202206281009.4332AA33@keescook/ - -* [v2: lm85: Bounds check to_sensor_dev_attr()->index usage](http://lore.kernel.org/linux-hardening/20230203223250.gonna.713-kees@kernel.org/) - - The index into various register arrays was not bounds checked. Provide a - simple wrapper to bounds check the index, adding robustness in the face - of memory corruption, unexpected index manipulation, etc. - -* [v1: randstruct: temporarily disable clang support](http://lore.kernel.org/linux-hardening/20230203194201.92015-1-ebiggers@kernel.org/) - - Randstruct with clang is currently unsafe to use in any clang release - that supports it, due to a clang bug that is causing miscompilations: - "-frandomize-layout-seed inconsistently randomizes all-function-pointers - structs" (https://github.com/llvm/llvm-project/issues/60349). Disable - it temporarily until the bug is fixed and the fix is released in a clang - version that can be checked for. - -* [v1: uaccess: Add minimum bounds check on kernel buffer size](http://lore.kernel.org/linux-hardening/20230203193523.never.667-kees@kernel.org/) - - While there is logic about the difference between ksize and usize, - copy_struct_from_user() didn't check the size of the destination buffer - (when it was known) against ksize. Add this check so there is an upper - bounds check on the possible memset() call, otherwise lower bounds - checks made by callers will trigger bounds warnings under -Warray-bounds. - -* [v2: arm64: Support Clang UBSAN trap codes for better reporting](http://lore.kernel.org/linux-hardening/20230203173946.gonna.972-kees@kernel.org/) - - When building with CONFIG_UBSAN_TRAP=y on arm64, Clang encodes the UBSAN - check (handler) type in the esr. Extract this and actually report these - traps as coming from the specific UBSAN check that tripped. - -* [v1: pstore/blk: Export a method to implemente panic_write()](http://lore.kernel.org/linux-hardening/20230203113515.93540-1-victor@allwinnertech.com/) - - The panic_write() is necessary to write the pstore frontend message - to blk devices when panic. Here is a way to register panic_write when - we use "best_effort" way to register the pstore blk-backend. - -* [v1: next: xen: Replace one-element array with flexible-array member](http://lore.kernel.org/linux-hardening/Y9xjN6Wa3VslgXeX@work/) - - One-element arrays are deprecated, and we are replacing them with flexible - array members instead. So, replace one-element array with flexible-array - member in struct xen_page_directory. - - This helps with the ongoing efforts to tighten the FORTIFY_SOURCE - routines on memcpy() and help us make progress towards globally - enabling -fstrict-flex-arrays=3 [1]. - -* [v1: next: xfs: Replace one-element arrays with flexible-array members](http://lore.kernel.org/linux-hardening/Y9xiYmVLRIKdpJcC@work/) - - One-element arrays are deprecated, and we are replacing them with flexible - array members instead. So, replace one-element arrays with flexible-array - members in structures xfs_attr_leaf_name_local and - xfs_attr_leaf_name_remote. - -* [v2: 4.14: Backport oops_limit to 4.14](http://lore.kernel.org/linux-hardening/20230203003354.85691-1-ebiggers@kernel.org/) - - This series backports the patchset - "exit: Put an upper limit on how often we can oops" - (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#u) - to 4.14, as recommended at - https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html - -* [v2: 4.19: Backport oops_limit to 4.19](http://lore.kernel.org/linux-hardening/20230203002717.49198-1-ebiggers@kernel.org/) - - This series backports the patchset - "exit: Put an upper limit on how often we can oops" - (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#u) - to 4.19, as recommended at - https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html - -* [v1: 5.4: Backport oops_limit to 5.4](http://lore.kernel.org/linux-hardening/20230202044255.128815-1-ebiggers@kernel.org/) - - This series backports the patchset - "exit: Put an upper limit on how often we can oops" - (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#u) - to 5.4, as recommended at - https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html - This follows the backports to 5.10 and 5.15 which already released. - -* [v1: use canonical ftrace path whenever possible](http://lore.kernel.org/linux-hardening/20230130181915.1113313-1-zwisler@google.com/) - - The canonical location for the tracefs filesystem is at /sys/kernel/tracing. - - But, from Documentation/trace/ftrace.rst: - - Before 4.1, all ftrace tracing control files were within the debugfs - file system, which is typically located at /sys/kernel/debug/tracing. - -#### 异步 IO - -* [v7: liburing: add api for napi busy poll](http://lore.kernel.org/io-uring/20230205002424.102422-1-shr@devkernel.io/) - - This adds two new api's to set/clear the napi busy poll settings. The two - new functions are called: - - io_uring_register_napi - - io_uring_unregister_napi - - The patch series also contains the documentation for the two new functions - and two example programs. The client program is called napi-busy-poll-client - and the server program napi-busy-poll-server. The client measures the - roundtrip times of requests. - -* [v2: io_uring,audit: don't log IORING_OP_MADVISE](http://lore.kernel.org/io-uring/b5dfdcd541115c86dbc774aa9dd502c964849c5f.1675282642.git.rgb@redhat.com/) - - fadvise and madvise both provide hints for caching or access pattern for - file and memory respectively. Skip them. - -* [GIT PULL: Upgrade to clang-17 (for liburing's CI)](http://lore.kernel.org/io-uring/a9aac5c7-425d-8011-3c7c-c08dfd7d7c2f@gnuweeb.org/) - - clang-17 is now available. Upgrade the clang version in the liburing's - CI to clang-17. - - Two prep patches to address `-Wextra-semi-stmt` warnings: - - - Remove unnecessary semicolon (Alviro) - - - Wrap the CHECK() macro with a do-while statement (Alviro) - -#### Rust For Linux - -* [v1: rust: sync: Arc: Implement Debug and Display](http://lore.kernel.org/rust-for-linux/20230201232244.212908-1-boqun.feng@gmail.com/) - - I found that our Arc doesn't implement `Debug` or `Display` when I tried - to play with them, therefore add these implementation. - - Wedson, I know that you are considering to get rid of `ArcBorrow`, so - the patch #3 may have some conflicts with what you may be working on. - -* [v3: rust: MAINTAINERS: Add the zulip link](http://lore.kernel.org/rust-for-linux/20230201184525.272909-1-boqun.feng@gmail.com/) - - Zulip organization "rust-for-linux" was created 2 years ago[1] and has - proven to be a great place for Rust related discussion, therefore - add the information in MAINTAINERS file so that newcomers have more - options to find guide and help. - -* [v1: rust: add this_module macro](http://lore.kernel.org/rust-for-linux/20230131130841.318301-1-yakoyoku@gmail.com/) - - Adds a Rust equivalent to the handy THIS_MODULE macro from C. - -#### BPF - -* [v2: bpf-next: Add support for tracing programs in BPF_PROG_RUN](http://lore.kernel.org/bpf/20230203182812.20657-1-grantseltzer@gmail.com/) - - This patch changes the behavior of how BPF_PROG_RUN treats tracing - (fentry/fexit) programs. Previously only a return value is injected - but the actual program was not run. New behavior mirrors that of - running raw tracepoint BPF programs which actually runs the - instructions of the program via `bpf_prog_run()` - -* [v1: uapi: add missing ip/ipv6 header dependencies for linux/stddef.h](http://lore.kernel.org/bpf/20230203160448.1314205-1-herton@redhat.com/) - - Since commit 58e0be1ef6118 ("net: use struct_group to copy ip/ipv6 - header addresses"), ip and ipv6 headers started to use the __struct_group - definition, which is defined at include/uapi/linux/stddef.h. However, - linux/stddef.h isn't explicitly included in include/uapi/linux/{ip,ipv6}.h, - -* [v3: bpf-next: Document kfunc lifecycle / stability expectations](http://lore.kernel.org/bpf/20230203155727.793518-1-void@manifault.com/) - - This is v3 of the proposal for documenting BPF kfunc lifecycle and - stability. - -* [v1: bpf-next: libbpf: allow users to set kprobe/uprobe attach mode](http://lore.kernel.org/bpf/20230203031742.1730761-1-imagedong@tencent.com/) - - By default, libbpf will attach the kprobe/uprobe eBPF program in the - latest mode that supported by kernel. In this series, we add the support - to let users manually attach kprobe/uprobe in legacy or perf mode in the - 1th patch. - - And in the 2th patch, we add the selftests for it. - - *** BLURB HERE *** - -* [v2: perf lock contention: Improve aggr x filter combination](http://lore.kernel.org/bpf/20230203021324.143540-1-namhyung@kernel.org/) - - The callstack filter can be useful to debug lock issues but it has a - limitation that it only works with caller aggregation mode (which is the - default setting). IOW it cannot filter by callstack when showing tasks - or lock addresses/names. - -* [v1: bpf-next: selftests/bpf: Initialize tc in xdp_synproxy](http://lore.kernel.org/bpf/20230202235335.3403781-1-iii@linux.ibm.com/) - - xdp_synproxy/xdp fails in CI with: - - Error: bpf_tc_hook_create: File exists - - The XDP version of the test should not be calling bpf_tc_hook_create(); - the reason it's happening anyway is that if we don't specify --tc on the - command line, tc variable remains uninitialized. - -* [v1: tools/resolve_btfids: Tidy HOST_OVERRIDES](http://lore.kernel.org/bpf/20230202224253.40283-1-irogers@google.com/) - - Don't set EXTRA_CFLAGS to HOSTCFLAGS, ensure CROSS_COMPILE isn't - passed through. - - This patch is based on top of: - https://lore.kernel.org/bpf/20230202112839.1131892-1-jolsa@kernel.org/ - -* [v1: net: virtio-net: Keep stop() to follow mirror sequence of open()](http://lore.kernel.org/bpf/20230202163516.12559-1-parav@nvidia.com/) - - Cited commit in fixes tag frees rxq xdp info while RQ NAPI is - still enabled and packet processing may be ongoing. - - Follow the mirror sequence of open() in the stop() callback. - This ensures that when rxq info is unregistered, no rx - packet processing is ongoing. - -* [v1: bpf-next: tools/resolve_btfids: Compile resolve_btfids as host program](http://lore.kernel.org/bpf/20230202112839.1131892-1-jolsa@kernel.org/) - - Making resolve_btfids to be compiled as host program so - we can avoid cross compile issues as reported by Nathan. - - Also we no longer need HOST_OVERRIDES for BINARY target, - just for 'prepare' targets. - -* [v1: virtio-net: support AF_XDP zero copy](http://lore.kernel.org/bpf/20230202110058.130695-1-xuanzhuo@linux.alibaba.com/) - - XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero - copy feature of xsk (XDP socket) needs to be supported by the driver. The - performance of zero copy is very good. mlx5 and intel ixgbe already support - this feature, This patch set allows virtio-net to support xsk's zerocopy xmit feature. - -* [v2: bpf-next: libbpf: Add wakeup_events to creation options](http://lore.kernel.org/bpf/20230202062549.632425-1-arilou@gmail.com/) - - Add option to set when the perf buffer should wake up, by default the - perf buffer becomes signaled for every event that is being pushed to it. - -* [v1: virtio-net: close() to follow mirror of open()](http://lore.kernel.org/bpf/20230202050038.3187-1-parav@nvidia.com/) - - This two small patches improves ndo_close() callback to follow - the mirror sequence of ndo_open() callback. This improves the code auditing - and also ensure that xdp rxq info is not unregistered while NAPI on RXQ is ongoing. - -* [v1: bpf-next: bpf, mm: bpf memory usage](http://lore.kernel.org/bpf/20230202014158.19616-1-laoar.shao@gmail.com/) - - Currently we can't get bpf memory usage reliably. bpftool now shows the - bpf memory footprint, which is difference with bpf memory usage. - -* [v2: tools/resolve_btfids: Tidy host CFLAGS forcing](http://lore.kernel.org/bpf/20230201213743.44674-1-irogers@google.com/) - - Avoid passing CROSS_COMPILE to submakes and ensure CFLAGS is forced to - HOSTCFLAGS for submake builds. This fixes problems with cross - compilation. - - Tidy to not unnecessarily modify/export CFLAGS, make the override for - prepare and build clearer. - -* [v3: Documentation/bpf: Document API stability expectations for kfuncs](http://lore.kernel.org/bpf/20230201174449.94650-1-toke@redhat.com/) - - Following up on the discussion at the BPF office hours (and subsequent - discussion), this patch adds a description of API stability expectations - for kfuncs. The goal here is to manage user expectations about what kind of - stability can be expected for kfuncs exposed by the kernel. - -* [v1: Add ftrace direct call for arm64](http://lore.kernel.org/bpf/20230201163420.1579014-1-revest@chromium.org/) - - This series adds ftrace direct call support to arm64. - This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64. - - It is meant to apply on top of the arm64 tree which contains Mark Rutland's - series on CALL_OPS [1] under the for-next/ftrace tag. - -* [v1: bpf-next: bpf: Replace BPF_ALU and BPF_JMP with BPF_ALU32 and BPF_JMP64](http://lore.kernel.org/bpf/1675254998-4951-1-git-send-email-yangtiezhu@loongson.cn/) - - The intention of this patchset is to make the code more readable, - no functional changes, based on bpf-next. - - If this patchset makes no sense, please ignore it and sorry for that. - -* [v5: bpf-next: xdp: introduce xdp-feature support](http://lore.kernel.org/bpf/cover.1675245257.git.lorenzo@kernel.org/) - - Introduce the capability to export the XDP features supported by the NIC. - Introduce a XDP compliance test tool (xdp_features) to check the features - exported by the NIC match the real features supported by the driver. - Allow XDP_REDIRECT of non-linear XDP frames into a devmap. - -* [v1: bpf-next: ice: add XDP mbuf support](http://lore.kernel.org/bpf/20230131204506.219292-1-maciej.fijalkowski@intel.com/) - - although this work started as an effort to add multi-buffer XDP support - to ice driver, as usual it turned out that some other side stuff needed to be addressed, so let me give you an overview. - -* [v3: bpf-next: BPF rbtree next-gen datastructure](http://lore.kernel.org/bpf/20230131180016.3368305-1-davemarchevsky@fb.com/) - - This series adds a rbtree datastructure following the "next-gen - datastructure" precedent set by recently-added linked-list [0]. This is - a reimplementation of previous rbtree RFC [1] to use kfunc + kptr - instead of adding a new map type. - -* [v2: bpf-next: bpf: Refactor release_regno searching logic](http://lore.kernel.org/bpf/20230131171038.2648165-1-davemarchevsky@fb.com/) - - Kfuncs marked KF_RELEASE indicate that they release some - previously-acquired arg. The verifier assumes that such a function will - only have one arg reg w/ ref_obj_id set, and that that arg is the one to - be released. Multiple kfunc arg regs have ref_obj_id set is considered - an invalid state. - -* [v1: dwarves: dwarves: sync with libbpf-1.1](http://lore.kernel.org/bpf/1675169241-32559-1-git-send-email-alan.maguire@oracle.com/) - - This will pull in BTF dedup improvements - - de048b6 libbpf: Resolve enum fwd as full enum64 and vice versa - f3c51fe libbpf: Btf dedup identical struct test needs check for nested structs/arrays - -* [v2: net-next: vsock: add support for sockmap](http://lore.kernel.org/bpf/20230118-support-vsock-sockmap-connectible-v2-0-58ffafde0965@bytedance.com/) - - Add support for sockmap to vsock. - - We're testing usage of vsock as a way to redirect guest-local UDS requests to - the host and this patch series greatly improves the performance of such a setup. - -* [v3: net: ixgbe: allow to increase MTU to 3K with XDP enabled](http://lore.kernel.org/bpf/20230131032357.34029-1-kerneljasonxing@gmail.com/) - - Recently I encountered one case where I cannot increase the MTU size - directly from 1500 to a much bigger value with XDP enabled if the - server is equipped with IXGBE card, which happened on thousands of - servers in production environment. After appling the current patch, - we can set the maximum MTU size to 3K. - -* [v1: bpf-next: selftests/bpf: Try to address xdp_metadata crashes](http://lore.kernel.org/bpf/20230130215137.3473320-1-sdf@google.com/) - - Commit e04ce9f4040b ("selftests/bpf: Make crashes more debuggable in - test_progs") hasn't uncovered anything interesting besides - confirming that the test passes successfully, but crashes eventually [0]. - -* [v1: bpf: add bpf_link support for BPF_NETFILTER programs](http://lore.kernel.org/bpf/20230130150432.24924-1-fw@strlen.de/) - - Doesn't apply, doesn't work -- there is no BPF_NETFILTER program type. - - nf_hook_run_bpf() (c-function that creates the program context and - calls the real bpf prog) would be "updated" to use the bpf dispatcher to - avoid the indirect call overhead. - - Does that seem ok to you? I'd ignore the bpf dispatcher for now and would work on the needed verifier changes first. - -### 周边技术动态 - -#### Qemu - -* [v10: riscv: Allow user to set the satp mode](http://lore.kernel.org/qemu-devel/20230203055812.257458-1-alexghiti@rivosinc.com/) - - This introduces new properties to allow the user to set the satp mode, - see patch 3 for full syntax. In addition, it prevents cpus to boot in a satp mode they do not support (see patch 4). - -* [v10: hw/riscv: handle kernel_entry high bits with 32bit CPUs](http://lore.kernel.org/qemu-devel/20230202135810.1657792-1-dbarboza@ventanamicro.com/) - - This new version removed the translate_fn() from patch 1 because it - wasn't removing the sign-extension for pentry as we thought it would. - A more detailed explanation is given in the commit msg of patch 1. - - We're now retrieving the 'lowaddr' value from load_elf_ram_sym() and - using it when we're running a 32-bit CPU. This worked with 32 bit 'virt' machine booting with the -kernel option. - -* [v1: Add RISC-V vector cryptography extensions](http://lore.kernel.org/qemu-devel/20230202124230.295997-1-lawrence.hunter@codethink.co.uk/) - - This patch series introduces an implementation for the six instruction sets - of the draft RISC-V vector cryptography extensions specification. - - This patch set implements the instruction sets as per the 20221202 - version of the specification (1). We plan to update to the latest spec - once stabilised. - -* [v1: Add basic ACPI support for risc-v virt](http://lore.kernel.org/qemu-devel/20230202045223.2594627-1-sunilvl@ventanamicro.com/) - - This series adds the basic ACPI support for the RISC-V virt machine. - Currently only INTC interrupt controller specification is approved by the - UEFI forum. External interrupt controller support in ACPI is in progress. - - The basic infrstructure changes are mostly leveraged from ARM. - -* [v1: target/riscv: Add RVV registers to log](http://lore.kernel.org/qemu-devel/20230201142454.109260-1-ivan.klokov@syntacore.com/) - - Added QEMU option 'rvv' to add RISC-V RVV registers to log like regular regs. - -* [v2: target/riscv: set tval for triggered watchpoints](http://lore.kernel.org/qemu-devel/20230131170955.752743-1-geomatsi@gmail.com/) - - According to priviledged spec, if [sm]tval is written with a nonzero - value when a breakpoint exception occurs, then [sm]tval will contain - the faulting virtual address. Set tval to hit address when breakpoint exception is triggered by hardware watchpoint. - -#### U-Boot - -* [v2: Migrate to split config](http://lore.kernel.org/u-boot/20230204002619.938387-1-sjg@chromium.org/) - - U-Boot uses an SPL prefix on CONFIG options to indicate when an option - relates to SPL. For example, while CONFIG_TEXT_BASE is the text base for - U-Boot proper, CONFIG_SPL_TEXT_BASE is the text base for SPL. - -* [v1: RFC: Migrate to split config](http://lore.kernel.org/u-boot/20230131152702.249197-1-sjg@chromium.org/) - - U-Boot uses an SPL prefix on CONFIG options to indicate when an option - relates to SPL. For example, while CONFIG_TEXT_BASE is the text base for - U-Boot proper, CONFIG_SPL_TEXT_BASE is the text base for SPL. - - Within the code it is possible do things like CONFIG_VAL(TEXT_BASE) to - get that value. It returns the appropriate option, depending on the phase being built. - -* [v2: riscv: cpu: ax25: Simplify cache enabling logic in harts_early_init()](http://lore.kernel.org/u-boot/20230131094034.12423-1-peterlin@andestech.com/) - - This patch improves the cache enabling operation in harts_early_init(), - also moves the CSR definition to include/asm/arch-andes/csr.h and drops - unnecessary i/d-cache disable functions from cleanup_before_linux(). - -## 20230129:第 31 期 - -### 内核动态 - -#### RISC-V 架构支持 - -* [v2: mm, arch: add generic implementation of pfn_valid() for FLATMEM](http://lore.kernel.org/linux-riscv/20230129124235.209895-1-rppt@kernel.org/) - - Every architecture that supports FLATMEM memory model defines its own - version of pfn_valid() that essentially compares a pfn to max_mapnr. - -* [v1: riscv: Add header include guards to insn.h](http://lore.kernel.org/linux-riscv/20230129094242.282620-1-liaochang1@huawei.com/) - - Add header include guards to insn.h to prevent repeating declaration of - any identifiers in insn.h. - -* [v1: riscv: support arch_has_hw_pte_young()](http://lore.kernel.org/linux-riscv/20230129064956.143664-1-tjytimi@163.com/) - - The arch_has_hw_pte_young() is false for riscv by default. If it's - false, page table walk is almost skipped for MGLRU reclaim. And it - will also cause useless step in __wp_page_copy_user(). - -* [v5: riscv: improve boot time isa extensions handling](http://lore.kernel.org/linux-riscv/20230128172856.3814-1-jszhang@kernel.org/) - - Generally, riscv ISA extensions are fixed for any specific hardware - platform, so a hart's features won't change after booting, this - chacteristic makes it straightforward to use a static branch to check - a specific ISA extension is supported or not to optimize performance. - -* [v2: RISC-V KVM virtualize AIA CSRs](http://lore.kernel.org/linux-riscv/20230128072737.2995881-1-apatel@ventanamicro.com/) - - The RISC-V AIA specification is now frozen as-per the RISC-V international - process. The latest frozen specifcation can be found at: - https://github.com/riscv/riscv-aia/releases/download/1.0-RC1/riscv-interrupts-1.0-RC1.pdf - -* [v3: KVM perf support](http://lore.kernel.org/linux-riscv/20230127182558.2416400-1-atishp@rivosinc.com/) - - This series extends perf support for KVM. The KVM implementation relies - on the SBI PMU extension and trap n emulation of hpmcounter CSRs. - The KVM implementation exposes the virtual counters to the guest and internally - manage the counters using kernel perf counters. - -* [v2: RISC-V: KVM: Redirect illegal instruction traps to guest](http://lore.kernel.org/linux-riscv/20230127112934.2749592-1-apatel@ventanamicro.com/) - - The M-mode redirects an unhandled illegal instruction trap back - to S-mode. However, KVM running in HS-mode terminates the VS-mode - software when it receives illegal instruction trap. - -* [v1: hwrng: starfive - Enable compile testing](http://lore.kernel.org/linux-riscv/Y9OveVKTkX8cRhyP@gondor.apana.org.au/) - - Enable compile testing for jh7110. Also remove the dependency on - HW_RANDOM. - -* [v3: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230127091051.1465278-1-jeeheng.sia@starfivetech.com/) - - This series adds RISC-V Hibernation/suspend to disk support. - Low level Arch functions were created to support hibernation. - swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write - cpu state onto the stack, then calling swsusp_save() to save the memory - image. - -* [v2: -next: riscv: mm: hugetlb: Enable ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP](http://lore.kernel.org/linux-riscv/20230127050421.1920048-1-guoren@kernel.org/) - - Add HVO support for RISC-V; see commit 6be24bed9da3 ("mm: hugetlb: - introduce a new config HUGETLB_PAGE_FREE_VMEMMAP"). This patch is - similar to commit 1e63ac088f20 ("arm64: mm: hugetlb: enable - HUGETLB_PAGE_FREE_VMEMMAP for arm64"), and riscv's motivation is the - same as arm64. The current riscv was ready to enable HVO after fixup, - ref commit d33deda095d3 ("riscv/mm: hugepage's PG_dcache_clean flag - is only set in head page"). - -* [v2: KVM: Add a common API for range-based TLB invalidation](http://lore.kernel.org/linux-riscv/20230126184025.2294823-1-dmatlack@google.com/) - - This series introduces a common API for performing range-based TLB - invalidation. - This series is based on patches 29-33 from (2.), but I made some further cleanups after looking at it a second time. - - Tested on x86_64 and ARM64 using KVM selftests. - -* [GIT PULL: RISC-V Devicetrees for v6.3](http://lore.kernel.org/linux-riscv/Y9LP+Za1h0fkBa58@spud/) - - DT stuff here for v6.3! I was kinda hoping to have a VisionFive 2 DT - for you, but alas no. - The changelog looks a bit odd since it's filled with un-reviewed - commits of my own, but they went as a PR to Palmer & are in riscv's - for-next too: - https://lore.kernel.org/all/167225428483.14530.3368527680488639805.b4-ty@rivosinc.com/ - They might also pop up as part of the Allwinner DT PR, if the D1 stuff - lands for v6.3, which I hope does happen! - -* [GIT PULL: RISC-V SoC drivers for v6.3](http://lore.kernel.org/linux-riscv/Y9LNIm9pkr+Owv%2Fe@spud/) - - I'm sending this one perhaps earlier than needed given there's going - to be -rc8 this time around, just in case something about the PMU - driver isn't to your liking. It'd be nice if there was a subsystem for - these power management units as I wasn't sure if the API usage was - correct. Heiko, who has experience from the rockchip driver, reviewed - it, so I am happy with that. - -* [v15: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230126172516.1580058-1-guoren@kernel.org/) - - The patches convert riscv to use the generic entry infrastructure from - kernel/entry/*. Some optimization for entry.S with new .macro and merge - ret_from_kernel_thread into ret_from_fork. - -* [v1: riscv: kprobe: Optimize kprobe with accurate atomicity](http://lore.kernel.org/linux-riscv/20230126161559.1467374-1-guoren@kernel.org/) - - The previous implementation was based on the stop_matchine mechanism, - which reduced the speed of arm/disarm_kprobe. Using minimum ebreak - instruction would get accurate atomicity. - - This patch removes the patch_text of riscv, which is based on - stop_machine. Then riscv only reserved patch_text_nosync, and developers - need to be more careful in dealing with patch_text atomicity. - -* [v2: Allwinner D1 power domain support](http://lore.kernel.org/linux-riscv/20230126063419.15971-1-samuel@sholland.org/) - - This series adds support for the power controller found in D1 and other - recent Allwinner SoCs. There is no first-party documentation, but there - are a couple of vendor drivers for different hardware revisions[1][2], - and the register definitions were easy to verify empirically. - -* [v5: riscv: Allwinner D1/D1s platform support](http://lore.kernel.org/linux-riscv/20230126045738.47903-1-samuel@sholland.org/) - - This series adds the Kconfig/defconfig plumbing and devicetrees for a - range of Allwinner D1 and D1s-based boards. Many features are already - enabled, including USB, Ethernet, and WiFi. - - This version drops all boards/nodes with missing YAML bindings, so at - least some support can get merged for v6.3. - -* [v13: -next: riscv: Add vector ISA support](http://lore.kernel.org/linux-riscv/20230125142056.18356-1-andy.chiu@sifive.com/) - - This patchset is implemented based on vector 1.0 spec to add vector support - in riscv Linux kernel. There are some assumptions for this implementations. - -* [v1: riscv: mm: Implement pmdp_collapse_flush for THP](http://lore.kernel.org/linux-riscv/20230125125512.2494577-1-mchitale@ventanamicro.com/) - - When THP is enabled, 4K pages are collapsed into a single huge - page using the generic pmdp_collapse_flush() which will further - use flush_tlb_range() to shoot-down stale TLB entries. Unfortunately, - the generic pmdp_collapse_flush() only invalidates cached leaf PTEs - using address specific SFENCEs which results in repetitive (or - unpredictable) page faults on RISC-V implementations which cache non-leaf PTEs. - -* [v3: RISC-V kasan rework](http://lore.kernel.org/linux-riscv/20230125082333.1577572-1-alexghiti@rivosinc.com/) - - As described in patch 2, our current kasan implementation is intricate, - so I tried to simplify the implementation and mimic what arm64/x86 are doing. - -* [v5: riscv: Use PUD/P4D/PGD pages for the linear mapping](http://lore.kernel.org/linux-riscv/20230125081214.1576313-1-alexghiti@rivosinc.com/) - - This patchset intends to improve tlb utilization by using hugepages for - the linear mapping. - -* [v2: dt-bindings: Introduce dual-link panels & panel-vendors](http://lore.kernel.org/linux-riscv/20230124101238.4542-1-a-bhatia1@ti.com/) - - The third patch introduces a dt-binding for generic dual-link LVDS - panels. These panels do not have any documented constraints, except for - their timing characteristics. Further, these panels have 2 pixel-sinks. - -* [v3: resend: riscv: Allow to downgrade paging mode from the command line](http://lore.kernel.org/linux-riscv/20230123105135.814154-1-alexghiti@rivosinc.com/) - - Add 2 early command line parameters that allow to downgrade satp mode - (using the same naming as x86): - - "no5lvl": use a 4-level page table (down from sv57 to sv48) - - "no4lvl": use a 3-level page table (down from sv57/sv48 to sv39) - - Note that going through the device tree to get the kernel command line - works with ACPI too since the efi stub creates a device tree anyway with - the command line. - -* [v2: RISC-V: Apply Zicboz to clear_page](http://lore.kernel.org/linux-riscv/20230122191328.1193885-1-ajones@ventanamicro.com/) - - When the Zicboz extension is available we can more rapidly zero naturally - aligned Zicboz block sized chunks of memory. As pages are always page - aligned and are larger than any Zicboz block size will be, then - clear_page() appears to be a good candidate for the extension. - -#### 进程调度 - -* [v1: net: sched: sch: Bounds check priority](http://lore.kernel.org/lkml/20230127224036.never.561-kees@kernel.org/) - - Nothing was explicitly bounds checking the priority index used to access - clpriop[]. WARN and bail out early if it's pathological. - -* [v1: sched/fair: sanitize vruntime of entity being placed](http://lore.kernel.org/lkml/20230127163230.3339408-1-rkagan@amazon.de/) - - When a scheduling entity is placed onto cfs_rq, its vruntime is pulled - to the base level (around cfs_rq->min_vruntime), so that the entity - doesn't gain extra boost when placed backwards. - - However, if the entity being placed wasn't executed for a long time, its - vruntime may get too far behind (e.g. while cfs_rq was executing a - low-weight hog), which can inverse the vruntime comparison due to s64 - overflow. This results in the entity being placed with its original - vruntime way forwards, so that it will effectively never get to the cpu. - -* [v3: sched: Store restrict_cpus_allowed_ptr() call state](http://lore.kernel.org/lkml/20230127015527.466367-1-longman@redhat.com/) - - The user_cpus_ptr field was originally added by commit b90ca8badbd1 - ("sched: Introduce task_struct::user_cpus_ptr to track requested - affinity"). It was used only by arm64 arch due to possible asymmetric - CPU setup. - - Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested - cpumask"), task_struct::user_cpus_ptr is repurposed to store user - requested cpu affinity specified in the sched_setaffinity(). - -* [v1: sched/rt: Add a comment for the existence of task_is_realtime()](http://lore.kernel.org/lkml/20230123042729.30268-1-dave@stgolabs.net/) - - ... such that users don't wonder about it when we have rt_task(). - -* [GIT PULL: sched/urgent for v6.2-rc6](http://lore.kernel.org/lkml/Y80pqpsa%2Ff2eEcYP@zn.tnic/) - - please pull a couple of urgent scheduler fixes for 6.2. - - Thx. - -#### 内存管理 - -* [v2: Some small improvements for memblock.](http://lore.kernel.org/linux-mm/20230129090034.12310-1-zhangpeng.00@bytedance.com/) - - Some small optimizations for memblock. - -* [v1: -next: memory tier: release the new_memtier in find_create_memory_tier()](http://lore.kernel.org/linux-mm/20230129040651.1329208-1-tongtiangen@huawei.com/) - - In find_create_memory_tier(), if failed to register device, then we should - release new_memtier from the tier list and put device instead of memtier. - -* [v2: mm/migrate: Continue to migrate for non-hugetlb folios](http://lore.kernel.org/linux-mm/20230129033910.1327277-1-chenwandun@huawei.com/) - - migrate_hugetlbs returns -ENOMEM when no enough hugetlb, - however there may be free non-hugetlb folios available, - so continue to migrate for non-hugetlb folios. - -* [v1: mm/migrate: Continue to migrate for small pages](http://lore.kernel.org/linux-mm/20230129025404.1262745-1-chenwandun@huawei.com/) - - migrate_hugetlbs returns -ENOMEM when no enough huge page, - however maybe there are still free small pages, so continue - to migrate for small pages. - -* [v4: kasan: infer allocation size by scanning metadata](http://lore.kernel.org/linux-mm/20230129021437.18812-1-Kuan-Ying.Lee@mediatek.com/) - - Make KASAN scan metadata to infer the requested allocation size instead of - printing cache->object_size. - - This patch fixes confusing slab-out-of-bounds reports as reported in: - - https://bugzilla.kernel.org/show_bug.cgi?id=216457 - -* [v1: mm/swapfile: add cond_resched() in get_swap_pages()](http://lore.kernel.org/linux-mm/20230128094757.1060525-1-xialonglong1@huawei.com/) - - The softlockup still occurs in get_swap_pages() under memory pressure. - 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each - zram device is 50MB with same priority as si. Use the stress-ng tool - to increase memory pressure, causing the system to oom frequently. - -* [v3: Add overflow checks for several syscalls](http://lore.kernel.org/linux-mm/20230128063229.989058-1-mawupeng1@huawei.com/) - - While testing mlock, we have a problem if the len of mlock is ULONG_MAX. - The return value of mlock is zero. But nothing will be locked since the - len in do_mlock overflows to zero due to the following code in mlock: - - len = PAGE_ALIGN(len + (offset_in_page(start))); - -* [v2: Per-VMA locks](http://lore.kernel.org/linux-mm/20230127194110.533103-1-surenb@google.com/) - - Previous version: - -* [v8: DEPT(Dependency Tracker)](http://lore.kernel.org/linux-mm/1674782358-25542-1-git-send-email-max.byungchul.park@gmail.com/) - - Nevertheless, I apologize for the lack of document. I promise to add it - before it gets needed to use DEPT's APIs by users. For now, you can use - DEPT just with CONFIG_DEPT on. - -* [v1: ipc/shm: Introduce new do_vma_munmap() to munmap](http://lore.kernel.org/linux-mm/20230126212049.980501-1-Liam.Howlett@oracle.com/) - - The shm already has the vma iterator in position for a write. - do_vmi_munmap() searches for the correct position and aligns the write, - so it is not the right function to use in this case. - -* [v1: mm: Add memcpy_from_file_folio()](http://lore.kernel.org/linux-mm/20230126201552.1681588-1-willy@infradead.org/) - - This is the equivalent of memcpy_from_page(). It differs in that it - takes the position in a file instead of offset in a folio, it accepts - the total number of bytes to be copied (instead of the number of bytes - to be copied from this folio) and it returns how many bytes were copied - from the folio, rather than making the caller calculate that and then - checking if the caller got it right. - -* [v1: Convert writepage_t to use a folio](http://lore.kernel.org/linux-mm/20230126201255.1681189-1-willy@infradead.org/) - - Against next-20230125. More folioisation. I split out the mpage - work from everything else because it completely dominated the patch, - but some implementations I just converted outright. - -* [v1: highmem: Round down the address passed to kunmap_flush_on_unmap()](http://lore.kernel.org/linux-mm/20230126200727.1680362-1-willy@infradead.org/) - - We already round down the address in kunmap_local_indexed() which is - the other implementation of __kunmap_local(). The only implementation - of kunmap_flush_on_unmap() is PA-RISC which is expecting a page-aligned - address. This may be causing PA-RISC to be flushing the wrong addresses - currently. - -* [v4: introduce vm_flags modifier functions](http://lore.kernel.org/linux-mm/20230126193752.297968-1-surenb@google.com/) - - This patchset was originally published as a part of per-VMA locking [1] and - was split after suggestion that it's viable on its own and to facilitate - the review process. It is now a preprequisite for the next version of per-VMA - lock patchset, which reuses vm_flags modifier functions to lock the VMA when - vm_flags are being updated. - -* [v8: cachestat: a new syscall for page cache state of files](http://lore.kernel.org/linux-mm/20230126175356.1582123-1-nphamcs@gmail.com/) - - There is currently no good way to query the page cache state of large - file sets and directory trees. There is mincore(), but it scales poorly: - the kernel writes out a lot of bitmap data that userspace has to - aggregate, when the user really doesn not care about per-page information - in that case. The user also needs to mmap and unmap each file as it goes - along, which can be quite slow as well. - -* [v1: mm/highmem: Align-down to page the address for kunmap_flush_on_unmap()](http://lore.kernel.org/linux-mm/20230126143346.12086-1-fmdefrancesco@gmail.com/) - - If ARCH_HAS_FLUSH_ON_KUNMAP is defined (PA-RISC case), __kunmap_local() - calls kunmap_flush_on_unmap(). The latter currently flushes the wrong - address (as confirmed by Matthew Wilcox and Helge Deller). Al Viro - proposed to call kunmap_flush_on_unmap() on an aligned-down to page - address in order to fix this issue. Consensus has been reached on this - solution. - -* [v11: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-mm/20230126141626.2809643-1-dhowells@redhat.com/) - - Here are patches to provide support for extracting pages from an iov_iter - and to use this in the extraction functions in the block layer bio code. - -* [v1: iov_iter: Use __bitwise with the extraction_flags](http://lore.kernel.org/linux-mm/2638928.1674729230@warthog.procyon.org.uk/) - - Interestingly, things like __be32 are __bitwise. I wonder if that actually - makes sense or if it was just convenient so stop people doing arithmetic on - them. I guess doing AND/OR/XOR on them isn't a problem provided both - arguments are appropriately byte-swapped. - -* [v2: mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups](http://lore.kernel.org/linux-mm/20230125225358.2576151-1-zokeefe@google.com/) - - This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE - might identify a pte-mapped hugepage, only to have khugepaged race-in, free - the pte table, and clear the pmd. - -* [v10: iov_iter: Improve page extraction (pin or just list)](http://lore.kernel.org/linux-mm/20230125210657.2335748-1-dhowells@redhat.com/) - - Here are patches to provide support for extracting pages from an iov_iter - and to use this in the extraction functions in the block layer bio code. - -* [v2: nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE](http://lore.kernel.org/linux-mm/167467815773.463042.7022545814443036382.stgit@dwillia2-xfh.jf.intel.com/) - - Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE") - - ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page) - potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this - doubles the amount of capacity stolen from user addressable capacity for - everyone, regardless of whether they are using the debug option. Revert - that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but - allow for debug scenarios to proceed with creating debug sized page maps - with a compile option to support debug scenarios. - -* [v2: mm/madvise: add vmstat statistics for madvise_[cold|pageout]](http://lore.kernel.org/linux-mm/20230125005457.4139289-1-minchan@kernel.org/) - - madvise LRU manipulation APIs need to scan address ranges to find - present pages at page table and provides advice hints for them. - - Likewise pg[scan/steal] count on vmstat, madvise_pg[scanned/hinted] - shows the proactive reclaim efficiency so this patch adds those - two statistics in vmstat. - -* [v1: Revert "mm: kmemleak: alloc gray object for reserved region with direct map"](http://lore.kernel.org/linux-mm/20230124230254.295589-1-isaacmanjarres@google.com/) - - Kmemleak operates by periodically scanning memory regions for pointers - to allocated memory blocks to determine if they are leaked or not. - However, reserved memory regions can be used for DMA transactions - between a device and a CPU, and thus, wouldn't contain pointers to - allocated memory blocks, making them inappropriate for kmemleak to - scan. Thus, revert this commit. - -* [v2: mm: kasan: reset page tags properly with sampling](http://lore.kernel.org/linux-mm/5dbd866714b4839069e2d8469ac45b60953db290.1674592780.git.andreyknvl@google.com/) - - The implementation of page_alloc poisoning sampling assumed that - tag_clear_highpage resets page tags for __GFP_ZEROTAGS allocations. - However, this is no longer the case since commit 70c248aca9e7 - ("mm: kasan: Skip unpoisoning of user pages"). - - This leads to kernel crashes when MTE-enabled userspace mappings are - used with Hardware Tag-Based KASAN enabled. - -#### 文件系统 - -* [v4: pipe: use __pipe_{lock,unlock} instead of spinlock](http://lore.kernel.org/linux-fsdevel/20230129060452.7380-1-zhanghongchen@loongson.cn/) - - Use spinlock in pipe_{read,write} cost too much time,IMO - pipe->{head,tail} can be protected by __pipe_{lock,unlock}. - On the other hand, we can use __pipe_{lock,unlock} to protect - the pipe->{head,tail} in pipe_resize_ring and - post_one_notification. - -* [v1: fscrypt: support decrypting data from large folios](http://lore.kernel.org/linux-fsdevel/20230127224202.355629-1-ebiggers@kernel.org/) - - Try to make the filesystem-level decryption functions in fs/crypto/ - aware of large folios. This includes making fscrypt_decrypt_bio() - support the case where the bio contains large folios, and making - fscrypt_decrypt_pagecache_blocks() take a folio instead of a page. - -* [v1: fsverity: support verifying data from large folios](http://lore.kernel.org/linux-fsdevel/20230127221529.299560-1-ebiggers@kernel.org/) - - Try to make fs/verity/verify.c aware of large folios. This includes - making fsverity_verify_bio() support the case where the bio contains - large folios, and adding a function fsverity_verify_folio() which is the - equivalent of fsverity_verify_page(). - -* [v1: multiblock allocator improvements](http://lore.kernel.org/linux-fsdevel/cover.1674822311.git.ojaswin@linux.ibm.com/) - - This patchset intends to improve some of the shortcomings of mb allocator - that we had noticed while running various tests and workloads in a - POWERPC machine with 64k block size. - -* [v1: Convert most of ext4 to folios](http://lore.kernel.org/linux-fsdevel/20230126202415.1682629-1-willy@infradead.org/) - - This, on top of a number of patches currently in next and a few patches - sent to the mailing lists earlier today, converts most of ext4 to use - folios instead of pages. It does not add support for large folios. - It does not convert mballoc to use folios. write_begin() and write_end() - still take a page parameter instead of a folio. - -* [v1: fs: gracefully handle ->get_block not mapping bh in __mpage_writepage](http://lore.kernel.org/linux-fsdevel/20230126085155.26395-1-jack@suse.cz/) - - When filesystem's ->get_block function does not map the buffer head when - called from __mpage_writepage(), the function will happily go and pass - bogus bdev and block number to bio allocation routines which leads to - crashes sooner or later. E.g. UDF can do this because it doesn't want to - allocate blocks from ->writepages callbacks. - -* [v1: proc: Add allowlist for procfs files](http://lore.kernel.org/linux-fsdevel/cover.1674660533.git.legion@kernel.org/) - - The patch expands subset= option. If the proc is mounted with the - subset=allowlist option, the /proc/allowlist file will appear. This file - contains the filenames and directories that are allowed for this - mountpoint. By default, /proc/allowlist contains only its own name. - -* [v2: udf: Unify aops](http://lore.kernel.org/linux-fsdevel/20230125093914.24627-1-jack@suse.cz/) - - this patch series makes UDF use the same address_space_operations for both - normal and in-ICB files as switching aops on live files is prone to races as - spotted by syzbot. When already dealing with this code, switch readpage, - writepage, in-ICB expanding functions from using kmap_atomic() to use - kmap_local_page(). - -#### 网络设备 - -* [v1: net-next: netlink: provide an ability to set default extack message](http://lore.kernel.org/netdev/d4843760219f20367c27472f084bd8aa729cf321.1674995155.git.leon@kernel.org/) - - In netdev common pattern, extack pointer is forwarded to the drivers - to be filled with error message. However, the caller can easily - overwrite the filled message. - - Instead of adding multiple "if (!extack->_msg)" checks before any - NL_SET_ERR_MSG() call, which appears after call to the driver, let's - add new macro to common code. - -* [v6: net-next: net/sched: cls_api: Support hardware miss to tc action](http://lore.kernel.org/netdev/20230129101613.17201-1-paulb@nvidia.com/) - - This series adds support for hardware miss to instruct tc to continue execution - in a specific tc action instance on a filter's action list. The mlx5 driver patch - (besides the refactors) shows its usage instead of using just chain restore. - - Currently a filter's action list must be executed all together or - not at all as driver are only able to tell tc to continue executing from a - specific tc chain, and not a specific filter/action. - -* [v1: vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit](http://lore.kernel.org/netdev/20230129091145.2837-1-liubo03@inspur.com/) - - Follow the advice of the Documentation/filesystems/sysfs.rst - and show() should only use sysfs_emit() or sysfs_emit_at() - when formatting the value to be returned to user space. - -* [v1: net-next: net/tls: tls_is_tx_ready() checked list_entry](http://lore.kernel.org/netdev/20230128-list-entry-null-check-tls-v1-1-525bbfe6f0d0@diag.uniroma1.it/) - - tls_is_tx_ready() checks that list_first_entry() does not return NULL. - This condition can never happen. For empty lists, list_first_entry() - returns the list_entry() of the head, which is a type confusion. - Use list_first_entry_or_null() which returns NULL in case of empty lists. - -* [v1: can: etas_es58x: do not send disable channel command if device is unplugged](http://lore.kernel.org/netdev/20230128133815.1796221-1-mailhol.vincent@wanadoo.fr/) - - When turning the network interface down, es58x_stop() is called and - will send a command to the ES58x device to disable the channel - c.f. es58x_ops::disable_channel(). - - However, if the device gets unplugged while the network interface is - still up, es58x_ops::disable_channel() will obviously fail to send the - URB command and the driver emits below error message: - - es58x_submit_urb: USB send urb failure: -ENODEV - - Check the usb device state before sending the disable channel command - in order to silence above error message. - -* [v1: net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC](http://lore.kernel.org/netdev/20230128094232.2451947-1-arinc.unal@arinc9.com/) - - According to my tests on MT7621AT and MT7623NI SoCs, hardware DSA untagging - won't work on the second MAC. Therefore, disable this feature when the - second MAC of the MT7621 and MT7623 SoCs is being used. - -* [v1: net-next: sh: checksum: add missing linux/uaccess.h include](http://lore.kernel.org/netdev/20230128073108.1603095-1-kuba@kernel.org/) - - SuperH does not include uaccess.h, even tho it calls access_ok(). - -* [v1: net-next: net: phy: motorcomm: change the phy id of yt8521 and yt8531s to lowercase](http://lore.kernel.org/netdev/20230128063558.5850-2-Frank.Sae@motor-comm.com/) - - The phy id is usually defined in lower case. - -* [v1: vhost/vdpa: Add MSI translation tables to iommu for software-managed MSI](http://lore.kernel.org/netdev/20230128031740.166743-1-sunnanyong@huawei.com/) - - Once enable iommu domain for one device, the MSI - translation tables have to be there for software-managed MSI. - Otherwise, platform with software-managed MSI without an - irq bypass function, can not get a correct memory write event - from pcie, will not get irqs. - The solution is to obtain the MSI phy base address from - iommu reserved region, and set it to iommu MSI cookie, - then translation tables will be created while request irq. - -* [v1: ixgbe: Panic during XDP_TX with > 64 CPUs](http://lore.kernel.org/netdev/20230128011213.150171-1-jjh@daedalian.us/) - - In commit 'ixgbe: let the xdpdrv work with more than 64 cpus' - (4fe815850bdc8d4cc94e06fe1de069424a895826), support was added to allow - XDP programs to run on systems with more than 64 CPUs by locking the - XDP TX rings and indexing them using cpu % 64 (IXGBE_MAX_XDP_QS). - - Upon trying this out patch via the Intel 5.18.6 out of tree driver - on a system with more than 64 cores, the kernel paniced with an - array-index-out-of-bounds at the return in ixgbe_determine_xdp_ring in - ixgbe.h, which means ixgbe_determine_xdp_q_idx was just returning the - cpu instead of cpu % IXGBE_MAX_XDP_QS. - -* [v1: Bluetooth: hci_conn: Refactor hci_bind_bis() since it always succeeds](http://lore.kernel.org/netdev/20230128005150.never.909-kees@kernel.org/) - - The compiler thinks "conn" might be NULL after a call to hci_bind_bis(), - which cannot happen. Avoid any confusion by just making it not return a - value since it cannot fail. Fixes the warnings seen with GCC 13: - - In function 'arch_atomic_dec_and_test', - inlined from 'atomic_dec_and_test' at ../include/linux/atomic/atomic-instrumented.h:576:9, - inlined from 'hci_conn_drop' at ../include/net/bluetooth/hci_core.h:1391:6, - inlined from 'hci_connect_bis' at ../net/bluetooth/hci_conn.c:2124:3: - ../arch/x86/include/asm/rmwcc.h:37:9: warning: array subscript 0 is outside array bounds of 'atomic_t[0]' [-Warray-bounds=] - 37 | asm volatile (fullop CC_SET(cc) \ - | ^ - - ... - In function 'hci_connect_bis': - cc1: note: source object is likely at address zero - -* [v1: net: ethernet: mtk_eth_soc: Avoid truncating allocation](http://lore.kernel.org/netdev/20230127223853.never.014-kees@kernel.org/) - - There doesn't appear to be a reason to truncate the allocation used for - flow_info, so do a full allocation and remove the unused empty struct. - GCC does not like having a reference to an object that has been - partially allocated, as bounds checking may become impossible when - such an object is passed to other code. - -* [v1: net: dsa: microchip: ptp: add one more PTP dependency](http://lore.kernel.org/netdev/20230127221323.2522421-1-arnd@kernel.org/) - - When only NET_DSA_MICROCHIP_KSZ8863_SMI is built-in but - PTP is a loadable module, the ksz_ptp support still causes - a link failure: - - ld.lld-16: error: undefined symbol: ptp_clock_index - >>> referenced by ksz_ptp.c - >>> drivers/net/dsa/microchip/ksz_ptp.o:(ksz_get_ts_info) in archive vmlinux.a - - Add the same dependency here that exists with the KSZ9477_I2C - and KSZ_SPI drivers. - -* [v2: net-next: ibmvnic: Toggle between queue types in affinity mapping](http://lore.kernel.org/netdev/20230127214358.318152-1-nnac123@linux.ibm.com/) - - Previously, ibmvnic IRQs were assigned to CPU numbers by assigning all - the IRQs for transmit queues then assigning all the IRQs for receive - queues. With multi-threaded processors, in a heavy RX or TX environment, - physical cores would either be overloaded or underutilized (due to the - IRQ assignment algorithm). This approach is sub-optimal because IRQs for - the same subprocess (RX or TX) would be bound to adjacent CPU numbers, - meaning they were more likely to be contending for the same core. - -* [v3: net-next: net/sched: transition act_pedit to rcu and percpu stats](http://lore.kernel.org/netdev/20230127192752.3643015-1-pctammela@mojatatu.com/) - - The software pedit action didn't get the same love as some of the - other actions and it's still using spinlocks and shared stats. - Transition the action to rcu and percpu stats which improves the - action's performance dramatically. - -* [v9: bpf-next: Add skb + xdp dynptrs](http://lore.kernel.org/netdev/20230127191703.3864860-1-joannelkoong@gmail.com/) - - This patchset is the 2nd in the dynptr series. The 1st can be found here [0]. - - When comparing the differences in runtime for packet parsing without dynptrs - vs. with dynptrs, there is no noticeable difference. Patch 5 contains more - details as well as examples of how to use skb and xdp dynptrs. - -* [v1: net-next: gve: Introduce a way to disable queue formats](http://lore.kernel.org/netdev/20230127190744.3721063-1-jeroendb@google.com/) - - The device is capable of simultaneously supporting multiple - queue formats. With this change the driver can deliberately pick a queue format. - -* [v5: net-next: Allow offloading of UDP NEW connections via act_ct](http://lore.kernel.org/netdev/20230127183845.597861-1-vladbu@nvidia.com/) - - Currently only bidirectional established connections can be offloaded - via act_ct. Such approach allows to hardcode a lot of assumptions into - act_ct, flow_table and flow_offload intermediate layer codes. - -* [v1: selftests: net: udpgso_bench_tx: Introduce exponential back-off retries](http://lore.kernel.org/netdev/20230127181625.286546-1-andrei.gherzan@canonical.com/) - - The tx and rx test programs are used in a couple of test scripts including - "udpgro_bench.sh". Taking this as an example, when the rx/tx programs - are invoked subsequently, there is a chance that the rx one is not ready to - accept socket connections. - -* [v3: Introduce STM32 system bus](http://lore.kernel.org/netdev/20230127164040.1047583-1-gatien.chevallier@foss.st.com/) - - Document STM32 System Bus. This bus is intended to control firewall - access for the peripherals connected to it. - - For every peripheral, the bus checks the firewall registers to see - if the peripheral is configured as non-secure. If the peripheral - is configured as secure, the node is marked populated, so the - device won't be probed. - -* [v15: net-next: vmxnet3: Add XDP support.](http://lore.kernel.org/netdev/20230127163027.60672-1-u9012063@gmail.com/) - - The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT. - - Background: - The vmxnet3 rx consists of three rings: ring0, ring1, and dataring. - For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma - mapped to the ring's descriptor. If LRO is enabled and packet size larger - than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of - the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. - -* [[PATCH bpf-next RFC V1] selftests/bpf: xdp_hw_metadata clear metadata when -EOPNOTSUPP](http://lore.kernel.org/netdev/167482734243.892262.18210955230092032606.stgit@firesoul/) - - The AF_XDP userspace part of xdp_hw_metadata see non-zero as a signal of - the availability of rx_timestamp and rx_hash in data_meta area. The - kernel-side BPF-prog code doesn't initialize these members when kernel - returns an error e.g. -EOPNOTSUPP. This memory area is not guaranteed to - be zeroed, and can contain garbage/previous values, which will be read - and interpreted by AF_XDP userspace side. - -* [v1: net-next: Adding Sparx5 ES2 VCAP support](http://lore.kernel.org/netdev/20230127130830.1481526-1-steen.hegelund@microchip.com/) - - This provides the Egress Stage 2 (ES2) VCAP (Versatile Content-Aware - Processor) support for the Sparx5 platform. - - The ES2 VCAP is an Egress Access Control VCAP that uses frame keyfields and - previously classified keyfields to apply e.g. policing, trapping or - mirroring to frames. - -* [v2: net: ixgbe: allow to increase MTU to some extent with XDP enabled](http://lore.kernel.org/netdev/20230127122018.2839-1-kerneljasonxing@gmail.com/) - - I encountered one case where I cannot increase the MTU size directly - from 1500 to 2000 with XDP enabled if the server is equipped with - IXGBE card, which happened on thousands of servers in production environment. - -* [v1: pull request for net-next: batman-adv 2023-01-27](http://lore.kernel.org/netdev/20230127102133.700173-1-sw@simonwunderlich.de/) - - The following changes since commit 88603b6dc419445847923fcb7fe5080067a30f98: - - Linux 6.2-rc2 (2023-01-01 13:53:16 -0800) - - are available in the Git repository at: - - git://git.open-mesh.org/linux-merge.git tags/batadv-next-pullrequest-20230127 - - for you to fetch changes up to 0c4061c0d0e2c381ffe4d8b7c62ea69ad8132071: - - batman-adv: tvlv: prepare for tvlv enabled multicast packet type (2023-01-21 19:01:59 +0100) - -* [v1: page_pool: add a comment explaining the fragment counter usage](http://lore.kernel.org/netdev/20230127101627.891614-1-ilias.apalodimas@linaro.org/) - - When reading the page_pool code the first impression is that keeping - two separate counters, one being the page refcnt and the other being - fragment pp_frag_count, is counter-intuitive. - - However without that fragment counter we don't know when to reliably - destroy or sync the outstanding DMA mappings. So let's add a comment - explaining this part. - -* [v1: net-next: net: netlink: recommend policy range validation](http://lore.kernel.org/netdev/20230127084506.09f280619d64.I5dece85f06efa8ab0f474ca77df9e26d3553d4ab@changeid/) - - For large ranges (outside of s16) the documentation currently - recommends open-coding the validation, but it's better to use - the NLA_POLICY_FULL_RANGE() or NLA_POLICY_FULL_RANGE_SIGNED() - policy validation instead; recommend that. - -* [v1: net-next: net: bcmgenet: Add a check for oversized packets](http://lore.kernel.org/netdev/20230127000819.3934-1-f.fainelli@gmail.com/) - - Occasionnaly we may get oversized packets from the hardware which - exceed the nomimal 2KiB buffer size we allocate SKBs with. Add an early - check which drops the packet to avoid invoking skb_over_panic() and move - on to processing the next packet. - -#### 安全增强 - -* [v1: scsi: aacraid: Allocate cmd_priv with scsicmd](http://lore.kernel.org/linux-hardening/20230128000409.never.976-kees@kernel.org/) - - The aac_priv() helper assumes that the private cmd area immediately - follows struct scsi_cmnd. Allocate this space as part of scsicmd, - else there is a risk of heap overflow. - -* [v1: regulator: max77802: Bounds check regulator id against opmode](http://lore.kernel.org/linux-hardening/20230127225203.never.864-kees@kernel.org/) - - Explicitly bounds-check the id before accessing the opmode array. Seen - with GCC 13: - - ../drivers/regulator/max77802-regulator.c: In function 'max77802_enable': - ../drivers/regulator/max77802-regulator.c:217:29: warning: array subscript [0, 41] is outside array bounds of 'unsigned int[42]' [-Warray-bounds=] - 217 | if (max77802->opmode[id] == MAX77802_OFF_PWRREQ) - | - - ^ - - ../drivers/regulator/max77802-regulator.c:62:22: note: while referencing 'opmode' - 62 | unsigned int opmode[MAX77802_REG_MAX]; - | ^ - -* [v1: ASoC: kirkwood: Iterate over array indexes instead of using pointer math](http://lore.kernel.org/linux-hardening/20230127224128.never.410-kees@kernel.org/) - - Walking the dram->cs array was seen as accesses beyond the first array - item by the compiler. Instead, use the array index directly. This allows - for run-time bounds checking under CONFIG_UBSAN_BOUNDS as well. - -* [v1: scripts/dtc: Replace 0-length arrays with flexible arrays](http://lore.kernel.org/linux-hardening/20230127224101.never.746-kees@kernel.org/) - - Replace the 0-length array with a C99 flexible array. Seen with GCC 13 - under -fstrict-flex-arrays: - - In file included from ../lib/fdt_ro.c:2: - ../lib/../scripts/dtc/libfdt/fdt_ro.c: In function 'fdt_get_name': - ../lib/../scripts/dtc/libfdt/fdt_ro.c:319:24: warning: 'strrchr' reading 1 or more bytes from a region of size 0 [-Wstringop-overread] - 319 | leaf = strrchr(nameptr, '/'); - | ^ - -* [v1: coda: Avoid partial allocation of sig_inputArgs](http://lore.kernel.org/linux-hardening/20230127223921.never.882-kees@kernel.org/) - - GCC does not like having a partially allocation object, since it cannot - reason about it for bounds checking when it is passed to other code. - Instead, fully allocate sig_inputArgs. - -* [v1: iommufd: Add top-level bounds check on kernel buffer size](http://lore.kernel.org/linux-hardening/20230127223816.never.413-kees@kernel.org/) - - While the op->size assignments are already bounds-checked at static - initializer time, these limits aren't aggregated and tracked when doing - later variable range checking under -Warray-bounds. Help the compiler - see that we know what we're talking about, and we'll never ask to - write more that sizeof(ucmd.cmd) bytes during the memset() inside - copy_struct_from_user(). - -* [v1: lm85: Bounds check to_sensor_dev_attr()->index usage](http://lore.kernel.org/linux-hardening/20230127223744.never.113-kees@kernel.org/) - - The index into various register arrays was not bounds checked. Add checking. - -* [v2: ACPICA: Replace fake flexible arrays with flexible array members](http://lore.kernel.org/linux-hardening/20230127191621.gonna.262-kees@kernel.org/) - - One-element arrays (and multi-element arrays being treated as - dynamically sized) are deprecated[1] and are being replaced with - flexible array members in support of the ongoing efforts to tighten the - FORTIFY_SOURCE routines on memcpy(), correctly instrument array indexing - with UBSAN_BOUNDS, and to globally enable -fstrict-flex-arrays=3. - -* [v1: powerpc/rtas: Replace one-element arrays with flexible arrays](http://lore.kernel.org/linux-hardening/20230127085023.271674-1-ajd@linux.ibm.com/) - - Using a one-element array as a fake flexible array is deprecated. - - Replace the one-element flexible arrays in rtas-types.h with C99 standard - flexible array members instead. - - This helps us move towards enabling -fstrict-flex-arrays=3 in future. - -* [v1: x86: enable Data Operand Independent Timing Mode](http://lore.kernel.org/linux-hardening/20230125012801.362496-1-ebiggers@kernel.org/) - - According to documentation that Intel published recently [1], Intel CPUs - based on the Ice Lake and later microarchitectures don't guarantee "data - operand independent timing" by default. I.e., instruction execution - times may depend on the values of data operated on. - -* [v1: 5.10: Backport oops_limit to 5.10](http://lore.kernel.org/linux-hardening/20230124193004.206841-1-ebiggers@kernel.org/) - - This series backports the patchset - "exit: Put an upper limit on how often we can oops" - (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#u) - to 5.10, as recommended at - https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html - -* [v1: 5.15: Backport oops_limit to 5.15](http://lore.kernel.org/linux-hardening/20230124185110.143857-1-ebiggers@kernel.org/) - - This series backports the patchset - "exit: Put an upper limit on how often we can oops" - (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#u) - to 5.15, as recommended at - https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html - -#### 异步 IO - -* [v1: liburing: liburing: patches for drain bug](http://lore.kernel.org/io-uring/20230127111133.2551653-1-dylany@meta.com/) - - Two patches for the drain bug I just sent a patch for. Patch 1 definitely - fails, but patch 2 I am sending just in case as it exercises some more code paths. - -* [v1: io_uring: always prep_async for drain requests](http://lore.kernel.org/io-uring/20230127105911.2420061-1-dylany@meta.com/) - - Drain requests all go through io_drain_req, which has a quick exit in case - there is nothing pending (ie the drain is not useful). In that case it can - run the issue the request immediately. - - However for safety it queues it through task work. - The problem is that in this case the request is run asynchronously, but the async work has not been prepared through io_req_prep_async. - -* [v1: io_uring: handle TIF_NOTIFY_RESUME when checking for task_work](http://lore.kernel.org/io-uring/be6fa09b-8a27-412b-52af-1cd3bc896ad4@kernel.dk/) - - If TIF_NOTIFY_RESUME is set, then we need to call resume_user_mode_work() - for PF_IO_WORKER threads. They never return to usermode, hence never get - a chance to process any items that are marked by this flag. Most notably - this includes the final put of files, but also any throttling markers set by block cgroups. - -* [v1: io_uring: initialize count variable to 0](http://lore.kernel.org/io-uring/20230124125805.630359-1-trix@redhat.com/) - - The clang build fails with - io_uring/io_uring.c:1240:3: error: variable 'count' is uninitialized - when used here [-Werror,-Wuninitialized] - count += handle_tw_list(node, &ctx, &uring_locked, &fake); - ^ - - The commit listed in the fixes: removed the initialization of count. - -* [v1: liburing: deferred tw msg_ring tests](http://lore.kernel.org/io-uring/cover.1674523156.git.asml.silence@gmail.com/) - - Add a regression test for a recent null deref regression with - disabled deferred ring and cover a couple more deferred tw cases. - -* [v1: for-next: normal tw optimisation + refactoring](http://lore.kernel.org/io-uring/cover.1674484266.git.asml.silence@gmail.com/) - - 1-5 are random refactoring patches - 6 is a prep patch, which also helps to inline handle_tw_list - 7 returns a link tw run optimisation for normal tw - -* [v2: io_uring/net: cache provided buffer group value for multishot receives](http://lore.kernel.org/io-uring/f1a1ba93-1adf-63fa-6f0f-f3182f165841@kernel.dk/) - - If we're using ring provided buffers with multishot receive, and we end - up doing an io-wq based issue at some points that also needs to select - a buffer, we'll lose the initially assigned buffer group as - io_ring_buffer_select() correctly clears the buffer group list as the - issue isn't serialized by the ctx uring_lock. This is fine for normal - receives as the request puts the buffer and finishes, but for multishot, - we will re-arm and do further receives. - -#### Rust For Linux - -* [v2: rust: MAINTAINERS: Add the zulip link](http://lore.kernel.org/rust-for-linux/20230128072258.3384037-1-boqun.feng@gmail.com/) - - Zulip organization "rust-for-linux" has been created since about 2 years - ago[1], and proven to be a great place for Rust related discussion, - therefore add the information in MAINTAINERS file so that newcomers have - more options to find guide and help. - - [1]: https://lore.kernel.org/rust-for-linux/CANiq72=xVaMQkgCA9rspjV8bhWDGqAn4x78B0_4U1WBJYj1PiA@mail.gmail.com/ - -* [v1: Rust enablement for AArch64](http://lore.kernel.org/rust-for-linux/20230125163739.3798252-1-Jamie.Cunliffe@arm.com/) - - The first patch is from Miguel's tree to enable Rust support for - AArch64. This has been tested with the Rust samples, and the generated code has also been manually inspected. - -* [v1: x86/insn_decoder_test: allow longer symbol-names](http://lore.kernel.org/rust-for-linux/320c4dba-9919-404b-8a26-a8af16be1845@app.fastmail.com/) - - Increase the allowed line-length of the insn-decoder-test to 4k to allow - for symbol-names longer than 256 characters. - - The insn-decoder-test takes objdump output as input, which may contain - symbol-names as instruction arguments. - -#### BPF - -* [v1: bpf-next: bpf: Build-time assert that cpumask offset is zero](http://lore.kernel.org/bpf/20230128141537.100777-1-void@manifault.com/) - - The first element of a struct bpf_cpumask is a cpumask_t. This is done - to allow struct bpf_cpumask to be cast to a struct cpumask. If this - element were ever moved to another field, any BPF program passing a - struct bpf_cpumask * to a kfunc expecting a const struct cpumask * would - immediately fail to load. Add a build-time assertion so this is - assumption is captured and verified. - -* [v4: bpf-next: xdp: introduce xdp-feature support](http://lore.kernel.org/bpf/cover.1674913191.git.lorenzo@kernel.org/) - - Introduce the capability to export the XDP features supported by the NIC. - Introduce a XDP compliance test tool (xdp_features) to check the features - exported by the NIC match the real features supported by the driver. - Allow XDP_REDIRECT of non-linear XDP frames into a devmap. - Export XDP features for each XDP capable driver. - Extend libbpf netlink implementation in order to support netlink_generic protocol. - -* [v1: bpf-next: selftest/bpf: Make crashes more debuggable in test_progs](http://lore.kernel.org/bpf/20230127215705.1254316-1-sdf@google.com/) - - Reset stdio before printing verbose log of the SIGSEGV'ed test. - Otherwise, it's hard to understand what's going on in the cases like [0]. - - 0: https://github.com/kernel-patches/bpf/actions/runs/4019879316/jobs/6907358876 - -* [v1: bpf-next: Add support for tracing programs in BPF_PROG_RUN](http://lore.kernel.org/bpf/20230127214353.628551-1-grantseltzer@gmail.com/) - - This patch changes the behavior of how BPF_PROG_RUN treats tracing - (fentry/fexit) programs. Previously only a return value is injected - but the actual program was not run. New behavior mirrors that of - running raw tracepoint BPF programs which actually runs the - instructions of the program via `bpf_prog_run()` - -* [v1: bpf-next: New benchmark for hashmap lookups](http://lore.kernel.org/bpf/20230127181457.21389-1-aspsk@isovalent.com/) - - Add a new benchmark for hashmap lookups and fix several typos. See individual - commits for descriptions. - - One thing to mention here is that in commit 3 I've patched bench so that now - command line options can be reused by different benchmarks. - -* [v2: bpf-next: selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata](http://lore.kernel.org/bpf/20230126225030.510629-1-sdf@google.com/) - - The existing timestamping_enable() is a no-op because it applies - to the socket-related path that we are not verifying here anymore. - -* [v1: perf lock contention: Add -S/--callstack-filter option](http://lore.kernel.org/bpf/20230126000936.3017683-1-namhyung@kernel.org/) - - The -S/--callstack-filter is to limit display entries having the given - string in the callstack (not only in the caller in the output). - - The following example shows lock contention results if the callstack - has 'net' substring somewhere. - -* [v3: bpf-next: Enable bpf_setsockopt() on ktls enabled sockets.](http://lore.kernel.org/bpf/20230125201608.908230-1-kuifeng@meta.com/) - - This patchset implements a change to bpf_setsockopt() which allows - ktls enabled sockets to be used with the SOL_TCP level. This is - necessary as when ktls is enabled, it changes the function pointer of - setsockopt of the socket, which bpf_setsockopt() checks in order to - make sure that the socket is a TCP socket. Checking sk_protocol - instead of the function pointer will ensure that bpf_setsockopt() with - the SOL_TCP level still works on sockets with ktls enabled. - -* [v4: bpf-next: Enable struct_ops programs to be sleepable](http://lore.kernel.org/bpf/20230125164735.785732-1-void@manifault.com/) - - This is part 4 of https://lore.kernel.org/bpf/20230123232228.646563-1-void@manifault.com/ - - Part 3: https://lore.kernel.org/all/20230125050359.339273-1-void@manifault.com/ - Part 2: https://lore.kernel.org/all/20230124160802.1122124-1-void@manifault.com/ - -* [v2: net: xdp: execute xdp_do_flush() before napi_complete_done()](http://lore.kernel.org/bpf/20230125074901.2737-1-magnus.karlsson@gmail.com/) - - Make sure that xdp_do_flush() is always executed before - napi_complete_done(). This is important for two reasons. First, a - redirect to an XSKMAP assumes that a call to xdp_do_redirect() from - napi context X on CPU Y will be followed by a xdp_do_flush() from the - same napi context and CPU. This is not guaranteed if the - napi_complete_done() is executed before xdp_do_flush(), as it tells - the napi logic that it is fine to schedule napi context X on another CPU. - -* [v1: bpf-next: bpftool: disable bpfilter kernel config checks](http://lore.kernel.org/bpf/20230125025516.5603-1-chethan.suresh@sony.com/) - - We've experienced similar issues about bpfilter like below: - https://github.com/moby/moby/issues/43755 - https://lore.kernel.org/bpf/CAADnVQJ5MxGkq=ng214aYoH-NmZ1gjoS=ZTY1eU-Fag4RwZjdg@mail.gmail.com/ - - Considering the current development status of bpfilter, - disable bpfilter kernel config checks in bpftool feature. - -* [v1: tracing: Have bpf and perf reuse the tracefs TRACE_EVENT macros](http://lore.kernel.org/bpf/20230124202238.563854686@goodmis.org/) - - When reviewing Linyu Yuan patches[1] where the change was to move most - the macros from perf and bpf into stages, I realized that the macros - that makes up perf and bpf events are duplicated from the tracefs - macros that were moved into the stages directory. One reason to move - them into that directory was to remove duplicate code. - -### 周边技术动态 - -#### Qemu - -* [v8: riscv: Allow user to set the satp mode](http://lore.kernel.org/qemu-devel/20230125162010.1615787-1-alexghiti@rivosinc.com/) - - This introduces new properties to allow the user to set the satp mode, - see patch 3 for full syntax. In addition, it prevents cpus to boot in a - satp mode they do not support (see patch 4). - -* [v1: hw/riscv: boot: Don't use CSRs if they are disabled](http://lore.kernel.org/qemu-devel/20230123035754.75553-1-alistair.francis@opensource.wdc.com/) - - If the CSRs and CSR instructions are disabled because the Zicsr - extension isn't enabled then we want to make sure we don't run any CSR - instructions in the boot ROM. - - This patches removes the CSR instructions from the reset-vec if the - extension isn't enabled. We replace the instruction with a NOP instead. - -#### U-Boot - -* [v2: dm: Move to new driver model schema for device tree tags](http://lore.kernel.org/u-boot/20230129012652.83432-1-sjg@chromium.org/) - - Now that a new schema has been accepted upstream, press it into service in - U-Boot. - -* [v1: elf: add Elf64_Sym](http://lore.kernel.org/u-boot/20230122190453.45033-1-kalle.wachsmuth@gmail.com/) - - Required as Elf_Sym in tools/prelink-riscv.inc. I assume people have - been using an OS-supplied elf.h, but macOS doesn't have that. - - Taken from - https://github.com/torvalds/linux/blob/v6.1/include/uapi/linux/elf.h - -* [v2: net: sun8i-emac: Allwinner D1 Support](http://lore.kernel.org/u-boot/20230122225107.62464-1-samuel@sholland.org/) - - D1 is a RISC-V SoC containing an EMAC compatible with the A64 EMAC. In a - very roundabout way, this series finishes adding support for the D1 EMAC: - patch 4 resolves a compiler warning when building the driver for RISC-V. - The rest of the series is just cleanup requested by Jagan. - -## 20230122:第 30 期 - -### 内核动态 - -#### RISC-V 架构支持 - -* [v2: Upstream kvx Linux port](http://lore.kernel.org/linux-riscv/20230120141002.2442-1-ysionneau@kalray.eu/) - - This patch series adds support for the kv3-1 CPU architecture of the kvx family - found in the Coolidge (aka MPPA3-80) SoC of Kalray. - - This is an RFC, since kvx support is not yet upstreamed into gcc/binutils, - therefore this patch series cannot be merged into Linux for now. - -* [v1: Add new partial clock and reset drivers for StarFive JH7110](http://lore.kernel.org/linux-riscv/20230120024445.244345-1-xingyu.wu@starfivetech.com/) - - This patch serises are to add new partial clock drivers and reset - supports about System-Top-Group(STG), Image-Signal-Process(ISP) - and Video-Output(VOUT) for the StarFive JH7110 RISC-V SoC. - -* [v4: riscv: elf: add .riscv.attributes parsing](http://lore.kernel.org/linux-riscv/20230119221833.3629409-1-vineetg@rivosinc.com/) - - This implements the elf loader hook to parse RV specific - .riscv.attributes section. This section is inserted by compilers - (gcc/llvm) with build related information such as -march organized as - tag/value attribute pairs. - -* [v2: spi: Add support for stacked/parallel memories](http://lore.kernel.org/linux-riscv/20230119185342.2093323-1-amit.kumar-mahapatra@amd.com/) - - This patch is in the continuation to the discussions which happened on - 'commit f89504300e94 ("spi: Stacked/parallel memories bindings")' for - adding dt-binding support for stacked/parallel memories. - -* [v1: riscv: uapi: Lie about having futex()](http://lore.kernel.org/linux-riscv/20230119193924.21186-1-palmer@rivosinc.com/) - - Without this libstdc++ correctly detects the lack of a futex() syscall - on rv32 and uses a fallback that doesn't work because it depends on - 64-bit atomics. - -* [v1: KVM: Add a common API for range-based TLB invalidation](http://lore.kernel.org/linux-riscv/20230119173559.2517103-1-dmatlack@google.com/) - - This series introduces a common API for performing range-based TLB - invalidation. This is then used to supplant - kvm_arch_flush_remote_tlbs_memslot() and pave the way for two other - patch series. - -* [v4: Generic IPI sending tracepoint](http://lore.kernel.org/linux-riscv/20230119143619.2733236-1-vschneid@redhat.com/) - - Detecting IPI *reception* is relatively easy, e.g. using - trace_irq_handler_{entry,exit} or even just function-trace - flush_smp_call_function_queue() for SMP calls. - - Figuring out their *origin*, is trickier as there is no generic tracepoint tied - to e.g. smp_call_function(): - - o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them (cf. trace_call_function{_single}_entry()). - -* [v4: JH7110 PMU Support](http://lore.kernel.org/linux-riscv/20230119094447.21939-1-walker.chen@starfivetech.com/) - - This patchset adds PMU (Power Management Unit) controller driver for the - StarFive JH7110 SoC. In order to meet low power requirements, PMU is - designed for including multiple PM domains that can be used for power - gating of selected IP blocks for power saving by reduced leakage current. - -* [v3: riscv: Dump faulting instructions in oops handler](http://lore.kernel.org/linux-riscv/20230119074738.708301-1-bjorn@kernel.org/) - - RISC-V does not dump faulting instructions in the oops handler. This - series adds "Code:" dumps to the oops output together with - scripts/decodecode support. - -* [v1: Add RISC-V 32 NOMMU support](http://lore.kernel.org/linux-riscv/20230119052642.1112171-1-Mr.Bossman075@gmail.com/) - - This patch-set aims to add NOMMU support to RV32. - Many people want to build simple emulators or HDL - models of RISC-V this patch makes it posible to run linux on them. - -* [v2: PATCH: riscv: Introduce system suspend support](http://lore.kernel.org/linux-riscv/20230118180338.6484-1-ajones@ventanamicro.com/) - - Booting with an OpenSBI including the RFC series[1] implementing the - draft proposal for SBI system suspend[2] we can add system support to - Linux. This support implements "suspend-to-RAM", which means when a - kernel is built with CONFIG_SUSPEND 'echo mem > /sys/power/state' will - initiate a suspension. - -* [v5: Introduce __xchg, non-atomic xchg](http://lore.kernel.org/linux-riscv/20230118153529.57695-1-andrzej.hajda@intel.com/) - - There is lot of places it can be used in, I have just chosen - some of them. I can provide cocci script to detect others (not all), if necessary. - -* [v4: Add Ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230118061701.30047-1-yanhong.wang@starfivetech.com/) - - This series adds ethernet support for the StarFive JH7110 RISC-V SoC. The series - includes MAC driver. The MAC version is dwmac-5.20 (from Synopsys DesignWare). - For more information and support, you can visit RVspace wiki[1]. - -* [v5: hwrng: starfive: Add driver for TRNG module](http://lore.kernel.org/linux-riscv/20230117015445.32500-1-jiajie.ho@starfivetech.com/) - - This patch series adds kernel support for StarFive JH7110 hardware - random number generator. First 2 patches add binding docs and device - driver for this module. Patch 3 adds devicetree entry for VisionFive 2 SoC. - -* [v1: riscv: alternative: proceed one more instruction for auipc/jalr pair](http://lore.kernel.org/linux-riscv/20230115162811.3146-1-jszhang@kernel.org/) - - If we patched auipc + jalr pair, we'd better proceed one more - instruction. Andrew pointed out "There's not a problem now, since - we're only adding a fixup for jal, not jalr, but we should - future-proof this and there's no reason to revisit an already fixed-up - instruction anyway." - -* [v4: riscv: improve boot time isa extensions handling](http://lore.kernel.org/linux-riscv/20230115154953.831-1-jszhang@kernel.org/) - - Generally, riscv ISA extensions are fixed for any specific hardware - platform, so a hart's features won't change after booting, this - chacteristic makes it straightforward to use a static branch to check - a specific ISA extension is supported or not to optimize performance. - -#### 进程调度 - -* [v1: RESEND: sched: cpumask: improve on cpumask_local_spread() locality](http://lore.kernel.org/lkml/20230121042436.2661843-1-yury.norov@gmail.com/) - - This has significant performance implications on NUMA machines, for example - when using NUMA-aware allocated memory together with NUMA-aware IRQ affinity hints. - -* [v2: sched: Store restrict_cpus_allowed_ptr() call state](http://lore.kernel.org/lkml/20230121021749.55313-1-longman@redhat.com/) - - The user_cpus_ptr field was originally added by commit b90ca8badbd1 - ("sched: Introduce task_struct::user_cpus_ptr to track requested - affinity"). It was used only by arm64 arch due to possible asymmetric CPU setup. - -* [v2: sched: cpuset: Don't rebuild sched domains on suspend-resume](http://lore.kernel.org/lkml/20230120194822.962958-1-qyousef@layalina.io/) - - Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information") - enabled rebuilding sched domain on cpuset and hotplug operations to - correct deadline accounting. - -* [v2: sched/debug: Put sched/domains files under the verbose flag](http://lore.kernel.org/lkml/20230120163330.1334128-1-pauld@redhat.com/) - - The debug files under sched/domains can take a long time to regenerate, - especially when updates are done one at a time. Move these files under - the sched verbose debug flag. Allow changes to verbose to trigger - generation of the files. - -* [v4: sched/fair: unlink misfit task from cpu overutilized](http://lore.kernel.org/lkml/20230119174244.2059628-1-vincent.guittot@linaro.org/) - - By taking into account uclamp_min, the 1:1 relation between task misfit - and cpu overutilized is no more true as a task with a small util_avg may - not fit a high capacity cpu because of uclamp_min constraint. - -* [v2: sched: print parent comm in sched_show_task()](http://lore.kernel.org/lkml/20230119110642.GA6463@didi-ThinkCentre-M930t-N000/) - - Knowing who the parent is might be useful for debugging. - For example, we can sometimes resolve kernel hung tasks by stopping - the person who begins those hung tasks. - With the parent's name printed in sched_show_task(), - it might be helpful to let people know which "service" should be operated. - -* [v1: sched: Pass flags to cpufreq governor for RT tasks](http://lore.kernel.org/lkml/CAKns5cVijC_o13H7UM7WS2ckexP2y1aYJviqNcKeCE-y_2mcXQ@mail.gmail.com/) - - Right now only CFS tasks could pass flags to the cpufreq governor - but not RT tasks. This limits the ability of cpufreq governor to handle - RT tasks if it needs to. By passing flags of RT tasks will increase - the flexibility of the cpufreq governor. - -* [v1: sched/numa: Enhance vma scanning](http://lore.kernel.org/lkml/20230116022508.9ll5S8Dns4XZ2BB0GB_N7d_2xSQRlFyoWuvqInl536w@z/) - - The patchset proposes one of the enhancements to numa vma scanning - suggested by Mel. - - Existing mechanism of scan period involves, scan period derived from - per-thread stats. Process Adaptive autoNUMA [1] proposed to gather NUMA - fault stats at per-process level to capture aplication behaviour better. - - During that course of discussion, Mel proposed several ideas to enhance - current numa balancing. - -#### 内存管理 - -* [v1: mm-unstable: lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default](http://lore.kernel.org/linux-mm/20230121033942.350387-1-42.hyeyoo@gmail.com/) - - In workloads where this_cpu operations are frequently performed, - enabling DEBUG_PREEMPT may result in significant increase in - runtime overhead due to frequent invocation of - __this_cpu_preempt_check() function. - -* [v1: Convert a couple migrate functions to use folios](http://lore.kernel.org/linux-mm/20230121005622.57808-1-vishal.moola@gmail.com/) - - This patch set introduces folio_movable_ops() and converts 3 functions - in mm/migrate.c to use folios. - -* [v2: drivers/base/memory: Use array to show memory block state](http://lore.kernel.org/linux-mm/20230120233814.368803-1-gshan@redhat.com/) - - Use an array to show memory block state from '/sys/devices/system/ - memory/memoryX/state', to simplify the code. Besides, WARN_ON() - is removed since the warning can be caught by the return value, - which is "ERROR-UNKNOWN-%ld\n". A system reboot caused by WARN_ON() is definitely unexpected as Greg mentioned. - -* [[RFC RESEND PATCH 0/2] Add support for sharing page tables across processes (Previously mshare)](http://lore.kernel.org/linux-mm/20230120160816.AydRnPHkAimCUUAJa06mi8Hyi_bW0rS-DsXXwfFNLyo@z/) - - Memory pages shared between processes require a page table entry - (PTE) for each process. Each of these PTE consumes consume some of - the memory and as long as number of mappings being maintained is - small enough, this space consumed by page tables is not - objectionable. - -* [v1: memcpy_from_folio()](http://lore.kernel.org/linux-mm/Y8qr8c3+SJLGWhUo@casper.infradead.org/) - - I think this is probably the best option. We could have a loop that - kmaps each page in the folio, but that seems like excessive complexity. - I'm happy to have highmem systems be less efficient, since they are - anyway. Another potential area of concern is that folios can be quite - large and maybe having preemption disabled while we copy 2MB of data - might be a bad thing. - -* [v1: ASoC: SOF: sof-audio: prepare_widgets: Check swidget for NULL on sink failure](http://lore.kernel.org/linux-mm/20230120102125.30653-1-peter.ujfalusi@linux.intel.com/) - - If the swidget is NULL we skip the preparing of the widget and jump to - handle the sink path of the widget. - If the prepare fails in this case we would undo the prepare but the swidget - is NULL (we skipped the prepare for the widget). - - To avoid NULL pointer dereference in this case we must check swidget - against NULL pointer once again. - -* [v2: Introduce per NUMA node memory error statistics](http://lore.kernel.org/linux-mm/20230120034622.2698268-1-jiaqiyan@google.com/) - - In the RFC for Kernel Support of Memory Error Detection [1], one advantage - of software-based scanning over hardware patrol scrubber is the ability - to make statistics visible to system administrators. - -* [v1: linux-next: mm/hugetlb: replace get_hwpoison_huge_page() with get_hwpoison_hugetlb_folio() when !CONFIG_HUGETLBFS](http://lore.kernel.org/linux-mm/202301201036092738081@zte.com.cn/) - - When CONFIG_HUGETLBFS is not set, there are two problems. One - is implicit declaration of function get_hwpoison_hugetlb_folio(), - the other is get_hwpoison_huge_page() is defined but not used. - Fix them all by defining get_hwpoison_hugetlb_folio() instead of - get_hwpoison_huge_page() when !CONFIG_HUGETLB_PAGE. - -* [v5: Shadow stacks for userspace](http://lore.kernel.org/linux-mm/20230119212317.8324-1-rick.p.edgecombe@intel.com/) - - This series implements Shadow Stacks for userspace using x86's Control-flow - Enforcement Technology (CET). CET consists of two related security features: - shadow stacks and indirect branch tracking. This series implements just the - shadow stack part of this feature, and just for userspace. - -* [v1: convert hugetlb fault functions to folios](http://lore.kernel.org/linux-mm/20230119211446.54165-1-sidhartha.kumar@oracle.com/) - - This series converts the hugetlb page faulting functions to operate on - folios. These include hugetlb_no_page(), hugetlb_wp(), - copy_hugetlb_page_range(), and hugetlb_mcopy_atomic_pte(). - -* [v2: mm: In-kernel support for memory-deny-write-execute (MDWE)](http://lore.kernel.org/linux-mm/20230119160344.54358-1-joey.gouly@arm.com/) - - This is v2 of the MDWE patchset. - -* [v1: iov_iter: Add a function to extract a page list from an iterator](http://lore.kernel.org/linux-mm/20230119152926.2899954-1-dhowells@redhat.com/) - - Add a function, iov_iter_extract_pages(), to extract a list of pages from - an iterator. The pages may be returned with a reference added or a pin - added or neither, depending on the type of iterator and the direction of - transfer. The caller must pass FOLL_READ_FROM_MEM or FOLL_WRITE_TO_MEM - as part of gup_flags to indicate how the iterator contents are to be used. - -* [v3: mm/hugetlb: convert get_hwpoison_huge_page() to folios](http://lore.kernel.org/linux-mm/20230119011057.91349-1-sidhartha.kumar@oracle.com/) - - Straightforward conversion of get_hwpoison_huge_page() to - get_hwpoison_hugetlb_folio(). Reduces two references to a head page in - memory-failure.c - -#### 文件系统 - -* [v7: iov_iter: Improve page extraction (ref, pin or just list)](http://lore.kernel.org/linux-fsdevel/20230120175556.3556978-1-dhowells@redhat.com/) - - Here are patches to provide support for extracting pages from an iov_iter - and a patch to use the primary extraction function in the block layer bio code. - -* [v3: Composefs: an opportunistically sharing verified image filesystem](http://lore.kernel.org/linux-fsdevel/cover.1674227308.git.alexl@redhat.com/) - - Giuseppe Scrivano and I have recently been working on a new project we - call composefs. This is the first time we propose this publically and - we would like some feedback on it. - -* [v1: Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"](http://lore.kernel.org/linux-fsdevel/20230120141150.1278819-1-agruenba@redhat.com/) - - Commit b2b0a5e97855 switched from generic_writepages() to - filemap_fdatawrite_wbc() in gfs2_ail1_start_one() on the path to - replacing ->writepage() with ->writepages() and eventually eliminating - the former. Function gfs2_ail1_start_one() is called from - gfs2_log_flush(), our main function for flushing the filesystem log. - -* [v2: fs/aio: obey min_nr when doing wakeups](http://lore.kernel.org/linux-fsdevel/20230120140347.2133611-1-kent.overstreet@linux.dev/) - - I've been observing workloads where IPIs due to wakeups in - aio_complete() are - 15% of total CPU time in the profile. Most of those - wakeups are unnecessary when completion batching is in use in io_getevents(). - -* [v1: shmem: support idmapped mounts for tmpfs](http://lore.kernel.org/linux-fsdevel/20230120094346.3182328-1-gscrivan@redhat.com/) - - This patch enables idmapped mounts for tmpfs when CONFIG_SHMEM is defined. - Since all dedicated helpers for this functionality exist, in this - patch we just pass down the idmap argument from the VFS methods to the relevant helpers. - -* [v1: RESEND: fs/namespace: defer free_mount from namespace_unlock](http://lore.kernel.org/linux-fsdevel/20230119211455.498968-1-echanude@redhat.com/) - - With the following patch, namespace_unlock will queue up the resources that - needs to be released and defer the operation through call_rcu to return without - waiting for the grace period. - -* [v3: fs/aio: Replace kmap{,_atomic}() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20230119162055.20944-1-fmdefrancesco@gmail.com/) - - The use of kmap() and kmap_atomic() are being deprecated in favor of - kmap_local_page(). - - With kmap_local_page() the mappings are per thread, CPU local, can take - page faults, and can be called from any context (including interrupts). - It is faster than kmap() in kernels with HIGHMEM enabled. - -* [v1: dax: use switch statement over chained ifs](http://lore.kernel.org/linux-fsdevel/CAPOgqxF_xEgKspetRJ=wq1_qSG3h8mkyXC58TXkUvx0agzEm_A@mail.gmail.com/) - - This patch uses a switch statement for pe_order, which improves - readability and on some platforms may minorly improve performance. It - also, to improve readability, recognizes that `PAGE_SHIFT - PAGE_SHIFT' is a constant, and uses 0 in its place instead. - -* [v1: fs: Use CHECK_DATA_CORRUPTION() when kernel bugs are detected](http://lore.kernel.org/linux-fsdevel/20230116191425.458864-1-jannh@google.com/) - - Currently, filp_close() and generic_shutdown_super() use printk() to log - messages when bugs are detected. This is problematic because infrastructure - like syzkaller has no idea that this message indicates a bug. - In addition, some people explicitly want their kernels to BUG() when kernel - data corruption has been detected (CONFIG_BUG_ON_DATA_CORRUPTION). - -* [v3: ext4: Convert inode preallocation list to an rbtree](http://lore.kernel.org/linux-fsdevel/20230116080216.249195-1-ojaswin@linux.ibm.com/) - - This patch series aim to improve the performance and scalability of - inode preallocation by changing inode preallocation linked list to an - rbtree. I've ran xfstests quick on this series and plan to run auto group - as well to confirm we have no regressions. - -* [v2: eventfd: use a generic helper instead of an open coded wait_event](http://lore.kernel.org/linux-fsdevel/tencent_B0E8F40B6620BFE2E79CAA06EAADA085C907@qq.com/) - - Use wait_event_interruptible_locked_irq() in the eventfd_{write,read} to - avoid the longer, open coded equivalent. - -* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) - - This removes the dependency on interrupts to wake up task. Set task - state as TASK_RUNNING, if need_resched() returns true, - while polling for IO completion. - Earlier, polling task used to sleep, relying on interrupt to wake it up. - This made some IO take very long when interrupt-coalescing is enabled in NVMe. - -#### 安全增强 - -* [v1: gcc-plugins: Reorganize gimple includes for GCC 13](http://lore.kernel.org/linux-hardening/20230118202355.never.520-kees@kernel.org/) - - The gimple-iterator.h header must be included before gimple-fold.h - starting with GCC 13. Reorganize gimple headers to work for all GCC - versions. - -* [v3: kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST](http://lore.kernel.org/linux-hardening/20230118200653.give.574-kees@kernel.org/) - - Since the long memcpy tests may stall a system for tens of seconds - in virtualized architecture environments, split those tests off under - CONFIG_MEMCPY_SLOW_KUNIT_TEST so they can be separately disabled. - - Reviewed-and-tested-by: Guenter Roeck - -* [v2: KVM: x86: Replace 0-length arrays with flexible arrays](http://lore.kernel.org/linux-hardening/20230118195905.gonna.693-kees@kernel.org/) - - Zero-length arrays are deprecated[1]. Replace struct kvm_nested_state's - "data" union 0-length arrays with flexible arrays. (How are the - sizes of these arrays verified?) Detected with GCC 13, using - -fstrict-flex-arrays=3: - -#### 异步 IO - -* [v1: io_uring/poll: don't reissue in case of poll race on multishot request](http://lore.kernel.org/io-uring/8997c26b-c498-166d-d130-2caca08a3abb@kernel.dk/) - - A previous commit fixed a poll race that can occur, but it's only - applicable for multishot requests. For a multishot request, we can safely - ignore a spurious wakeup, as we never leave the waitqueue to begin with. - -* [v1: for-next: random for-next patches](http://lore.kernel.org/io-uring/cover.1673887636.git.asml.silence@gmail.com/) - - 1/5 returns back an old lost optimisation - Others are small cleanups - -* [v1: liburing: test lazy poll wq activation](http://lore.kernel.org/io-uring/cover.1673886955.git.asml.silence@gmail.com/) - - Some tests around DEFER_TASKRUN and lazy poll activation, with - 3/3 specifically testing the feature with disabled. - -* [v1: io_uring: make io_sqpoll_wait_sq return void](http://lore.kernel.org/io-uring/20230115071519.554282-1-quanfafu@gmail.com/) - - Change the return type to void since it always return 0, and no need - to do the checking in syscall io_uring_enter. - -#### Rust For Linux - -* [v1: scripts: `make rust-analyzer` for out-of-tree modules](http://lore.kernel.org/rust-for-linux/20230118160220.776302-1-varmavinaym@gmail.com/) - - Adds support for out-of-tree rust modules to use the `rust-analyzer` - make target to generate the rust-project.json file. - -### 周边技术动态 - -#### Qemu - -* [v1: riscv-to-apply queue](http://lore.kernel.org/qemu-devel/20230120073913.1028407-1-alistair.francis@opensource.wdc.com/) - - The following changes since commit 239b8b0699a222fd21da1c5fdeba0a2456085a47: - - Merge tag 'trivial-branch-for-8.0-pull-request' of https://gitlab.com/laurent_vivier/qemu into staging (2023-01-19 15:05:29 +0000) - -#### U-Boot - -* [v2: spl: spl_nor: add alternative SPL_LOAD_IMAGE_METHOD](http://lore.kernel.org/u-boot/20230119152822.1214202-1-dev@kicherer.org/) - - Add a second SPL_LOAD_IMAGE_METHOD BOOT_DEVICE_NOR2 to enable booting - from an alternative NOR address in case loading from the first address - fails - e.g., if no valid header is found. - -* [v2: Basic StarFive JH7110 RISC-V SoC support](http://lore.kernel.org/u-boot/20230118081132.31403-1-yanhong.wang@starfivetech.com/) - - This series of patches base on the latest branch/master, and add support - for the StarFive JH7110 RISC-V SoC and VisionFive V2 board. In order for - this to be achieved, the respective DT nodes have been added, and the - required defconfigs have been added to the boards' defconfig. What is more, - the basic required DM drivers have been added, such as reset, clock, pinctrl, - uart, ram etc. - -* [v1: event: Correct dependencies on the EVENT framework](http://lore.kernel.org/u-boot/20230116191207.151545-1-trini@konsulko.com/) - - The event framework is just that, a framework. Enabling it by itself - does nothing, so we shouldn't ask the user about it. Reword (and correct - typos) around this the option and help text. This also applies to - DM_EVENT and EVENT_DYNAMIC. Only EVENT_DEBUG and CMD_EVENT should be - visible to the user to select, when EVENT is selected. - -## 20230115:第 29 期 - -### 内核动态 - -#### RISC-V 架构支持 - -* [v1: Zbb + fast-unaligned string optimization](http://lore.kernel.org/linux-riscv/20230113212351.3534769-1-heiko@sntech.de/) - - For this it uses Palmer's series for hw-feature probing that would read - this property from firmware (devicetree), as the performance of unaligned - accesses is an implementation detail of the relevant cpu core. - -* [v5: Zbb string optimizations](http://lore.kernel.org/linux-riscv/20230113212301.3534711-1-heiko@sntech.de/) - - This series still tries to allow optimized string functions for specific - extensions. The last approach of using an inline base function to hold - the alternative calls did cause some issues in a number of places - -* [v1: RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes](http://lore.kernel.org/linux-riscv/20230113211955.3534431-1-heiko@sntech.de/) - - The __RISCV_INSN_FUNCS originally declared riscv_insn_is_* functions inside - the kprobes implementation. This got moved into a central header in - commit ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header"). - - Though it looks like I overlooked two of them, so fix that. FENCE itself is - an instruction defined directly by its own opcode, while the created - riscv_isn_is_system function covers all instructions defined under the SYSTEM opcode. - -* [v5: dt-bindings: riscv: add SBI PMU event mappings](http://lore.kernel.org/linux-riscv/20230113205435.122712-1-conor@kernel.org/) - - The SBI PMU extension requires a firmware to be aware of the event to - counter/mhpmevent mappings supported by the hardware. OpenSBI may use - DeviceTree to describe the PMU mappings. This binding is currently - described in markdown in OpenSBI (since v1.0 in Dec 2021) & used by QEMU since v7.2.0. - -* [v1: mm-unstable: mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs](http://lore.kernel.org/linux-riscv/20230113171026.582290-1-david@redhat.com/) - - This is the follow-up on [1]: - v2: mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages - - After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent - enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all - remaining architectures that support swap PTEs. - -* [v1: riscv: Add "Code:", and decodecode support](http://lore.kernel.org/linux-riscv/20230113144552.138081-1-bjorn@kernel.org/) - - From: Björn Töpel - - RISC-V does not have "Code:" dumps in the Oops output. This series - adds that, together with scripts/decodecode support. - -* [v1: irqchip/irq-sifive-plic: Add syscore callbacks for hibernation](http://lore.kernel.org/linux-riscv/20230113094216.116036-1-mason.huo@starfivetech.com/) - - The priority and enable registers of plic will be reset - during hibernation power cycle in poweroff mode, add the syscore callbacks to save/restore those registers. - -* [v1: Change PWM-controlled LED pin active mode and algorithm](http://lore.kernel.org/linux-riscv/20230113083115.2590-1-nylon.chen@sifive.com/) - - According to the circuit diagram of User LEDs - RGB described in the - manual hifive-unmatched-schematics-v3.pdf[0]. The behavior of PWM is acitve-high. - - According to the descriptionof PWM for pwmcmp in SiFive FU740-C000 - Manual[1]. - The pwm algorithm is (PW) pulse active time = (D) duty * (T) period[2]. - The `frac` variable is pulse "inactive" time so we need to invert it. - -* [v2: riscv: elf: add .riscv.attributes parsing](http://lore.kernel.org/linux-riscv/20230112210622.2337254-1-vineetg@rivosinc.com/) - - This implements the elf loader hook to parse RV specific - .riscv.attributes section. This section is inserted by compilers - (gcc/llvm) with build related information such as -march organized as - tag/value attribute pairs. - - It identifies the various attribute tags (and corresponding values) as currently specified in the psABI specification. - -* [v1: RISC-V KVM virtualize AIA CSRs](http://lore.kernel.org/linux-riscv/20230112140304.1830648-1-apatel@ventanamicro.com/) - - This series implements first phase of AIA virtualization which targets - virtualizing AIA CSRs. This also provides a foundation for the second - phase of AIA virtualization which will target in-kernel AIA irqchip - (including both IMSIC and APLIC). - - The first two patches are shared with the "Linux RISC-V AIA Support" - series which adds AIA driver support. - -* [v14: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230112095848.1464404-1-guoren@kernel.org/) - - The patches convert riscv to use the generic entry infrastructure from - kernel/entry/*. Some optimization for entry.S with new .macro and merge - ret_from_kernel_thread into ret_from_fork. - - The 1,2 are the preparation of generic entry. 3 7 are the main part of generic entry. - - All tested with rv64, rv32, rv64 + 32rootfs, all are passed. - -* [v7: -next: riscv: Optimize function trace](http://lore.kernel.org/linux-riscv/20230112090603.1295340-1-guoren@kernel.org/) - - The previous ftrace detour implementation fc76b8b8011 ("riscv: Using - PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems. - - This series adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS support for RISC-V. - SAMPLE_FTRACE_DIRECT and SAMPLE_FTRACE_DIRECT_MULTI are also included - here as the samples for testing DIRECT_CALLS related interface. - -* [v4: hwrng: starfive: Add driver for TRNG module](http://lore.kernel.org/linux-riscv/20230112043812.150393-1-jiajie.ho@starfivetech.com/) - - This patch series adds kernel support for StarFive hardware random - number generator. First 2 patches add binding docs and device driver for - this module. Patch 3 adds devicetree entry for VisionFive 2 SoC. - -* [v3: riscv: improve boot time isa extensions handling](http://lore.kernel.org/linux-riscv/20230111171027.2392-1-jszhang@kernel.org/) - - Generally, riscv ISA extensions are fixed for any specific hardware - platform, so a hart's features won't change after booting, this - chacteristic makes it straightforward to use a static branch to check - a specific ISA extension is supported or not to optimize performance. - -* [v3: PCI: microchip: Partition address translations](http://lore.kernel.org/linux-riscv/20230111125323.1911373-1-daire.mcnamara@microchip.com/) - - Microchip PolarFire SoC is a 64-bit device and has DDR starting at - Coreplex via an FPGA fabric. The AXI connections between the Coreplex and - the fabric are 64-bit and the AXI connections between the fabric and the - rootport are 32-bit. For the CPU CorePlex to act as an AXI-Master to the - PCIe devices and for the PCIe devices to act as bus masters to DDR at these - base addresses, the fabric can be customised to add/remove offsets for bits - customer's design. - -* [v2: Add a devicetree for the Aldec PolarFire SoC TySoM](http://lore.kernel.org/linux-riscv/20230111124106.2417152-1-conor.dooley@microchip.com/) - - The board has 32 GB of DDR but the DT I have access to only has a small - bit of that mapped. I tried accessing more DDR, but it was not possible - with the FPGA design as things stand. I'd rather have the devicetree - match what the vendor is shipping, so left the design/DDR as-was. - -* [v2: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230109062407.3235-1-jeeheng.sia@starfivetech.com/) - - This series adds RISC-V Hibernation/suspend to disk support. Low level Arch functions were created to support hibernation. - -* [v13: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230107113838.3969149-1-guoren@kernel.org/) - - The patches convert riscv to use the generic entry infrastructure from - kernel/entry/*. Some optimization for entry.S with new .macro and merge - ret_from_kernel_thread into ret_from_fork. - -#### 进程调度 - -* [v3: sched/fair: unlink misfit task from cpu overutilized](http://lore.kernel.org/lkml/20230113134056.257691-1-vincent.guittot@linaro.org/) - - By taking into account uclamp_min, the 1:1 relation between task misfit - and cpu overutilized is no more true as a task with a small util_avg of - may not fit a high capacity cpu because of uclamp_min constraint. - -* [v4: sched/fair: limit sched slice duration](http://lore.kernel.org/lkml/20230113133613.257342-1-vincent.guittot@linaro.org/) - - In presence of a lot of small weight tasks like sched_idle tasks, normal - or high weight tasks can see their ideal runtime (sched_slice) to increase - to hundreds ms whereas it normally stays below sysctl_sched_latency. - - Such long sched_slice can delay significantly the release of resources - as the tasks can wait hundreds of ms before the next running slot just - because of idle tasks queued on the rq. - -* [v1: sched: print parent comm in sched_show_task()](http://lore.kernel.org/lkml/20230113105413.GA30243@didi-ThinkCentre-M930t-N000/) - - Knowing who the parent is might be useful for debugging. - For example, we can sometimes resolve kernel hung tasks by stopping - the person who begins those hung tasks. - With the parent's name printed in sched_show_task(), - it might be helpful to let people know which "service" should be operated. Also, we move the parent info to a following new line. - -* [v1: sched/idle: Make idle poll dynamic per-cpu](http://lore.kernel.org/lkml/20230112162426.217522-1-bristot@kernel.org/) - - idle=poll is frequently used on ultra-low-latency systems. Examples of - such systems are high-performance trading and 5G NVRAM. The performance - gain is given by avoiding the idle driver machinery and by keeping the - CPU is always in an active state - avoiding (odd) hardware heuristics that are out of the control of the OS. - -* [v1: net: sched: disallow noqueue for qdisc classes](http://lore.kernel.org/lkml/20230109163906.706000-1-fred@cloudflare.com/) - - While experimenting with applying noqueue to a classful queue discipline, - - Fix this by not allowing classes to be assigned to the noqueue - discipline. Linux TC Notes states that classes cannot be set to - the noqueue discipline. [1] Let's enforce that here. - -#### 内存管理 - -* [v1: memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory](http://lore.kernel.org/linux-mm/DS0PR02MB90787835F5B9CB9771A20329C4C09@DS0PR02MB9078.namprd02.prod.outlook.com/) - - We're from the Linux memory team here at Qualcomm. We are currently devising a VM memory resizing feature where we dynamically inflate or deflate the Linux VM based on ongoing memory demands in the VM. We wanted to propose few details about this userspace daemon in form of RFC and wanted to know the upstream's opinion. - -* [v5: selftest/vm: add mremap expand merge offset test](http://lore.kernel.org/linux-mm/8ff3ba3cadc0b6c1b2688ae5c851bf73aa062d57.1673701836.git.lstoakes@gmail.com/) - - Add a test to assert that we can mremap() and expand a mapping starting - from an offset within an existing mapping. We unmap the last page in a 3 - page mapping to ensure that the remap should always succeed, before - remapping from the 2nd page. - -* [v3: mm-unstable: continue hugetlb folio conversion](http://lore.kernel.org/linux-mm/20230113223057.173292-1-sidhartha.kumar@oracle.com/) - - This series continues the conversion of core hugetlb functions to use - folios. This series converts many helper funtions in the hugetlb fault - path. This is in preperation for another series to convert the hugetlb - fault code paths to operate on folios. - -* [v4: Secure prandom_u32 invocations](http://lore.kernel.org/linux-mm/cover.1673470326.git.david.keisarschm@mail.huji.ac.il/) - - The security improvements for prandom_u32 done in commits c51f8f88d705 - from October 2020 and d4150779e60f from May 2022 didn't handle the cases - when prandom_bytes_state() and prandom_u32_state() are used. - -* [v5: scripts/gdb: add mm introspection utils](http://lore.kernel.org/linux-mm/20230113175151.22278-1-dmitrii.bundin.a@gmail.com/) - - This command provides a way to traverse the entire page hierarchy by a - given virtual address on x86. In addition to qemu's commands info - tlb/info mem it provides the complete information about the paging structure for an arbitrary virtual address. It supports 4KB/2MB/1GB and 5 level paging. - -* [v1: mm: populate multiple PTEs if file page is large folio](http://lore.kernel.org/linux-mm/20230113163538.23412-1-fengwei.yin@intel.com/) - - The page fault number can be reduced by batched PTEs population. - The batch size of PTEs population is not allowed to cross: - - page table boundaries - - vma range - - large folio size - - fault_around_bytes - -* [v1: Add tests for memblock_alloc_node()](http://lore.kernel.org/linux-mm/0c3fdce6-3180-89c6-ba9e-77b7e98a5691@mail.polimi.it/) - - These tests are aimed at verifying the memblock_alloc_node() to work as expected, so setting the - correct NUMA node for the new allocated region. The memblock_alloc_node() is mimicked by executing - the already implemented test function run_memblock_alloc_try_nid() and by setting the flags used - internally by the memblock_alloc_node(). The core check is between the requested NUMA node and the - `nid` field inside the memblock_region structure. These two are supposed to be equal in order for the test to succeed. - -* [v3: mm/page_ext: Do not allocate space for page_ext->flags if not needed](http://lore.kernel.org/linux-mm/20230113154253.92480-1-pasha.tatashin@soleen.com/) - - There is 8 byte page_ext->flags field allocated per page whenever - CONFIG_PAGE_EXTENSION is enabled. However, not every user of page_ext - uses flags. Therefore, check whether flags is needed at least by one - user and if so allocate space for it. - -* [v3: 0/6: Discard __GFP_ATOMIC](http://lore.kernel.org/linux-mm/20230113111217.14134-1-mgorman@techsingularity.net/) - - This replaces the "Discard __GFP_ATOMIC v2" series in mm-unstable. There - are changelog and patch replacements that make -fix patches impractical. - -* [v1: Some small improvements for memblock.](http://lore.kernel.org/linux-mm/20230113082659.65276-1-zhangpeng.00@bytedance.com/) - - I found some small optimizations while reading the code of memblock. - Please help to review. Thanks. - -* [v3: mm/vmalloc.c: allow vread() to read out vm_map_ram areas](http://lore.kernel.org/linux-mm/20230113031921.64716-1-bhe@redhat.com/) - - The normal vmalloc API uses struct vmap_area to manage the virtual - kernel area allocated, and associate a vm_struct to store more - information and pass out. However, area reserved through vm_map_ram() - interface doesn't allocate vm_struct to associate with. So the current code in vread() will skip the vm_map_ram area through 'if (!va->vm)' conditional checking. - -* [v1: mm-unstable: convert hugepage memory failure functions to folios](http://lore.kernel.org/linux-mm/20230112204608.80136-1-sidhartha.kumar@oracle.com/) - - This series contains a 1:1 straightforward page to folio conversion for - memory failure functions which deal with huge pages. I renamed a few - functions to fit with how other folio operating functions are named. - -* [v2: bpf-next: mm, bpf: Add BPF into /proc/meminfo](http://lore.kernel.org/linux-mm/20230112155326.26902-1-laoar.shao@gmail.com/) - - Currently there's no way to get BPF memory usage, while we can only - estimate the usage by bpftool or memcg, both of which are not reliable. - -* [v1: shmem: Convert shmem_write_end() to use a folio](http://lore.kernel.org/linux-mm/20230112131031.1209553-1-willy@infradead.org/) - - Use a folio internally to shmem_write_end() which saves a number of - calls to compound_head() and lets us get rid of the custom code to - zero out the rest of a THP and supports folios of arbitrary size. - -* [v1: -next: mm: madvise: use vm_normal_folio() in madvise_free_pte_range()](http://lore.kernel.org/linux-mm/20230112124028.16964-1-wangkefeng.wang@huawei.com/) - - There is already a vm_normal_folio(), use it to make - madvise_free_pte_range() only use a folio. - -* [v1: zsmalloc: turn chain size config option into UL constant](http://lore.kernel.org/linux-mm/20230112071443.1933880-1-senozhatsky@chromium.org/) - - This fixes - - >> mm/zsmalloc.c:122:59: warning: right shift count >= width of type [-Wshift-count-overflow] - - and - - >> mm/zsmalloc.c:224:28: error: variably modified 'size_class' at file scope - 224 | struct size_class *size_class[ZS_SIZE_CLASSES]; - -* [v1: Get rid of tail page fields](http://lore.kernel.org/linux-mm/20230111142915.1001531-1-willy@infradead.org/) - - Continue the shrinkage of the struct page definition by getting rid of - the 'first tail page' and 'second tail page' fields. I originally did - this patch set before Hugh's rewrite of the subpages_mapcount, so it - needed substantial updates; hope I didn't miss anything. - -#### 文件系统 - -* [v3: exfat: handle unreconized benign secondary entries](http://lore.kernel.org/linux-fsdevel/20230114041900.4458-1-linkinjeon@kernel.org/) - - Sony PXW-Z280 camera add vendor allocation entries to directory of - pictures. Currently, linux exfat does not support it and the file is - not visible. This patch handle vendor extension and allocation entries - as unreconized benign secondary entries. As described in the specification, - it is recognized but ignored, and when deleting directory entry set, the associated clusters allocation are removed as well as benign secondary directory entries. - -* [v3: vfs: provide automatic kernel freeze / resume](http://lore.kernel.org/linux-fsdevel/20230114003409.1168311-1-mcgrof@kernel.org/) - - Darrick J. Wong poked me about the status of the fs freez work, he's - right, it's been too long since the last spin. The last v2 attempt happened - in April 2021 [0], this just takes the feedback from Christoph and spins it - again. I've only done basic build tests on x86_64, and haven't yet run time - tested the stuff, but given the size of this set its better to review early - before getting stuck on details. So this is what I've ended up with so far. - -* [v1: lockref: stop doing cpu_relax in the cmpxchg loop](http://lore.kernel.org/linux-fsdevel/20230113184447.1707316-1-mjguzik@gmail.com/) - - On the x86-64 architecture even a failing cmpxchg grants exclusive - access to the cacheline, making it preferable to retry the failed op - immediately instead of stalling with the pause instruction. - -* [v2: Composefs: an opportunistically sharing verified image filesystem](http://lore.kernel.org/linux-fsdevel/cover.1673623253.git.alexl@redhat.com/) - - Giuseppe Scrivano and I have recently been working on a new project we - call composefs. This is the first time we propose this publically and - we would like some feedback on it. - - At its core, composefs is a way to construct and use read only images - that are used similar to how you would use e.g. loop-back mounted - squashfs images. On top of this composefs has two fundamental features. - -* [v1: fs: finish conversion to mnt_idmap](http://lore.kernel.org/linux-fsdevel/20230113-fs-idmapped-mnt_idmap-conversion-v1-0-fc84fa7eba67@kernel.org/) - - This series converts all places that currently still pass around a plain - namespace attached to a mount to passing around a separate type eliminating - all bugs that can arise from conflating filesystem and mount idmappings. - After this series nothing will have changed semantically. - -* [v3: RESEND: coredump: Use vmsplice_to_pipe() for pipes in dump_emit_page()](http://lore.kernel.org/linux-fsdevel/20230112224348.5384-1-yepeilin.cs@gmail.com/) - - Tested by dumping a 32-GByte core into a simple handler that splice()s - from stdin to disk in a loop, PIPE_DEF_BUFFERS (16) pages at a time. - -* [v6: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20230112115908.23662-1-nj.shetty@samsung.com/) - - The patch series covers the points discussed in November 2021 virtual - call [LSF/MM/BFP TOPIC] Storage: Copy Offload: 0: . - We have covered the initial agreed requirements in this patchset and further additional features suggested by community. - -* [v3: RESEND: nsfs: add compat ioctl handler](http://lore.kernel.org/linux-fsdevel/20221214-nsfs-ioctl-compat-v3-1-dce2d26e1fec@weissschuh.net/) - - As all parameters and return values of the ioctls have the same - representation on both 32bit and 64bit we can reuse the normal ioctl - handler for the compat handler via compat_ptr_ioctl(). - - All nsfs ioctls return a plain "int" filedescriptor which is a signed 4-byte integer type on both 32bit and 64bit. - -* [v5: iov_iter: Add extraction helpers](http://lore.kernel.org/linux-fsdevel/167344725490.2425628.13771289553670112965.stgit@warthog.procyon.org.uk/) - - Here are patches clean up some use of READ/WRITE and ITER_SOURCE/DEST, - patches to provide support for extracting pages from an iov_iter and a - patch to use the primary extraction function in the block layer bio code if - you could take a look? - -* [v2: erofs: support page cache sharing between EROFS images in fscache mode](http://lore.kernel.org/linux-fsdevel/20230111083158.23462-1-jefflexu@linux.alibaba.com/) - - changes since RFC: - - patch 2: allocate an anonymous file (realfile) when file is opened, - rather than allocate a single anonymous file for each blob at mount - time - - patch 7: add 'sharecache' mount option to control if page cache - sharing shall be enabled - -* [v4: Introduce daemon failover mechanism to recover from crashing](http://lore.kernel.org/linux-fsdevel/20230111052515.53941-1-zhujia.zj@bytedance.com/) - - In ondemand read mode, if user daemon closes anonymous fd(e.g. daemon - crashes), subsequent read and inflight requests based on these fd will - return -EIO. - Even if above mentioned case is tolerable for some individual users, but - when it happenens in real cloud service production environment, such IO - errors will be passed to cloud service users and impact its working jobs. - -* [v1: proc: introduce proc_statfs()](http://lore.kernel.org/linux-fsdevel/20230110152003.1118777-1-chao@kernel.org/) - - Introduce proc_statfs() to replace simple_statfs(), so that - f_bsize queried from statfs() can be consistent w/ the value we set in s_blocksize. - -* [v1: Reduce zonefs memory usage](http://lore.kernel.org/linux-fsdevel/20230110130830.246019-1-damien.lemoal@opensource.wdc.com/) - - This series improves memory usage by switching to using dynamically - allocated inodes and dentries, similarly to regular file systems. This - drastically reduces the memory consumption of zonefs when the file - system is mounted. E.g., for a 26 TB SMR HDD with over 95000 zones, - memory usage is decreased from about 130 MB down to a little over 5 MB. - -* [v1: fs: kill old ms_* flags for internal sb](http://lore.kernel.org/linux-fsdevel/20230110022554.1186499-1-mcgrof@kernel.org/) - - David had started the sb flag split for internal flags through - commit e462ec50cb5 ("VFS: Differentiate mount flags (MS_*) from internal - superblock flags") but it seems we just never axed out the old flag usage. - -* [v2: fs/aio: Replace kmap{,_atomic}() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20230109175629.9482-1-fmdefrancesco@gmail.com/) - - The use of kmap_local_page() in fs/aio.c is "safe" in the sense that the - code don't hands the returned kernel virtual addresses to other threads - and there are no nestings which should be handled with the stack based - (LIFO) mappings/un-mappings order. - -* [v2: fs/sysv: Replace kmap() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20230109170639.19757-1-fmdefrancesco@gmail.com/) - - kmap() is deprecated in favor of kmap_local_page(). - - There are two main problems with kmap(): (1) It comes with an overhead as - the mapping space is restricted and protected by a global lock for - synchronization and (2) it also requires global TLB invalidation when the - kmap’s pool wraps and it might block when the mapping space is fully - utilized until a slot becomes available. - -* [v1: Checkpoint Support for Syscall User Dispatch](http://lore.kernel.org/linux-fsdevel/20230109153348.5625-1-gregory.price@memverge.com/) - - Syscall user dispatch makes it possible to cleanly intercept system - calls from user-land. However, most transparent checkpoint software - presently leverages some combination of ptrace and system call - injection to place software in a ready-to-checkpoint state. - -* [v7: Implement IOCTL to get and/or the clear info about PTEs](http://lore.kernel.org/linux-fsdevel/20230109064519.3555250-1-usama.anjum@collabora.com/) - - Stop using the soft-dirty flags for finding which pages have been - written to. It is too delicate and wrong as it shows more soft-dirty - pages than the actual soft-dirty pages. There is no interest in - correcting it [A]: B: as this is how the feature was written years ago. - It shouldn't be updated to changed behaviour. Peter Xu has suggested using the async version of the UFFD WP [C] as it is based inherently on the PTEs. - -* [v7: DEPT(Dependency Tracker)](http://lore.kernel.org/linux-fsdevel/1673235231-30302-1-git-send-email-byungchul.park@lge.com/) - - I've been developing a tool for detecting deadlock possibilities by - tracking wait/event rather than lock(?) acquisition order to try to - cover all synchonization machanisms. It's done on v6.2-rc2. - - https://github.com/lgebyungchulpark/linux-dept/commits/dept2.0_on_v6.2-rc2 - -#### 网络设备 - -* [v1: net: usb: sr9700: Handle negative len](http://lore.kernel.org/netdev/20230114182326.30479-1-szymon.heidrich@gmail.com/) - - Packet len computed as difference of length word extracted from - skb data and four may result in a negative value. In such case - processing of the buffer should be interrupted rather than - setting sr_skb->len to an unexpectedly large value (due to cast - from signed to unsigned integer) and passing sr_skb to - usbnet_skb_return. - -* [v6: net-next: net: ethernet: mtk_wed: introduce reset support](http://lore.kernel.org/netdev/cover.1673715298.git.lorenzo@kernel.org/) - - Introduce proper reset integration between ethernet and wlan drivers in order - to schedule wlan driver reset when ethernet/wed driver is resetting. - Introduce mtk_hw_reset_monitor work in order to detect possible DMA hangs. - -* [v2: bpf-next: xdp: introduce xdp-feature support](http://lore.kernel.org/netdev/cover.1673710866.git.lorenzo@kernel.org/) - - Introduce the capability to export the XDP features supported by the NIC. - Introduce a XDP compliance test tool (xdp_features) to check the features - exported by the NIC match the real features supported by the driver. - Allow XDP_REDIRECT of non-linear XDP frames into a devmap. - -* [v4: net-next: Add support for two classes of VCAP rules](http://lore.kernel.org/netdev/20230114134242.3737446-1-steen.hegelund@microchip.com/) - - For this to work the VCAP Loopups must be enabled from boot, so that the - "internal" clients like PTP can add rules that are always active. - - When the TC tool add a flower filter the VCAP rule corresponding to this filter will be disabled (kept in memory) until a TC matchall filter creates a link from chain 0 to the chain (lookup) where the flower filter was added. - -* [v2: net: tcp: avoid the lookup process failing to get sk in ehash table](http://lore.kernel.org/netdev/20230114132705.78400-1-kerneljasonxing@gmail.com/) - - While one cpu is working on looking up the right socket from ehash - table, another cpu is done deleting the request socket and is about - to add (or is adding) the big socket from the table. It means that - we could miss both of them, even though it has little chance. - -* [v2: net-next: unix: Improve locking scheme in unix_show_fdinfo()](http://lore.kernel.org/netdev/c6c7084c-56c7-cd37-befe-df718e080597@ya.ru/) - - After switching to TCP_ESTABLISHED or TCP_LISTEN sk_state, alive SOCK_STREAM - and SOCK_SEQPACKET sockets can't change it anymore (since commit 3ff8bff704f4 - "unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()"). - - Thus, we do not need to take lock here. - -* [v1: net-next: net: support ipv4 big tcp](http://lore.kernel.org/netdev/cover.1673666803.git.lucien.xin@gmail.com/) - - Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header - doesn't have exthdrs(options) for the BIG TCP packets' length. To make - it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to - indicate this might be a BIG TCP packet and use skb->len as the real IPv4 total length. - -* [v1: net-next: Small packet processing handling changes](http://lore.kernel.org/netdev/20230113223619.162405-1-parav@nvidia.com/) - - These two changes improve the small packet handling. - - Patch summary: - patch-1 fixes the length check by considering Ethernet 60B frame size - patch-2 avoids code duplication by reuses existing buffer free helper - -* [v10: net-next: virtio/vsock: replace virtio_vsock_pkt with sk_buff](http://lore.kernel.org/netdev/20230113222137.2490173-1-bobby.eshleman@bytedance.com/) - - This commit changes virtio/vsock to use sk_buff instead of - virtio_vsock_pkt. Beyond better conforming to other net code, using - sk_buff allows vsock to use sk_buff-dependent features in the future - (such as sockmap) and improves throughput. - -* [v2: net-next: Allow offloading of UDP NEW connections via act_ct](http://lore.kernel.org/netdev/20230113165548.2692720-1-vladbu@nvidia.com/) - - With all the necessary infrastructure in place modify act_ct to offload - UDP NEW as unidirectional connection. Pass reply direction traffic to CT - and promote connection to bidirectional when UDP connection state - changes to "assured". Rely on refresh mechanism to propagate connection - state change to supporting drivers. - -* [v3: net-next: net: dsa: mv88e6xxx: Enable PTP receive for mv88e6390](http://lore.kernel.org/netdev/20230113151258.196828-1-kurt@linutronix.de/) - - The switch receives management traffic such as STP and LLDP. However, PTP - messages are not received, only transmitted. - - Ideally, the switch would trap all PTP messages to the management CPU. This - particular switch has a PTP block which identifies PTP messages and traps them - to a dedicated port. There is a register to program this destination. This is - not used at the moment. - -* [v1: 5.10: mt76: move mt76_init_tx_queue in common code](http://lore.kernel.org/netdev/20230113150445.39286-1-n.zhandarovich@fintech.ru/) - - My apologies, I should've have explained my reasoning better. - - My issue with 5.10 version of mt7615_init_tx_queues() in drivers/net/wireless/mediatek/mt76/mt7615/dma.c is that return value of final call to mt7615_init_tx_queue() is not taken into account - when returning result of mt7615_init_tx_queues(). So, if last mt7615_init_tx_queue() fails (due to memory issues, for instance), parent function will still erroneously return 0. - -* [v1: ARM: imx: make Ethernet refclock configurable](http://lore.kernel.org/netdev/20230113142718.3038265-1-o.rempel@pengutronix.de/) - - Most of i.MX SoC variants have configurable FEC/Ethernet reference clock - used by RMII specification. This functionality is located in the - general purpose registers (GRPx) and till now was not implemented as part of SoC clock tree. - -* [v1: wireless/at76c50x-usb.c: Use devm_kmalloc replaces kmalloc](http://lore.kernel.org/netdev/20230113141231.71892-1-sensor1010@163.com/) - - use devm_kmalloc replaces kmalloc - -* [v2: net-next: net: use kmem_cache_free_bulk in kfree_skb_list](http://lore.kernel.org/netdev/167361788585.531803.686364041841425360.stgit@firesoul/) - - The kfree_skb_list function walks SKB (via skb->next) and frees them - individually to the SLUB/SLAB allocator (kmem_cache). It is more - efficient to bulk free them via the kmem_cache_free_bulk API. - - Netstack NAPI fastpath already uses kmem_cache bulk alloc and free APIs for SKBs. - -* [v1: wireless/at76c50x-usb.c: Use devm_kzalloc replaces kmalloc](http://lore.kernel.org/netdev/20230113133503.58336-1-sensor1010@163.com/) - - use devm_kzalloc replaces kamlloc - -* [v3: bpf-next: bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()](http://lore.kernel.org/netdev/cover.1673574419.git.william.xuanziyang@huawei.com/) - - Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). - Main use case is for using cls_bpf on ingress hook to decapsulate - IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. - - And add ipip6 and ip6ip decap testcases to verify that - bpf_skb_adjust_room() correctly decapsulate ipip6 and ip6ip tunnel packets. - -* [v4: net-next: virtio-net: support multi buffer xdp](http://lore.kernel.org/netdev/20230113080016.45505-1-hengqi@linux.alibaba.com/) - - Currently, virtio net only supports xdp for single-buffer packets - or linearized multi-buffer packets. This patchset supports xdp for - multi-buffer packets, then larger MTU can be used if xdp sets the xdp.frags. This does not affect single buffer handling. - -* [v1: net-next: r8169: reset bus if NIC isn't accessible after tx timeout](http://lore.kernel.org/netdev/85f2b5e5-ea85-3a84-1a5e-c4f84897ac04@gmail.com/) - - ASPM issues may result in the NIC not being accessible any longer. - In this case disabling ASPM may not work. Therefore detect this case - by checking whether register reads return - 0, and try to make the - NIC accessible again by resetting the secondary bus. - - * [v1: : net PATCH v2] octeontx2-pf: Avoid use of GFP_KERNEL in atomic context: (http://lore.kernel.org/netdev/20230113061902.6061-1-gakula@marvell.com/) - - Using GFP_KERNEL in preemption disable context, causing below warning - when CONFIG_DEBUG_ATOMIC_SLEEP is enabled. - - To avoid use of GFP_ATOMIC for memory allocation, disable preemption - after all memory allocation is done. - - Fixes: 4af1b64f80fb ("octeontx2-pf: Fix lmtst ID used in aura free") - -* [v1: net: sched: gred: prevent races when adding offloads to stats](http://lore.kernel.org/netdev/20230113044137.1383067-1-kuba@kernel.org/) - - Naresh reports seeing a warning that gred is calling - u64_stats_update_begin() with preemption enabled. - Arnd points out it's coming from _bstats_update(). - - We should be holding the qdisc lock when writing - to stats, they are also updated from the datapath. - -* [v2: Add eqos and fec support for imx93](http://lore.kernel.org/netdev/20230113033347.264135-1-xiaoning.wang@nxp.com/) - - This patchset add imx93 support for dwmac-imx glue driver. - There are some changes of GPR implement. - And add fec and eqos nodes for imx93 dts. - -* [v1: net-next: add some vf fault detect patch for hns](http://lore.kernel.org/netdev/20230113020829.48451-1-lanhao@huawei.com/) - - Currently hns3 driver supports vf fault detect feature.Patch #1 is - add hns3 vf fault detect cap bit support.Patch #2 is add vf fault process in hns3 ras. - -* [v3: net: sch_htb: Avoid grafting on htb_destroy_class_offload when destroying htb](http://lore.kernel.org/netdev/20230113005528.302625-1-rrameshbabu@nvidia.com/) - - Peek at old qdisc and graft only when deleting a leaf class in the htb, - rather than when deleting the htb itself. Do not peek at the qdisc of the - netdev queue when destroying the htb. The caller may already have grafted a new qdisc that is not part of the htb structure being destroyed. - -#### 安全增强 - -* [v3: firmware: coreboot: Check size of table entry and split memcpy](http://lore.kernel.org/linux-hardening/20230112230312.give.446-kees@kernel.org/) - - The memcpy() of the data following a coreboot_table_entry couldn't - be evaluated by the compiler under CONFIG_FORTIFY_SOURCE. To make it - easier to reason about, add an explicit flexible array member to struct - coreboot_device so the entire entry can be copied at once. Additionally, validate the sizes before copying. - -* [v3: kmod: harden user namespaces with new kernel.ns_modules_allowed sysctl](http://lore.kernel.org/linux-hardening/20230112131911.7684-1-vegard.nossum@oracle.com/) - - This mitigation obviously offers no protection if the vulnerable module is - already loaded, but for many of these exploits the vast majority of users - will never actually load or use these modules on purpose; in other words, - for the vast majority of users, this would block exploits for the above list of vulnerabilities. - -* [v1: pstore/ram: Rework logic for detecting ramoops](http://lore.kernel.org/linux-hardening/1673428065-22356-1-git-send-email-quic_mojha@quicinc.com/) - - The reserved memory region for ramoops is assumed to be at a fixed - and known location when read from the devicetree. This is not desirable - in environments where it is preferred the region to be dynamically - allocated at runtime, as opposed to being fixed at compile time. - -* [v1: next: x86/fpu: Replace zero-length array with flexible-array member](http://lore.kernel.org/linux-hardening/Y7zCFpa2XNs%2Fo9YQ@work/) - - Zero-length arrays are deprecated1] and we are moving towards - adopting C99 flexible-array members instead. So, replace zero-length - array declaration in struct xregs_state with flex-array member. - -* [v1: next: RDMA/erdma: Replace zero-length arrays with flexible-array members](http://lore.kernel.org/linux-hardening/Y7zCBqwC1LtabRJ9@work/) - - Zero-length arrays are deprecated[1] and we are moving towards - adopting C99 flexible-array members instead. So, replace zero-length - arrays, in a couple of structures, with flex-array members. - -* [v1: next: nvmem: u-boot-env: replace zero-length array with flexible-array member](http://lore.kernel.org/linux-hardening/Y7zB+s2AC6O+CRR+@work/) - - Zero-length arrays are deprecated[1] and we are moving towards - adopting C99 flexible-array members instead. So, replace zero-length - array declaration in struct u_boot_env_image_broadcom with flex-array member. - -* [v1: next: habanalabs: Replace zero-length arrays with flexible-array members](http://lore.kernel.org/linux-hardening/Y7zB4z5cxpFkPXKV@work/) - - Zero-length arrays are deprecated[1] and we are moving towards - adopting C99 flexible-array members instead. So, replace zero-length - arrays in a couple of structures with flex-array members. - -* [v1: next: cifs: Replace zero-length arrays with flexible-array members](http://lore.kernel.org/linux-hardening/Y7zBtCZ%2FeRJCpjBf@work/) - - Zero-length arrays are deprecated[1] and we are moving towards - adopting C99 flexible-array members instead. So, replace zero-length - arrays in a couple of structures with flex-array members. - - This helps with the ongoing efforts to tighten the FORTIFY_SOURCE - routines on memcpy() and help us make progress towards globally - enabling -fstrict-flex-arrays=3 [2]. - -* [v6: arm64: dts: qcom: sm6125: UFS and xiaomi-laurel-sprout support](http://lore.kernel.org/linux-hardening/20230108195336.388349-1-they@mint.lgbt/) - - Introduce Universal Flash Storage support on SM6125 and add support for the Xiaomi Mi A3 based on the former platform. Uses the name xiaomi-laurel-sprout instead of the official codename (laurel_sprout) due to naming limitations in the kernel. - -#### 异步 IO - -* [v1: liburing: liburing.map: Export `io_uring_{enable_rings,register_restrictions}`](http://lore.kernel.org/io-uring/20230114035405.429608-1-ammar.faizi@intel.com/) - - When adding these two functions, Stefano didn't add - io_uring_enable_rings() and io_uring_register_restrictions() to - liburing.map. It causes a linking problem. Add them to liburing.map. - -* [v1: io_uring: Add NULL checks for current->io_uring](http://lore.kernel.org/io-uring/20230111101907.600820-1-baijiaju1990@gmail.com/) - - As described in a previous commit 998b30c3948e, current->io_uring could - be NULL, and thus a NULL check is required for this variable. - -* [v1: io_uring/poll: add hash if ready poll request can't complete inline](http://lore.kernel.org/io-uring/559d2a90-25c5-626c-c643-25a86cf15e6a@kernel.dk/) - - If we don't, then we may lose access to it completely, leading to a - request leak. This will eventually stall the ring exit process as well. - - Fixes: 49f1c68e048f ("io_uring: optimise submission side poll_refs") - -#### Rust For Linux - -* [v3: scripts: Exclude Rust CUs with pahole](http://lore.kernel.org/rust-for-linux/20230111152050.559334-1-yakoyoku@gmail.com/) - - Version 1.24 of pahole has the capability to exclude compilation units - (CUs) of specific languages [1]: 2: . Rust, as of writing, is not - currently supported by pahole and if it's used with a build that has - BTF debugging enabled it results in malformed kernel and module - binaries [3]. So it's better for pahole to exclude Rust CUs until - support for it arrives. - -* [v1: rust: print: avoid evaluating arguments in `pr_*` macros in `unsafe` blocks](http://lore.kernel.org/rust-for-linux/20230109204912.539790-1-ojeda@kernel.org/) - - At the moment it is possible to perform unsafe operations in - the arguments of `pr_*` macros since they are evaluated inside - an `unsafe` block: - - let x = &10u32 as *const u32; - pr_info!("{}", *x); - - In other words, this is a soundness issue. - - Fix it so that it requires an explicit `unsafe` block. - -* [Fwd: v1: bpf: scripts: Exclude Rust CUs with pahole](http://lore.kernel.org/rust-for-linux/0ca4ad02-af27-0d1f-8750-1ff6b34e8d2a@gmail.com/) - - I see, I was making a dependency on `auto.conf` in `pahole-flags.sh` but - the former gets generated after the latter is called, so that's the - reason behind the `grep` errors. Sent a new version of the patch. - -* [v2: kbuild: rust: move rust/target.json to scripts/](http://lore.kernel.org/rust-for-linux/20230107094545.3384745-1-masahiroy@kernel.org/) - - scripts/ is a better place to generate files used treewide. - - With target.json moved to scripts/, you do not need to add target.json - to no-clean-files or MRPROPER_FILES. - - 'make clean' does not visit scripts/, but 'make mrproper' does. - -#### BPF - -* [v1: bpf: Add CONFIG_BPF_HELPER_STRICT](http://lore.kernel.org/bpf/SJ0PR04MB7248C599DE6F006F94997CF180C39@SJ0PR04MB7248.namprd04.prod.outlook.com/) - - In container environment, ebpf helpers could be used maliciously to - leak information, DOS, even escape from containers. - CONFIG_BPF_HELPER_STRICT is as a mitigation of it. - Related Link: https://rolandorange.zone/report.html - -* [v1: bpftool: Always disable stack protection for clang](http://lore.kernel.org/bpf/74cd9d2e-6052-312a-241e-2b514a75c92c@applied-asynchrony.com/) - - When the clang toolchain has stack protection enabled in order to be consistent - with gcc - which just happens to be the case on Gentoo - the bpftool build fails - -* [v1: tools/resolve_btfids: Install subcmd headers](http://lore.kernel.org/bpf/20230112004024.1934601-1-irogers@google.com/) - - Previously tools/lib/subcmd was added to the include path, switch to - installing the headers and then including from that directory. This - avoids dependencies on headers internal to tools/lib/subcmd. Add the - missing subcmd directory to the affected #include. - -* [v7: bpf-next: xdp: hints via kfuncs](http://lore.kernel.org/bpf/20230112003230.3779451-1-sdf@google.com/) - - Please see the first patch in the series for the overall - design and use-cases. - - See the following email from Toke for the per-packet metadata overhead: - https://lore.kernel.org/bpf/20221206024554.3826186-1-sdf@google.com/T/#m49d48ea08d525ec88360c7d14c4d34fb0e45e798 - -* [v1: Add and use run_command_strbuf](http://lore.kernel.org/bpf/20230110222003.1591436-1-irogers@google.com/) - - It is commonly useful to run a command using "/bin/sh -c" (like popen) - and to place the output in a string. Move strbuf to libapi, add a new - run_command that places output in a strbuf, then use it in help and - llvm in perf. Some small strbuf efficiency improvements are - included. Whilst adding a new function should increase lines-of-code, - by sharing two similar usages in perf llvm and perf help, the overall - lines-of-code is moderately reduced. - -* [v1: bpf-next: bpftool: Add missing quotes to libbpf bootstrap submake vars](http://lore.kernel.org/bpf/20230110014504.3120711-1-james.hilliard1@gmail.com/) - - When passing compiler variables like CC=$(HOSTCC) to a submake - we must ensure the variable is quoted in order to handle cases - where $(HOSTCC) may be multiple binaries. - - For example when using ccache $HOSTCC may be: - "/usr/bin/ccache /usr/bin/gcc" - - If we pass CC without quotes like CC=$(HOSTCC) only the first - "/usr/bin/ccache" part will be assigned to the CC variable which - will cause an error due to dropping the "/usr/bin/gcc" part of - the variable in the submake invocation. - -* [v1: Assume libbpf 1.0 in build](http://lore.kernel.org/bpf/20230109203424.1157561-1-irogers@google.com/) - - Rather than build a binary that would fail at runtime it is - preferrential just to build libbpf statically and link against - that. The static version is in the kernel tools tree and newer than 1.0. - - These patches change the libbpf test to only pass when at least - version 1.0 is installed, then remove the conditional build and feature logic. - -* [v1: bpf-next: bpf: Do not allow to load sleepable BPF_TRACE_RAW_TP program](http://lore.kernel.org/bpf/20230109143716.2332415-1-jolsa@kernel.org/) - - Currently we allow to load any tracing program as sleepable, but BPF_TRACE_RAW_TP can't sleep. Making the check explicit for tracing programs attach types, so sleepable BPF_TRACE_RAW_TP will fail to load. - - Updating the verifier error to mention iter programs as well. - -* [v1: libbpf: resolve kernel function name optimization for kprobe](http://lore.kernel.org/bpf/20230109094247.1464856-1-imagedong@tencent.com/) - - The function name in kernel may be changed by the compiler. For example, - the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. - - This kind optimization can happen in any kernel function. Therefor, we should conside this case. - -* [Fwd: v1: bpf: scripts: Exclude Rust CUs with pahole](http://lore.kernel.org/bpf/0ca4ad02-af27-0d1f-8750-1ff6b34e8d2a@gmail.com/) - - I see, I was making a dependency on `auto.conf` in `pahole-flags.sh` but - the former gets generated after the latter is called, so that's the - reason behind the `grep` errors. Sent a new version of the patch. - -### 周边技术动态 - -#### Qemu - -* [v7: hw/riscv: clear kernel_entry high bits with 32bit CPUs](http://lore.kernel.org/qemu-devel/20230113171805.470252-1-dbarboza@ventanamicro.com/) - - In this version I followed Bin Meng's suggestion and reverted patch 1 - back from what it was in the v5, acks included, and added a new patch - (3) to fix the problem detected with the Xvisor use case. I believe this - reflects that there is nothing particularly wrong with what we - did in the v5 patch and we're going an extra mile to fix what, at first - glance, is a bug somewhere else. - -* [v5: riscv: Allow user to set the satp mode](http://lore.kernel.org/qemu-devel/20230113103453.42776-1-alexghiti@rivosinc.com/) - - This introduces new properties to allow the user to set the satp mode, - see patch 1 for full syntax. - -* [v6: hw/riscv: consolidate kernel init in riscv_load_kernel()](http://lore.kernel.org/qemu-devel/20230112223444.484879-1-dbarboza@ventanamicro.com/) - - The first 9 patches are already available in riscv-to-apply.next. - - The only change made was in patch 10 where we're now handling the case - where load_elf_ram_sym is padding the resulting kernel_entry with 1s for - 32 bits. Patch 11 is unchanged. - -* [v1: target/riscv: Use TARGET_FMT_lx for env->mhartid](http://lore.kernel.org/qemu-devel/20230109152655.340114-1-bmeng@tinylab.org/) - - env->mhartid is currently casted to long before printed, which drops - the high 32-bit for rv64 on 32-bit host. Use TARGET_FMT_lx instead. - -## 20230109:第 28 期 - -### 内核动态 - -#### RISC-V 架构支持 - -* [v6: -next: riscv: Optimize function trace](http://lore.kernel.org/linux-riscv/20230107133549.4192639-1-guoren@kernel.org/) - - The previous ftrace detour implementation fc76b8b8011 ("riscv: Using - PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems. - - Patches 1,2,3 fixup above problems. Patches 4,5,6,7 are the features based on reduced detour code - patch, we include them in the series for test and maintenance. - -* [v13: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230107113838.3969149-1-guoren@kernel.org/) - - The patches convert riscv to use the generic entry infrastructure from - kernel/entry/*. Some optimization for entry.S with new .macro and merge - ret_from_kernel_thread into ret_from_fork. - -* [v6: RISC-V non-coherent function pointer based cache management operations + non-coherent DMA support for AX45MP](http://lore.kernel.org/linux-riscv/20230106185526.260163-1-prabhakar.mahadev-lad.rj@bp.renesas.com/) - - RISC-V non-coherent function pointer based cache management operations: - - This v6 version of the patch series add support to use function pointers for CMO - and switches the current CMO implementations for zicbom and T-HEAD to use function - pointers. - - non-coherent DMA support for AX45MP: - - On the Andes AX45MP core, cache coherency is a specification option so it - may not be supported. In this case DMA will fail. To get around with this - issue this patch series does the below: - -* [v1: MAINTAINERS: add an IRC entry for RISC-V](http://lore.kernel.org/linux-riscv/20230106125344.1685266-1-conor@kernel.org/) - - I remember being told "Just ping me on IRC" about patches, but googling - at the time was not helpful. #riscv on libera is not linux specific, - but a bunch of contributors etc do hang out there. - Add a link to the maintainers entry to help others find it in the future! - -* [v1: riscv: Introduce system suspend support](http://lore.kernel.org/linux-riscv/20230106113216.443057-1-ajones@ventanamicro.com/) - - Booting with an OpenSBI including the RFC series[1] implementing the - draft proposal for SBI system suspend[2] we can add system support to - Linux. This support implements "suspend-to-RAM", which means when a - kernel is built with CONFIG_SUSPEND 'echo mem > /sys/power/state' will - initiate a suspension. - - This has only been tested on QEMU using the OpenSBI system suspend - test. The test just waits 5 seconds and then resumes. To truly use - system suspend a platform must have a low-level firmware implementation - and provide at least one wake-up event, such as from a wakeup-capable - RTC alarm, to resume. - -* [v1: RISC-V Hibernation Support](http://lore.kernel.org/linux-riscv/20230106060535.104321-1-jeeheng.sia@starfivetech.com/) - - This series adds RISC-V Hibernation/suspend to disk support. - Low level Arch functions were created to support hibernation. - swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write - cpu state onto the stack, then calling swsusp_save() to save the memory - image. - -* [v3: Add Ethernet driver for StarFive JH7110 SoC](http://lore.kernel.org/linux-riscv/20230106030001.1952-1-yanhong.wang@starfivetech.com/) - - This series adds ethernet support for the StarFive JH7110 RISC-V SoC. The series - includes MAC driver. The MAC version is dwmac-5.20 (from Synopsys DesignWare). - For more information and support, you can visit RVspace wiki[1]. - - -* [v1: Support using physical addresses for RISC-V CMO](http://lore.kernel.org/linux-riscv/20230104074146.578485-1-uwu@icenowy.me/) - - Despite the official Zicbom extension only supports virtual addresses, - some vendor-specific extensions, e.g. Xtheadcmo, supports using directly - the physical address. - - This patchset tries to provide a CMO alternative macro variant that is - feed with both VA and PA (and the used one can be picked at runtime), - implement it with PA on T-Head cores, and utilize this variant for some - situations that PA is easily accessible. - -* [v4: arch: rename all internal names __xchg to __arch_xchg](http://lore.kernel.org/linux-riscv/20230105095426.2163354-1-andrzej.hajda@intel.com/) - - __xchg will be used for non-atomic xchg macro. - -* [v2: bpf-next: bpf, x86: Simplify the parsing logic of structure parameters](http://lore.kernel.org/linux-riscv/20230105035026.3091988-1-pulehui@huaweicloud.com/) - - Extra_nregs of structure parameters and nr_args can be - added directly at the beginning, and using a flip flag - to identifiy structure parameters. Meantime, renaming - some variables to make them more sense. - -* [v2: riscv: Move call to init_cpu_topology() to later initialization stage](http://lore.kernel.org/linux-riscv/20230105033705.3946130-1-leyfoon.tan@starfivetech.com/) - - If "capacity-dmips-mhz" is present in a CPU DT node, - topology_parse_cpu_capacity() will fail to allocate memory. - ARM64, with which this code path is shared, does not call - topology_parse_cpu_capacity() until later in boot where memory allocation - is available. - - Move init_cpu_topology(), which calls topology_parse_cpu_capacity(), to a - later initialization stage, to match ARM64. - -* [v4: arch_topology: Build cacheinfo from primary CPU](http://lore.kernel.org/linux-riscv/20230104183033.755668-1-pierre.gondois@arm.com/) - -* [v1: dt-bindings: Add a cpu-capacity property for RISC-V](http://lore.kernel.org/linux-riscv/20230104180513.1379453-1-conor@kernel.org/) - - Ever since RISC-V starting using generic arch topology code, the code - paths for cpu-capacity have been there but there's no binding defined to - actually convey the information. Defining the same property as used on - arm seems to be the only logical thing to do, so do it. - -* [v1: Upstream kvx Linux port](http://lore.kernel.org/linux-riscv/20230103164359.24347-1-ysionneau@kalray.eu/) - - This patch series adds support for the kv3-1 CPU architecture of the kvx family - found in the Coolidge (aka MPPA3-80) SoC of Kalray. - - This is an RFC, since kvx support is not yet upstreamed into gcc/binutils, - therefore this patch series cannot be merged into Linux for now. - -* [v2: Linux RISC-V AIA Support](http://lore.kernel.org/linux-riscv/20230103141409.772298-1-apatel@ventanamicro.com/) - - The RISC-V AIA specification is now frozen as-per the RISC-V international - process. The latest frozen specifcation can be found at: - https://github.com/riscv/riscv-aia/releases/download/1.0-RC1/riscv-interrupts-1.0-RC1.pdf - - This series adds required Linux irqchip drivers for AIA and it depends on - the recent "RISC-V IPI Improvements". - (Refer, https://lore.kernel.org/lkml/20221101143400.690000-1-apatel@ventanamicro.com/t/) - - To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). - -* [v16: RISC-V IPI Improvements](http://lore.kernel.org/linux-riscv/20230103141221.772261-1-apatel@ventanamicro.com/) - - This series aims to improve IPI support in Linux RISC-V in following ways: - 1) Treat IPIs as normal per-CPU interrupts instead of having custom RISC-V - specific hooks. This also makes Linux RISC-V IPI support aligned with - other architectures. - 2) Remote TLB flushes and icache flushes should prefer local IPIs instead - of SBI calls whenever we have specialized hardware (such as RISC-V AIA - IMSIC and RISC-V SWI) which allows S-mode software to directly inject - IPIs without any assistance from M-mode runtime firmware. - -* [v6: Improve CLOCK_EVT_FEAT_C3STOP feature setting](http://lore.kernel.org/linux-riscv/20230103141102.772228-1-apatel@ventanamicro.com/) - - This series improves the RISC-V timer driver to set CLOCK_EVT_FEAT_C3STOP - feature based on RISC-V platform capabilities. - -* [v2: bpf-next: Support bpf trampoline for RV64](http://lore.kernel.org/linux-riscv/20230103090756.1993820-1-pulehui@huaweicloud.com/) - - BPF trampoline is the critical infrastructure of the bpf - subsystem, acting as a mediator between kernel functions - and BPF programs. Numerous important features, such as - using ebpf program for zero overhead kernel introspection, - rely on this key component. We can't wait to support bpf - trampoline on RV64. The implementation of bpf trampoline - was closely to x86 and arm64 for future development. - - -* [Patch "riscv: add support for TIF_NOTIFY_SIGNAL" has been added to the 5.10-stable tree](http://lore.kernel.org/linux-riscv/1672731390244212@kroah.com/) - - This is a note to let you know that I've just added the patch titled - - riscv: add support for TIF_NOTIFY_SIGNAL - - to the 5.10-stable tree which can be found at: - http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary - -* [v1: dt-bindings: Introduce dual-link panels & panel-vendors](http://lore.kernel.org/linux-riscv/20230103064615.5311-1-a-bhatia1@ti.com/) - - Microtips Technology Solutions USA, and Lincoln Technology Solutions are - 2 display panel vendors, and the first 2 patches add their vendor - prefixes. - -* [v12: -next: riscv: Add GENERIC_ENTRY support](http://lore.kernel.org/linux-riscv/20230103033531.2011112-1-guoren@kernel.org/) - - The patches convert riscv to use the generic entry infrastructure from - kernel/entry/*. Some optimization for entry.S with new .macro and merge - ret_from_kernel_thread into ret_from_fork. - -* [v1: Temperature sensor support for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20230103013145.9570-1-hal.feng@starfivetech.com/) - - This patch series adds temperature sensor support for StarFive JH7110 SoC. - The last two patches depend on series - -* [v2: riscv: dts: renesas: rzfive-smarc-som: Enable OSTM nodes](http://lore.kernel.org/linux-riscv/20230102222233.274021-1-prabhakar.mahadev-lad.rj@bp.renesas.com/) - - Enable OSTM{1,2} nodes on RZ/Five SMARC SoM. - - Note, OSTM{1,2} nodes are enabled in the RZ/G2UL SMARC SoM DTSI [0] hence - deleting the disabled nodes from RZ/Five SMARC SoM DTSI enables it here - too as we include in RZ/Five SMARC SoM DTSI. - -* [v3: dt-bindings: riscv: add SBI PMU event mappings](http://lore.kernel.org/linux-riscv/20230102165551.1564960-1-conor@kernel.org/) - - The SBI PMU extension requires a firmware to be aware of the event to - counter/mhpmevent mappings supported by the hardware. OpenSBI may use - DeviceTree to describe the PMU mappings. This binding is currently - described in markdown in OpenSBI (since v1.0 in Dec 2021) & used by QEMU - since v7.2.0. - - Import the binding for use while validating dtb dumps from QEMU and - upcoming hardware (eg JH7110 SoC) that will make use of the event - mapping. - -* [v1: riscv, kprobes: Stricter c.jr/c.jalr decoding](http://lore.kernel.org/linux-riscv/20230102160748.1307289-1-bjorn@kernel.org/) - - In the compressed instruction extension, c.jr, c.jalr, c.mv, and c.add - is encoded the following way (each instruction is 16b): - - 100 0 rs1[4:0]!=0 00000 10: c.jr - 100 1 rs1[4:0]!=0 00000 10: c.jalr - 100 0 rd[4:0]!=0 rs2[4:0]!=0 10: c.mv - 100 1 rd[4:0]!=0 rs2[4:0]!=0 10: c.add - - The following logic is used to decode c.jr and c.jalr: - - insn & 0xf007 == 0x8002 => instruction is an c.jr - insn & 0xf007 == 0x9002 => instruction is an c.jalr - - When 0xf007 is used to mask the instruction, c.mv can be incorrectly - decoded as c.jr, and c.add as c.jalr. - - Correct the decoding by changing the mask from 0xf007 to 0xf07f. - -* [v1: RISC-V: define RUNTIME_DISCARD_EXIT](http://lore.kernel.org/linux-riscv/20230102124936.1363533-1-conor@kernel.org/) - - Masahiro noted: - > arch/riscv/kernel/vmlinux.lds.S clearly says: - > /* we have to discard exit text and such at runtime, not link time */ - > [...] - > so riscv should define RUNTIME_DISCARD_EXIT like x86, arm64. - - As things stand, no ill comes of this - but if "DISCARDS" was to be - re-ordered in the linker script, linking would fail. - Do as suggested by Masahiro and define RUNTIME_DISCARD_EXIT. - -#### 进程调度 - -* [v1: sched/topology: Add __init for sched_init_domains](http://lore.kernel.org/lkml/20230105014943.9857-1-huangbing775@126.com/) - - sched_init_domains is only used in initialization - -#### 内存管理 - -* [v7: DEPT(Dependency Tracker)](http://lore.kernel.org/linux-mm/1673235231-30302-1-git-send-email-byungchul.park@lge.com/) - - Just for those who want to try the latest version of DEPT. - -* [v1: mm-unstable: selftests/mm: convert missing vm->mm changes](http://lore.kernel.org/linux-mm/20230107230643.252273-1-sj@kernel.org/) - - Commit 6b380799d251 ("selftests/vm: rename selftests/vm to - selftests/mm") in mm-unstable is missing some files that need to be - updated for the renaming. This commit adds the changes. - -* [v4: iov_iter: Add extraction helpers](http://lore.kernel.org/linux-mm/167305160937.1521586.133299343565358971.stgit@warthog.procyon.org.uk/) - - Here are patches clean up some use of READ/WRITE and ITER_SOURCE/DEST, - patches to provide support for extracting pages from an iov_iter and a - patch to use the primary extraction function in the block layer bio code if - you could take a look? - -* [v3: Pages not released from memblock to the buddy allocator](http://lore.kernel.org/linux-mm/01010185892dd125-7738e4af-55c6-43b6-9cd9-d52dfea959d9-000000@us-west-2.amazonses.com/) - -* [v1: mm-unstable: mm: introduce folio_is_pfmemalloc](http://lore.kernel.org/linux-mm/20230106215251.599222-1-sidhartha.kumar@oracle.com/) - - Add a folio equivalent for page_is_pfmemalloc. This removes two instances - of page_is_pfmemalloc(folio_page(folio, 0)) so the folio can be used - directly. - -* [v1: add folio_headpage() macro](http://lore.kernel.org/linux-mm/20230106174028.151384-1-sj@kernel.org/) - - The standard idiom for getting head page of a given folio is - '&folio->page'. It is efficient and safe even if the folio is NULL, - because the offset of page field in folio is zero. However, it makes - the code not that easy to understand at the first glance, especially the - NULL safety. Also, sometimes people forget the idiom and use - 'folio_page(folio, 0)' instead. To make it easier to read and remember, - add a new macro function called 'folio_headpage()' with the NULL case - explanation. Then, replace the 'folio_page(folio, 0)' calls with - 'folio_headpage(folio)'. - -* [v1: shmem: optimize shmem_huge_enabled() and shmem_is_huge() when !CONFIG_TRANSPARENT_HUGEPAGE](http://lore.kernel.org/linux-mm/20230105230417.966438-1-ydroneaud@opteya.com/) - - When CONFIG_TRANSPARENT_HUGEPAGE is not set, shmem_is_huge() is not needed - outside of shmem.c. - -* [v1: memremap: Replace 0-length array with flexible array](http://lore.kernel.org/linux-mm/20230105220151.never.343-kees@kernel.org/) - - Zero-length arrays are deprecated[1]. Replace struct ethtool_rxnfc's - "rule_locs" 0-length array with a flexible array. Detected with GCC 13, - using -fstrict-flex-arrays=3 - -* [v2: Split netmem from struct page](http://lore.kernel.org/linux-mm/20230105214631.3939268-1-willy@infradead.org/) - - The MM subsystem is trying to reduce struct page to a single pointer. - The first step towards that is splitting struct page by its individual - users, as has already been done with folio and slab. This patchset does - that for netmem which is used for page pools. - - There are some relatively significant reductions in kernel text size - from these changes. They don't appear to affect performance at all, - but it's nice to save a bit of memory. - -* [v1: Based on latest mm-unstable (85b44c25cd1e).](http://lore.kernel.org/linux-mm/20230105101844.1893104-1-jthoughton@google.com/) - - This series introduces the concept of HugeTLB high-granularity mapping - (HGM). This series teaches HugeTLB how to map HugeTLB pages at - high-granularity, similar to how THPs can be PTE-mapped. - -#### 文件系统 - -* [v6: Turn iomap_page_ops into iomap_folio_ops](http://lore.kernel.org/linux-fsdevel/20230108194034.1444764-1-agruenba@redhat.com/) - - Here's an updated version of this patch queue. Changes since v5 [*]: - - * A new iomap-internal __iomap_get_folio() helper was added. - - * The previous iomap-internal iomap_put_folio() helper was renamed to - __iomap_put_folio() to mirror __iomap_get_folio(). - - * The comment describing struct iomap_folio_ops was still referring to - pages instead of folios in two places. - - Is this good enough for iomap-for-next now, please? - -* [v3: pipe: use __pipe_{lock,unlock} instead of spinlock](http://lore.kernel.org/linux-fsdevel/20230107012324.30698-1-zhanghongchen@loongson.cn/) - - Use spinlock in pipe_read/write cost too much time,IMO - pipe->{head,tail} can be protected by __pipe_{lock,unlock}. - On the other hand, we can use __pipe_{lock,unlock} to protect - the pipe->{head,tail} in pipe_resize_ring and - post_one_notification. - -* [v1: erofs: support page cache sharing between EROFS images in fscache mode](http://lore.kernel.org/linux-fsdevel/20230106125330.55529-1-jefflexu@linux.alibaba.com/) - - Erofs already supports chunk deduplication across different images to - minimize disk usage since v6.1. - -* [v2: filelock: move file locking definitions to separate header file](http://lore.kernel.org/linux-fsdevel/20230105211937.1572384-1-jlayton@kernel.org/) - - The file locking definitions have lived in fs.h since the dawn of time, - but they are only used by a small subset of the source files that - include it. - - Move the file locking definitions to a new header file, and add the - appropriate #include directives to the source files that need them. By - doing this we trim down fs.h a bit and limit the amount of rebuilding - that has to be done when we make changes to the file locking APIs. - -* [v1: filesystems: Simplify if conditional statements](http://lore.kernel.org/linux-fsdevel/20230105061831.3516-1-zeming@nfschina.com/) - - When the * p pointer is null, assign a value to res; otherwise, do not - execute the content in the conditional statement block. - -* [v5: Convert to filemap_get_folios_tag()](http://lore.kernel.org/linux-fsdevel/20230104211448.4804-1-vishal.moola@gmail.com/) - - This patch series replaces find_get_pages_range_tag() with - filemap_get_folios_tag(). This also allows the removal of multiple - calls to compound_head() throughout. - It also makes a good chunk of the straightforward conversions to folios, - and takes the opportunity to introduce a function that grabs a folio - from the pagecache. - - I've run xfstests on xfs, btrfs, ext4, f2fs, and nilfs2, but more testing may - be beneficial. The page-writeback and filemap changes implicitly work. Still - looking for review of cifs, gfs2, and ext4. - -* [v1: xfstests: add fuse support](http://lore.kernel.org/linux-fsdevel/20230104193932.984531-1-jakobunt@gmail.com/) - - This allows using any fuse filesystem that can be mounted with - - mount -t fuse.$FUSE_SUBTYP ... - -* [v1: fs: don't allocate blocks beyond EOF from __mpage_writepage](http://lore.kernel.org/linux-fsdevel/20230103104430.27749-1-jack@suse.cz/) - - When __mpage_writepage() is called for a page beyond EOF, it will go and - allocate all blocks underlying the page. This is not only unnecessary - but this way blocks can get leaked (e.g. if a page beyond EOF is marked - dirty but in the end write fails and i_size is not extended). - -* [v1: mm-unstable: mm/nommu: don't use VM_MAYSHARE for MAP_PRIVATE mappings](http://lore.kernel.org/linux-fsdevel/20230102160856.500584-1-david@redhat.com/) - - Trying to reduce the confusion around VM_SHARED and VM_MAYSHARE first - requires !CONFIG_MMU to stop using VM_MAYSHARE for MAP_PRIVATE mappings. - CONFIG_MMU only sets VM_MAYSHARE for MAP_SHARED mappings. - - This paves the way for further VM_MAYSHARE and VM_SHARED cleanups: for - example, renaming VM_MAYSHARED to VM_MAP_SHARED to make it cleaner what - is actually means. - - Let's first get the weird case out of the way and not use VM_MAYSHARE in - MAP_PRIVATE mappings, using a new VM_MAYOVERLAY flag instead. - -* [v3: Add new open(2) flag - O_EMPTY_PATH](http://lore.kernel.org/linux-fsdevel/20230101153752.20165-1-ahamza@ixsystems.com/) - - This patch adds a new flag O_EMPTY_PATH that allows openat and open - system calls to open a file referenced by fd if the path is empty, - and it is very similar to the FreeBSD O_EMPTY_PATH flag. This can be - beneficial in some cases since it would avoid having to grant /proc - -* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) - - This removes the dependency on interrupts to wake up task. Set task - state as TASK_RUNNING, if need_resched() returns true, - while polling for IO completion. - Earlier, polling task used to sleep, relying on interrupt to wake it up. - This made some IO take very long when interrupt-coalescing is enabled in - NVMe. - -#### 网络设备 - -* [v3: net-next: add PLCA RS support and onsemi NCN26000](http://lore.kernel.org/netdev/cover.1673222807.git.piergiorgio.beruto@gmail.com/) - - This patchset adds support for getting/setting the Physical Layer - Collision Avoidace (PLCA) Reconciliation Sublayer (RS) configuration and - status on Ethernet PHYs that supports it. - -* [v1: RFC: wifi: rtw88: Validate the eFuse structs](http://lore.kernel.org/netdev/20230108213114.547135-1-martin.blumenstingl@googlemail.com/) - - Add static assertions for the PCIe/USB offsets inside the eFuse structs - to ensure that the compiler doesn't add padding anywhere (relevant) - inside the structs. - -* [v1: net: Revert "r8169: disable detection of chip version 36"](http://lore.kernel.org/netdev/42e9674c-d5d0-a65a-f578-e5c74f244739@gmail.com/) - - This chip version seems to be very rare, but it exits in consumer - devices, see linked report. - - https://stackoverflow.com/questions/75049473/cant-setup-a-wired-network-in-archlinux-fresh-install - -* [v11: net-next: vmxnet3: Add XDP support.](http://lore.kernel.org/netdev/20230108181826.88882-1-u9012063@gmail.com/) - - The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT. - -* [v1: net: selftests/net: Isolate l2_tos_ttl_inherit.sh in its own netns.](http://lore.kernel.org/netdev/cover.1673191942.git.gnault@redhat.com/) - - l2_tos_ttl_inherit.sh uses a veth pair to run its tests, but only one - of the veth interfaces runs in a dedicated netns. The other one remains - in the initial namespace where the existing network configuration can - interfere with the setup used for the tests. - - Isolate both veth devices in their own netns and ensure everything gets - cleaned up when the script exits. - -* [v1: net: ena: initialize dim_sample](http://lore.kernel.org/netdev/20230108143843.2987732-1-trix@redhat.com/) - - clang static analysis reports this problem - drivers/net/ethernet/amazon/ena/ena_netdev.c:1821:2: warning: Passed-by-value struct - argument contains uninitialized data (e.g., field: 'comp_ctr') [core.CallAndMessage] - net_dim(&ena_napi->dim, dim_sample); - ^ - - net_dim can call dim_calc_stats() which uses the comp_ctr element, - so it must be initialized. - -* [v1: net-next: Add devlink support to ena](http://lore.kernel.org/netdev/20230108103533.10104-1-darinzon@amazon.com/) - - This patchset adds devlink support to the ena driver. - -* [v4: net-next: mv88e6xxx: Add MAB offload support](http://lore.kernel.org/netdev/20230108094849.1789162-1-netdev@kapio-technology.com/) - - This patch-set adds MAB [1] offload support in mv88e6xxx. - -* [v6: net-next: net: ngbe: Add ngbe mdio bus driver.](http://lore.kernel.org/netdev/20230108093903.27054-1-mengyuanlou@net-swift.com/) - - Add mdio bus register for ngbe. - The internal phy and external phy need to be handled separately. - Add phy changed event detection. - -* [v1: Add Auxiliary driver support](http://lore.kernel.org/netdev/20230108030208.26390-1-ajit.khaparde@broadcom.com/) - - Add auxiliary device driver for Broadcom devices. - The bnxt_en driver will register and initialize an aux device - if RDMA is enabled in the underlying device. - The bnxt_re driver will then probe and initialize the - RoCE interfaces with the infiniband stack. - - We got rid of the bnxt_en_ops which the bnxt_re driver used to - communicate with bnxt_en. - Similarly We have tried to clean up most of the bnxt_ulp_ops. - In most of the cases we used the functions and entry points provided - by the auxiliary bus driver framework. - And now these are the minimal functions needed to support the functionality. - -* [v4: sock: add tracepoint for send recv length](http://lore.kernel.org/netdev/20230108025545.338-1-cuiyunhui@bytedance.com/) - - Add 2 tracepoints to monitor the tcp/udp traffic - of per process and per cgroup. - - Regarding monitoring the tcp/udp traffic of each process, there are two - existing solutions, the first one is https://www.atoptool.nl/netatop.php. - The second is via kprobe/kretprobe. - - Netatop solution is implemented by registering the hook function at the - hook point provided by the netfilter framework. - - These hook functions may be in the soft interrupt context and cannot - directly obtain the pid. Some data structures are added to bind packets - and processes. For example, struct taskinfobucket, struct taskinfo ... - - Every time the process sends and receives packets it needs multiple - hashmaps,resulting in low performance and it has the problem fo inaccurate - tcp/udp traffic statistics(for example: multiple threads share sockets). - - We can obtain the information with kretprobe, but as we know, kprobe gets - the result by trappig in an exception, which loses performance compared - to tracepoint. - -* [v1: mt76: add wed reset callbacks](http://lore.kernel.org/netdev/cover.1673103214.git.lorenzo@kernel.org/) - - Introduce Wireless Ethernet Dispatcher reset callbacks in order to complete - reset requested by ethernet NIC. - - This patch is based on the following mtk_eth_soc series: - https://lore.kernel.org/netdev/cover.1673102767.git.lorenzo@kernel.org/T/#m830c78ce34a4383ae1dedc5349bed19a74dbf4af - -* [v3: net-next: net: ethernet: mtk_wed: introduce reset support](http://lore.kernel.org/netdev/cover.1673102767.git.lorenzo@kernel.org/) - - Introduce proper reset integration between ethernet and wlan drivers in order - to schedule wlan driver reset when ethernet/wed driver is resetting. - Introduce mtk_hw_reset_monitor work in order to detect possible DMA hangs. - -* [v1: net-next: net: ethernet: mtk_wed: get rid of queue lock for rx queue](http://lore.kernel.org/netdev/bff65ff7f9a269b8a066cae0095b798ad5b37065.1673102426.git.lorenzo@kernel.org/) - - mtk_wed_wo_queue_rx_clean and mtk_wed_wo_queue_refill routines can't run - concurrently so get rid of spinlock for rx queues. - -* [[net PATCH] octeontx2-pf: Use GFP_ATOMIC in atomic context](http://lore.kernel.org/netdev/20230107044139.25787-1-gakula@marvell.com/) - - Use GFP_ATOMIC flag instead of GFP_KERNEL while allocating memory - in atomic context. - -* [v9: net-next: virtio/vsock: replace virtio_vsock_pkt with sk_buff](http://lore.kernel.org/netdev/20230107002937.899605-1-bobby.eshleman@bytedance.com/) - - This commit changes virtio/vsock to use sk_buff instead of - virtio_vsock_pkt. Beyond better conforming to other net code, using - sk_buff allows vsock to use sk_buff-dependent features in the future - (such as sockmap) and improves throughput. - -* [v1: net: lan966x: Allow to add rules in TCAM even if not enabled](http://lore.kernel.org/netdev/20230106201507.2206113-1-horatiu.vultur@microchip.com/) - - The blamed commit implemented the vcap_operations to allow to add an - entry in the TCAM. One of the callbacks is to validate the supported - keysets. If the TCAM lookup was not enabled, then this will return - failure so no entries could be added. - This doesn't make much sense, as you can enable at a later point the - TCAM. Therefore change it such to allow entries in TCAM even it is not - enabled. - -* [v1: net: ipv6: prevent only DAD and RS sending for IFF_NO_ADDRCONF](http://lore.kernel.org/netdev/ab8f8ce5b99b658483214f3a9887c0c32efcca80.1673023907.git.lucien.xin@gmail.com/) - - Currently IFF_NO_ADDRCONF is used to prevent all ipv6 addrconf for the - slave ports of team, bonding and failover devices and it means no ipv6 - packets can be sent out through these slave ports. However, for team - device, "nsna_ping" link_watch requires ipv6 addrconf. Otherwise, the - link will be marked failure. - -* [v1: Let iommufd charge IOPTE allocations to the memory cgroup](http://lore.kernel.org/netdev/0-v1-6e8b3997c46d+89e-iommu_map_gfp_jgg@nvidia.com/) - - iommufd follows the same design as KVM and uses memory cgroups to limit - the amount of kernel memory a iommufd file descriptor can pin down. The - various internal data structures already use GFP_KERNEL_ACCOUNT to charge - its own memory. - - However, one of the biggest consumers of kernel memory is the IOPTEs - stored under the iommu_domain and these allocations are not tracked. - -* [v3: net-next: net: wwan: t7xx: fw flashing & coredump support](http://lore.kernel.org/netdev/cover.1673016069.git.m.chetan.kumar@linux.intel.com/) - - This patch series brings-in the support for FM350 wwan device firmware - flashing & coredump collection using devlink interface. - -* [v1: r8152: allow firmwares with NCM support](http://lore.kernel.org/netdev/20230106160739.100708-1-bjorn@mork.no/) - - Some device and firmware combinations with NCM support will - end up using the cdc_ncm driver by default. This is sub- - optimal for the same reasons we've previously accepted the - blacklist hack in cdc_ether. - - The recent support for subclassing the generic USB device - driver allows us to create a very slim driver with the same - functionality. This patch set uses that to implement a - device specific configuration default which is independent - of any USB interface drivers. This means that it works - equally whether the device initially ends up in NCM or ECM - mode, without depending on any code in the respective class - drivers. - -* [v1: net: gro: take care of DODGY packets](http://lore.kernel.org/netdev/20230106142523.1234476-1-edumazet@google.com/) - - Jaroslav reported a recent throughput regression with virtio_net - caused by blamed commit. - - It is unclear if DODGY GSO packets coming from user space - can be accepted by GRO engine in the future with minimal - changes, and if there is any expected gain from it. - - In the meantime, make sure to detect and flush DODGY packets. - -* [v1: net: lan966x: check for ptp to be enabled in lan966x_ptp_deinit()](http://lore.kernel.org/netdev/20230106134830.333494-1-clement.leger@bootlin.com/) - - If ptp was not enabled due to missing IRQ for instance, - lan966x_ptp_deinit() will dereference NULL pointers. - -* [v2: net: ipa: correct IPA v4.7 IMEM offset](http://lore.kernel.org/netdev/20230106132502.3307220-1-elder@linaro.org/) - - Commit b310de784bacd ("net: ipa: add IPA v4.7 support") was merged - despite an unresolved comment made by Konrad Dybcio. Konrad - observed that the IMEM region specified for IPA v4.7 did not match - that used downstream for the SM7225 SoC. In "lagoon.dtsi" present - in a Sony Xperia source tree, a ipa_smmu_ap node was defined with a - "qcom,additional-mapping" property that defined the IPA IMEM area - starting at offset 0x146a8000 (not 0x146a9000 that was committed). - - The IPA v4.7 target system used for testing uses the SM7225 SoC, so - we'll adhere what the downstream code specifies is the address of - the IMEM region used for IPA. - -* [v2: brcmfmac: Prefer DT board type over DMI board type](http://lore.kernel.org/netdev/20230106131905.81854-1-iivanov@suse.de/) - - The introduction of support for Apple board types inadvertently changed - the precedence order, causing hybrid SMBIOS+DT platforms to look up the - firmware using the DMI information instead of the device tree compatible - to generate the board type. Revert back to the old behavior, - as affected platforms use firmwares named after the DT compatible. - -* [v1: wpan-next: ieee802154: Beaconing support](http://lore.kernel.org/netdev/20230106113129.694750-1-miquel.raynal@bootlin.com/) - - Scanning being now supported, we can eg. play with hwsim to verify - everything works as soon as this series including beaconing support gets - merged. - -* [v1: ath10k USB support (QCA9377)](http://lore.kernel.org/netdev/20230106105853.3484381-1-alexander.stein@ew.tq-group.com/) - - apparently there have been several tries for adding ath10k USB support, see - [1] & [2]. There are probably even more. - This series is a first step for supporting my actual device, - a Silex SX-USBAC. This is a Bluetooth & WiFi combo device. - - I picked commit 131da4f5a5b9 ("HACK: ath10k: add start_once support") from - [2] and extracted the ath10k_hw_params_list entry from [3]. - Since v5.9, the base of [3], other required changes have already been - integrated. - For now I tested a very simple STA mode usage profile, using - wpa_supplicant on a WPA interface. AP is untested, module unloading not - supported, probably affected by the firmware start/stop patch 1 adds a - workaround. - - Reading the other, older series, apparently a lot has been merged already, - but I do not know what is still missing fpr proper USB support. - I would like to have a discussion for how to add support so the device is - at least probing and can be used rudimentary. - -* [v4: net-next: usbnet: optimize usbnet_bh() to reduce CPU load](http://lore.kernel.org/netdev/20230106104950.22741-1-lsahn@ooseel.net/) - - The current source pushes skb into dev-done queue by calling - skb_dequeue_tail() and then pop it by skb_dequeue() to branch to - rx_cleanup state for freeing urb/skb in usbnet_bh(). - -* [v1: net-next: Add IP_LOCAL_PORT_RANGE socket option](http://lore.kernel.org/netdev/20221221-sockopt-port-range-v1-0-e2b094b60ffd@cloudflare.com/) - - This patch set is a follow up to the "How to share IPv4 addresses by - partitioning the port space" talk given at LPC 2022 [1]. - -#### 安全增强 - -* [v6: arm64: dts: qcom: sm6125: UFS and xiaomi-laurel-sprout support](http://lore.kernel.org/linux-hardening/20230108195336.388349-1-they@mint.lgbt/) - - Introduce Universal Flash Storage support on SM6125 and add support for the Xiaomi Mi A3 based on the former platform. Uses the name xiaomi-laurel-sprout instead of the official codename (laurel_sprout) due to naming limitations in the kernel. - -* [v1: kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST](http://lore.kernel.org/linux-hardening/20230107040203.never.112-kees@kernel.org/) - - Since the long memcpy tests may stall a system for tens of seconds - in virtualized architecture environments, split those tests off under - CONFIG_MEMCPY_SLOW_KUNIT_TEST so they can be separately disabled. - -* [v2: firmware: coreboot: Check size of table entry and split memcpy](http://lore.kernel.org/linux-hardening/20230107031406.gonna.761-kees@kernel.org/) - - The memcpy() of the data following a coreboot_table_entry couldn't - be evaluated by the compiler under CONFIG_FORTIFY_SOURCE. To make it - easier to reason about, add an explicit flexible array member to struct - coreboot_device so the entire entry can be copied at once. Additionally, - validate the sizes before copying. Avoids this run-time false positive - warning: - - memcpy: detected field-spanning write (size 168) of single field "&device->entry" at drivers/firmware/google/coreboot_table.c:103 (size 8) - -* [v1: scsi: megaraid_sas: Add flexible array member for SGLs](http://lore.kernel.org/linux-hardening/20230106053153.never.999-kees@kernel.org/) - - struct MPI2_RAID_SCSI_IO_REQUEST ends with a single SGL, but expects to - copy multiple. Add a flexible array member so the compiler can reason - about the size of the memcpy(). This will avoid the run-time false - positive warning: - - memcpy: detected field-spanning write (size 128) of single field "&r1_cmd->io_request->SGL" at drivers/scsi/megaraid/megaraid_sas_fusion.c:3326 (size 16) - - This change results in no binary output differences. - -#### 异步 IO - -* [v1: liburing: Always enable CONFIG_NOLIBC if supported and deprecate --nolibc option](http://lore.kernel.org/io-uring/20230106155202.558533-1-ammar.faizi@intel.com/) - - This is an RFC patchset. It's already build-tested. - - Currently, the default liburing compilation uses libc as its dependency. - liburing doesn't depend on libc when it's compiled on x86-64, x86 - (32-bit), and aarch64. There is no benefit to having libc.so linked to - liburing.so on those architectures. - - Always enable CONFIG_NOLBIC if the arch is supported. If the - architecture is not supported, fallback to libc. - -* [v1: liburing: liburing micro-optimzation](http://lore.kernel.org/io-uring/20230106154259.556542-1-ammar.faizi@intel.com/) - - This series contains liburing micro-optimzation. There are two patches - in this series - -* [v1: io_uring: move 'poll_multi_queue' bool in io_ring_ctx](http://lore.kernel.org/io-uring/5c3b0571-ee3b-5bf1-50ce-a2009ee219d5@kernel.dk/) - - The cacheline section holding this variable has two gaps, where one is - caused by this bool not packing well with structs. This causes it to - blow into the next cacheline. Move the variable, shrinking io_ring_ctx - by a full cacheline in size. - -* [v1: io_uring/io-wq: free worker if task_work creation is canceled](http://lore.kernel.org/io-uring/1d287a8e-3c3f-4a8d-f6cc-8199b53ae886@kernel.dk/) - - If we cancel the task_work, the worker will never come into existance. - As this is the last reference to it, ensure that we get it freed - appropriately. - -#### Rust For Linux - -* [Fwd: v1: bpf: scripts: Exclude Rust CUs with pahole](http://lore.kernel.org/rust-for-linux/0ca4ad02-af27-0d1f-8750-1ff6b34e8d2a@gmail.com/) - - I see, I was making a dependency on `auto.conf` in `pahole-flags.sh` but - the former gets generated after the latter is called, so that's the - reason behind the `grep` errors. Sent a new version of the patch. - -* [v2: scripts: Exclude Rust CUs with pahole](http://lore.kernel.org/rust-for-linux/20230108021450.120791-1-yakoyoku@gmail.com/) - - Version 1.24 of pahole has the capability to exclude compilation units - (CUs) of specific languages. Rust, as of writing, is not currently - supported by pahole and if it's used with a build that has BTF debugging - enabled it results in malformed kernel and module binaries (see - Rust-for-Linux/linux#735). So it's better for pahole to exclude Rust - CUs until support for it arrives. - -* [v2: kbuild: rust: move rust/target.json to scripts/](http://lore.kernel.org/rust-for-linux/20230107094545.3384745-1-masahiroy@kernel.org/) - - scripts/ is a better place to generate files used treewide. - - With target.json moved to scripts/, you do not need to add target.json - to no-clean-files or MRPROPER_FILES. - - 'make clean' does not visit scripts/, but 'make mrproper' does. - -* [v1: scripts: store Makefiles in dictionary](http://lore.kernel.org/rust-for-linux/20230103210219.5690-1-apantykhin@gmail.com/) - -#### BPF - -* [v3: bpf-next: bpf: btf: limit logging of ignored BTF mismatches](http://lore.kernel.org/bpf/20230107025331.3240536-1-connoro@google.com/) - - Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF - mismatches are expected and module loading should proceed - anyway. Logging with pr_warn() on every one of these "benign" - mismatches creates unnecessary noise when many such modules are - loaded. Instead, handle this case with a single log warning that BTF - info may be unavailable. - - Mismatches also result in calls to __btf_verifier_log() via - __btf_verifier_log_type() or btf_verifier_log_member(), adding several - additional lines of logging per mismatched module. Add checks to these - paths to skip logging for module BTF mismatches in the "allow - mismatch" case. - - All existing logging behavior is preserved in the default - CONFIG_MODULE_ALLOW_BTF_MISMATCH=n case. - -* [v1: bpf-next: Annotate kfuncs with new __bpf_kfunc macro](http://lore.kernel.org/bpf/20230106195130.1216841-1-void@manifault.com/) - - BPF kfuncs are kernel functions that can be invoked by BPF programs. - kfuncs can be kernel functions which are also called elsewhere in the - main kernel (such as crash_kexec()), or may be functions that are only - meant to be used by BPF programs, such as bpf_task_acquire(), and which - are not called from anywhere else in the kernel. - - While thus far we haven't observed any issues such as kfuncs being - elided by the compiler, at some point we could easily run into problems - such as the following: - - - static kernel functions that are also used as kfuncs could be inlined - and/or elided by the compiler. - - BPF-specific kfuncs with external linkage may at some point be elided - by the compiler in LTO builds, when it's determined that they aren't - called anywhere. - - To address this, this patch set introduces a new __bpf_kfunc macro which - should be added to all kfuncs, and which will protect kfuncs from such - problems. Note that some kfuncs kind of try to do this already by - specifying noinline or __used. We are inconsistent in how this is - applied. __bpf_kfunc should provide a uniform and more-future-proof way - to do this. - -* [v1: bpf: skip task with pid=1 in send_signal_common()](http://lore.kernel.org/bpf/20230106084838.12690-1-sunhao.th@gmail.com/) - - The following kernel panic can be triggered when a task with pid=1 - attach a prog that attempts to send killing signal to itself, also - see [1] for more details: - - Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b - CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-09652-g59fe41b5255f #148 - Call Trace: - - __dump_stack lib/dump_stack.c:88 [inline] - dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106 - panic+0x2c4/0x60f kernel/panic.c:275 - do_exit.cold+0x63/0xe4 kernel/exit.c:789 - do_group_exit+0xd4/0x2a0 kernel/exit.c:950 - get_signal+0x2460/0x2600 kernel/signal.c:2858 - arch_do_signal_or_restart+0x78/0x5d0 arch/x86/kernel/signal.c:306 - exit_to_user_mode_loop kernel/entry/common.c:168 [inline] - exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 - __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] - syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 - do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86 - entry_SYSCALL_64_after_hwframe+0x63/0xcd - - So skip task with pid=1 in bpf_send_signal_common() to avoid the panic. - - [1] https://lore.kernel.org/bpf/20221222043507.33037-1-sunhao.th@gmail.com - -* [v1: bpf-next: bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()](http://lore.kernel.org/bpf/cover.1672976410.git.william.xuanziyang@huawei.com/) - - Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). - Main use case is for using cls_bpf on ingress hook to decapsulate - IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. - - And add ipip6 and ip6ip decap testcases to verify that - bpf_skb_adjust_room() correctly decapsulate ipip6 and ip6ip - tunnel packets. - -* [[RFC/PATCH] perf lock contention: Add -o/--lock-owner option](http://lore.kernel.org/bpf/20230105203231.1598936-1-namhyung@kernel.org/) - - When there're many lock contentions in the system, people sometimes - want to know who caused the contention, IOW who's the owner of the - locks. - - The -o/--lock-owner option tries to follow the lock owners for the - contended mutexes and rwsems from BPF, and then attributes the - contention time to the owner instead of the waiter. It's a best - effort approach to get the owner info at the time of the contention - and doesn't guarantee to have the precise tracking of owners if it's - changing over time. - - Currently it only handles mutex and rwsem that have owner field in - their struct and it basically points to a task_struct that owns the - lock at the moment. - -* [v1: bpf-next: libbpf: poison strlcpy()](http://lore.kernel.org/bpf/tencent_5695A257C4D16B4413036BA1DAACDECB0B07@qq.com/) - - Since commit 9fc205b413b3("libbpf: Add sane strncpy alternative and use - it internally") introduce libbpf_strlcpy(), thus add strlcpy() to a poison - list to prevent accidental use of it. - -* [v6: bpf-next: xdp: hints via kfuncs](http://lore.kernel.org/bpf/20230104215949.529093-1-sdf@google.com/) - - Please see the first patch in the series for the overall - design and use-cases. - -* [v1: bpf: skip invalid kfunc call in backtrack_insn](http://lore.kernel.org/bpf/20230104014709.9375-1-sunhao.th@gmail.com/) - - The verifier skips invalid kfunc call in check_kfunc_call(), which - would be captured in fixup_kfunc_call() if such insn is not - eliminated by dead code elimination. However, this can lead to the - following warning in backtrack_insn() - -* [v3: virtio-net: support multi buffer xdp](http://lore.kernel.org/bpf/20230103064012.108029-1-hengqi@linux.alibaba.com/) - - Currently, virtio net only supports xdp for single-buffer packets - or linearized multi-buffer packets. This patchset supports xdp for - multi-buffer packets, then larger MTU can be used if xdp sets the - xdp.frags. This does not affect single buffer handling. - - In order to build multi-buffer xdp neatly, we integrated the code - into virtnet_build_xdp_buff_mrg() for xdp. The first buffer is used - for prepared xdp buff, and the rest of the buffers are added to - its skb_shared_info structure. This structure can also be - conveniently converted during XDP_PASS to get the corresponding skb. - - Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail() - is based on the assumption of the page pool, - (rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag)) - is negative in most cases. So we didn't set xdp_rxq->frag_size in - virtnet_open() to disable the tail increase. - -### 周边技术动态 - -#### Qemu - -* [v3: riscv-to-apply queue](http://lore.kernel.org/qemu-devel/20230106031357.777790-1-alistair.francis@opensource.wdc.com/) - - The following changes since commit d1852caab131ea898134fdcea8c14bc2ee75fbe9: - - Merge tag 'python-pull-request' of https://gitlab.com/jsnow/qemu into staging (2023-01-05 16:59:22 +0000) - - are available in the Git repository at: - - https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20230106 - - for you to fetch changes up to bc92f261519d5c77c70cf2ebcf0a3b9a414d82d0: - - hw/intc: sifive_plic: Fix the pending register range check (2023-01-06 10:42:55 +1000) - - First RISC-V PR for QEMU 8.0 - - * Fix PMP propagation for tlb - * Collection of bug fixes - * Bump the OpenTitan supported version - * Add smstateen support - * Support native debug icount trigger - * Remove the redundant ipi-id property in the virt machine - * Support cache-related PMU events in virtual mode - * Add some missing PolarFire SoC io regions - * Fix mret exception cause when no pmp rule is configured - * Fix bug where disabling compressed instructions would crash QEMU - * Add Zawrs ISA extension support - * A range of code refactoring and cleanups - -#### U-Boot - -* [v4: riscv: ae350: support OpenSBI 1.0+ which enable FW_PIC](http://lore.kernel.org/u-boot/20230104023748.6109-1-rick@andestech.com/) - - Original OpenSBI (without FW_PIC) will relocate itself - from 0x1000000 to 0x0. After OpenSBI added FW_PIC codes, - it will not relocate any more and always run at 0x1000000. - Hence, it may overlap with Kernel memory region. So it is - necessary to change OpenSBI address from 0x1000000 to 0x0. - - More details can refer to commit cb052d771200 - ("riscv: qemu: spl: Fix booting Linux kernel with OpenSBI 1.0+") - -* [v3: riscv: ae350: support openSBI 1.0+ which enable FW_PIC](http://lore.kernel.org/u-boot/20230104020743.30046-1-rick@andestech.com/) - - Original openSBI (without FW_PIC) will relocate itself - from 0x1000000 to 0x0. After openSBI added FW_PIC codes, - it will not relocate any more and always run at 0x1000000. - Hence, it may overlap with Kernel memory region. So it is - necessary to change openSBI address from 0x1000000 to 0x0. - - More details can refer to commit cb052d771200 - ("riscv: qemu: spl: Fix booting Linux kernel with OpenSBI 1.0+") - -* [v2: riscv: ax25: bypass malloc when spl fit boots from ram](http://lore.kernel.org/u-boot/20230103082012.15379-1-rick@andestech.com/) - - When fit image boots from ram, the payload will - be prepared in the address of SPL_LOAD_FIT_ADDRESS. - In spl fit generic flow, it will malloc another - memory address and copy whole fit image to this - malloc address. But it is un-necessary for booting - from RAM. - - This patch improves this flow by declare the - board_spl_fit_buffer_addr() to replace the original one. - The larger image size (eq: Kernel Image 10 - 20MB), it - can save more booting time. - - Also enhance memcpy function by checking source and - destination address. If they are the same address, - just return and don't copy data anymore. - -* [v2: riscv: ae350: Enable CCTL_SUEN](http://lore.kernel.org/u-boot/20230103081713.15220-1-rick@andestech.com/) - - CCTL operations are available to Supervisor/User-mode - software under the control of the mcache_ctl.CCTL_SUEN - control bit. Enable it to support Supervisor(and User) - CCTL operations. - - -## 20230101:第 27 期 - -### 内核动态 - -#### RISC-V 架构支持 - -* v7: [RESEND: leds: Allwinner A100 LED controller support](http://lore.kernel.org/linux-riscv/20221231235541.13568-1-samuel@sholland.org/) - - This series adds bindings and a driver for the RGB LED controller found - in some Allwinner SoCs, starting with A100. The hardware in the R329 and - D1 SoCs appears to be identical. - -* [v4: riscv: Allwinner D1/D1s platform support](http://lore.kernel.org/linux-riscv/20221231233851.24923-1-samuel@sholland.org/) - - This series adds the Kconfig/defconfig plumbing and devicetrees for a - range of Allwinner D1 and D1s-based boards. Many features are already enabled, including USB, Ethernet, and WiFi. - -* [v2: clk: sunxi-ng: Allwinner R528/T113 clock support](http://lore.kernel.org/linux-riscv/20221231231429.18357-1-samuel@sholland.org/) - - R528 and T113 are SoCs based on the same design as D1/D1s, but with ARM - CPUs instead of RISC-V. They use the same CCU implementation, meaning - the CCU has gates/resets for all peripherals present on any SoC in this - family. I verified the CAN bus bits are also present on D1/D1s. - -* [v1: Allwinner D1 video engine support](http://lore.kernel.org/linux-riscv/20221231164628.19688-1-samuel@sholland.org/) - - This series finishes adding Cedrus support for Allwinner D1. I had - tested the hardware and documented the compatible string a while back, - but at the time I had a dummy SRAM section in the devicetree. Further - testing shows that there is no switchable SRAM section -- there is no - need for it, I was unable to guess the address, and the usual bits in - the SRAM controller register have no effect on the video engine. - -* [v3: arch: rename all internal names __xchg to __arch_xchg](http://lore.kernel.org/linux-riscv/20221230141552.128508-1-andrzej.hajda@intel.com/) - - __xchg will be used for non-atomic xchg macro. - -* [v1: riscv: dts: renesas: rzfive-smarc-som: Enable OSTM nodes](http://lore.kernel.org/linux-riscv/20221229230300.104524-1-prabhakar.mahadev-lad.rj@bp.renesas.com/) - - Enable OSTM{1,2} nodes on RZ/Five SMARC SoM. - - Note, OSTM{1,2} nodes are enabled in the RZ/G2UL SMARC SoM DTSI [0] hence - deleting the disabled nodes from RZ/Five SMARC SoM DTSI enables it here - too as we include [0] in RZ/Five SMARC SoM DTSI. - -* [v2: clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback](http://lore.kernel.org/linux-riscv/20221229224601.103851-1-prabhakar.mahadev-lad.rj@bp.renesas.com/) - - Having a clocksource_arch_init() callback always sets vdso_clock_mode to - VDSO_CLOCKMODE_ARCHTIMER if GENERIC_GETTIMEOFDAY is enabled, this is - required for the riscv-timer. - - This works for platforms where just riscv-timer clocksource is present. - On platforms where other clock sources are available we want them to - register with vdso_clock_mode set to VDSO_CLOCKMODE_NONE. - -* [v1: riscv: sbi: Switch to the sys-off handler API](http://lore.kernel.org/linux-riscv/20221228161915.13194-1-samuel@sholland.org/) - - I want to convert the axp20x PMIC poweroff handler to use the sys-off - API, so it can be used as a fallback for if the SBI poweroff handler - is unavailable. But the SBI poweroff handler still uses pm_power_off, so - done alone, this would cause the axp20x callback to be called first, - before the SBI poweroff handler has a chance to run. - -* [v2: hwrng: starfive - Add driver for TRNG module](http://lore.kernel.org/linux-riscv/20221228071103.91797-1-jiajie.ho@starfivetech.com/) - - This patch series adds kernel support for StarFive hardware random - number generator. First 2 patches add bindings documentation and driver - for this module. Patch 3 adds devicetree entry for VisionFive v2 SoC. - -* [v1: clocksource/drivers/riscv: Increase the clock source rating](http://lore.kernel.org/linux-riscv/20221228004444.61568-1-samuel@sholland.org/) - - RISC-V provides an architectural clock source via the time CSR. This - clock source exposes a 64-bit counter synchronized across all CPUs. - Because it is accessed using a CSR, it is much more efficient to read - than MMIO clock sources. For example, on the Allwinner D1, reading the - sun4i timer in a loop takes 131 cycles/iteration, while reading the RISC-V time CSR takes only 5 cycles/iteration. - -* [v2: dt-bindings: riscv: add SBI PMU event mappings](http://lore.kernel.org/linux-riscv/20221227194056.3891216-1-conor@kernel.org/) - - The SBI PMU extension requires a firmware to be aware of the event to - counter/mhpmevent mappings supported by the hardware. OpenSBI may use - DeviceTree to describe the PMU mappings. This binding is currently - described in markdown in OpenSBI (since v1.0 in Dec 2021) & used by QEMU since v7.2.0. - -* [v2: StarFive's SDIO/eMMC driver support](http://lore.kernel.org/linux-riscv/20221227122227.460921-1-william.qiu@starfivetech.com/) - - This patchset adds initial rudimentary support for the StarFive - designware mobile storage host controller driver. And this driver will - be used in StarFive's VisionFive 2 board. The main purpose of adding - this driver is to accommodate the ultra-high speed mode of eMMC. - -* [v5: Add OPTPROBES feature on RISCV](http://lore.kernel.org/linux-riscv/20221224114315.850130-1-chenguokai17@mails.ucas.ac.cn/) - - Add jump optimization support for RISC-V. - - Replaces ebreak instructions used by normal kprobes with an - auipc+jalr instruction pair, at the aim of suppressing the probe-hit overhead. - - All known optprobe-capable RISC architectures have been using a single - jump or branch instructions while this patch chooses not. RISC-V has a - quite limited jump range (4KB or 2MB) for both its branch and jump instructions, which prevent optimizations from supporting probes that spread all over the kernel. - -#### 进程调度 - -* [v2: sched/fair: unlink misfit task from cpu overutilized](http://lore.kernel.org/lkml/20221228165415.3436-1-vincent.guittot@linaro.org/) - - By taking into account uclamp_min, the 1:1 relation between task misfit - and cpu overutilized is no more true as a task with a small util_avg of - may not may not fit a high capacity cpu because of uclamp_min constraint. - - Add a new state in util_fits_cpu() to reflect the case that task would fit - a CPU except for the uclamp_min hint which is a performance requirement. - -* [v1: sched: print parent comm in sched_show_task()](http://lore.kernel.org/lkml/20221227161400.GA7646@didi-ThinkCentre-M930t-N000/) - - Knowing who the parent is might be useful for debugging. - For example, we can sometimes resolve kernel hung tasks by stopping - the person who begins those hung tasks. - With the parent's name printed in sched_show_task(), it might be helpful to let people know which "service" should be operated. - -* [v1: sched/cputime: Make cputime_adjust() more accurate](http://lore.kernel.org/lkml/20221226031010.4079885-1-maxing.lan@bytedance.com/) - - In the current algorithm of cputime_adjust(), the accumulated stime and - utime are used to divide the accumulated rtime. When the value is very - large, it is easy for the stime or utime not to be updated. - -#### 内存管理 - -* [v1: Get rid of first tail page fields](http://lore.kernel.org/linux-mm/20221231214610.2800682-1-willy@infradead.org/) - - Continue the shrinkage of the struct page definition by getting rid of the - 'first tail page' fields. I originally did this patch set before Hugh's - rewrite of the subpages_mapcount, so it needed substantial updates; - hope I didn't miss anything. - -* [v2: scripts/gdb: add mm introspection utils](http://lore.kernel.org/linux-mm/20221231171258.7907-1-dmitrii.bundin.a@gmail.com/) - - This command provides a way to traverse the entire page hierarchy by a - given virtual address on x86. In addition to qemu's commands info - tlb/info mem it provides the complete information about the paging structure for an arbitrary virtual address. It supports 4KB/2MB/1GB and 5 level paging. - -* [v2: mm: huge_memory: convert split_huge_pages_all() to use a folio](http://lore.kernel.org/linux-mm/20221230093020.9664-1-wangkefeng.wang@huawei.com/) - - Straightforwardly convert split_huge_pages_all() to use a folio. - -* [v4: -next: mm: convert page_idle/damon to use folios](http://lore.kernel.org/linux-mm/20221230070849.63358-1-wangkefeng.wang@huawei.com/) - -* [v3: mm/page_reporting: replace rcu_access_pointer() with rcu_dereference_protected()](http://lore.kernel.org/linux-mm/20221228175942.149491-1-sj@kernel.org/) - - Page reporting fetches pr_dev_info using rcu_access_pointer(), which is - for safely fetching a pointer that will not be dereferenced but could - concurrently updated. The code indeed does not dereference pr_dev_info - after fetching it using rcu_access_pointer(), but it fetches the pointer - while concurrent updates to the pointer is avoided by holding the update - side lock, page_reporting_mutex. - -* [v1: mm, slab: periodically resched in drain_freelist()](http://lore.kernel.org/linux-mm/b1808b92-86df-9f53-bfb2-8862a9c554e9@google.com/) - - drain_freelist() can be called with a very large number of slabs to free, - such as for kmem_cache_shrink(), or depending on various settings of the - slab cache when doing periodic reaping. - - If there is a potentially long list of slabs to drain, periodically - schedule to ensure we aren't saturating the cpu for too long. - -* [v1: arm64/vmalloc: use module region only for module_alloc() if CONFIG_RANDOMIZE_BASE is set](http://lore.kernel.org/linux-mm/20221227092634.445212-1-liushixin2@huawei.com/) - - After I add a 10GB pmem device, I got the following error message when - insert module: - - insmod: vmalloc error: size 16384, vm_struct allocation failed, - mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 - - Skip module region if not calling from module_alloc(). - -* [v1: migrate_pages(): batch TLB flushing](http://lore.kernel.org/linux-mm/20221227002859.27740-1-ying.huang@intel.com/) - - If multiple folios are passed to migrate_pages(), there are - opportunities to batch the TLB flushing and copying. That is, we can - change the code to something as follows, - - The total number of TLB flushing IPI can be reduced considerably. And we may use some hardware accelerator such as DSA to accelerate the folio copying. - -#### 文件系统 - -* [v1: fs/ext4: Replace kmap_atomic() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20221231174439.8557-1-fmdefrancesco@gmail.com/) - - However, the code within the mappings and un-mappings in ext4/inline.c - does not depend on the above-mentioned side effects. - - Therefore, a mere replacement of the old API with the new one is all it - is required (i.e., there is no need to explicitly add any calls to pagefault_disable() and/or preempt_disable()). - -* [v1: fs/ext2: Replace kmap_atomic() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20221231174205.8492-1-fmdefrancesco@gmail.com/) - - However, the code within the mapping and un-mapping in ext2_make_empty() - does not depend on the above-mentioned side effects. - - Therefore, a mere replacement of the old API with the new one is all it - is required (i.e., there is no need to explicitly add any calls to - pagefault_disable() and/or preempt_disable()). - -* [v5: Turn iomap_page_ops into iomap_folio_ops](http://lore.kernel.org/linux-fsdevel/20221231150919.659533-1-agruenba@redhat.com/) - - The patches are split up into relatively small pieces. That may seem - unnecessary, but at least it makes reviewing the patches easier. - -* [v1: fs/sysv: Replace kmap() with kmap_local_page()](http://lore.kernel.org/linux-fsdevel/20221231075717.10258-1-fmdefrancesco@gmail.com/) - - kmap() is deprecated in favor of kmap_local_page(). - - There are two main problems with kmap(): (1) It comes with an overhead as - the mapping space is restricted and protected by a global lock for - synchronization and (2) it also requires global TLB invalidation when the - kmap’s pool wraps and it might block when the mapping space is fully - utilized until a slot becomes available. - -* [v4: -next: fs: coredump: using preprocessor directives for dump_emit_page](http://lore.kernel.org/linux-fsdevel/20221230022446.448179-1-xiehongyu1@kylinos.cn/) - - When CONFIG_COREDUMP is set and CONFIG_ELF_CORE is not, you'll get warnings - like: - fs/coredump.c:841:12: error: ‘dump_emit_page’ defined but not used - [v1: -Werror=unused-function] - 841 | static int dump_emit_page(struct coredump_params *cprm, struct - page *page) - - dump_emit_page only called in dump_user_range, since dump_user_range - using #ifdef preprocessor directives, use #ifdef for dump_emit_page too. - -* [v5: fs/ufs: Replace kmap() with kmap_local_page](http://lore.kernel.org/linux-fsdevel/20221229225100.22141-1-fmdefrancesco@gmail.com/) - - With kmap_local_page() the mappings are per thread, CPU local, can take - page faults, and can be called from any context (including interrupts). - It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, - the tasks can be preempted and, when they are scheduled to run again, the - kernel virtual addresses are restored and still valid. - -* [v2: Introduce provisioning primitives for thinly provisioned storage](http://lore.kernel.org/linux-fsdevel/20221229081252.452240-1-sarthakkukreti@chromium.org/) - - This patch series adds a mechanism to pass through provision requests on - stacked thinly provisioned storage devices/filesystems. - - The linux kernel provides several mechanisms to set up thinly provisioned - block storage abstractions (eg. dm-thin, loop devices over sparse files), - either directly as block devices or backing storage for filesystems. - -* [v1: Add new open(2) flag - O_EMPTY_PATH](http://lore.kernel.org/linux-fsdevel/20221228160249.428399-1-ahamza@ixsystems.com/) - - This patch adds a new flag O_EMPTY_PATH that allows openat and open - system calls to open a file referenced by fd if the path is empty, - and it is very similar to the FreeBSD O_EMPTY_PATH flag. - -* [v1: fs: nls: Simplification of ASCII and ISO-8859-1](http://lore.kernel.org/linux-fsdevel/20221226144301.16382-1-pali@kernel.org/) - - This is RFC patch series which simplify ASCII and ISO-8859-1 tables. - I'm not sure what is the direction of the nls code and duplicated - default/iso88591 tables, so I'm sending this series as RFC. - -* [v2: eventfd: use a generic helper instead of an open coded wait_event](http://lore.kernel.org/linux-fsdevel/tencent_1D2E4866B2223D9A19DF4FFB79AFAA955A05@qq.com/) - - Use wait_event_interruptible_locked_irq() in the eventfd_{write,read} to - avoid the longer, open coded equivalent. - -* [v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/) - - This removes the dependency on interrupts to wake up task. Set task - state as TASK_RUNNING, if need_resched() returns true, - while polling for IO completion. - Earlier, polling task used to sleep, relying on interrupt to wake it up. - This made some IO take very long when interrupt-coalescing is enabled in NVMe. - -#### 网络设备 - -* [v1: net-next: net: ipa: simplify IPA interrupt handling](http://lore.kernel.org/netdev/20221230232230.2348757-1-elder@linaro.org/) - - One of the IPA's two IRQs fires when data on a suspended channel is - available (to request that the channel--or system--be resumed to - recieve the pending data). This interrupt also handles a few - conditions signaled by the embedded microcontroller. - - For this "IPA interrupt", the current code requires a handler to be - dynamically registered for each interrupt condition. Any condition - that has no registered handler is quietly ignored. This design is derived from the downstream IPA driver implementation. - -* [v1: net: ipa: use proper endpoint mask for suspend](http://lore.kernel.org/netdev/20221230223304.2137471-1-elder@linaro.org/) - - It is now possible for a system to have more than 32 endpoints. As - a result, registers related to endpoint suspend are parameterized, with 32 endpoints represented in one more registers. - -* [v2: net-next: r8169: disable ASPM in case of tx timeout](http://lore.kernel.org/netdev/06bab827-be4a-606e-7a01-52379b1e1a91@gmail.com/) - - There are still single reports of systems where ASPM incompatibilities - cause tx timeouts. It's not clear whom to blame, so let's disable - ASPM in case of a tx timeout. - -* [v1: igc: Mask replay rollover/timeout errors in I225_LMVP](http://lore.kernel.org/netdev/20221229122640.239859-1-rajat.khandelwal@linux.intel.com/) - - The CPU logs get flooded with replay rollover/timeout AER errors in - the system with i225_lmvp connected, usually inside thunderbolt devices. - - One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates - an Intel Foxville chipset, which uses the igc driver. - On connecting ethernet, CPU logs get inundated with these errors. - -* [v1: tcp/udp: add tracepoint for send recv length](http://lore.kernel.org/netdev/20221229080207.1029-1-cuiyunhui@bytedance.com/) - - Add a tracepoint for capturing TCP segments with - a send or receive length. This makes it easy to obtain - the packet sending and receiving information of each process in the user mode, such as the netatop tool. - -* [v1: wifi: ath9k: htc_hst: free skb in ath9k_htc_rx_msg() if there is no callback function](http://lore.kernel.org/netdev/20221228224047.146399-1-pchelkin@ispras.ru/) - - It is stated that ath9k_htc_rx_msg() either frees the provided skb or - passes its management to another callback function. However, the skb is - not freed in case there is no another callback function, and Syzkaller was - able to cause a memory leak. Also minor comment fix. - -* [v2: net: qed: allow sleep in qed_mcp_trace_dump()](http://lore.kernel.org/netdev/20221228220045.101647-1-csander@purestorage.com/) - - By default, qed_mcp_cmd_and_union() delays 10us at a time in a loop - that can run 500K times, so calls to qed_mcp_nvm_rd_cmd() - may block the current thread for over 5s. - We observed thread scheduling delays over 700ms in production, - with stacktraces pointing to this code as the culprit. - -* [v1: net: amd-xgbe: add missed tasklet_kill](http://lore.kernel.org/netdev/20221228081447.3400369-1-jiguang.xiao@windriver.com/) - - The driver does not call tasklet_kill in several places. - Add the calls to fix it. - -* [v2: net: hns3: refine the handling for VF heartbeat](http://lore.kernel.org/netdev/20221228062749.58809-1-lanhao@huawei.com/) - - Currently, the PF check the VF alive by the KEEP_ALVE - mailbox from VF. VF keep sending the mailbox per 2 - seconds. Once PF lost the mailbox for more than 8 - seconds, it will regards the VF is abnormal, and stop - notifying the state change to VF, include link state, - vf mac, reset, even though it receives the KEEP_ALIVE mailbox again. - -* [v1: rtw88: Add SDIO support](http://lore.kernel.org/netdev/20221227233020.284266-1-martin.blumenstingl@googlemail.com/) - - Recently the rtw88 driver has gained locking support for the "slow" bus - types (USB, SDIO) as part of USB support. Thanks to everyone who helped - make this happen! - - Based on the USB work (especially the locking part and various - bugfixes) this series adds support for SDIO based cards. It's the - result of a collaboration between Jernej and myself. Neither of us has - access to the rtw88 datasheets. All of our work is based on studying - the RTL8822BS and RTL8822CS vendor drivers and trial and error. - - Jernej and myself have tested this with RTL8822BS and RTL8822CS cards. - Other users have confirmed that RTL8821CS support is working as well. - RTL8723DS may also work (we tried our best to handle rtw_chip_wcpu_11n - where needed) but has not been tested at this point. - - Jernej's results with a RTL8822BS: - - Main functionality works - - Had a case where no traffic got across the link until he issued a - scan - - My results with a RTL8822CS: - - 2.4GHz and 5GHz bands are both working - - TX throughput on a 5GHz network is between 50 Mbit/s and 90 Mbit/s - - RX throughput on a 5GHz network is at 19 Mbit/s - - Sometimes there are frequent reconnects (once every 1-5 minutes) - after the link has been up for a long time (multiple hours). Today - I was unable to reproduce this though (I only had reconnect in 8 - hours). - - Why is this an RFC? - - It needs a through review especially by the rtw88 maintainers - - It's not clear to me how the "mmc: sdio" patch will be merged (will - Ulf take this or can we merge - -* [v1: net: dpaa2-mac: Get serdes only for backplane links](http://lore.kernel.org/netdev/20221227230918.2440351-1-sean.anderson@seco.com/) - - This implies that Linux only manages the SerDes when the link type is - backplane. From my testing, the link fails to come up when the link type is - phy, but does come up when it is backplane. Modify the condition in dpaa2_mac_connect to reflect this, moving the existing conditions to more appropriate places. - -* [v2: net-next: net: mdio: Start separating C22 and C45](http://lore.kernel.org/netdev/20221227-v6-2-rc1-c45-seperation-v2-0-ddb37710e5a7@walle.cc/) - - This patch set starts the separation of C22 and C45 MDIO bus - transactions at the API level to the MDIO Bus drivers. C45 read and - write ops are added to the MDIO bus driver structure, and the MDIO - core will try to use these ops if requested to perform a C45 - transfer. - -* [v1: ethtool-next: JSON output support for Netlink implementation of --show-ring option](http://lore.kernel.org/netdev/20221227175221.7762-1-glipus@gmail.com/) - - Add --json support for Netlink implementation of --show-ring option No changes for non-JSON output for this featire. - -* [v1: s390/qeth: convert sysfs snprintf to sysfs_emit](http://lore.kernel.org/netdev/20221227110352.1436120-1-zhangxuezhi3@gmail.com/) - - Follow the advice of the Documentation/filesystems/sysfs.rst - and show() should only use sysfs_emit() or sysfs_emit_at() - when formatting the value to be returned to user space. - -* [v1: iproute2: dcb: Do not leave ACKs in socket receive buffer](http://lore.kernel.org/netdev/20221227110318.2899056-1-idosch@nvidia.com/) - - Originally, the dcb utility only stopped receiving messages from a - socket when it found the attribute it was looking for. Cited commit - changed that, so that the utility will also stop when seeing an ACK - (NLMSG_ERROR message), by setting the NLM_F_ACK flag on requests. - - This is problematic because it means a successful request will leave an ACK in the socket receive buffer, causing the next request to bail before reading its response. - -* [v1: Introduce a vringh accessor for IO memory](http://lore.kernel.org/netdev/20221227022528.609839-1-mie@igel.co.jp/) - - Vringh is a host-side implementation of virtio rings, and supports the - vring located on three kinds of memories, userspace, kernel space and a - space translated iotlb. - - The goal of this patchset is to refactor vringh and introduce a new vringh - accessor for the vring located on the io memory region. - -* [v2: Add some USB hotspot IDs](http://lore.kernel.org/netdev/20221226234751.444917-1-mjg59@srcf.ucam.org/) - - Add a few additional IDs to support a couple of hotspots I had lying - around. V2 avoids reserving the PPP modem endpoint for the MDM9207 - devices. - -* [v2: net: net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers](http://lore.kernel.org/netdev/20221226114825.1937189-1-d-tatianin@yandex-team.ru/) - - This series fixes a potential NULL dereference in ethtool_get_phy_stats - while also attempting to refactor/split said function into multiple - helpers so that it's easier to reason about what's going on. - -* [v2: wireless-next: wl18xx: use strscpy() to instead of strncpy()](http://lore.kernel.org/netdev/202212261914060599112@zte.com.cn/) - - The implementation of strscpy() is more robust and safer. - That's now the recommended way to copy NUL-terminated strings. - -* [v1: virtio-net: don't busy poll for cvq command](http://lore.kernel.org/netdev/20221226074908.8154-1-jasowang@redhat.com/) - - The code used to busy poll for cvq command which turns out to have - several side effects: - - 1) infinite poll for buggy devices - 2) bad interaction with scheduler - - So this series tries to use sleep + timeout instead of busy polling. - -* [v1: batman-adv: Check return value](http://lore.kernel.org/netdev/20221224233311.48678-1-artem.chernyshev@red-soft.ru/) - - Check, if rtnl_link_register() call in batadv_init() was successful - - Found by Linux Verification Center (linuxtesting.org) with SVACE. - -#### BPF - -* [v1: bpf-next: Support for BPF_ST instruction in LLVM C compiler](http://lore.kernel.org/bpf/20221231163122.1360813-1-eddyz87@gmail.com/) - - Currently LLVM BPF back-end does not emit BPF_ST instruction and does not allow one to be specified as inline assembly. - - Recently I've been exploring ways to port some of the verifier test - cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly - and machinery provided in tools/testing/selftests/bpf/test_loader.c - (which should hopefully simplify tests maintenance). - -* [v1: bpf-next: libbpf: Add LoongArch support to bpf_tracing.h](http://lore.kernel.org/bpf/20221231100757.3177034-1-hengqi.chen@gmail.com/) - - Add PT_REGS macros for LoongArch ([v1: 0]). - -* [v1: bpf-next: bpf: Handle reuse in bpf memory alloc](http://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/) - - Tndles element reuse in bpf memory allocator. The immediate reuse of - freed elements may lead to two problems in htab map: - reuse will reinitialize special fields (e.g., bpf_spin_lock) in htab map value and it may corrupt lookup procedure with BFP_F_LOCK flag which acquires bpf-spin-lock during value copying. The corruption of bpf-spin-lock may result in hard lock-up. - -* [bpf helpers freeze. Was: v2: bpf-next: Dynptr convenience helpers](http://lore.kernel.org/bpf/20221225215210.ekmfhyczgubx4rih@macbook-pro-6.dhcp.thefacebook.com/) - - uapi helpers vs kfuncs argument is not a black and white comparison. - It's not just stable vs unstable. - uapi has strict rules and helpers in uapi/bpf.h have to follow those rules. - While kfuncs in terms of stability are equivalent to EXPORT_SYMBOL_GPL. - -### 周边技术动态 - -#### Qemu - -* [v1: riscv: do not set the rounding mode via `gen_set_rm`](http://lore.kernel.org/qemu-devel/20221229172734.119600-1-abdulras@google.com/) - - Setting the rounding mode via the `gen_set_rm` call would alter the - state of the disassembler, resetting the `TransOp` in the assembler - context. When we subsequently set the rounding mode to the desired - value, we would trigger an assertion in `decode_save_opc`. - -* [v2: hw/riscv: Improve Spike HTIF emulation fidelity](http://lore.kernel.org/qemu-devel/20221229091828.1945072-1-bmeng@tinylab.org/) - - At present the 32-bit OpenSBI generic firmware image does not boot on - Spike, only 64-bit image can. This is due to the HTIF emulation does - not implement the proxy syscall interface which is required for the - 32-bit HTIF console output. - - An OpenSBI bug fix [1] is also needed when booting the plain binary image. - -#### Buildroot - -* [v2: package/qemu: refactor target emulator selection](http://lore.kernel.org/buildroot/20221227114842.2620182-1-unixmania@gmail.com/) - - Since CUSTOM_TARGETS does not select FDT, we can get build errors like - this: - - ../meson.build:2778:2: ERROR: Problem encountered: fdt not available but required by targets x86_64-softmmu - - We could select FDT when CUSTOM_TARGETS is set, but this would force an - unnecessary dependency on dtc, as BR2_PACKAGE_QEMU_SYSTEM does. - -#### U-Boot - -* [v1: Pull request for efi-2023-01-rc5-2](http://lore.kernel.org/u-boot/80393d33-8a51-2840-b5c5-112298e4c5aa@gmx.de/) - - Pull request for efi-2023-01-rc5-2 - - Documentation: - - * Reorganize existing TI docs and add K3 generation page - * Add texinfodocs and infodocs targets - * Update qemu-ppce500 documentation - * Use "changesets" not "csets" in statistics pages - - UEFI - - * Fix merging of preseeded non-volatile variables - * Fix a return value in the EFI_HII_DATABASE_PROTOCOL - * Set UEFI specification version to 2.10