From 819dc58be770781b65c9fc63093882e077f0591b Mon Sep 17 00:00:00 2001 From: Josh Hunt Date: Tue, 10 Sep 2024 15:08:22 -0400 Subject: [PATCH 01/51] tcp: check skb is non-NULL in tcp_rto_delta_us() stable inclusion from stable-6.6.54 commit 570f7d8c9bf14f041152ba8353d4330ef7575915 category: bugfix issue: #IB55S9 CVE: CVE-2024-47684 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit c8770db2d54437a5f49417ae7b46f7de23d14db6 ] We have some machines running stock Ubuntu 20.04.6 which is their 5.4.0-174-generic kernel that are running ceph and recently hit a null ptr dereference in tcp_rearm_rto(). Initially hitting it from the TLP path, but then later we also saw it getting hit from the RACK case as well. Here are examples of the oops messages we saw in each of those cases: Jul 26 15:05:02 rx [11061395.780353] BUG: kernel NULL pointer dereference, address: 0000000000000020 Jul 26 15:05:02 rx [11061395.787572] #PF: supervisor read access in kernel mode Jul 26 15:05:02 rx [11061395.792971] #PF: error_code(0x0000) - not-present page Jul 26 15:05:02 rx [11061395.798362] PGD 0 P4D 0 Jul 26 15:05:02 rx [11061395.801164] Oops: 0000 [#1] SMP NOPTI Jul 26 15:05:02 rx [11061395.805091] CPU: 0 PID: 9180 Comm: msgr-worker-1 Tainted: G W 5.4.0-174-generic #193-Ubuntu Jul 26 15:05:02 rx [11061395.814996] Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023 Jul 26 15:05:02 rx [11061395.825952] RIP: 0010:tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061395.830656] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3 Jul 26 15:05:02 rx [11061395.849665] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246 Jul 26 15:05:02 rx [11061395.855149] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000 Jul 26 15:05:02 rx [11061395.862542] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60 Jul 26 15:05:02 rx [11061395.869933] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8 Jul 26 15:05:02 rx [11061395.877318] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900 Jul 26 15:05:02 rx [11061395.884710] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30 Jul 26 15:05:02 rx [11061395.892095] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000 Jul 26 15:05:02 rx [11061395.900438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 26 15:05:02 rx [11061395.906435] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0 Jul 26 15:05:02 rx [11061395.913822] PKRU: 55555554 Jul 26 15:05:02 rx [11061395.916786] Call Trace: Jul 26 15:05:02 rx [11061395.919488] Jul 26 15:05:02 rx [11061395.921765] ? show_regs.cold+0x1a/0x1f Jul 26 15:05:02 rx [11061395.925859] ? __die+0x90/0xd9 Jul 26 15:05:02 rx [11061395.929169] ? no_context+0x196/0x380 Jul 26 15:05:02 rx [11061395.933088] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0 Jul 26 15:05:02 rx [11061395.938216] ? ip6_sublist_rcv_finish+0x3d/0x50 Jul 26 15:05:02 rx [11061395.943000] ? __bad_area_nosemaphore+0x50/0x1a0 Jul 26 15:05:02 rx [11061395.947873] ? bad_area_nosemaphore+0x16/0x20 Jul 26 15:05:02 rx [11061395.952486] ? do_user_addr_fault+0x267/0x450 Jul 26 15:05:02 rx [11061395.957104] ? ipv6_list_rcv+0x112/0x140 Jul 26 15:05:02 rx [11061395.961279] ? __do_page_fault+0x58/0x90 Jul 26 15:05:02 rx [11061395.965458] ? do_page_fault+0x2c/0xe0 Jul 26 15:05:02 rx [11061395.969465] ? page_fault+0x34/0x40 Jul 26 15:05:02 rx [11061395.973217] ? tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061395.977313] ? tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061395.981408] tcp_send_loss_probe+0x10b/0x220 Jul 26 15:05:02 rx [11061395.985937] tcp_write_timer_handler+0x1b4/0x240 Jul 26 15:05:02 rx [11061395.990809] tcp_write_timer+0x9e/0xe0 Jul 26 15:05:02 rx [11061395.994814] ? tcp_write_timer_handler+0x240/0x240 Jul 26 15:05:02 rx [11061395.999866] call_timer_fn+0x32/0x130 Jul 26 15:05:02 rx [11061396.003782] __run_timers.part.0+0x180/0x280 Jul 26 15:05:02 rx [11061396.008309] ? recalibrate_cpu_khz+0x10/0x10 Jul 26 15:05:02 rx [11061396.012841] ? native_x2apic_icr_write+0x30/0x30 Jul 26 15:05:02 rx [11061396.017718] ? lapic_next_event+0x21/0x30 Jul 26 15:05:02 rx [11061396.021984] ? clockevents_program_event+0x8f/0xe0 Jul 26 15:05:02 rx [11061396.027035] run_timer_softirq+0x2a/0x50 Jul 26 15:05:02 rx [11061396.031212] __do_softirq+0xd1/0x2c1 Jul 26 15:05:02 rx [11061396.035044] do_softirq_own_stack+0x2a/0x40 Jul 26 15:05:02 rx [11061396.039480] Jul 26 15:05:02 rx [11061396.041840] do_softirq.part.0+0x46/0x50 Jul 26 15:05:02 rx [11061396.046022] __local_bh_enable_ip+0x50/0x60 Jul 26 15:05:02 rx [11061396.050460] _raw_spin_unlock_bh+0x1e/0x20 Jul 26 15:05:02 rx [11061396.054817] nf_conntrack_tcp_packet+0x29e/0xbe0 [nf_conntrack] Jul 26 15:05:02 rx [11061396.060994] ? get_l4proto+0xe7/0x190 [nf_conntrack] Jul 26 15:05:02 rx [11061396.066220] nf_conntrack_in+0xe9/0x670 [nf_conntrack] Jul 26 15:05:02 rx [11061396.071618] ipv6_conntrack_local+0x14/0x20 [nf_conntrack] Jul 26 15:05:02 rx [11061396.077356] nf_hook_slow+0x45/0xb0 Jul 26 15:05:02 rx [11061396.081098] ip6_xmit+0x3f0/0x5d0 Jul 26 15:05:02 rx [11061396.084670] ? ipv6_anycast_cleanup+0x50/0x50 Jul 26 15:05:02 rx [11061396.089282] ? __sk_dst_check+0x38/0x70 Jul 26 15:05:02 rx [11061396.093381] ? inet6_csk_route_socket+0x13b/0x200 Jul 26 15:05:02 rx [11061396.098346] inet6_csk_xmit+0xa7/0xf0 Jul 26 15:05:02 rx [11061396.102263] __tcp_transmit_skb+0x550/0xb30 Jul 26 15:05:02 rx [11061396.106701] tcp_write_xmit+0x3c6/0xc20 Jul 26 15:05:02 rx [11061396.110792] ? __alloc_skb+0x98/0x1d0 Jul 26 15:05:02 rx [11061396.114708] __tcp_push_pending_frames+0x37/0x100 Jul 26 15:05:02 rx [11061396.119667] tcp_push+0xfd/0x100 Jul 26 15:05:02 rx [11061396.123150] tcp_sendmsg_locked+0xc70/0xdd0 Jul 26 15:05:02 rx [11061396.127588] tcp_sendmsg+0x2d/0x50 Jul 26 15:05:02 rx [11061396.131245] inet6_sendmsg+0x43/0x70 Jul 26 15:05:02 rx [11061396.135075] __sock_sendmsg+0x48/0x70 Jul 26 15:05:02 rx [11061396.138994] ____sys_sendmsg+0x212/0x280 Jul 26 15:05:02 rx [11061396.143172] ___sys_sendmsg+0x88/0xd0 Jul 26 15:05:02 rx [11061396.147098] ? __seccomp_filter+0x7e/0x6b0 Jul 26 15:05:02 rx [11061396.151446] ? __switch_to+0x39c/0x460 Jul 26 15:05:02 rx [11061396.155453] ? __switch_to_asm+0x42/0x80 Jul 26 15:05:02 rx [11061396.159636] ? __switch_to_asm+0x5a/0x80 Jul 26 15:05:02 rx [11061396.163816] __sys_sendmsg+0x5c/0xa0 Jul 26 15:05:02 rx [11061396.167647] __x64_sys_sendmsg+0x1f/0x30 Jul 26 15:05:02 rx [11061396.171832] do_syscall_64+0x57/0x190 Jul 26 15:05:02 rx [11061396.175748] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 Jul 26 15:05:02 rx [11061396.181055] RIP: 0033:0x7f1ef692618d Jul 26 15:05:02 rx [11061396.184893] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ca ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 48 89 44 24 08 e8 fe ee ff ff 48 Jul 26 15:05:02 rx [11061396.203889] RSP: 002b:00007f1ef4a26aa0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e Jul 26 15:05:02 rx [11061396.211708] RAX: ffffffffffffffda RBX: 000000000000084b RCX: 00007f1ef692618d Jul 26 15:05:02 rx [11061396.219091] RDX: 0000000000004000 RSI: 00007f1ef4a26b10 RDI: 0000000000000275 Jul 26 15:05:02 rx [11061396.226475] RBP: 0000000000004000 R08: 0000000000000000 R09: 0000000000000020 Jul 26 15:05:02 rx [11061396.233859] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000084b Jul 26 15:05:02 rx [11061396.241243] R13: 00007f1ef4a26b10 R14: 0000000000000275 R15: 000055592030f1e8 Jul 26 15:05:02 rx [11061396.248628] Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet mii ast drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler nft_ct sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 mlx5_core hid_generic pci_hyperv_intf crc32_pclmul tls usbhid ahci mlxfw bnxt_en libahci hid nvme i2c_piix4 nvme_core wmi Jul 26 15:05:02 rx [11061396.324334] CR2: 0000000000000020 Jul 26 15:05:02 rx [11061396.327944] ---[ end trace 68a2b679d1cfb4f1 ]--- Jul 26 15:05:02 rx [11061396.433435] RIP: 0010:tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061396.438137] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3 Jul 26 15:05:02 rx [11061396.457144] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246 Jul 26 15:05:02 rx [11061396.462629] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000 Jul 26 15:05:02 rx [11061396.470012] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60 Jul 26 15:05:02 rx [11061396.477396] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8 Jul 26 15:05:02 rx [11061396.484779] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900 Jul 26 15:05:02 rx [11061396.492164] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30 Jul 26 15:05:02 rx [11061396.499547] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000 Jul 26 15:05:02 rx [11061396.507886] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 26 15:05:02 rx [11061396.513884] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0 Jul 26 15:05:02 rx [11061396.521267] PKRU: 55555554 Jul 26 15:05:02 rx [11061396.524230] Kernel panic - not syncing: Fatal exception in interrupt Jul 26 15:05:02 rx [11061396.530885] Kernel Offset: 0x1b200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Jul 26 15:05:03 rx [11061396.660181] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- After we hit this we disabled TLP by setting tcp_early_retrans to 0 and then hit the crash in the RACK case: Aug 7 07:26:16 rx [1006006.265582] BUG: kernel NULL pointer dereference, address: 0000000000000020 Aug 7 07:26:16 rx [1006006.272719] #PF: supervisor read access in kernel mode Aug 7 07:26:16 rx [1006006.278030] #PF: error_code(0x0000) - not-present page Aug 7 07:26:16 rx [1006006.283343] PGD 0 P4D 0 Aug 7 07:26:16 rx [1006006.286057] Oops: 0000 [#1] SMP NOPTI Aug 7 07:26:16 rx [1006006.289896] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.4.0-174-generic #193-Ubuntu Aug 7 07:26:16 rx [1006006.299107] Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023 Aug 7 07:26:16 rx [1006006.309970] RIP: 0010:tcp_rearm_rto+0xe4/0x160 Aug 7 07:26:16 rx [1006006.314584] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3 Aug 7 07:26:16 rx [1006006.333499] RSP: 0018:ffffb42600a50960 EFLAGS: 00010246 Aug 7 07:26:16 rx [1006006.338895] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000 Aug 7 07:26:16 rx [1006006.346193] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff92d687ed8160 Aug 7 07:26:16 rx [1006006.353489] RBP: ffffb42600a50978 R08: 0000000000000000 R09: 00000000cd896dcc Aug 7 07:26:16 rx [1006006.360786] R10: ffff92dc3404f400 R11: 0000000000000001 R12: ffff92d687ed8000 Aug 7 07:26:16 rx [1006006.368084] R13: ffff92d687ed8160 R14: 00000000cd896dcc R15: 00000000cd8fca81 Aug 7 07:26:16 rx [1006006.375381] FS: 0000000000000000(0000) GS:ffff93158ad40000(0000) knlGS:0000000000000000 Aug 7 07:26:16 rx [1006006.383632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 7 07:26:16 rx [1006006.389544] CR2: 0000000000000020 CR3: 0000003e775ce006 CR4: 0000000000760ee0 Aug 7 07:26:16 rx [1006006.396839] PKRU: 55555554 Aug 7 07:26:16 rx [1006006.399717] Call Trace: Aug 7 07:26:16 rx [1006006.402335] Aug 7 07:26:16 rx [1006006.404525] ? show_regs.cold+0x1a/0x1f Aug 7 07:26:16 rx [1006006.408532] ? __die+0x90/0xd9 Aug 7 07:26:16 rx [1006006.411760] ? no_context+0x196/0x380 Aug 7 07:26:16 rx [1006006.415599] ? __bad_area_nosemaphore+0x50/0x1a0 Aug 7 07:26:16 rx [1006006.420392] ? _raw_spin_lock+0x1e/0x30 Aug 7 07:26:16 rx [1006006.424401] ? bad_area_nosemaphore+0x16/0x20 Aug 7 07:26:16 rx [1006006.428927] ? do_user_addr_fault+0x267/0x450 Aug 7 07:26:16 rx [1006006.433450] ? __do_page_fault+0x58/0x90 Aug 7 07:26:16 rx [1006006.437542] ? do_page_fault+0x2c/0xe0 Aug 7 07:26:16 rx [1006006.441470] ? page_fault+0x34/0x40 Aug 7 07:26:16 rx [1006006.445134] ? tcp_rearm_rto+0xe4/0x160 Aug 7 07:26:16 rx [1006006.449145] tcp_ack+0xa32/0xb30 Aug 7 07:26:16 rx [1006006.452542] tcp_rcv_established+0x13c/0x670 Aug 7 07:26:16 rx [1006006.456981] ? sk_filter_trim_cap+0x48/0x220 Aug 7 07:26:16 rx [1006006.461419] tcp_v6_do_rcv+0xdb/0x450 Aug 7 07:26:16 rx [1006006.465257] tcp_v6_rcv+0xc2b/0xd10 Aug 7 07:26:16 rx [1006006.468918] ip6_protocol_deliver_rcu+0xd3/0x4e0 Aug 7 07:26:16 rx [1006006.473706] ip6_input_finish+0x15/0x20 Aug 7 07:26:16 rx [1006006.477710] ip6_input+0xa2/0xb0 Aug 7 07:26:16 rx [1006006.481109] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0 Aug 7 07:26:16 rx [1006006.486151] ip6_sublist_rcv_finish+0x3d/0x50 Aug 7 07:26:16 rx [1006006.490679] ip6_sublist_rcv+0x1aa/0x250 Aug 7 07:26:16 rx [1006006.494779] ? ip6_rcv_finish_core.isra.0+0xa0/0xa0 Aug 7 07:26:16 rx [1006006.499828] ipv6_list_rcv+0x112/0x140 Aug 7 07:26:16 rx [1006006.503748] __netif_receive_skb_list_core+0x1a4/0x250 Aug 7 07:26:16 rx [1006006.509057] netif_receive_skb_list_internal+0x1a1/0x2b0 Aug 7 07:26:16 rx [1006006.514538] gro_normal_list.part.0+0x1e/0x40 Aug 7 07:26:16 rx [1006006.519068] napi_complete_done+0x91/0x130 Aug 7 07:26:16 rx [1006006.523352] mlx5e_napi_poll+0x18e/0x610 [mlx5_core] Aug 7 07:26:16 rx [1006006.528481] net_rx_action+0x142/0x390 Aug 7 07:26:16 rx [1006006.532398] __do_softirq+0xd1/0x2c1 Aug 7 07:26:16 rx [1006006.536142] irq_exit+0xae/0xb0 Aug 7 07:26:16 rx [1006006.539452] do_IRQ+0x5a/0xf0 Aug 7 07:26:16 rx [1006006.542590] common_interrupt+0xf/0xf Aug 7 07:26:16 rx [1006006.546421] Aug 7 07:26:16 rx [1006006.548695] RIP: 0010:native_safe_halt+0xe/0x10 Aug 7 07:26:16 rx [1006006.553399] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 36 2c 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 2c 50 00 fb f4 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 dd 5e 61 ff 65 Aug 7 07:26:16 rx [1006006.572309] RSP: 0018:ffffb42600177e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffc2 Aug 7 07:26:16 rx [1006006.580040] RAX: ffffffff8ed08b20 RBX: 0000000000000005 RCX: 0000000000000001 Aug 7 07:26:16 rx [1006006.587337] RDX: 00000000f48eeca2 RSI: 0000000000000082 RDI: 0000000000000082 Aug 7 07:26:16 rx [1006006.594635] RBP: ffffb42600177e90 R08: 0000000000000000 R09: 000000000000020f Aug 7 07:26:16 rx [1006006.601931] R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000005 Aug 7 07:26:16 rx [1006006.609229] R13: ffff93157deb5f00 R14: 0000000000000000 R15: 0000000000000000 Aug 7 07:26:16 rx [1006006.616530] ? __cpuidle_text_start+0x8/0x8 Aug 7 07:26:16 rx [1006006.620886] ? default_idle+0x20/0x140 Aug 7 07:26:16 rx [1006006.624804] arch_cpu_idle+0x15/0x20 Aug 7 07:26:16 rx [1006006.628545] default_idle_call+0x23/0x30 Aug 7 07:26:16 rx [1006006.632640] do_idle+0x1fb/0x270 Aug 7 07:26:16 rx [1006006.636035] cpu_startup_entry+0x20/0x30 Aug 7 07:26:16 rx [1006006.640126] start_secondary+0x178/0x1d0 Aug 7 07:26:16 rx [1006006.644218] secondary_startup_64+0xa4/0xb0 Aug 7 07:26:17 rx [1006006.648568] Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 nft_ct amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet ast mii drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 hid_generic mlx5_core pci_hyperv_intf crc32_pclmul usbhid ahci tls mlxfw bnxt_en hid libahci nvme i2c_piix4 nvme_core wmi [last unloaded: cpuid] Aug 7 07:26:17 rx [1006006.726180] CR2: 0000000000000020 Aug 7 07:26:17 rx [1006006.729718] ---[ end trace e0e2e37e4e612984 ]--- Prior to seeing the first crash and on other machines we also see the warning in tcp_send_loss_probe() where packets_out is non-zero, but both transmit and retrans queues are empty so we know the box is seeing some accounting issue in this area: Jul 26 09:15:27 kernel: ------------[ cut here ]------------ Jul 26 09:15:27 kernel: invalid inflight: 2 state 1 cwnd 68 mss 8988 Jul 26 09:15:27 kernel: WARNING: CPU: 16 PID: 0 at net/ipv4/tcp_output.c:2605 tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 nft_ct amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif joydev input_leds rndis_host cdc_ether usbnet mii ast drm_vram_helper ttm drm_kms_he> Jul 26 09:15:27 kernel: CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.4.0-174-generic #193-Ubuntu Jul 26 09:15:27 kernel: Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023 Jul 26 09:15:27 kernel: RIP: 0010:tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: Code: 08 26 01 00 75 e2 41 0f b6 54 24 12 41 8b 8c 24 c0 06 00 00 45 89 f0 48 c7 c7 e0 b4 20 a7 c6 05 8d 08 26 01 01 e8 4a c0 0f 00 <0f> 0b eb ba 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 Jul 26 09:15:27 kernel: RSP: 0018:ffffb7838088ce00 EFLAGS: 00010286 Jul 26 09:15:27 kernel: RAX: 0000000000000000 RBX: ffff9b84b5630430 RCX: 0000000000000006 Jul 26 09:15:27 kernel: RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9b8e4621c8c0 Jul 26 09:15:27 kernel: RBP: ffffb7838088ce18 R08: 0000000000000927 R09: 0000000000000004 Jul 26 09:15:27 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b84b5630000 Jul 26 09:15:27 kernel: R13: 0000000000000000 R14: 000000000000231c R15: ffff9b84b5630430 Jul 26 09:15:27 kernel: FS: 0000000000000000(0000) GS:ffff9b8e46200000(0000) knlGS:0000000000000000 Jul 26 09:15:27 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 26 09:15:27 kernel: CR2: 000056238cec2380 CR3: 0000003e49ede005 CR4: 0000000000760ee0 Jul 26 09:15:27 kernel: PKRU: 55555554 Jul 26 09:15:27 kernel: Call Trace: Jul 26 09:15:27 kernel: Jul 26 09:15:27 kernel: ? show_regs.cold+0x1a/0x1f Jul 26 09:15:27 kernel: ? __warn+0x98/0xe0 Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: ? report_bug+0xd1/0x100 Jul 26 09:15:27 kernel: ? do_error_trap+0x9b/0xc0 Jul 26 09:15:27 kernel: ? do_invalid_op+0x3c/0x50 Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: ? invalid_op+0x1e/0x30 Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: tcp_write_timer_handler+0x1b4/0x240 Jul 26 09:15:27 kernel: tcp_write_timer+0x9e/0xe0 Jul 26 09:15:27 kernel: ? tcp_write_timer_handler+0x240/0x240 Jul 26 09:15:27 kernel: call_timer_fn+0x32/0x130 Jul 26 09:15:27 kernel: __run_timers.part.0+0x180/0x280 Jul 26 09:15:27 kernel: ? timerqueue_add+0x9b/0xb0 Jul 26 09:15:27 kernel: ? enqueue_hrtimer+0x3d/0x90 Jul 26 09:15:27 kernel: ? do_error_trap+0x9b/0xc0 Jul 26 09:15:27 kernel: ? do_invalid_op+0x3c/0x50 Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: ? invalid_op+0x1e/0x30 Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220 Jul 26 09:15:27 kernel: tcp_write_timer_handler+0x1b4/0x240 Jul 26 09:15:27 kernel: tcp_write_timer+0x9e/0xe0 Jul 26 09:15:27 kernel: ? tcp_write_timer_handler+0x240/0x240 Jul 26 09:15:27 kernel: call_timer_fn+0x32/0x130 Jul 26 09:15:27 kernel: __run_timers.part.0+0x180/0x280 Jul 26 09:15:27 kernel: ? timerqueue_add+0x9b/0xb0 Jul 26 09:15:27 kernel: ? enqueue_hrtimer+0x3d/0x90 Jul 26 09:15:27 kernel: ? recalibrate_cpu_khz+0x10/0x10 Jul 26 09:15:27 kernel: ? ktime_get+0x3e/0xa0 Jul 26 09:15:27 kernel: ? native_x2apic_icr_write+0x30/0x30 Jul 26 09:15:27 kernel: run_timer_softirq+0x2a/0x50 Jul 26 09:15:27 kernel: __do_softirq+0xd1/0x2c1 Jul 26 09:15:27 kernel: irq_exit+0xae/0xb0 Jul 26 09:15:27 kernel: smp_apic_timer_interrupt+0x7b/0x140 Jul 26 09:15:27 kernel: apic_timer_interrupt+0xf/0x20 Jul 26 09:15:27 kernel: Jul 26 09:15:27 kernel: RIP: 0010:native_safe_halt+0xe/0x10 Jul 26 09:15:27 kernel: Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 36 2c 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 2c 50 00 fb f4 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 dd 5e 61 ff 65 Jul 26 09:15:27 kernel: RSP: 0018:ffffb783801cfe70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 Jul 26 09:15:27 kernel: RAX: ffffffffa6908b20 RBX: 0000000000000010 RCX: 0000000000000001 Jul 26 09:15:27 kernel: RDX: 000000006fc0c97e RSI: 0000000000000082 RDI: 0000000000000082 Jul 26 09:15:27 kernel: RBP: ffffb783801cfe90 R08: 0000000000000000 R09: 0000000000000225 Jul 26 09:15:27 kernel: R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000010 Jul 26 09:15:27 kernel: R13: ffff9b8e390b0000 R14: 0000000000000000 R15: 0000000000000000 Jul 26 09:15:27 kernel: ? __cpuidle_text_start+0x8/0x8 Jul 26 09:15:27 kernel: ? default_idle+0x20/0x140 Jul 26 09:15:27 kernel: arch_cpu_idle+0x15/0x20 Jul 26 09:15:27 kernel: default_idle_call+0x23/0x30 Jul 26 09:15:27 kernel: do_idle+0x1fb/0x270 Jul 26 09:15:27 kernel: cpu_startup_entry+0x20/0x30 Jul 26 09:15:27 kernel: start_secondary+0x178/0x1d0 Jul 26 09:15:27 kernel: secondary_startup_64+0xa4/0xb0 Jul 26 09:15:27 kernel: ---[ end trace e7ac822987e33be1 ]--- The NULL ptr deref is coming from tcp_rto_delta_us() attempting to pull an skb off the head of the retransmit queue and then dereferencing that skb to get the skb_mstamp_ns value via tcp_skb_timestamp_us(skb). The crash is the same one that was reported a # of years ago here: https://lore.kernel.org/netdev/86c0f836-9a7c-438b-d81a-839be45f1f58@gmail.com/T/#t and the kernel we're running has the fix which was added to resolve this issue. Unfortunately we've been unsuccessful so far in reproducing this problem in the lab and do not have the luxury of pushing out a new kernel to try and test if newer kernels resolve this issue at the moment. I realize this is a report against both an Ubuntu kernel and also an older 5.4 kernel. I have reported this issue to Ubuntu here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077657 however I feel like since this issue has possibly cropped up again it makes sense to build in some protection in this path (even on the latest kernel versions) since the code in question just blindly assumes there's a valid skb without testing if it's NULL b/f it looks at the timestamp. Given we have seen crashes in this path before and now this case it seems like we should protect ourselves for when packets_out accounting is incorrect. While we should fix that root cause we should also just make sure the skb is not NULL before dereferencing it. Also add a warn once here to capture some information if/when the problem case is hit again. Fixes: e1a10ef7fa87 ("tcp: introduce tcp_rto_delta_us() helper for xmit timer fix") Signed-off-by: Josh Hunt Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- include/net/tcp.h | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index a3840a2749c1..35be7fd996e5 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2228,9 +2228,26 @@ static inline s64 tcp_rto_delta_us(const struct sock *sk) { const struct sk_buff *skb = tcp_rtx_queue_head(sk); u32 rto = inet_csk(sk)->icsk_rto; - u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + jiffies_to_usecs(rto); - return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp; + if (likely(skb)) { + u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + jiffies_to_usecs(rto); + + return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp; + } else { + WARN_ONCE(1, + "rtx queue emtpy: " + "out:%u sacked:%u lost:%u retrans:%u " + "tlp_high_seq:%u sk_state:%u ca_state:%u " + "advmss:%u mss_cache:%u pmtu:%u\n", + tcp_sk(sk)->packets_out, tcp_sk(sk)->sacked_out, + tcp_sk(sk)->lost_out, tcp_sk(sk)->retrans_out, + tcp_sk(sk)->tlp_high_seq, sk->sk_state, + inet_csk(sk)->icsk_ca_state, + tcp_sk(sk)->advmss, tcp_sk(sk)->mss_cache, + inet_csk(sk)->icsk_pmtu_cookie); + return jiffies_to_usecs(rto); + } + } /* -- Gitee From 19ff15c6194da7d3b03ac2b961f03fd0278f5cad Mon Sep 17 00:00:00 2001 From: Julian Sun Date: Fri, 23 Aug 2024 21:07:30 +0800 Subject: [PATCH 02/51] vfs: fix race between evice_inodes() and find_inode()&iput() stable inclusion from stable-6.6.54 commit 0eed942bc65de1f93eca7bda51344290f9c573bb category: bugfix issue: #IB55S9 CVE: CVE-2024-47679 Signed-off-by: zhangshuqi --------------------------------------- commit 88b1afbf0f6b221f6c5bb66cc80cd3b38d696687 upstream. Hi, all Recently I noticed a bug[1] in btrfs, after digged it into and I believe it'a race in vfs. Let's assume there's a inode (ie ino 261) with i_count 1 is called by iput(), and there's a concurrent thread calling generic_shutdown_super(). cpu0: cpu1: iput() // i_count is 1 ->spin_lock(inode) ->dec i_count to 0 ->iput_final() generic_shutdown_super() ->__inode_add_lru() ->evict_inodes() // cause some reason[2] ->if (atomic_read(inode->i_count)) continue; // return before // inode 261 passed the above check // list_lru_add_obj() // and then schedule out ->spin_unlock() // note here: the inode 261 // was still at sb list and hash list, // and I_FREEING|I_WILL_FREE was not been set btrfs_iget() // after some function calls ->find_inode() // found the above inode 261 ->spin_lock(inode) // check I_FREEING|I_WILL_FREE // and passed ->__iget() ->spin_unlock(inode) // schedule back ->spin_lock(inode) // check (I_NEW|I_FREEING|I_WILL_FREE) flags, // passed and set I_FREEING iput() ->spin_unlock(inode) ->spin_lock(inode) ->evict() // dec i_count to 0 ->iput_final() ->spin_unlock() ->evict() Now, we have two threads simultaneously evicting the same inode, which may trigger the BUG(inode->i_state & I_CLEAR) statement both within clear_inode() and iput(). To fix the bug, recheck the inode->i_count after holding i_lock. Because in the most scenarios, the first check is valid, and the overhead of spin_lock() can be reduced. If there is any misunderstanding, please let me know, thanks. [1]: https://lore.kernel.org/linux-btrfs/000000000000eabe1d0619c48986@google.com/ [2]: The reason might be 1. SB_ACTIVE was removed or 2. mapping_shrinkable() return false when I reproduced the bug. Reported-by: syzbot+67ba3c42bcbb4665d3ad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad CC: stable@vger.kernel.org Fixes: 63997e98a3be ("split invalidate_inodes()") Signed-off-by: Julian Sun Link: https://lore.kernel.org/r/20240823130730.658881-1-sunjunchao2870@gmail.com Reviewed-by: Jan Kara Signed-off-by: Christian Brauner Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- fs/inode.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/inode.c b/fs/inode.c index ae1a6410b53d..e794d82ee43c 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -722,6 +722,10 @@ void evict_inodes(struct super_block *sb) continue; spin_lock(&inode->i_lock); + if (atomic_read(&inode->i_count)) { + spin_unlock(&inode->i_lock); + continue; + } if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { spin_unlock(&inode->i_lock); continue; -- Gitee From d643b6d6f794a62c03aa286f51a2386702db2234 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 13 Sep 2024 17:06:15 +0000 Subject: [PATCH 03/51] netfilter: nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put() stable inclusion from stable-6.6.54 commit af4b8a704f26f38310655bad67fd8096293275a2 category: bugfix issue: #IB55S9 CVE: CVE-2024-47685 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 9c778fe48d20ef362047e3376dee56d77f8500d4 ] syzbot reported that nf_reject_ip6_tcphdr_put() was possibly sending garbage on the four reserved tcp bits (th->res1) Use skb_put_zero() to clear the whole TCP header, as done in nf_reject_ip_tcphdr_put() BUG: KMSAN: uninit-value in nf_reject_ip6_tcphdr_put+0x688/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:255 nf_reject_ip6_tcphdr_put+0x688/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:255 nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344 nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48 expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline] nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288 nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626 nf_hook include/linux/netfilter.h:269 [inline] NF_HOOK include/linux/netfilter.h:312 [inline] ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775 process_backlog+0x4ad/0xa50 net/core/dev.c:6108 __napi_poll+0xe7/0x980 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963 handle_softirqs+0x1ce/0x800 kernel/softirq.c:554 __do_softirq+0x14/0x1a kernel/softirq.c:588 do_softirq+0x9a/0x100 kernel/softirq.c:455 __local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline] __dev_queue_xmit+0x2692/0x5610 net/core/dev.c:4450 dev_queue_xmit include/linux/netdevice.h:3105 [inline] neigh_resolve_output+0x9ca/0xae0 net/core/neighbour.c:1565 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0x2347/0x2ba0 net/ipv6/ip6_output.c:141 __ip6_finish_output net/ipv6/ip6_output.c:215 [inline] ip6_finish_output+0xbb8/0x14b0 net/ipv6/ip6_output.c:226 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x356/0x620 net/ipv6/ip6_output.c:247 dst_output include/net/dst.h:450 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] ip6_xmit+0x1ba6/0x25d0 net/ipv6/ip6_output.c:366 inet6_csk_xmit+0x442/0x530 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x3b07/0x4880 net/ipv4/tcp_output.c:1466 tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline] tcp_connect+0x35b6/0x7130 net/ipv4/tcp_output.c:4143 tcp_v6_connect+0x1bcc/0x1e40 net/ipv6/tcp_ipv6.c:333 __inet_stream_connect+0x2ef/0x1730 net/ipv4/af_inet.c:679 inet_stream_connect+0x6a/0xd0 net/ipv4/af_inet.c:750 __sys_connect_file net/socket.c:2061 [inline] __sys_connect+0x606/0x690 net/socket.c:2078 __do_sys_connect net/socket.c:2088 [inline] __se_sys_connect net/socket.c:2085 [inline] __x64_sys_connect+0x91/0xe0 net/socket.c:2085 x64_sys_call+0x27a5/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:43 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was stored to memory at: nf_reject_ip6_tcphdr_put+0x60c/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:249 nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344 nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48 expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline] nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288 nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626 nf_hook include/linux/netfilter.h:269 [inline] NF_HOOK include/linux/netfilter.h:312 [inline] ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775 process_backlog+0x4ad/0xa50 net/core/dev.c:6108 __napi_poll+0xe7/0x980 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963 handle_softirqs+0x1ce/0x800 kernel/softirq.c:554 __do_softirq+0x14/0x1a kernel/softirq.c:588 Uninit was stored to memory at: nf_reject_ip6_tcphdr_put+0x2ca/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:231 nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344 nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48 expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline] nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288 nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626 nf_hook include/linux/netfilter.h:269 [inline] NF_HOOK include/linux/netfilter.h:312 [inline] ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775 process_backlog+0x4ad/0xa50 net/core/dev.c:6108 __napi_poll+0xe7/0x980 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963 handle_softirqs+0x1ce/0x800 kernel/softirq.c:554 __do_softirq+0x14/0x1a kernel/softirq.c:588 Uninit was created at: slab_post_alloc_hook mm/slub.c:3998 [inline] slab_alloc_node mm/slub.c:4041 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4084 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:583 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:674 alloc_skb include/linux/skbuff.h:1320 [inline] nf_send_reset6+0x98d/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:327 nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48 expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline] nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288 nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626 nf_hook include/linux/netfilter.h:269 [inline] NF_HOOK include/linux/netfilter.h:312 [inline] ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775 process_backlog+0x4ad/0xa50 net/core/dev.c:6108 __napi_poll+0xe7/0x980 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963 handle_softirqs+0x1ce/0x800 kernel/softirq.c:554 __do_softirq+0x14/0x1a kernel/softirq.c:588 Fixes: c8d7b98bec43 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules") Reported-by: syzbot Signed-off-by: Eric Dumazet Reviewed-by: Simon Horman Reviewed-by: Pablo Neira Ayuso Link: https://patch.msgid.link/20240913170615.3670897-1-edumazet@google.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- net/ipv6/netfilter/nf_reject_ipv6.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c index 71d692728230..690d1c047691 100644 --- a/net/ipv6/netfilter/nf_reject_ipv6.c +++ b/net/ipv6/netfilter/nf_reject_ipv6.c @@ -223,33 +223,23 @@ void nf_reject_ip6_tcphdr_put(struct sk_buff *nskb, const struct tcphdr *oth, unsigned int otcplen) { struct tcphdr *tcph; - int needs_ack; skb_reset_transport_header(nskb); - tcph = skb_put(nskb, sizeof(struct tcphdr)); + tcph = skb_put_zero(nskb, sizeof(struct tcphdr)); /* Truncate to length (no data) */ tcph->doff = sizeof(struct tcphdr)/4; tcph->source = oth->dest; tcph->dest = oth->source; if (oth->ack) { - needs_ack = 0; tcph->seq = oth->ack_seq; - tcph->ack_seq = 0; } else { - needs_ack = 1; tcph->ack_seq = htonl(ntohl(oth->seq) + oth->syn + oth->fin + otcplen - (oth->doff<<2)); - tcph->seq = 0; + tcph->ack = 1; } - /* Reset flags */ - ((u_int8_t *)tcph)[13] = 0; tcph->rst = 1; - tcph->ack = needs_ack; - tcph->window = 0; - tcph->urg_ptr = 0; - tcph->check = 0; /* Adjust TCP checksum */ tcph->check = csum_ipv6_magic(&ipv6_hdr(nskb)->saddr, -- Gitee From 65db677e9c71adef62a53932a7de465479d151c2 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 13 Sep 2024 21:17:48 +0200 Subject: [PATCH 04/51] bpf: Fix helper writes to read-only maps stable inclusion from stable-6.6.54 commit a2c8dc7e21803257e762b0bf067fd13e9c995da0 category: bugfix issue: #IB55S9 CVE: CVE-2024-49861 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 32556ce93bc45c730829083cb60f95a2728ea48b ] Lonial found an issue that despite user- and BPF-side frozen BPF map (like in case of .rodata), it was still possible to write into it from a BPF program side through specific helpers having ARG_PTR_TO_{LONG,INT} as arguments. In check_func_arg() when the argument is as mentioned, the meta->raw_mode is never set. Later, check_helper_mem_access(), under the case of PTR_TO_MAP_VALUE as register base type, it assumes BPF_READ for the subsequent call to check_map_access_type() and given the BPF map is read-only it succeeds. The helpers really need to be annotated as ARG_PTR_TO_{LONG,INT} | MEM_UNINIT when results are written into them as opposed to read out of them. The latter indicates that it's okay to pass a pointer to uninitialized memory as the memory is written to anyway. However, ARG_PTR_TO_{LONG,INT} is a special case of ARG_PTR_TO_FIXED_SIZE_MEM just with additional alignment requirement. So it is better to just get rid of the ARG_PTR_TO_{LONG,INT} special cases altogether and reuse the fixed size memory types. For this, add MEM_ALIGNED to additionally ensure alignment given these helpers write directly into the args via * = val. The .arg*_size has been initialized reflecting the actual sizeof(*). MEM_ALIGNED can only be used in combination with MEM_FIXED_SIZE annotated argument types, since in !MEM_FIXED_SIZE cases the verifier does not know the buffer size a priori and therefore cannot blindly write * = val. Fixes: 57c3bb725a3d ("bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types") Reported-by: Lonial Con Signed-off-by: Daniel Borkmann Acked-by: Andrii Nakryiko Acked-by: Shung-Hsi Yu Link: https://lore.kernel.org/r/20240913191754.13290-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- include/linux/bpf.h | 7 +++++-- kernel/bpf/helpers.c | 6 ++++-- kernel/bpf/syscall.c | 3 ++- kernel/bpf/verifier.c | 41 +++++----------------------------------- kernel/trace/bpf_trace.c | 6 ++++-- net/core/filter.c | 6 ++++-- 6 files changed, 24 insertions(+), 45 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2ebb5d4d43dc..16f590237c39 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -673,6 +673,11 @@ enum bpf_type_flag { /* DYNPTR points to xdp_buff */ DYNPTR_TYPE_XDP = BIT(16 + BPF_BASE_TYPE_BITS), + /* Memory must be aligned on some architectures, used in combination with + * MEM_FIXED_SIZE. + */ + MEM_ALIGNED = BIT(17 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; @@ -709,8 +714,6 @@ enum bpf_arg_type { ARG_ANYTHING, /* any (initialized) argument is ok */ ARG_PTR_TO_SPIN_LOCK, /* pointer to bpf_spin_lock */ ARG_PTR_TO_SOCK_COMMON, /* pointer to sock_common */ - ARG_PTR_TO_INT, /* pointer to int */ - ARG_PTR_TO_LONG, /* pointer to long */ ARG_PTR_TO_SOCKET, /* pointer to bpf_sock (fullsock) */ ARG_PTR_TO_BTF_ID, /* pointer to in-kernel struct */ ARG_PTR_TO_RINGBUF_MEM, /* pointer to dynamically reserved ringbuf memory */ diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 31da67703307..6951a2346a3c 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -537,7 +537,8 @@ const struct bpf_func_proto bpf_strtol_proto = { .arg1_type = ARG_PTR_TO_MEM | MEM_RDONLY, .arg2_type = ARG_CONST_SIZE, .arg3_type = ARG_ANYTHING, - .arg4_type = ARG_PTR_TO_LONG, + .arg4_type = ARG_PTR_TO_FIXED_SIZE_MEM | MEM_UNINIT | MEM_ALIGNED, + .arg4_size = sizeof(s64), }; BPF_CALL_4(bpf_strtoul, const char *, buf, size_t, buf_len, u64, flags, @@ -565,7 +566,8 @@ const struct bpf_func_proto bpf_strtoul_proto = { .arg1_type = ARG_PTR_TO_MEM | MEM_RDONLY, .arg2_type = ARG_CONST_SIZE, .arg3_type = ARG_ANYTHING, - .arg4_type = ARG_PTR_TO_LONG, + .arg4_type = ARG_PTR_TO_FIXED_SIZE_MEM | MEM_UNINIT | MEM_ALIGNED, + .arg4_size = sizeof(u64), }; BPF_CALL_3(bpf_strncmp, const char *, s1, u32, s1_sz, const char *, s2) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 4902a7487f07..286377e6878e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -5653,7 +5653,8 @@ static const struct bpf_func_proto bpf_kallsyms_lookup_name_proto = { .arg1_type = ARG_PTR_TO_MEM, .arg2_type = ARG_CONST_SIZE_OR_ZERO, .arg3_type = ARG_ANYTHING, - .arg4_type = ARG_PTR_TO_LONG, + .arg4_type = ARG_PTR_TO_FIXED_SIZE_MEM | MEM_UNINIT | MEM_ALIGNED, + .arg4_size = sizeof(u64), }; static const struct bpf_func_proto * diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c9fc734989c6..b42b05b8cb5d 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7978,16 +7978,6 @@ static bool arg_type_is_dynptr(enum bpf_arg_type type) return base_type(type) == ARG_PTR_TO_DYNPTR; } -static int int_ptr_type_to_size(enum bpf_arg_type type) -{ - if (type == ARG_PTR_TO_INT) - return sizeof(u32); - else if (type == ARG_PTR_TO_LONG) - return sizeof(u64); - - return -EINVAL; -} - static int resolve_map_arg_type(struct bpf_verifier_env *env, const struct bpf_call_arg_meta *meta, enum bpf_arg_type *arg_type) @@ -8060,16 +8050,6 @@ static const struct bpf_reg_types mem_types = { }, }; -static const struct bpf_reg_types int_ptr_types = { - .types = { - PTR_TO_STACK, - PTR_TO_PACKET, - PTR_TO_PACKET_META, - PTR_TO_MAP_KEY, - PTR_TO_MAP_VALUE, - }, -}; - static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE, @@ -8124,8 +8104,6 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = { [ARG_PTR_TO_SPIN_LOCK] = &spin_lock_types, [ARG_PTR_TO_MEM] = &mem_types, [ARG_PTR_TO_RINGBUF_MEM] = &ringbuf_mem_types, - [ARG_PTR_TO_INT] = &int_ptr_types, - [ARG_PTR_TO_LONG] = &int_ptr_types, [ARG_PTR_TO_PERCPU_BTF_ID] = &percpu_btf_ptr_types, [ARG_PTR_TO_FUNC] = &func_ptr_types, [ARG_PTR_TO_STACK] = &stack_ptr_types, @@ -8632,9 +8610,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, */ meta->raw_mode = arg_type & MEM_UNINIT; if (arg_type & MEM_FIXED_SIZE) { - err = check_helper_mem_access(env, regno, - fn->arg_size[arg], false, - meta); + err = check_helper_mem_access(env, regno, fn->arg_size[arg], false, meta); + if (err) + return err; + if (arg_type & MEM_ALIGNED) + err = check_ptr_alignment(env, reg, 0, fn->arg_size[arg], true); } break; case ARG_CONST_SIZE: @@ -8659,17 +8639,6 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, if (err) return err; break; - case ARG_PTR_TO_INT: - case ARG_PTR_TO_LONG: - { - int size = int_ptr_type_to_size(arg_type); - - err = check_helper_mem_access(env, regno, size, false, meta); - if (err) - return err; - err = check_ptr_alignment(env, reg, 0, size, true); - break; - } case ARG_PTR_TO_CONST_STR: { struct bpf_map *map = reg->map_ptr; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 1e79084a9d9d..7a1d6cd15a81 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1220,7 +1220,8 @@ static const struct bpf_func_proto bpf_get_func_arg_proto = { .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_CTX, .arg2_type = ARG_ANYTHING, - .arg3_type = ARG_PTR_TO_LONG, + .arg3_type = ARG_PTR_TO_FIXED_SIZE_MEM | MEM_UNINIT | MEM_ALIGNED, + .arg3_size = sizeof(u64), }; BPF_CALL_2(get_func_ret, void *, ctx, u64 *, value) @@ -1236,7 +1237,8 @@ static const struct bpf_func_proto bpf_get_func_ret_proto = { .func = get_func_ret, .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_CTX, - .arg2_type = ARG_PTR_TO_LONG, + .arg2_type = ARG_PTR_TO_FIXED_SIZE_MEM | MEM_UNINIT | MEM_ALIGNED, + .arg2_size = sizeof(u64), }; BPF_CALL_1(get_func_arg_cnt, void *, ctx) diff --git a/net/core/filter.c b/net/core/filter.c index 24f23a30c945..3b4a98d2b9c1 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6272,7 +6272,8 @@ static const struct bpf_func_proto bpf_skb_check_mtu_proto = { .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_CTX, .arg2_type = ARG_ANYTHING, - .arg3_type = ARG_PTR_TO_INT, + .arg3_type = ARG_PTR_TO_FIXED_SIZE_MEM | MEM_UNINIT | MEM_ALIGNED, + .arg3_size = sizeof(u32), .arg4_type = ARG_ANYTHING, .arg5_type = ARG_ANYTHING, }; @@ -6283,7 +6284,8 @@ static const struct bpf_func_proto bpf_xdp_check_mtu_proto = { .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_CTX, .arg2_type = ARG_ANYTHING, - .arg3_type = ARG_PTR_TO_INT, + .arg3_type = ARG_PTR_TO_FIXED_SIZE_MEM | MEM_UNINIT | MEM_ALIGNED, + .arg3_size = sizeof(u32), .arg4_type = ARG_ANYTHING, .arg5_type = ARG_ANYTHING, }; -- Gitee From 8285444b946b19cc17bda2c7a6aa4ef1eca8f96e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 29 Aug 2024 14:46:39 +0000 Subject: [PATCH 05/51] icmp: change the order of rate limits stable inclusion from stable-6.6.54 commit 662ec52260cc07b9ae53ecd3925183c29d34288b category: bugfix issue: #IB55S9 CVE: CVE-2024-47678 Signed-off-by: zhangshuqi --------------------------------------- commit 8c2bd38b95f75f3d2a08c93e35303e26d480d24e upstream. ICMP messages are ratelimited : After the blamed commits, the two rate limiters are applied in this order: 1) host wide ratelimit (icmp_global_allow()) 2) Per destination ratelimit (inetpeer based) In order to avoid side-channels attacks, we need to apply the per destination check first. This patch makes the following change : 1) icmp_global_allow() checks if the host wide limit is reached. But credits are not yet consumed. This is deferred to 3) 2) The per destination limit is checked/updated. This might add a new node in inetpeer tree. 3) icmp_global_consume() consumes tokens if prior operations succeeded. This means that host wide ratelimit is still effective in keeping inetpeer tree small even under DDOS. As a bonus, I removed icmp_global.lock as the fast path can use a lock-free operation. Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited") Fixes: 4cdf507d5452 ("icmp: add a global rate limitation") Reported-by: Keyu Man Signed-off-by: Eric Dumazet Reviewed-by: David Ahern Cc: Jesper Dangaard Brouer Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240829144641.3880376-2-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- include/net/ip.h | 2 + net/ipv4/icmp.c | 103 ++++++++++++++++++++++++++--------------------- net/ipv6/icmp.c | 28 ++++++++----- 3 files changed, 76 insertions(+), 57 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 162cf2d2f841..0263f40f3ae9 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -787,6 +787,8 @@ static inline void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb) } bool icmp_global_allow(void); +void icmp_global_consume(void); + extern int sysctl_icmp_msgs_per_sec; extern int sysctl_icmp_msgs_burst; diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 3b221643206d..9dffdd876fef 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -222,57 +222,59 @@ int sysctl_icmp_msgs_per_sec __read_mostly = 1000; int sysctl_icmp_msgs_burst __read_mostly = 50; static struct { - spinlock_t lock; - u32 credit; + atomic_t credit; u32 stamp; -} icmp_global = { - .lock = __SPIN_LOCK_UNLOCKED(icmp_global.lock), -}; +} icmp_global; /** * icmp_global_allow - Are we allowed to send one more ICMP message ? * * Uses a token bucket to limit our ICMP messages to ~sysctl_icmp_msgs_per_sec. * Returns false if we reached the limit and can not send another packet. - * Note: called with BH disabled + * Works in tandem with icmp_global_consume(). */ bool icmp_global_allow(void) { - u32 credit, delta, incr = 0, now = (u32)jiffies; - bool rc = false; + u32 delta, now, oldstamp; + int incr, new, old; - /* Check if token bucket is empty and cannot be refilled - * without taking the spinlock. The READ_ONCE() are paired - * with the following WRITE_ONCE() in this same function. + /* Note: many cpus could find this condition true. + * Then later icmp_global_consume() could consume more credits, + * this is an acceptable race. */ - if (!READ_ONCE(icmp_global.credit)) { - delta = min_t(u32, now - READ_ONCE(icmp_global.stamp), HZ); - if (delta < HZ / 50) - return false; - } + if (atomic_read(&icmp_global.credit) > 0) + return true; - spin_lock(&icmp_global.lock); - delta = min_t(u32, now - icmp_global.stamp, HZ); - if (delta >= HZ / 50) { - incr = READ_ONCE(sysctl_icmp_msgs_per_sec) * delta / HZ; - if (incr) - WRITE_ONCE(icmp_global.stamp, now); - } - credit = min_t(u32, icmp_global.credit + incr, - READ_ONCE(sysctl_icmp_msgs_burst)); - if (credit) { - /* We want to use a credit of one in average, but need to randomize - * it for security reasons. - */ - credit = max_t(int, credit - get_random_u32_below(3), 0); - rc = true; + now = jiffies; + oldstamp = READ_ONCE(icmp_global.stamp); + delta = min_t(u32, now - oldstamp, HZ); + if (delta < HZ / 50) + return false; + + incr = READ_ONCE(sysctl_icmp_msgs_per_sec) * delta / HZ; + if (!incr) + return false; + + if (cmpxchg(&icmp_global.stamp, oldstamp, now) == oldstamp) { + old = atomic_read(&icmp_global.credit); + do { + new = min(old + incr, READ_ONCE(sysctl_icmp_msgs_burst)); + } while (!atomic_try_cmpxchg(&icmp_global.credit, &old, new)); } - WRITE_ONCE(icmp_global.credit, credit); - spin_unlock(&icmp_global.lock); - return rc; + return true; } EXPORT_SYMBOL(icmp_global_allow); +void icmp_global_consume(void) +{ + int credits = get_random_u32_below(3); + + /* Note: this might make icmp_global.credit negative. */ + if (credits) + atomic_sub(credits, &icmp_global.credit); +} +EXPORT_SYMBOL(icmp_global_consume); + static bool icmpv4_mask_allow(struct net *net, int type, int code) { if (type > NR_ICMP_TYPES) @@ -289,14 +291,16 @@ static bool icmpv4_mask_allow(struct net *net, int type, int code) return false; } -static bool icmpv4_global_allow(struct net *net, int type, int code) +static bool icmpv4_global_allow(struct net *net, int type, int code, + bool *apply_ratelimit) { if (icmpv4_mask_allow(net, type, code)) return true; - if (icmp_global_allow()) + if (icmp_global_allow()) { + *apply_ratelimit = true; return true; - + } __ICMP_INC_STATS(net, ICMP_MIB_RATELIMITGLOBAL); return false; } @@ -306,15 +310,16 @@ static bool icmpv4_global_allow(struct net *net, int type, int code) */ static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt, - struct flowi4 *fl4, int type, int code) + struct flowi4 *fl4, int type, int code, + bool apply_ratelimit) { struct dst_entry *dst = &rt->dst; struct inet_peer *peer; bool rc = true; int vif; - if (icmpv4_mask_allow(net, type, code)) - goto out; + if (!apply_ratelimit) + return true; /* No rate limit on loopback */ if (dst->dev && (dst->dev->flags&IFF_LOOPBACK)) @@ -329,6 +334,8 @@ static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt, out: if (!rc) __ICMP_INC_STATS(net, ICMP_MIB_RATELIMITHOST); + else + icmp_global_consume(); return rc; } @@ -400,6 +407,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) struct ipcm_cookie ipc; struct rtable *rt = skb_rtable(skb); struct net *net = dev_net(rt->dst.dev); + bool apply_ratelimit = false; struct flowi4 fl4; struct sock *sk; struct inet_sock *inet; @@ -411,11 +419,11 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) if (ip_options_echo(net, &icmp_param->replyopts.opt.opt, skb)) return; - /* Needed by both icmp_global_allow and icmp_xmit_lock */ + /* Needed by both icmpv4_global_allow and icmp_xmit_lock */ local_bh_disable(); - /* global icmp_msgs_per_sec */ - if (!icmpv4_global_allow(net, type, code)) + /* is global icmp_msgs_per_sec exhausted ? */ + if (!icmpv4_global_allow(net, type, code, &apply_ratelimit)) goto out_bh_enable; sk = icmp_xmit_lock(net); @@ -448,7 +456,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) rt = ip_route_output_key(net, &fl4); if (IS_ERR(rt)) goto out_unlock; - if (icmpv4_xrlim_allow(net, rt, &fl4, type, code)) + if (icmpv4_xrlim_allow(net, rt, &fl4, type, code, apply_ratelimit)) icmp_push_reply(sk, icmp_param, &fl4, &ipc, &rt); ip_rt_put(rt); out_unlock: @@ -592,6 +600,7 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, int room; struct icmp_bxm icmp_param; struct rtable *rt = skb_rtable(skb_in); + bool apply_ratelimit = false; struct ipcm_cookie ipc; struct flowi4 fl4; __be32 saddr; @@ -673,7 +682,7 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, } } - /* Needed by both icmp_global_allow and icmp_xmit_lock */ + /* Needed by both icmpv4_global_allow and icmp_xmit_lock */ local_bh_disable(); /* Check global sysctl_icmp_msgs_per_sec ratelimit, unless @@ -681,7 +690,7 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, * loopback, then peer ratelimit still work (in icmpv4_xrlim_allow) */ if (!(skb_in->dev && (skb_in->dev->flags&IFF_LOOPBACK)) && - !icmpv4_global_allow(net, type, code)) + !icmpv4_global_allow(net, type, code, &apply_ratelimit)) goto out_bh_enable; sk = icmp_xmit_lock(net); @@ -740,7 +749,7 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, goto out_unlock; /* peer icmp_ratelimit */ - if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code)) + if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code, apply_ratelimit)) goto ende; /* RFC says return as much as we can without exceeding 576 bytes. */ diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 93a594a901d1..a790294d3104 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -175,14 +175,16 @@ static bool icmpv6_mask_allow(struct net *net, int type) return false; } -static bool icmpv6_global_allow(struct net *net, int type) +static bool icmpv6_global_allow(struct net *net, int type, + bool *apply_ratelimit) { if (icmpv6_mask_allow(net, type)) return true; - if (icmp_global_allow()) + if (icmp_global_allow()) { + *apply_ratelimit = true; return true; - + } __ICMP_INC_STATS(net, ICMP_MIB_RATELIMITGLOBAL); return false; } @@ -191,13 +193,13 @@ static bool icmpv6_global_allow(struct net *net, int type) * Check the ICMP output rate limit */ static bool icmpv6_xrlim_allow(struct sock *sk, u8 type, - struct flowi6 *fl6) + struct flowi6 *fl6, bool apply_ratelimit) { struct net *net = sock_net(sk); struct dst_entry *dst; bool res = false; - if (icmpv6_mask_allow(net, type)) + if (!apply_ratelimit) return true; /* @@ -228,6 +230,8 @@ static bool icmpv6_xrlim_allow(struct sock *sk, u8 type, if (!res) __ICMP6_INC_STATS(net, ip6_dst_idev(dst), ICMP6_MIB_RATELIMITHOST); + else + icmp_global_consume(); dst_release(dst); return res; } @@ -452,6 +456,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, struct net *net; struct ipv6_pinfo *np; const struct in6_addr *saddr = NULL; + bool apply_ratelimit = false; struct dst_entry *dst; struct icmp6hdr tmp_hdr; struct flowi6 fl6; @@ -533,11 +538,12 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, return; } - /* Needed by both icmp_global_allow and icmpv6_xmit_lock */ + /* Needed by both icmpv6_global_allow and icmpv6_xmit_lock */ local_bh_disable(); /* Check global sysctl_icmp_msgs_per_sec ratelimit */ - if (!(skb->dev->flags & IFF_LOOPBACK) && !icmpv6_global_allow(net, type)) + if (!(skb->dev->flags & IFF_LOOPBACK) && + !icmpv6_global_allow(net, type, &apply_ratelimit)) goto out_bh_enable; mip6_addr_swap(skb, parm); @@ -575,7 +581,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, np = inet6_sk(sk); - if (!icmpv6_xrlim_allow(sk, type, &fl6)) + if (!icmpv6_xrlim_allow(sk, type, &fl6, apply_ratelimit)) goto out; tmp_hdr.icmp6_type = type; @@ -717,6 +723,7 @@ static enum skb_drop_reason icmpv6_echo_reply(struct sk_buff *skb) struct ipv6_pinfo *np; const struct in6_addr *saddr = NULL; struct icmp6hdr *icmph = icmp6_hdr(skb); + bool apply_ratelimit = false; struct icmp6hdr tmp_hdr; struct flowi6 fl6; struct icmpv6_msg msg; @@ -781,8 +788,9 @@ static enum skb_drop_reason icmpv6_echo_reply(struct sk_buff *skb) goto out; /* Check the ratelimit */ - if ((!(skb->dev->flags & IFF_LOOPBACK) && !icmpv6_global_allow(net, ICMPV6_ECHO_REPLY)) || - !icmpv6_xrlim_allow(sk, ICMPV6_ECHO_REPLY, &fl6)) + if ((!(skb->dev->flags & IFF_LOOPBACK) && + !icmpv6_global_allow(net, ICMPV6_ECHO_REPLY, &apply_ratelimit)) || + !icmpv6_xrlim_allow(sk, ICMPV6_ECHO_REPLY, &fl6, apply_ratelimit)) goto out_dst_release; idev = __in6_dev_get(skb->dev); -- Gitee From 49cda1267f3718140cb621da0f88b901a5139b8e Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Wed, 28 Aug 2024 01:45:48 +0200 Subject: [PATCH 06/51] firmware_loader: Block path traversal stable inclusion from stable-6.6.54 commit 7420c1bf7fc784e587b87329cc6dfa3dca537aa4 category: bugfix issue: #IB55S9 CVE: CVE-2024-47742 Signed-off-by: zhangshuqi --------------------------------------- commit f0e5311aa8022107d63c54e2f03684ec097d1394 upstream. Most firmware names are hardcoded strings, or are constructed from fairly constrained format strings where the dynamic parts are just some hex numbers or such. However, there are a couple codepaths in the kernel where firmware file names contain string components that are passed through from a device or semi-privileged userspace; the ones I could find (not counting interfaces that require root privileges) are: - lpfc_sli4_request_firmware_update() seems to construct the firmware filename from "ModelName", a string that was previously parsed out of some descriptor ("Vital Product Data") in lpfc_fill_vpd() - nfp_net_fw_find() seems to construct a firmware filename from a model name coming from nfp_hwinfo_lookup(pf->hwinfo, "nffw.partno"), which I think parses some descriptor that was read from the device. (But this case likely isn't exploitable because the format string looks like "netronome/nic_%s", and there shouldn't be any *folders* starting with "netronome/nic_". The previous case was different because there, the "%s" is *at the start* of the format string.) - module_flash_fw_schedule() is reachable from the ETHTOOL_MSG_MODULE_FW_FLASH_ACT netlink command, which is marked as GENL_UNS_ADMIN_PERM (meaning CAP_NET_ADMIN inside a user namespace is enough to pass the privilege check), and takes a userspace-provided firmware name. (But I think to reach this case, you need to have CAP_NET_ADMIN over a network namespace that a special kind of ethernet device is mapped into, so I think this is not a viable attack path in practice.) Fix it by rejecting any firmware names containing ".." path components. For what it's worth, I went looking and haven't found any USB device drivers that use the firmware loader dangerously. Cc: stable@vger.kernel.org Reviewed-by: Danilo Krummrich Fixes: abb139e75c2c ("firmware: teach the kernel to load firmware files directly from the filesystem") Signed-off-by: Jann Horn Acked-by: Luis Chamberlain Link: https://lore.kernel.org/r/20240828-firmware-traversal-v3-1-c76529c63b5f@google.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- drivers/base/firmware_loader/main.c | 30 +++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c index b58c42f1b1ce..0b18c6b46e65 100644 --- a/drivers/base/firmware_loader/main.c +++ b/drivers/base/firmware_loader/main.c @@ -844,6 +844,26 @@ static void fw_log_firmware_info(const struct firmware *fw, const char *name, {} #endif +/* + * Reject firmware file names with ".." path components. + * There are drivers that construct firmware file names from device-supplied + * strings, and we don't want some device to be able to tell us "I would like to + * be sent my firmware from ../../../etc/shadow, please". + * + * Search for ".." surrounded by either '/' or start/end of string. + * + * This intentionally only looks at the firmware name, not at the firmware base + * directory or at symlink contents. + */ +static bool name_contains_dotdot(const char *name) +{ + size_t name_len = strlen(name); + + return strcmp(name, "..") == 0 || strncmp(name, "../", 3) == 0 || + strstr(name, "/../") != NULL || + (name_len >= 3 && strcmp(name+name_len-3, "/..") == 0); +} + /* called from request_firmware() and request_firmware_work_func() */ static int _request_firmware(const struct firmware **firmware_p, const char *name, @@ -864,6 +884,14 @@ _request_firmware(const struct firmware **firmware_p, const char *name, goto out; } + if (name_contains_dotdot(name)) { + dev_warn(device, + "Firmware load for '%s' refused, path contains '..' component\n", + name); + ret = -EINVAL; + goto out; + } + ret = _request_firmware_prepare(&fw, name, device, buf, size, offset, opt_flags); if (ret <= 0) /* error or already assigned */ @@ -941,6 +969,8 @@ _request_firmware(const struct firmware **firmware_p, const char *name, * @name will be used as $FIRMWARE in the uevent environment and * should be distinctive enough not to be confused with any other * firmware image for this or any other device. + * It must not contain any ".." path components - "foo/bar..bin" is + * allowed, but "foo/../bar.bin" is not. * * Caller must hold the reference count of @device. * -- Gitee From cef1eed71400e4547d095e2fcc8ffd607da90d9e Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Mon, 5 Aug 2024 12:20:35 +0200 Subject: [PATCH 07/51] serial: protect uart_port_dtr_rts() in uart_shutdown() too stable inclusion from stable-6.6.57 commit e418d91195d29d5f9c9685ff309b92b04b41dc40 category: bugfix issue: #IB55S9 CVE: CVE-2024-50058 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 602babaa84d627923713acaf5f7e9a4369e77473 ] Commit af224ca2df29 (serial: core: Prevent unsafe uart port access, part 3) added few uport == NULL checks. It added one to uart_shutdown(), so the commit assumes, uport can be NULL in there. But right after that protection, there is an unprotected "uart_port_dtr_rts(uport, false);" call. That is invoked only if HUPCL is set, so I assume that is the reason why we do not see lots of these reports. Or it cannot be NULL at this point at all for some reason :P. Until the above is investigated, stay on the safe side and move this dereference to the if too. I got this inconsistency from Coverity under CID 1585130. Thanks. Signed-off-by: Jiri Slaby (SUSE) Cc: Peter Hurley Cc: Greg Kroah-Hartman Link: https://lore.kernel.org/r/20240805102046.307511-3-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- drivers/tty/serial/serial_core.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c index 2eceef54e0b3..8192ba8048a6 100644 --- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -374,14 +374,16 @@ static void uart_shutdown(struct tty_struct *tty, struct uart_state *state) /* * Turn off DTR and RTS early. */ - if (uport && uart_console(uport) && tty) { - uport->cons->cflag = tty->termios.c_cflag; - uport->cons->ispeed = tty->termios.c_ispeed; - uport->cons->ospeed = tty->termios.c_ospeed; - } + if (uport) { + if (uart_console(uport) && tty) { + uport->cons->cflag = tty->termios.c_cflag; + uport->cons->ispeed = tty->termios.c_ispeed; + uport->cons->ospeed = tty->termios.c_ospeed; + } - if (!tty || C_HUPCL(tty)) - uart_port_dtr_rts(uport, false); + if (!tty || C_HUPCL(tty)) + uart_port_dtr_rts(uport, false); + } uart_port_shutdown(port); } -- Gitee From 1bc9be77fdbe45072fb4b863777385c756df2e50 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 19 Sep 2024 15:28:53 +0200 Subject: [PATCH 08/51] bpf: Fix use-after-free in bpf_uprobe_multi_link_attach() stable inclusion from stable-6.6.54 commit 790c630ab0e7d7aba6d186581d4627c09fce60f3 category: bugfix issue: #IB55S9 CVE: CVE-2024-47675 Signed-off-by: zhangshuqi --------------------------------------- commit 5fe6e308abaea082c20fbf2aa5df8e14495622cf upstream. If bpf_link_prime() fails, bpf_uprobe_multi_link_attach() goes to the error_free label and frees the array of bpf_uprobe's without calling bpf_uprobe_unregister(). This leaks bpf_uprobe->uprobe and worse, this frees bpf_uprobe->consumer without removing it from the uprobe->consumers list. Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link") Closes: https://lore.kernel.org/all/000000000000382d39061f59f2dd@google.com/ Reported-by: syzbot+f7a1c2c2711e4a780f19@syzkaller.appspotmail.com Signed-off-by: Oleg Nesterov Signed-off-by: Peter Zijlstra (Intel) Acked-by: Andrii Nakryiko Acked-by: Jiri Olsa Tested-by: syzbot+f7a1c2c2711e4a780f19@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240813152524.GA7292@redhat.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- kernel/trace/bpf_trace.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 7a1d6cd15a81..9a4758bedbff 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -3291,18 +3291,21 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr ref_ctr_offsets ? ref_ctr_offsets[i] : 0, &uprobes[i].consumer); if (err) { - bpf_uprobe_unregister(&path, uprobes, i); - goto error_free; + link->cnt = i; + goto error_unregister; } } err = bpf_link_prime(&link->link, &link_primer); if (err) - goto error_free; + goto error_unregister; kvfree(ref_ctr_offsets); return bpf_link_settle(&link_primer); +error_unregister: + bpf_uprobe_unregister(&path, uprobes, link->cnt); + error_free: kvfree(ref_ctr_offsets); kvfree(uprobes); -- Gitee From e81603109f75ed36ed423d490237a513b3c17071 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Thu, 22 Aug 2024 10:35:24 +0800 Subject: [PATCH 09/51] ext4: avoid use-after-free in ext4_ext_show_leaf() stable inclusion from stable-6.6.55 commit 34b2096380ba475771971a778a478661a791aa15 category: bugfix issue: #IB55S9 CVE: CVE-2024-49889 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 4e2524ba2ca5f54bdbb9e5153bea00421ef653f5 ] In ext4_find_extent(), path may be freed by error or be reallocated, so using a previously saved *ppath may have been freed and thus may trigger use-after-free, as follows: ext4_split_extent path = *ppath; ext4_split_extent_at(ppath) path = ext4_find_extent(ppath) ext4_split_extent_at(ppath) // ext4_find_extent fails to free path // but zeroout succeeds ext4_ext_show_leaf(inode, path) eh = path[depth].p_hdr // path use-after-free !!! Similar to ext4_split_extent_at(), we use *ppath directly as an input to ext4_ext_show_leaf(). Fix a spelling error by the way. Same problem in ext4_ext_handle_unwritten_extents(). Since 'path' is only used in ext4_ext_show_leaf(), remove 'path' and use *ppath directly. This issue is triggered only when EXT_DEBUG is defined and therefore does not affect functionality. Signed-off-by: Baokun Li Reviewed-by: Jan Kara Reviewed-by: Ojaswin Mujoo Tested-by: Ojaswin Mujoo Link: https://patch.msgid.link/20240822023545.1994557-5-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- fs/ext4/extents.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index cce0362f6139..daee225c5477 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3287,7 +3287,7 @@ static int ext4_split_extent_at(handle_t *handle, } /* - * ext4_split_extents() splits an extent and mark extent which is covered + * ext4_split_extent() splits an extent and mark extent which is covered * by @map as split_flags indicates * * It may result in splitting the extent into multiple extents (up to three) @@ -3363,7 +3363,7 @@ static int ext4_split_extent(handle_t *handle, goto out; } - ext4_ext_show_leaf(inode, path); + ext4_ext_show_leaf(inode, *ppath); out: return err ? err : allocated; } @@ -3827,14 +3827,13 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode, struct ext4_ext_path **ppath, int flags, unsigned int allocated, ext4_fsblk_t newblock) { - struct ext4_ext_path __maybe_unused *path = *ppath; int ret = 0; int err = 0; ext_debug(inode, "logical block %llu, max_blocks %u, flags 0x%x, allocated %u\n", (unsigned long long)map->m_lblk, map->m_len, flags, allocated); - ext4_ext_show_leaf(inode, path); + ext4_ext_show_leaf(inode, *ppath); /* * When writing into unwritten space, we should not fail to @@ -3931,7 +3930,7 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode, if (allocated > map->m_len) allocated = map->m_len; map->m_len = allocated; - ext4_ext_show_leaf(inode, path); + ext4_ext_show_leaf(inode, *ppath); out2: return err ? err : allocated; } -- Gitee From 948cd4616454b1e23b7af356af88fb4bb294c295 Mon Sep 17 00:00:00 2001 From: Christian Marangi Date: Fri, 4 Oct 2024 20:27:58 +0200 Subject: [PATCH 10/51] net: phy: Remove LED entry from LEDs list on unregister stable inclusion from stable-6.6.57 commit 143ffa7878e2d9d9c3836ee8304ce4930f7852a3 category: bugfix issue: #IB55S9 CVE: CVE-2024-50023 Signed-off-by: zhangshuqi --------------------------------------- commit f50b5d74c68e551667e265123659b187a30fe3a5 upstream. Commit c938ab4da0eb ("net: phy: Manual remove LEDs to ensure correct ordering") correctly fixed a problem with using devm_ but missed removing the LED entry from the LEDs list. This cause kernel panic on specific scenario where the port for the PHY is torn down and up and the kmod for the PHY is removed. On setting the port down the first time, the assosiacted LEDs are correctly unregistered. The associated kmod for the PHY is now removed. The kmod is now added again and the port is now put up, the associated LED are registered again. On putting the port down again for the second time after these step, the LED list now have 4 elements. With the first 2 already unregistered previously and the 2 new one registered again. This cause a kernel panic as the first 2 element should have been removed. Fix this by correctly removing the element when LED is unregistered. Reported-by: Daniel Golle Tested-by: Daniel Golle Cc: stable@vger.kernel.org Fixes: c938ab4da0eb ("net: phy: Manual remove LEDs to ensure correct ordering") Signed-off-by: Christian Marangi Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20241004182759.14032-1-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- drivers/net/phy/phy_device.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index c895cd178e6a..bc05ae6e2fbb 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -3082,10 +3082,11 @@ static __maybe_unused int phy_led_hw_is_supported(struct led_classdev *led_cdev, static void phy_leds_unregister(struct phy_device *phydev) { - struct phy_led *phyled; + struct phy_led *phyled, *tmp; - list_for_each_entry(phyled, &phydev->leds, list) { + list_for_each_entry_safe(phyled, tmp, &phydev->leds, list) { led_classdev_unregister(&phyled->led_cdev); + list_del(&phyled->list); } } -- Gitee From e92523917bc7daca75b0982c4e7bb384ed9879ec Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 7 Oct 2024 19:46:01 +0200 Subject: [PATCH 11/51] uprobes: fix kernel info leak via "[uprobes]" vma stable inclusion from stable-6.6.55 commit 5b981d8335e18aef7908a068529a3287258ff6d8 category: bugfix issue: #IB55S9 CVE: CVE-2024-49975 Signed-off-by: zhangshuqi --------------------------------------- commit 34820304cc2cd1804ee1f8f3504ec77813d29c8e upstream. xol_add_vma() maps the uninitialized page allocated by __create_xol_area() into userspace. On some architectures (x86) this memory is readable even without VM_READ, VM_EXEC results in the same pgprot_t as VM_EXEC|VM_READ, although this doesn't really matter, debugger can read this memory anyway. Link: https://lore.kernel.org/all/20240929162047.GA12611@redhat.com/ Reported-by: Will Deacon Fixes: d4b3b6384f98 ("uprobes/core: Allocate XOL slots for uprobes use") Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) Signed-off-by: Oleg Nesterov Signed-off-by: Masami Hiramatsu (Google) Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- kernel/events/uprobes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 3048589e2e85..c7ba0c6356d5 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1492,7 +1492,7 @@ static struct xol_area *__create_xol_area(unsigned long vaddr) area->xol_mapping.name = "[uprobes]"; area->xol_mapping.fault = NULL; area->xol_mapping.pages = area->pages; - area->pages[0] = alloc_page(GFP_HIGHUSER); + area->pages[0] = alloc_page(GFP_HIGHUSER | __GFP_ZERO); if (!area->pages[0]) goto free_bitmap; area->pages[1] = NULL; -- Gitee From 56e73de251ee5fc95fa5c33a4f3ebba4434c0047 Mon Sep 17 00:00:00 2001 From: Anastasia Kovaleva Date: Thu, 3 Oct 2024 13:44:31 +0300 Subject: [PATCH 12/51] net: Fix an unsafe loop on the list stable inclusion from stable-6.6.57 commit 3be342e0332a7c83eb26fbb22bf156fdca467a5d category: bugfix issue: #IB55S9 CVE: CVE-2024-50024 Signed-off-by: zhangshuqi --------------------------------------- commit 1dae9f1187189bc09ff6d25ca97ead711f7e26f9 upstream. The kernel may crash when deleting a genetlink family if there are still listeners for that family: Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0 LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0 Call Trace: __netlink_clear_multicast_users+0x74/0xc0 genl_unregister_family+0xd4/0x2d0 Change the unsafe loop on the list to a safe one, because inside the loop there is an element removal from this list. Fixes: b8273570f802 ("genetlink: fix netns vs. netlink table locking (2)") Cc: stable@vger.kernel.org Signed-off-by: Anastasia Kovaleva Reviewed-by: Dmitry Bogdanov Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- include/net/sock.h | 2 ++ net/netlink/af_netlink.c | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/net/sock.h b/include/net/sock.h index 53b81e0a8981..b5e09790a88b 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -902,6 +902,8 @@ static inline void sk_add_bind2_node(struct sock *sk, struct hlist_head *list) hlist_for_each_entry(__sk, list, sk_bind_node) #define sk_for_each_bound_bhash2(__sk, list) \ hlist_for_each_entry(__sk, list, sk_bind2_node) +#define sk_for_each_bound_safe(__sk, tmp, list) \ + hlist_for_each_entry_safe(__sk, tmp, list, sk_bind_node) /** * sk_for_each_entry_offset_rcu - iterate over a list at a given struct offset diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 6ae782efb1ee..09be2c351d2e 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2147,8 +2147,9 @@ void __netlink_clear_multicast_users(struct sock *ksk, unsigned int group) { struct sock *sk; struct netlink_table *tbl = &nl_table[ksk->sk_protocol]; + struct hlist_node *tmp; - sk_for_each_bound(sk, &tbl->mc_list) + sk_for_each_bound_safe(sk, tmp, &tbl->mc_list) netlink_update_socket_mc(nlk_sk(sk), group, 0); } -- Gitee From 9621b4c48da55f6ec4f3d9bb8c26c515b30f1bd4 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Thu, 22 Aug 2024 10:35:23 +0800 Subject: [PATCH 13/51] ext4: fix slab-use-after-free in ext4_split_extent_at() stable inclusion from stable-6.6.55 commit 8fe117790b37c84c651e2bad9efc0e7fda73c0e3 category: bugfix issue: #IB55S9 CVE: CVE-2024-49884 Signed-off-by: zhangshuqi --------------------------------------- commit c26ab35702f8cd0cdc78f96aa5856bfb77be798f upstream. We hit the following use-after-free: ================================================================== BUG: KASAN: slab-use-after-free in ext4_split_extent_at+0xba8/0xcc0 Read of size 2 at addr ffff88810548ed08 by task kworker/u20:0/40 CPU: 0 PID: 40 Comm: kworker/u20:0 Not tainted 6.9.0-dirty #724 Call Trace: kasan_report+0x93/0xc0 ext4_split_extent_at+0xba8/0xcc0 ext4_split_extent.isra.0+0x18f/0x500 ext4_split_convert_extents+0x275/0x750 ext4_ext_handle_unwritten_extents+0x73e/0x1580 ext4_ext_map_blocks+0xe20/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] Allocated by task 40: __kmalloc_noprof+0x1ac/0x480 ext4_find_extent+0xf3b/0x1e70 ext4_ext_map_blocks+0x188/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] Freed by task 40: kfree+0xf1/0x2b0 ext4_find_extent+0xa71/0x1e70 ext4_ext_insert_extent+0xa22/0x3260 ext4_split_extent_at+0x3ef/0xcc0 ext4_split_extent.isra.0+0x18f/0x500 ext4_split_convert_extents+0x275/0x750 ext4_ext_handle_unwritten_extents+0x73e/0x1580 ext4_ext_map_blocks+0xe20/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] ================================================================== The flow of issue triggering is as follows: ext4_split_extent_at path = *ppath ext4_ext_insert_extent(ppath) ext4_ext_create_new_leaf(ppath) ext4_find_extent(orig_path) path = *orig_path read_extent_tree_block // return -ENOMEM or -EIO ext4_free_ext_path(path) kfree(path) *orig_path = NULL a. If err is -ENOMEM: ext4_ext_dirty(path + path->p_depth) // path use-after-free !!! b. If err is -EIO and we have EXT_DEBUG defined: ext4_ext_show_leaf(path) eh = path[depth].p_hdr // path also use-after-free !!! So when trying to zeroout or fix the extent length, call ext4_find_extent() to update the path. In addition we use *ppath directly as an ext4_ext_show_leaf() input to avoid possible use-after-free when EXT_DEBUG is defined, and to avoid unnecessary path updates. Fixes: dfe5080939ea ("ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling code") Cc: stable@kernel.org Signed-off-by: Baokun Li Reviewed-by: Jan Kara Reviewed-by: Ojaswin Mujoo Tested-by: Ojaswin Mujoo Link: https://patch.msgid.link/20240822023545.1994557-4-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- fs/ext4/extents.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index daee225c5477..ac8f5fd66b94 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3230,6 +3230,25 @@ static int ext4_split_extent_at(handle_t *handle, if (err != -ENOSPC && err != -EDQUOT && err != -ENOMEM) goto out; + /* + * Update path is required because previous ext4_ext_insert_extent() + * may have freed or reallocated the path. Using EXT4_EX_NOFAIL + * guarantees that ext4_find_extent() will not return -ENOMEM, + * otherwise -ENOMEM will cause a retry in do_writepages(), and a + * WARN_ON may be triggered in ext4_da_update_reserve_space() due to + * an incorrect ee_len causing the i_reserved_data_blocks exception. + */ + path = ext4_find_extent(inode, ee_block, ppath, + flags | EXT4_EX_NOFAIL); + if (IS_ERR(path)) { + EXT4_ERROR_INODE(inode, "Failed split extent on %u, err %ld", + split, PTR_ERR(path)); + return PTR_ERR(path); + } + depth = ext_depth(inode); + ex = path[depth].p_ext; + *ppath = path; + if (EXT4_EXT_MAY_ZEROOUT & split_flag) { if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { if (split_flag & EXT4_EXT_DATA_VALID1) { @@ -3282,7 +3301,7 @@ static int ext4_split_extent_at(handle_t *handle, ext4_ext_dirty(handle, inode, path + path->p_depth); return err; out: - ext4_ext_show_leaf(inode, path); + ext4_ext_show_leaf(inode, *ppath); return err; } -- Gitee From 92a738e1e8ff8b51c8ef0fa2b99c245e68bda937 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Tue, 1 Oct 2024 13:17:46 -0400 Subject: [PATCH 14/51] gso: fix udp gso fraglist segmentation after pull from frag_list stable inclusion from stable-6.6.55 commit af3122f5fdc0d00581d6e598a668df6bf54c9daa category: bugfix issue: #IB55S9 CVE: CVE-2024-49978 Signed-off-by: zhangshuqi --------------------------------------- commit a1e40ac5b5e9077fe1f7ae0eb88034db0f9ae1ab upstream. Detect gso fraglist skbs with corrupted geometry (see below) and pass these to skb_segment instead of skb_segment_list, as the first can segment them correctly. Valid SKB_GSO_FRAGLIST skbs - consist of two or more segments - the head_skb holds the protocol headers plus first gso_size - one or more frag_list skbs hold exactly one segment - all but the last must be gso_size Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can modify these skbs, breaking these invariants. In extreme cases they pull all data into skb linear. For UDP, this causes a NULL ptr deref in __udpv4_gso_segment_list_csum at udp_hdr(seg->next)->dest. Detect invalid geometry due to pull, by checking head_skb size. Don't just drop, as this may blackhole a destination. Convert to be able to pass to regular skb_segment. Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediatek.com/ Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Willem de Bruijn Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20241001171752.107580-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- net/ipv4/udp_offload.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index c3d67423ae18..bf2a77b3682a 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -285,8 +285,26 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, return NULL; } - if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST) - return __udp_gso_segment_list(gso_skb, features, is_ipv6); + if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST) { + /* Detect modified geometry and pass those to skb_segment. */ + if (skb_pagelen(gso_skb) - sizeof(*uh) == skb_shinfo(gso_skb)->gso_size) + return __udp_gso_segment_list(gso_skb, features, is_ipv6); + + /* Setup csum, as fraglist skips this in udp4_gro_receive. */ + gso_skb->csum_start = skb_transport_header(gso_skb) - gso_skb->head; + gso_skb->csum_offset = offsetof(struct udphdr, check); + gso_skb->ip_summed = CHECKSUM_PARTIAL; + + uh = udp_hdr(gso_skb); + if (is_ipv6) + uh->check = ~udp_v6_check(gso_skb->len, + &ipv6_hdr(gso_skb)->saddr, + &ipv6_hdr(gso_skb)->daddr, 0); + else + uh->check = ~udp_v4_check(gso_skb->len, + ip_hdr(gso_skb)->saddr, + ip_hdr(gso_skb)->daddr, 0); + } skb_pull(gso_skb, sizeof(*uh)); -- Gitee From a3429421d8a707fb1fdd7f295df9334cd2529984 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Mon, 7 Oct 2024 23:42:04 +0200 Subject: [PATCH 15/51] mm/mremap: fix move_normal_pmd/retract_page_tables race stable inclusion from stable-6.6.58 commit 17396e32f975130b3e6251f024c8807d192e4c3e category: bugfix issue: #IB55S9 CVE: CVE-2024-50066 Signed-off-by: zhangshuqi --------------------------------------- commit 6fa1066fc5d00cb9f1b0e83b7ff6ef98d26ba2aa upstream. In mremap(), move_page_tables() looks at the type of the PMD entry and the specified address range to figure out by which method the next chunk of page table entries should be moved. At that point, the mmap_lock is held in write mode, but no rmap locks are held yet. For PMD entries that point to page tables and are fully covered by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called, which first takes rmap locks, then does move_normal_pmd(). move_normal_pmd() takes the necessary page table locks at source and destination, then moves an entire page table from the source to the destination. The problem is: The rmap locks, which protect against concurrent page table removal by retract_page_tables() in the THP code, are only taken after the PMD entry has been read and it has been decided how to move it. So we can race as follows (with two processes that have mappings of the same tmpfs file that is stored on a tmpfs mount with huge=advise); note that process A accesses page tables through the MM while process B does it through the file rmap: process A process B ========= ========= mremap mremap_to move_vma move_page_tables get_old_pmd alloc_new_pmd *** PREEMPT *** madvise(MADV_COLLAPSE) do_madvise madvise_walk_vmas madvise_vma_behavior madvise_collapse hpage_collapse_scan_file collapse_file retract_page_tables i_mmap_lock_read(mapping) pmdp_collapse_flush i_mmap_unlock_read(mapping) move_pgt_entry(NORMAL_PMD, ...) take_rmap_locks move_normal_pmd drop_rmap_locks When this happens, move_normal_pmd() can end up creating bogus PMD entries in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`. The effect depends on arch-specific and machine-specific details; on x86, you can end up with physical page 0 mapped as a page table, which is likely exploitable for user->kernel privilege escalation. Fix the race by letting process B recheck that the PMD still points to a page table after the rmap locks have been taken. Otherwise, we bail and let the caller fall back to the PTE-level copying path, which will then bail immediately at the pmd_none() check. Bug reachability: Reaching this bug requires that you can create shmem/file THP mappings - anonymous THP uses different code that doesn't zap stuff under rmap locks. File THP is gated on an experimental config flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need shmem THP to hit this bug. As far as I know, getting shmem THP normally requires that you can mount your own tmpfs with the right mount flags, which would require creating your own user+mount namespace; though I don't know if some distros maybe enable shmem THP by default or something like that. Bug impact: This issue can likely be used for user->kernel privilege escalation when it is reachable. Link: https://lkml.kernel.org/r/20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5ead9631f2ea@google.com Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock") Signed-off-by: Jann Horn Signed-off-by: David Hildenbrand Co-developed-by: David Hildenbrand Closes: https://project-zero.issues.chromium.org/371047675 Acked-by: Qi Zheng Reviewed-by: Lorenzo Stoakes Cc: Hugh Dickins Cc: Joel Fernandes Cc: Matthew Wilcox Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- mm/mremap.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/mremap.c b/mm/mremap.c index 382e81c33fc4..df71010baabe 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -238,6 +238,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, { spinlock_t *old_ptl, *new_ptl; struct mm_struct *mm = vma->vm_mm; + bool res = false; pmd_t pmd; if (!arch_supports_page_table_move()) @@ -277,19 +278,25 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, if (new_ptl != old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); - /* Clear the pmd */ pmd = *old_pmd; + + /* Racing with collapse? */ + if (unlikely(!pmd_present(pmd) || pmd_leaf(pmd))) + goto out_unlock; + /* Clear the pmd */ pmd_clear(old_pmd); + res = true; VM_BUG_ON(!pmd_none(*new_pmd)); pmd_populate(mm, new_pmd, pmd_pgtable(pmd)); flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE); +out_unlock: if (new_ptl != old_ptl) spin_unlock(new_ptl); spin_unlock(old_ptl); - return true; + return res; } #else static inline bool move_normal_pmd(struct vm_area_struct *vma, -- Gitee From d3b99ed93e3c7f6e2c90ea3390b06a0d8ac5b470 Mon Sep 17 00:00:00 2001 From: Eduard Zingerman Date: Thu, 22 Aug 2024 01:01:23 -0700 Subject: [PATCH 16/51] bpf: correctly handle malformed BPF_CORE_TYPE_ID_LOCAL relos stable inclusion from stable-6.6.54 commit 2288b54b96dcb55bedebcef3572bb8821fc5e708 category: bugfix issue: #IB55S9 CVE: CVE-2024-49850 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 3d2786d65aaa954ebd3fcc033ada433e10da21c4 ] In case of malformed relocation record of kind BPF_CORE_TYPE_ID_LOCAL referencing a non-existing BTF type, function bpf_core_calc_relo_insn would cause a null pointer deference. Fix this by adding a proper check upper in call stack, as malformed relocation records could be passed from user space. Simplest reproducer is a program: r0 = 0 exit With a single relocation record: .insn_off = 0, /* patch first instruction */ .type_id = 100500, /* this type id does not exist */ .access_str_off = 6, /* offset of string "0" */ .kind = BPF_CORE_TYPE_ID_LOCAL, See the link for original reproducer or next commit for a test case. Fixes: 74753e1462e7 ("libbpf: Replace btf__type_by_id() with btf_type_by_id().") Reported-by: Liu RuiTong Closes: https://lore.kernel.org/bpf/CAK55_s6do7C+DVwbwY_7nKfUz0YLDoiA1v6X3Y9+p0sWzipFSA@mail.gmail.com/ Acked-by: Andrii Nakryiko Signed-off-by: Eduard Zingerman Link: https://lore.kernel.org/r/20240822080124.2995724-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- kernel/bpf/btf.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index a31704a6bb61..cc66d48be546 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -8421,6 +8421,7 @@ int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo, struct bpf_core_cand_list cands = {}; struct bpf_core_relo_res targ_res; struct bpf_core_spec *specs; + const struct btf_type *type; int err; /* ~4k of temp memory necessary to convert LLVM spec like "0:1:0:5" @@ -8430,6 +8431,13 @@ int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo, if (!specs) return -ENOMEM; + type = btf_type_by_id(ctx->btf, relo->type_id); + if (!type) { + bpf_log(ctx->log, "relo #%u: bad type id %u\n", + relo_idx, relo->type_id); + return -EINVAL; + } + if (need_cands) { struct bpf_cand_cache *cc; int i; -- Gitee From f0a90b7c89cdd03b2adc4cadaa4ebc496e923d05 Mon Sep 17 00:00:00 2001 From: Yanjun Zhang Date: Tue, 1 Oct 2024 16:39:30 +0800 Subject: [PATCH 17/51] NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies() stable inclusion from stable-6.6.57 commit fca41e5fa4914d12b2136c25f9dad69520b52683 category: bugfix issue: #IB55S9 CVE: CVE-2024-50046 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit a848c29e3486189aaabd5663bc11aea50c5bd144 ] On the node of an NFS client, some files saved in the mountpoint of the NFS server were copied to another location of the same NFS server. Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference crash with the following syslog: [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [232066.588586] Mem abort info: [232066.588701] ESR = 0x0000000096000007 [232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits [232066.589084] SET = 0, FnV = 0 [232066.589216] EA = 0, S1PTW = 0 [232066.589340] FSC = 0x07: level 3 translation fault [232066.589559] Data abort info: [232066.589683] ISV = 0, ISS = 0x00000007 [232066.589842] CM = 0, WnR = 0 [232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400 [232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000 [232066.590757] Internal error: Oops: 96000007 [#1] SMP [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2 [232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1 [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06 [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4] [232066.598595] sp : ffff8000f568fc70 [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000 [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001 [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050 [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000 [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000 [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6 [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828 [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058 [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000 [232066.601636] Call trace: [232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4] [232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4] [232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4] [232066.602690] kthread+0x110/0x114 [232066.602830] ret_from_fork+0x10/0x20 [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00) [232066.603284] SMP: stopping secondary CPUs [232066.606936] Starting crashdump kernel... [232066.607146] Bye! Analysing the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added by the field copies in handle_async_copy(), and we found a waiting copy process with the stack as: PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp" #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4 #1 [ffff8001116ef760] __schedule at ffff800008dd0650 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4] #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4] #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4] #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4] The NULL-pointer dereference was due to nfs42_complete_copies() listed the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and the data accessed through this pointer was also incorrect. Generally, the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state(). When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED and copies are not deleted in nfs_server->ss_copies, the source state may be passed to the nfs42_complete_copies() process earlier, resulting in this crash scene finally. To solve this issue, we add a list_head nfs_server->ss_src_copies for a server-to-server copy specially. Fixes: 0e65a32c8a56 ("NFS: handle source server reboot") Signed-off-by: Yanjun Zhang Reviewed-by: Trond Myklebust Signed-off-by: Anna Schumaker Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- fs/nfs/client.c | 1 + fs/nfs/nfs42proc.c | 2 +- fs/nfs/nfs4state.c | 2 +- include/linux/nfs_fs_sb.h | 1 + 4 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 44eca51b2808..94bfa1b30ca4 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -986,6 +986,7 @@ struct nfs_server *nfs_alloc_server(void) INIT_LIST_HEAD(&server->layouts); INIT_LIST_HEAD(&server->state_owners_lru); INIT_LIST_HEAD(&server->ss_copies); + INIT_LIST_HEAD(&server->ss_src_copies); atomic_set(&server->active, 0); diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 28704f924612..531c9c20ef1d 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -218,7 +218,7 @@ static int handle_async_copy(struct nfs42_copy_res *res, if (dst_server != src_server) { spin_lock(&src_server->nfs_client->cl_lock); - list_add_tail(©->src_copies, &src_server->ss_copies); + list_add_tail(©->src_copies, &src_server->ss_src_copies); spin_unlock(&src_server->nfs_client->cl_lock); } diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 9a5d911a7edc..32d4a352d06d 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1597,7 +1597,7 @@ static void nfs42_complete_copies(struct nfs4_state_owner *sp, struct nfs4_state complete(©->completion); } } - list_for_each_entry(copy, &sp->so_server->ss_copies, src_copies) { + list_for_each_entry(copy, &sp->so_server->ss_src_copies, src_copies) { if ((test_bit(NFS_CLNT_SRC_SSC_COPY_STATE, &state->flags) && !nfs4_stateid_match_other(&state->stateid, ©->parent_src_state->stateid))) diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index cd628c4b011e..86d96e00c2e3 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -238,6 +238,7 @@ struct nfs_server { struct list_head layouts; struct list_head delegations; struct list_head ss_copies; + struct list_head ss_src_copies; unsigned long mig_gen; unsigned long mig_status; -- Gitee From 9f16bbbe2a02ea76cb1523c27c47fc53024be99f Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 13 Sep 2024 08:31:47 +0000 Subject: [PATCH 18/51] ipv6: avoid possible NULL deref in rt6_uncached_list_flush_dev() stable inclusion from stable-6.6.54 commit 0ceb2f2b5c813f932d6e60d3feec5e7e713da783 category: bugfix issue: #IB55S9 CVE: CVE-2024-47707 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 04ccecfa959d3b9ae7348780d8e379c6486176ac ] Blamed commit accidentally removed a check for rt->rt6i_idev being NULL, as spotted by syzbot: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 10998 Comm: syz-executor Not tainted 6.11.0-rc6-syzkaller-00208-g625403177711 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 RIP: 0010:rt6_uncached_list_flush_dev net/ipv6/route.c:177 [inline] RIP: 0010:rt6_disable_ip+0x33e/0x7e0 net/ipv6/route.c:4914 Code: 41 80 3c 04 00 74 0a e8 90 d0 9b f7 48 8b 7c 24 08 48 8b 07 48 89 44 24 10 4c 89 f0 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 f7 e8 64 d0 9b f7 48 8b 44 24 18 49 39 06 RSP: 0018:ffffc900047374e0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1ffff1100fdf8f33 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88807efc78c0 RBP: ffffc900047375d0 R08: 0000000000000003 R09: fffff520008e6e8c R10: dffffc0000000000 R11: fffff520008e6e8c R12: 1ffff1100fdf8f18 R13: ffff88807efc7998 R14: 0000000000000000 R15: ffff88807efc7930 FS: 0000000000000000(0000) GS:ffff8880b8900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020002a80 CR3: 0000000022f62000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: addrconf_ifdown+0x15d/0x1bd0 net/ipv6/addrconf.c:3856 addrconf_notify+0x3cb/0x1020 notifier_call_chain+0x19f/0x3e0 kernel/notifier.c:93 call_netdevice_notifiers_extack net/core/dev.c:2032 [inline] call_netdevice_notifiers net/core/dev.c:2046 [inline] unregister_netdevice_many_notify+0xd81/0x1c40 net/core/dev.c:11352 unregister_netdevice_many net/core/dev.c:11414 [inline] unregister_netdevice_queue+0x303/0x370 net/core/dev.c:11289 unregister_netdevice include/linux/netdevice.h:3129 [inline] __tun_detach+0x6b9/0x1600 drivers/net/tun.c:685 tun_detach drivers/net/tun.c:701 [inline] tun_chr_close+0x108/0x1b0 drivers/net/tun.c:3510 __fput+0x24a/0x8a0 fs/file_table.c:422 task_work_run+0x24f/0x310 kernel/task_work.c:228 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0xa2f/0x27f0 kernel/exit.c:882 do_group_exit+0x207/0x2c0 kernel/exit.c:1031 __do_sys_exit_group kernel/exit.c:1042 [inline] __se_sys_exit_group kernel/exit.c:1040 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1040 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f1acc77def9 Code: Unable to access opcode bytes at 0x7f1acc77decf. RSP: 002b:00007ffeb26fa738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1acc77def9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000043 RBP: 00007f1acc7dd508 R08: 00007ffeb26f84d7 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000003 R14: 00000000ffffffff R15: 00007ffeb26fa8e0 Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:rt6_uncached_list_flush_dev net/ipv6/route.c:177 [inline] RIP: 0010:rt6_disable_ip+0x33e/0x7e0 net/ipv6/route.c:4914 Code: 41 80 3c 04 00 74 0a e8 90 d0 9b f7 48 8b 7c 24 08 48 8b 07 48 89 44 24 10 4c 89 f0 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 f7 e8 64 d0 9b f7 48 8b 44 24 18 49 39 06 RSP: 0018:ffffc900047374e0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1ffff1100fdf8f33 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88807efc78c0 RBP: ffffc900047375d0 R08: 0000000000000003 R09: fffff520008e6e8c R10: dffffc0000000000 R11: fffff520008e6e8c R12: 1ffff1100fdf8f18 R13: ffff88807efc7998 R14: 0000000000000000 R15: ffff88807efc7930 FS: 0000000000000000(0000) GS:ffff8880b8900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020002a80 CR3: 0000000022f62000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: e332bc67cf5e ("ipv6: Don't call with rt6_uncached_list_flush_dev") Signed-off-by: Eric Dumazet Reviewed-by: Simon Horman Reviewed-by: David Ahern Acked-by: Martin KaFai Lau Link: https://patch.msgid.link/20240913083147.3095442-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- net/ipv6/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 236a45557ba1..b664549de1ad 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -174,7 +174,7 @@ static void rt6_uncached_list_flush_dev(struct net_device *dev) struct net_device *rt_dev = rt->dst.dev; bool handled = false; - if (rt_idev->dev == dev) { + if (rt_idev && rt_idev->dev == dev) { rt->rt6i_idev = in6_dev_get(blackhole_netdev); in6_dev_put(rt_idev); handled = true; -- Gitee From 0ff6c6c65f3518c79c077528ba7951be973ab697 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 8 Oct 2024 14:31:10 +0000 Subject: [PATCH 19/51] net: do not delay dst_entries_add() in dst_release() stable inclusion from stable-6.6.57 commit eae7435b48ffc8e9be0ff9cfeae40af479a609dd category: bugfix issue: #IB55S9 CVE: CVE-2024-50036 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit ac888d58869bb99753e7652be19a151df9ecb35d ] dst_entries_add() uses per-cpu data that might be freed at netns dismantle from ip6_route_net_exit() calling dst_entries_destroy() Before ip6_route_net_exit() can be called, we release all the dsts associated with this netns, via calls to dst_release(), which waits an rcu grace period before calling dst_destroy() dst_entries_add() use in dst_destroy() is racy, because dst_entries_destroy() could have been called already. Decrementing the number of dsts must happen sooner. Notes: 1) in CONFIG_XFRM case, dst_destroy() can call dst_release_immediate(child), this might also cause UAF if the child does not have DST_NOCOUNT set. IPSEC maintainers might take a look and see how to address this. 2) There is also discussion about removing this count of dst, which might happen in future kernels. Fixes: f88649721268 ("ipv4: fix dst race in sk_dst_get()") Closes: https://lore.kernel.org/lkml/CANn89iLCCGsP7SFn9HKpvnKu96Td4KD08xf7aGtiYgZnkjaL=w@mail.gmail.com/T/ Reported-by: Naresh Kamboju Tested-by: Linux Kernel Functional Testing Tested-by: Naresh Kamboju Signed-off-by: Eric Dumazet Cc: Xin Long Cc: Steffen Klassert Reviewed-by: Xin Long Link: https://patch.msgid.link/20241008143110.1064899-1-edumazet@google.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- net/core/dst.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/core/dst.c b/net/core/dst.c index 980e2fd2f013..137b8d1c7220 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -109,9 +109,6 @@ struct dst_entry *dst_destroy(struct dst_entry * dst) child = xdst->child; } #endif - if (!(dst->flags & DST_NOCOUNT)) - dst_entries_add(dst->ops, -1); - if (dst->ops->destroy) dst->ops->destroy(dst); netdev_put(dst->dev, &dst->dev_tracker); @@ -161,17 +158,27 @@ void dst_dev_put(struct dst_entry *dst) } EXPORT_SYMBOL(dst_dev_put); +static void dst_count_dec(struct dst_entry *dst) +{ + if (!(dst->flags & DST_NOCOUNT)) + dst_entries_add(dst->ops, -1); +} + void dst_release(struct dst_entry *dst) { - if (dst && rcuref_put(&dst->__rcuref)) + if (dst && rcuref_put(&dst->__rcuref)) { + dst_count_dec(dst); call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu); + } } EXPORT_SYMBOL(dst_release); void dst_release_immediate(struct dst_entry *dst) { - if (dst && rcuref_put(&dst->__rcuref)) + if (dst && rcuref_put(&dst->__rcuref)) { + dst_count_dec(dst); dst_destroy(dst); + } } EXPORT_SYMBOL(dst_release_immediate); -- Gitee From 9b573cbfa54ea23523be4e5590308bcd84f8e294 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 9 Oct 2024 09:11:32 +0000 Subject: [PATCH 20/51] slip: make slhc_remember() more robust against malicious packets stable inclusion from stable-6.6.57 commit 8bb79eb1db85a10865f0d4dd15b013def3f2d246 category: bugfix issue: #IB55S9 CVE: CVE-2024-50033 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 7d3fce8cbe3a70a1c7c06c9b53696be5d5d8dd5c ] syzbot found that slhc_remember() was missing checks against malicious packets [1]. slhc_remember() only checked the size of the packet was at least 20, which is not good enough. We need to make sure the packet includes the IPv4 and TCP header that are supposed to be carried. Add iph and th pointers to make the code more readable. [1] BUG: KMSAN: uninit-value in slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666 slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666 ppp_receive_nonmp_frame+0xe45/0x35e0 drivers/net/ppp/ppp_generic.c:2455 ppp_receive_frame drivers/net/ppp/ppp_generic.c:2372 [inline] ppp_do_recv+0x65f/0x40d0 drivers/net/ppp/ppp_generic.c:2212 ppp_input+0x7dc/0xe60 drivers/net/ppp/ppp_generic.c:2327 pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379 sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113 __release_sock+0x1da/0x330 net/core/sock.c:3072 release_sock+0x6b/0x250 net/core/sock.c:3626 pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4091 [inline] slab_alloc_node mm/slub.c:4134 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678 alloc_skb include/linux/skbuff.h:1322 [inline] sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732 pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 0 UID: 0 PID: 5460 Comm: syz.2.33 Not tainted 6.12.0-rc2-syzkaller-00006-g87d6aab2389e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Fixes: b5451d783ade ("slip: Move the SLIP drivers") Reported-by: syzbot+2ada1bc857496353be5a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/670646db.050a0220.3f80e.0027.GAE@google.com/T/#u Signed-off-by: Eric Dumazet Link: https://patch.msgid.link/20241009091132.2136321-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- drivers/net/slip/slhc.c | 57 ++++++++++++++++++++++++----------------- 1 file changed, 34 insertions(+), 23 deletions(-) diff --git a/drivers/net/slip/slhc.c b/drivers/net/slip/slhc.c index ba93bab948e0..bf9e801cc61c 100644 --- a/drivers/net/slip/slhc.c +++ b/drivers/net/slip/slhc.c @@ -643,46 +643,57 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize) int slhc_remember(struct slcompress *comp, unsigned char *icp, int isize) { - struct cstate *cs; - unsigned ihl; - + const struct tcphdr *th; unsigned char index; + struct iphdr *iph; + struct cstate *cs; + unsigned int ihl; - if(isize < 20) { - /* The packet is shorter than a legal IP header */ + /* The packet is shorter than a legal IP header. + * Also make sure isize is positive. + */ + if (isize < (int)sizeof(struct iphdr)) { +runt: comp->sls_i_runt++; - return slhc_toss( comp ); + return slhc_toss(comp); } + iph = (struct iphdr *)icp; /* Peek at the IP header's IHL field to find its length */ - ihl = icp[0] & 0xf; - if(ihl < 20 / 4){ - /* The IP header length field is too small */ - comp->sls_i_runt++; - return slhc_toss( comp ); - } - index = icp[9]; - icp[9] = IPPROTO_TCP; + ihl = iph->ihl; + /* The IP header length field is too small, + * or packet is shorter than the IP header followed + * by minimal tcp header. + */ + if (ihl < 5 || isize < ihl * 4 + sizeof(struct tcphdr)) + goto runt; + + index = iph->protocol; + iph->protocol = IPPROTO_TCP; if (ip_fast_csum(icp, ihl)) { /* Bad IP header checksum; discard */ comp->sls_i_badcheck++; - return slhc_toss( comp ); + return slhc_toss(comp); } - if(index > comp->rslot_limit) { + if (index > comp->rslot_limit) { comp->sls_i_error++; return slhc_toss(comp); } - + th = (struct tcphdr *)(icp + ihl * 4); + if (th->doff < sizeof(struct tcphdr) / 4) + goto runt; + if (isize < ihl * 4 + th->doff * 4) + goto runt; /* Update local state */ cs = &comp->rstate[comp->recv_current = index]; comp->flags &=~ SLF_TOSS; - memcpy(&cs->cs_ip,icp,20); - memcpy(&cs->cs_tcp,icp + ihl*4,20); + memcpy(&cs->cs_ip, iph, sizeof(*iph)); + memcpy(&cs->cs_tcp, th, sizeof(*th)); if (ihl > 5) - memcpy(cs->cs_ipopt, icp + sizeof(struct iphdr), (ihl - 5) * 4); - if (cs->cs_tcp.doff > 5) - memcpy(cs->cs_tcpopt, icp + ihl*4 + sizeof(struct tcphdr), (cs->cs_tcp.doff - 5) * 4); - cs->cs_hsize = ihl*2 + cs->cs_tcp.doff*2; + memcpy(cs->cs_ipopt, &iph[1], (ihl - 5) * 4); + if (th->doff > 5) + memcpy(cs->cs_tcpopt, &th[1], (th->doff - 5) * 4); + cs->cs_hsize = ihl*2 + th->doff*2; cs->initialized = true; /* Put headers back on packet * Neither header checksum is recalculated -- Gitee From b2aa089428bd7fa4560b5189f2a002d286f45bf6 Mon Sep 17 00:00:00 2001 From: Martin Wilck Date: Thu, 12 Sep 2024 15:43:08 +0200 Subject: [PATCH 21/51] scsi: sd: Fix off-by-one error in sd_read_block_characteristics() stable inclusion from stable-6.6.54 commit 568c7c4c77eee6df7677bb861b7cee7398a3255d category: bugfix issue: #IB55S9 CVE: CVE-2024-47682 Signed-off-by: zhangshuqi --------------------------------------- commit f81eaf08385ddd474a2f41595a7757502870c0eb upstream. Ff the device returns page 0xb1 with length 8 (happens with qemu v2.x, for example), sd_read_block_characteristics() may attempt an out-of-bounds memory access when accessing the zoned field at offset 8. Fixes: 7fb019c46eee ("scsi: sd: Switch to using scsi_device VPD pages") Cc: stable@vger.kernel.org Signed-off-by: Martin Wilck Link: https://lore.kernel.org/r/20240912134308.282824-1-mwilck@suse.com Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- drivers/scsi/sd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index c62f677084b4..05dcec586e1b 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3116,7 +3116,7 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp) rcu_read_lock(); vpd = rcu_dereference(sdkp->device->vpd_pgb1); - if (!vpd || vpd->len < 8) { + if (!vpd || vpd->len <= 8) { rcu_read_unlock(); return; } -- Gitee From 89dd476dad0fdedeb8c449e6e2a68ea4d4fd83e4 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Fri, 13 Sep 2024 23:46:34 +0200 Subject: [PATCH 22/51] kthread: unpark only parked kthread stable inclusion from stable-6.6.57 commit 19a5029981c87c2ad0845e713837faa88f5d8e2b category: bugfix issue: #IB55S9 CVE: CVE-2024-50019 Signed-off-by: zhangshuqi --------------------------------------- commit 214e01ad4ed7158cab66498810094fac5d09b218 upstream. Calling into kthread unparking unconditionally is mostly harmless when the kthread is already unparked. The wake up is then simply ignored because the target is not in TASK_PARKED state. However if the kthread is per CPU, the wake up is preceded by a call to kthread_bind() which expects the task to be inactive and in TASK_PARKED state, which obviously isn't the case if it is unparked. As a result, calling kthread_stop() on an unparked per-cpu kthread triggers such a warning: WARNING: CPU: 0 PID: 11 at kernel/kthread.c:525 __kthread_bind_mask kernel/kthread.c:525 kthread_stop+0x17a/0x630 kernel/kthread.c:707 destroy_workqueue+0x136/0xc40 kernel/workqueue.c:5810 wg_destruct+0x1e2/0x2e0 drivers/net/wireguard/device.c:257 netdev_run_todo+0xe1a/0x1000 net/core/dev.c:10693 default_device_exit_batch+0xa14/0xa90 net/core/dev.c:11769 ops_exit_list net/core/net_namespace.c:178 [inline] cleanup_net+0x89d/0xcc0 net/core/net_namespace.c:640 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312 worker_thread+0x86d/0xd70 kernel/workqueue.c:3393 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Fix this with skipping unecessary unparking while stopping a kthread. Link: https://lkml.kernel.org/r/20240913214634.12557-1-frederic@kernel.org Fixes: 5c25b5ff89f0 ("workqueue: Tag bound workers with KTHREAD_IS_PER_CPU") Signed-off-by: Frederic Weisbecker Reported-by: syzbot+943d34fa3cf2191e3068@syzkaller.appspotmail.com Tested-by: syzbot+943d34fa3cf2191e3068@syzkaller.appspotmail.com Suggested-by: Thomas Gleixner Cc: Hillf Danton Cc: Tejun Heo Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- kernel/kthread.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/kthread.c b/kernel/kthread.c index 1eea53050bab..270ab22f6854 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -622,6 +622,8 @@ void kthread_unpark(struct task_struct *k) { struct kthread *kthread = to_kthread(k); + if (!test_bit(KTHREAD_SHOULD_PARK, &kthread->flags)) + return; /* * Newly created kthread was parked when the CPU was offline. * The binding was lost and we need to set it again. -- Gitee From f5c878fffb369116150cf6daa14d1b6a3760b5ce Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 9 Oct 2024 18:58:02 +0000 Subject: [PATCH 23/51] ppp: fix ppp_async_encode() illegal access stable inclusion from stable-6.6.57 commit 8fe992ff3df493d1949922ca234419f3ede08dff category: bugfix issue: #IB55S9 CVE: CVE-2024-50035 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 40dddd4b8bd08a69471efd96107a4e1c73fabefc ] syzbot reported an issue in ppp_async_encode() [1] In this case, pppoe_sendmsg() is called with a zero size. Then ppp_async_encode() is called with an empty skb. BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline] BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675 ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline] ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675 ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634 ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline] ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304 pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379 sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113 __release_sock+0x1da/0x330 net/core/sock.c:3072 release_sock+0x6b/0x250 net/core/sock.c:3626 pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4092 [inline] slab_alloc_node mm/slub.c:4135 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678 alloc_skb include/linux/skbuff.h:1322 [inline] sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732 pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted 6.12.0-rc1-syzkaller-00165-g360c1f1f24c6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+1d121645899e7692f92a@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet Reviewed-by: Simon Horman Link: https://patch.msgid.link/20241009185802.3763282-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- drivers/net/ppp/ppp_async.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c index e94a4b08fd63..7d9201ef925f 100644 --- a/drivers/net/ppp/ppp_async.c +++ b/drivers/net/ppp/ppp_async.c @@ -541,7 +541,7 @@ ppp_async_encode(struct asyncppp *ap) * and 7 (code-reject) must be sent as though no options * had been negotiated. */ - islcp = proto == PPP_LCP && 1 <= data[2] && data[2] <= 7; + islcp = proto == PPP_LCP && count >= 3 && 1 <= data[2] && data[2] <= 7; if (i == 0) { if (islcp) -- Gitee From 2d3a86fe473ddd724d48d21bad06ff5055fc2386 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Tue, 6 Aug 2024 16:07:16 +0200 Subject: [PATCH 24/51] f2fs: Require FMODE_WRITE for atomic write ioctls stable inclusion from stable-6.6.54 commit 5e0de753bfe87768ebe6744d869caa92f35e5731 category: bugfix issue: #IB55S9 CVE: CVE-2024-47740 Signed-off-by: zhangshuqi --------------------------------------- commit 4f5a100f87f32cb65d4bb1ad282a08c92f6f591e upstream. The F2FS ioctls for starting and committing atomic writes check for inode_owner_or_capable(), but this does not give LSMs like SELinux or Landlock an opportunity to deny the write access - if the caller's FSUID matches the inode's UID, inode_owner_or_capable() immediately returns true. There are scenarios where LSMs want to deny a process the ability to write particular files, even files that the FSUID of the process owns; but this can currently partially be bypassed using atomic write ioctls in two ways: - F2FS_IOC_START_ATOMIC_REPLACE + F2FS_IOC_COMMIT_ATOMIC_WRITE can truncate an inode to size 0 - F2FS_IOC_START_ATOMIC_WRITE + F2FS_IOC_ABORT_ATOMIC_WRITE can revert changes another process concurrently made to a file Fix it by requiring FMODE_WRITE for these operations, just like for F2FS_IOC_MOVE_RANGE. Since any legitimate caller should only be using these ioctls when intending to write into the file, that seems unlikely to break anything. Fixes: 88b88a667971 ("f2fs: support atomic writes") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn Reviewed-by: Chao Yu Reviewed-by: Eric Biggers Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- fs/f2fs/file.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 68f8f319d125..a4f0394491a9 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -2046,6 +2046,9 @@ static int f2fs_ioc_start_atomic_write(struct file *filp, bool truncate) loff_t isize; int ret; + if (!(filp->f_mode & FMODE_WRITE)) + return -EBADF; + if (!inode_owner_or_capable(idmap, inode)) return -EACCES; @@ -2150,6 +2153,9 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) struct mnt_idmap *idmap = file_mnt_idmap(filp); int ret; + if (!(filp->f_mode & FMODE_WRITE)) + return -EBADF; + if (!inode_owner_or_capable(idmap, inode)) return -EACCES; @@ -2182,6 +2188,9 @@ static int f2fs_ioc_abort_atomic_write(struct file *filp) struct mnt_idmap *idmap = file_mnt_idmap(filp); int ret; + if (!(filp->f_mode & FMODE_WRITE)) + return -EBADF; + if (!inode_owner_or_capable(idmap, inode)) return -EACCES; -- Gitee From 9cb30a9475c54522830bcbf270fe9c57eba731dc Mon Sep 17 00:00:00 2001 From: Leo Yan Date: Mon, 7 Oct 2024 15:47:24 +0100 Subject: [PATCH 25/51] tracing: Consider the NULL character when validating the event length stable inclusion from stable-6.6.59 commit a14a075a14af8d622c576145455702591bdde09d category: bugfix issue: #IB55S9 CVE: CVE-2024-50131 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 0b6e2e22cb23105fcb171ab92f0f7516c69c8471 ] strlen() returns a string length excluding the null byte. If the string length equals to the maximum buffer length, the buffer will have no space for the NULL terminating character. This commit checks this condition and returns failure for it. Link: https://lore.kernel.org/all/20241007144724.920954-1-leo.yan@arm.com/ Fixes: dec65d79fd26 ("tracing/probe: Check event name length correctly") Signed-off-by: Leo Yan Reviewed-by: Steven Rostedt (Google) Signed-off-by: Masami Hiramatsu (Google) Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- kernel/trace/trace_probe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c index 34289f9c6707..63223395a8cd 100644 --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -275,7 +275,7 @@ int traceprobe_parse_event_name(const char **pevent, const char **pgroup, } trace_probe_log_err(offset, NO_EVENT_NAME); return -EINVAL; - } else if (len > MAX_EVENT_NAME_LEN) { + } else if (len >= MAX_EVENT_NAME_LEN) { trace_probe_log_err(offset, EVENT_TOO_LONG); return -EINVAL; } -- Gitee From 140647e33787977c9933708b3d75a42f488e0389 Mon Sep 17 00:00:00 2001 From: Ojaswin Mujoo Date: Fri, 30 Aug 2024 12:50:57 +0530 Subject: [PATCH 26/51] ext4: check stripe size compatibility on remount as well stable inclusion from stable-6.6.54 commit faeff8b1ee2eaa5969c8e994d66c3337298cefed category: bugfix issue: #IB55S9 CVE: CVE-2024-47700 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit ee85e0938aa8f9846d21e4d302c3cf6a2a75110d ] We disable stripe size in __ext4_fill_super if it is not a multiple of the cluster ratio however this check is missed when trying to remount. This can leave us with cases where stripe < cluster_ratio after remount:set making EXT4_B2C(sbi->s_stripe) become 0 that can cause some unforeseen bugs like divide by 0. Fix that by adding the check in remount path as well. Reported-by: syzbot+1ad8bac5af24d01e2cbd@syzkaller.appspotmail.com Tested-by: syzbot+1ad8bac5af24d01e2cbd@syzkaller.appspotmail.com Reviewed-by: Kemeng Shi Reviewed-by: Ritesh Harjani (IBM) Fixes: c3defd99d58c ("ext4: treat stripe in block unit") Signed-off-by: Ojaswin Mujoo Link: https://patch.msgid.link/3a493bb503c3598e25dcfbed2936bb2dff3fece7.1725002410.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- fs/ext4/super.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a1efc64c05c4..942e476f7f4b 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5205,6 +5205,18 @@ static int ext4_block_group_meta_init(struct super_block *sb, int silent) return 0; } +/* + * It's hard to get stripe aligned blocks if stripe is not aligned with + * cluster, just disable stripe and alert user to simplify code and avoid + * stripe aligned allocation which will rarely succeed. + */ +static bool ext4_is_stripe_incompatible(struct super_block *sb, unsigned long stripe) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + return (stripe > 0 && sbi->s_cluster_ratio > 1 && + stripe % sbi->s_cluster_ratio != 0); +} + static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) { struct ext4_super_block *es = NULL; @@ -5312,13 +5324,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) goto failed_mount3; sbi->s_stripe = ext4_get_stripe_size(sbi); - /* - * It's hard to get stripe aligned blocks if stripe is not aligned with - * cluster, just disable stripe and alert user to simpfy code and avoid - * stripe aligned allocation which will rarely successes. - */ - if (sbi->s_stripe > 0 && sbi->s_cluster_ratio > 1 && - sbi->s_stripe % sbi->s_cluster_ratio != 0) { + if (ext4_is_stripe_incompatible(sb, sbi->s_stripe)) { ext4_msg(sb, KERN_WARNING, "stripe (%lu) is not aligned with cluster size (%u), " "stripe is disabled", @@ -6484,6 +6490,15 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb) } + if ((ctx->spec & EXT4_SPEC_s_stripe) && + ext4_is_stripe_incompatible(sb, ctx->s_stripe)) { + ext4_msg(sb, KERN_WARNING, + "stripe (%lu) is not aligned with cluster size (%u), " + "stripe is disabled", + ctx->s_stripe, sbi->s_cluster_ratio); + ctx->s_stripe = 0; + } + /* * Changing the DIOREAD_NOLOCK or DELALLOC mount options may cause * two calls to ext4_should_dioread_nolock() to return inconsistent -- Gitee From a7f37d8194d9561fec038b046de2ea0d70bec3e3 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 27 Sep 2024 07:45:53 +0000 Subject: [PATCH 27/51] ppp: do not assume bh is held in ppp_channel_bridge_input() stable inclusion from stable-6.6.55 commit f9620e2a665aa642625bd2501282bbddff556bd7 category: bugfix issue: #IB55S9 CVE: CVE-2024-49946 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit aec7291003df78cb71fd461d7b672912bde55807 ] Networking receive path is usually handled from BH handler. However, some protocols need to acquire the socket lock, and packets might be stored in the socket backlog is the socket was owned by a user process. In this case, release_sock(), __release_sock(), and sk_backlog_rcv() might call the sk->sk_backlog_rcv() handler in process context. sybot caught ppp was not considering this case in ppp_channel_bridge_input() : WARNING: inconsistent lock state 6.11.0-rc7-syzkaller-g5f5673607153 #0 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/1/24 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff0000db7f11e0 (&pch->downl){+.?.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff0000db7f11e0 (&pch->downl){+.?.}-{2:2}, at: ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2272 [inline] ffff0000db7f11e0 (&pch->downl){+.?.}-{2:2}, at: ppp_input+0x16c/0x854 drivers/net/ppp/ppp_generic.c:2304 {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x48/0x60 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2272 [inline] ppp_input+0x16c/0x854 drivers/net/ppp/ppp_generic.c:2304 pppoe_rcv_core+0xfc/0x314 drivers/net/ppp/pppoe.c:379 sk_backlog_rcv include/net/sock.h:1111 [inline] __release_sock+0x1a8/0x3d8 net/core/sock.c:3004 release_sock+0x68/0x1b8 net/core/sock.c:3558 pppoe_sendmsg+0xc8/0x5d8 drivers/net/ppp/pppoe.c:903 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] __sys_sendto+0x374/0x4f4 net/socket.c:2204 __do_sys_sendto net/socket.c:2216 [inline] __se_sys_sendto net/socket.c:2212 [inline] __arm64_sys_sendto+0xd8/0xf8 net/socket.c:2212 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 irq event stamp: 282914 hardirqs last enabled at (282914): [] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline] hardirqs last enabled at (282914): [] _raw_spin_unlock_irqrestore+0x38/0x98 kernel/locking/spinlock.c:194 hardirqs last disabled at (282913): [] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline] hardirqs last disabled at (282913): [] _raw_spin_lock_irqsave+0x2c/0x7c kernel/locking/spinlock.c:162 softirqs last enabled at (282904): [] softirq_handle_end kernel/softirq.c:400 [inline] softirqs last enabled at (282904): [] handle_softirqs+0xa3c/0xbfc kernel/softirq.c:582 softirqs last disabled at (282909): [] run_ksoftirqd+0x70/0x158 kernel/softirq.c:928 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&pch->downl); lock(&pch->downl); *** DEADLOCK *** 1 lock held by ksoftirqd/1/24: #0: ffff80008f74dfa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x10/0x4c include/linux/rcupdate.h:325 stack backtrace: CPU: 1 UID: 0 PID: 24 Comm: ksoftirqd/1 Not tainted 6.11.0-rc7-syzkaller-g5f5673607153 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 Call trace: dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:319 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:326 __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0xe4/0x150 lib/dump_stack.c:119 dump_stack+0x1c/0x28 lib/dump_stack.c:128 print_usage_bug+0x698/0x9ac kernel/locking/lockdep.c:4000 mark_lock_irq+0x980/0xd2c mark_lock+0x258/0x360 kernel/locking/lockdep.c:4677 __lock_acquire+0xf48/0x779c kernel/locking/lockdep.c:5096 lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x48/0x60 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2272 [inline] ppp_input+0x16c/0x854 drivers/net/ppp/ppp_generic.c:2304 ppp_async_process+0x98/0x150 drivers/net/ppp/ppp_async.c:495 tasklet_action_common+0x318/0x3f4 kernel/softirq.c:785 tasklet_action+0x68/0x8c kernel/softirq.c:811 handle_softirqs+0x2e4/0xbfc kernel/softirq.c:554 run_ksoftirqd+0x70/0x158 kernel/softirq.c:928 smpboot_thread_fn+0x4b0/0x90c kernel/smpboot.c:164 kthread+0x288/0x310 kernel/kthread.c:389 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860 Fixes: 4cf476ced45d ("ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls") Reported-by: syzbot+bd8d55ee2acd0a71d8ce@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/66f661e2.050a0220.38ace9.000f.GAE@google.com/T/#u Signed-off-by: Eric Dumazet Cc: Tom Parkin Cc: James Chapman Link: https://patch.msgid.link/20240927074553.341910-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- drivers/net/ppp/ppp_generic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index a9beacd552cf..12a6716d90a9 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -2254,7 +2254,7 @@ static bool ppp_channel_bridge_input(struct channel *pch, struct sk_buff *skb) if (!pchb) goto out_rcu; - spin_lock(&pchb->downl); + spin_lock_bh(&pchb->downl); if (!pchb->chan) { /* channel got unregistered */ kfree_skb(skb); @@ -2266,7 +2266,7 @@ static bool ppp_channel_bridge_input(struct channel *pch, struct sk_buff *skb) kfree_skb(skb); outl: - spin_unlock(&pchb->downl); + spin_unlock_bh(&pchb->downl); out_rcu: rcu_read_unlock(); -- Gitee From bea6af707fb3838c227fd62d1f35cde12ad53661 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Thu, 22 Aug 2024 10:35:26 +0800 Subject: [PATCH 28/51] ext4: aovid use-after-free in ext4_ext_insert_extent() stable inclusion from stable-6.6.55 commit 8162ee5d94b8c0351be0a9321be134872a7654a1 category: bugfix issue: #IB55S9 CVE: CVE-2024-49883 Signed-off-by: zhangshuqi --------------------------------------- commit a164f3a432aae62ca23d03e6d926b122ee5b860d upstream. As Ojaswin mentioned in Link, in ext4_ext_insert_extent(), if the path is reallocated in ext4_ext_create_new_leaf(), we'll use the stale path and cause UAF. Below is a sample trace with dummy values: ext4_ext_insert_extent path = *ppath = 2000 ext4_ext_create_new_leaf(ppath) ext4_find_extent(ppath) path = *ppath = 2000 if (depth > path[0].p_maxdepth) kfree(path = 2000); *ppath = path = NULL; path = kcalloc() = 3000 *ppath = 3000; return path; /* here path is still 2000, UAF! */ eh = path[depth].p_hdr ================================================================== BUG: KASAN: slab-use-after-free in ext4_ext_insert_extent+0x26d4/0x3330 Read of size 8 at addr ffff8881027bf7d0 by task kworker/u36:1/179 CPU: 3 UID: 0 PID: 179 Comm: kworker/u6:1 Not tainted 6.11.0-rc2-dirty #866 Call Trace: ext4_ext_insert_extent+0x26d4/0x3330 ext4_ext_map_blocks+0xe22/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 [...] Allocated by task 179: ext4_find_extent+0x81c/0x1f70 ext4_ext_map_blocks+0x146/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 ext4_writepages+0x26d/0x4e0 do_writepages+0x175/0x700 [...] Freed by task 179: kfree+0xcb/0x240 ext4_find_extent+0x7c0/0x1f70 ext4_ext_insert_extent+0xa26/0x3330 ext4_ext_map_blocks+0xe22/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 ext4_writepages+0x26d/0x4e0 do_writepages+0x175/0x700 [...] ================================================================== So use *ppath to update the path to avoid the above problem. Reported-by: Ojaswin Mujoo Closes: https://lore.kernel.org/r/ZqyL6rmtwl6N4MWR@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com Fixes: 10809df84a4d ("ext4: teach ext4_ext_find_extent() to realloc path if necessary") Cc: stable@kernel.org Signed-off-by: Baokun Li Reviewed-by: Jan Kara Link: https://patch.msgid.link/20240822023545.1994557-7-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- fs/ext4/extents.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index ac8f5fd66b94..814a2cbe361d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2103,6 +2103,7 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode, ppath, newext); if (err) goto cleanup; + path = *ppath; depth = ext_depth(inode); eh = path[depth].p_hdr; -- Gitee From 0a51d9f8da854fd80a63e1bc83ecf966fdfb7cc3 Mon Sep 17 00:00:00 2001 From: Jonathan McDowell Date: Fri, 16 Aug 2024 12:55:46 +0100 Subject: [PATCH 29/51] tpm: Clean up TPM space after command failure stable inclusion from stable-6.6.54 commit 82478cb8a23bd4f97935bbe60d64528c6d9918b4 category: bugfix issue: #IB55S9 CVE: CVE-2024-49851 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit e3aaebcbb7c6b403416f442d1de70d437ce313a7 ] tpm_dev_transmit prepares the TPM space before attempting command transmission. However if the command fails no rollback of this preparation is done. This can result in transient handles being leaked if the device is subsequently closed with no further commands performed. Fix this by flushing the space in the event of command transmission failure. Fixes: 745b361e989a ("tpm: infrastructure for TPM spaces") Signed-off-by: Jonathan McDowell Reviewed-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- drivers/char/tpm/tpm-dev-common.c | 2 ++ drivers/char/tpm/tpm2-space.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c index 30b4c288c1bb..c3fbbf4d3db7 100644 --- a/drivers/char/tpm/tpm-dev-common.c +++ b/drivers/char/tpm/tpm-dev-common.c @@ -47,6 +47,8 @@ static ssize_t tpm_dev_transmit(struct tpm_chip *chip, struct tpm_space *space, if (!ret) ret = tpm2_commit_space(chip, space, buf, &len); + else + tpm2_flush_space(chip); out_rc: return ret ? ret : len; diff --git a/drivers/char/tpm/tpm2-space.c b/drivers/char/tpm/tpm2-space.c index 363afdd4d1d3..d4d1007fe811 100644 --- a/drivers/char/tpm/tpm2-space.c +++ b/drivers/char/tpm/tpm2-space.c @@ -166,6 +166,9 @@ void tpm2_flush_space(struct tpm_chip *chip) struct tpm_space *space = &chip->work_space; int i; + if (!space) + return; + for (i = 0; i < ARRAY_SIZE(space->context_tbl); i++) if (space->context_tbl[i] && ~space->context_tbl[i]) tpm2_flush_context(chip, space->context_tbl[i]); -- Gitee From a0691567ea8d72881543dd34de2256ea627c7d32 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Thu, 22 Aug 2024 10:35:28 +0800 Subject: [PATCH 30/51] ext4: fix double brelse() the buffer of the extents path stable inclusion from stable-6.6.55 commit 68a69cf60660c73990c1875f94a5551600b04775 category: bugfix issue: #IB55S9 CVE: CVE-2024-49882 Signed-off-by: zhangshuqi --------------------------------------- commit dcaa6c31134c0f515600111c38ed7750003e1b9c upstream. In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been released, otherwise it may be released twice. An example of what triggers this is as follows: split2 map split1 |--------|-------|--------| ext4_ext_map_blocks ext4_ext_handle_unwritten_extents ext4_split_convert_extents // path->p_depth == 0 ext4_split_extent // 1. do split1 ext4_split_extent_at |ext4_ext_insert_extent | ext4_ext_create_new_leaf | ext4_ext_grow_indepth | le16_add_cpu(&neh->eh_depth, 1) | ext4_find_extent | // return -ENOMEM |// get error and try zeroout |path = ext4_find_extent | path->p_depth = 1 |ext4_ext_try_to_merge | ext4_ext_try_to_merge_up | path->p_depth = 0 | brelse(path[1].p_bh) ---> not set to NULL here |// zeroout success // 2. update path ext4_find_extent // 3. do split2 ext4_split_extent_at ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth le16_add_cpu(&neh->eh_depth, 1) ext4_find_extent path[0].p_bh = NULL; path->p_depth = 1 read_extent_tree_block ---> return err // path[1].p_bh is still the old value ext4_free_ext_path ext4_ext_drop_refs // path->p_depth == 1 brelse(path[1].p_bh) ---> brelse a buffer twice Finally got the following WARRNING when removing the buffer from lru: ============================================ VFS: brelse: Trying to free free buffer WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90 CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716 RIP: 0010:__brelse+0x58/0x90 Call Trace: __find_get_block+0x6e7/0x810 bdev_getblk+0x2b/0x480 __ext4_get_inode_loc+0x48a/0x1240 ext4_get_inode_loc+0xb2/0x150 ext4_reserve_inode_write+0xb7/0x230 __ext4_mark_inode_dirty+0x144/0x6a0 ext4_ext_insert_extent+0x9c8/0x3230 ext4_ext_map_blocks+0xf45/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] ============================================ Fixes: ecb94f5fdf4b ("ext4: collapse a single extent tree block into the inode if possible") Cc: stable@kernel.org Signed-off-by: Baokun Li Reviewed-by: Jan Kara Reviewed-by: Ojaswin Mujoo Tested-by: Ojaswin Mujoo Link: https://patch.msgid.link/20240822023545.1994557-9-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- fs/ext4/extents.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 814a2cbe361d..f7143d2be3a7 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1877,6 +1877,7 @@ static void ext4_ext_try_to_merge_up(handle_t *handle, path[0].p_hdr->eh_max = cpu_to_le16(max_root); brelse(path[1].p_bh); + path[1].p_bh = NULL; ext4_free_blocks(handle, inode, NULL, blk, 1, EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); } -- Gitee From 9277ef09302475ea78a1127a657ace52e2d1b607 Mon Sep 17 00:00:00 2001 From: Dmitry Antipov Date: Fri, 6 Sep 2024 15:31:51 +0300 Subject: [PATCH 31/51] wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop() stable inclusion from stable-6.6.54 commit 058c9026ad79dc98572442fd4c7e9a36aba6f596 category: bugfix issue: #IB55S9 CVE: CVE-2024-47713 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 9d301de12da6e1bb069a9835c38359b8e8135121 ] Since '__dev_queue_xmit()' should be called with interrupts enabled, the following backtrace: ieee80211_do_stop() ... spin_lock_irqsave(&local->queue_stop_reason_lock, flags) ... ieee80211_free_txskb() ieee80211_report_used_skb() ieee80211_report_ack_skb() cfg80211_mgmt_tx_status_ext() nl80211_frame_tx_status() genlmsg_multicast_netns() genlmsg_multicast_netns_filtered() nlmsg_multicast_filtered() netlink_broadcast_filtered() do_one_broadcast() netlink_broadcast_deliver() __netlink_sendskb() netlink_deliver_tap() __netlink_deliver_tap_skb() dev_queue_xmit() __dev_queue_xmit() ; with IRQS disabled ... spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags) issues the warning (as reported by syzbot reproducer): WARNING: CPU: 2 PID: 5128 at kernel/softirq.c:362 __local_bh_enable_ip+0xc3/0x120 Fix this by implementing a two-phase skb reclamation in 'ieee80211_do_stop()', where actual work is performed outside of a section with interrupts disabled. Fixes: 5061b0c2b906 ("mac80211: cooperate more with network namespaces") Reported-by: syzbot+1a3986bbd3169c307819@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1a3986bbd3169c307819 Signed-off-by: Dmitry Antipov Link: https://patch.msgid.link/20240906123151.351647-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- net/mac80211/iface.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index 6e3bfb46af44..6cc17ea15ae2 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -445,6 +445,7 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata, bool going_do { struct ieee80211_local *local = sdata->local; unsigned long flags; + struct sk_buff_head freeq; struct sk_buff *skb, *tmp; u32 hw_reconf_flags = 0; int i, flushed; @@ -631,18 +632,32 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata, bool going_do skb_queue_purge(&sdata->status_queue); } + /* + * Since ieee80211_free_txskb() may issue __dev_queue_xmit() + * which should be called with interrupts enabled, reclamation + * is done in two phases: + */ + __skb_queue_head_init(&freeq); + + /* unlink from local queues... */ spin_lock_irqsave(&local->queue_stop_reason_lock, flags); for (i = 0; i < IEEE80211_MAX_QUEUES; i++) { skb_queue_walk_safe(&local->pending[i], skb, tmp) { struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); if (info->control.vif == &sdata->vif) { __skb_unlink(skb, &local->pending[i]); - ieee80211_free_txskb(&local->hw, skb); + __skb_queue_tail(&freeq, skb); } } } spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags); + /* ... and perform actual reclamation with interrupts enabled. */ + skb_queue_walk_safe(&freeq, skb, tmp) { + __skb_unlink(skb, &freeq); + ieee80211_free_txskb(&local->hw, skb); + } + if (sdata->vif.type == NL80211_IFTYPE_AP_VLAN) ieee80211_txq_remove_vlan(local, sdata); -- Gitee From 32ffd1959aaa04f4d57981dcb708e6f43b3e0d97 Mon Sep 17 00:00:00 2001 From: Roman Smirnov Date: Tue, 17 Sep 2024 18:54:53 +0300 Subject: [PATCH 32/51] KEYS: prevent NULL pointer dereference in find_asymmetric_key() stable inclusion from stable-6.6.54 commit a3765b497a4f5224cb2f7a6a2d3357d3066214ee category: bugfix issue: #IB55S9 CVE: CVE-2024-47743 Signed-off-by: zhangshuqi --------------------------------------- commit 70fd1966c93bf3bfe3fe6d753eb3d83a76597eef upstream. In find_asymmetric_key(), if all NULLs are passed in the id_{0,1,2} arguments, the kernel will first emit WARN but then have an oops because id_2 gets dereferenced anyway. Add the missing id_2 check and move WARN_ON() to the final else branch to avoid duplicate NULL checks. Found by Linux Verification Center (linuxtesting.org) with Svace static analysis tool. Cc: stable@vger.kernel.org # v5.17+ Fixes: 7d30198ee24f ("keys: X.509 public key issuer lookup without AKID") Suggested-by: Sergey Shtylyov Signed-off-by: Roman Smirnov Reviewed-by: Sergey Shtylyov Reviewed-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- crypto/asymmetric_keys/asymmetric_type.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/crypto/asymmetric_keys/asymmetric_type.c b/crypto/asymmetric_keys/asymmetric_type.c index a5da8ccd353e..43af5fa510c0 100644 --- a/crypto/asymmetric_keys/asymmetric_type.c +++ b/crypto/asymmetric_keys/asymmetric_type.c @@ -60,17 +60,18 @@ struct key *find_asymmetric_key(struct key *keyring, char *req, *p; int len; - WARN_ON(!id_0 && !id_1 && !id_2); - if (id_0) { lookup = id_0->data; len = id_0->len; } else if (id_1) { lookup = id_1->data; len = id_1->len; - } else { + } else if (id_2) { lookup = id_2->data; len = id_2->len; + } else { + WARN_ON(1); + return ERR_PTR(-EINVAL); } /* Construct an identifier "id:". */ -- Gitee From cbcc67179840c89882673b41f0340fe57a428510 Mon Sep 17 00:00:00 2001 From: Calvin Owens Date: Mon, 29 Jul 2024 17:05:51 +0100 Subject: [PATCH 33/51] ARM: 9410/1: vfp: Use asm volatile in fmrx/fmxr macros stable inclusion from stable-6.6.54 commit 9fc60f2bdd43e758bdf0305c0fc83221419ddb3f category: bugfix issue: #IB55S9 CVE: CVE-2024-47716 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 89a906dfa8c3b21b3e5360f73c49234ac1eb885b ] Floating point instructions in userspace can crash some arm kernels built with clang/LLD 17.0.6: BUG: unsupported FP instruction in kernel mode FPEXC == 0xc0000780 Internal error: Oops - undefined instruction: 0 [#1] ARM CPU: 0 PID: 196 Comm: vfp-reproducer Not tainted 6.10.0 #1 Hardware name: BCM2835 PC is at vfp_support_entry+0xc8/0x2cc LR is at do_undefinstr+0xa8/0x250 pc : [] lr : [] psr: a0000013 sp : dc8d1f68 ip : 60000013 fp : bedea19c r10: ec532b17 r9 : 00000010 r8 : 0044766c r7 : c0000780 r6 : ec532b17 r5 : c1c13800 r4 : dc8d1fb0 r3 : c10072c4 r2 : c0101c88 r1 : ec532b17 r0 : 0044766c Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 00c5387d Table: 0251c008 DAC: 00000051 Register r0 information: non-paged memory Register r1 information: vmalloc memory Register r2 information: non-slab/vmalloc memory Register r3 information: non-slab/vmalloc memory Register r4 information: 2-page vmalloc region Register r5 information: slab kmalloc-cg-2k Register r6 information: vmalloc memory Register r7 information: non-slab/vmalloc memory Register r8 information: non-paged memory Register r9 information: zero-size pointer Register r10 information: vmalloc memory Register r11 information: non-paged memory Register r12 information: non-paged memory Process vfp-reproducer (pid: 196, stack limit = 0x61aaaf8b) Stack: (0xdc8d1f68 to 0xdc8d2000) 1f60: 0000081f b6f69300 0000000f c10073f4 c10072c4 dc8d1fb0 1f80: ec532b17 0c532b17 0044766c b6f9ccd8 00000000 c010a80c 00447670 60000010 1fa0: ffffffff c1c13800 00c5387d c0100f10 b6f68af8 00448fc0 00000000 bedea188 1fc0: bedea314 00000001 00448ebc b6f9d000 00447608 b6f9ccd8 00000000 bedea19c 1fe0: bede9198 bedea188 b6e1061c 0044766c 60000010 ffffffff 00000000 00000000 Call trace: [] (vfp_support_entry) from [] (do_undefinstr+0xa8/0x250) [] (do_undefinstr) from [] (__und_usr+0x70/0x80) Exception stack(0xdc8d1fb0 to 0xdc8d1ff8) 1fa0: b6f68af8 00448fc0 00000000 bedea188 1fc0: bedea314 00000001 00448ebc b6f9d000 00447608 b6f9ccd8 00000000 bedea19c 1fe0: bede9198 bedea188 b6e1061c 0044766c 60000010 ffffffff Code: 0a000061 e3877202 e594003c e3a09010 (eef16a10) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception in interrupt ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- This is a minimal userspace reproducer on a Raspberry Pi Zero W: #include #include int main(void) { double v = 1.0; printf("%fn", NAN + *(volatile double *)&v); return 0; } Another way to consistently trigger the oops is: calvin@raspberry-pi-zero-w ~$ python -c "import json" The bug reproduces only when the kernel is built with DYNAMIC_DEBUG=n, because the pr_debug() calls act as barriers even when not activated. This is the output from the same kernel source built with the same compiler and DYNAMIC_DEBUG=y, where the userspace reproducer works as expected: VFP: bounce: trigger ec532b17 fpexc c0000780 VFP: emulate: INST=0xee377b06 SCR=0x00000000 VFP: bounce: trigger eef1fa10 fpexc c0000780 VFP: emulate: INST=0xeeb40b40 SCR=0x00000000 VFP: raising exceptions 30000000 calvin@raspberry-pi-zero-w ~$ ./vfp-reproducer nan Crudely grepping for vmsr/vmrs instructions in the otherwise nearly idential text for vfp_support_entry() makes the problem obvious: vmlinux.llvm.good [0xc0101cb8] <+48>: vmrs r7, fpexc vmlinux.llvm.good [0xc0101cd8] <+80>: vmsr fpexc, r0 vmlinux.llvm.good [0xc0101d20] <+152>: vmsr fpexc, r7 vmlinux.llvm.good [0xc0101d38] <+176>: vmrs r4, fpexc vmlinux.llvm.good [0xc0101d6c] <+228>: vmrs r0, fpscr vmlinux.llvm.good [0xc0101dc4] <+316>: vmsr fpexc, r0 vmlinux.llvm.good [0xc0101dc8] <+320>: vmrs r0, fpsid vmlinux.llvm.good [0xc0101dcc] <+324>: vmrs r6, fpscr vmlinux.llvm.good [0xc0101e10] <+392>: vmrs r10, fpinst vmlinux.llvm.good [0xc0101eb8] <+560>: vmrs r10, fpinst2 vmlinux.llvm.bad [0xc0101cb8] <+48>: vmrs r7, fpexc vmlinux.llvm.bad [0xc0101cd8] <+80>: vmsr fpexc, r0 vmlinux.llvm.bad [0xc0101d20] <+152>: vmsr fpexc, r7 vmlinux.llvm.bad [0xc0101d30] <+168>: vmrs r0, fpscr vmlinux.llvm.bad [0xc0101d50] <+200>: vmrs r6, fpscr <== BOOM! vmlinux.llvm.bad [0xc0101d6c] <+228>: vmsr fpexc, r0 vmlinux.llvm.bad [0xc0101d70] <+232>: vmrs r0, fpsid vmlinux.llvm.bad [0xc0101da4] <+284>: vmrs r10, fpinst vmlinux.llvm.bad [0xc0101df8] <+368>: vmrs r4, fpexc vmlinux.llvm.bad [0xc0101e5c] <+468>: vmrs r10, fpinst2 I think LLVM's reordering is valid as the code is currently written: the compiler doesn't know the instructions have side effects in hardware. Fix by using "asm volatile" in fmxr() and fmrx(), so they cannot be reordered with respect to each other. The original compiler now produces working kernels on my hardware with DYNAMIC_DEBUG=n. This is the relevant piece of the diff of the vfp_support_entry() text, from the original oopsing kernel to a working kernel with this patch: vmrs r0, fpscr tst r0, #4096 bne 0xc0101d48 tst r0, #458752 beq 0xc0101ecc orr r7, r7, #536870912 ldr r0, [r4, #0x3c] mov r9, #16 -vmrs r6, fpscr orr r9, r9, #251658240 add r0, r0, #4 str r0, [r4, #0x3c] mvn r0, #159 sub r0, r0, #-1207959552 and r0, r7, r0 vmsr fpexc, r0 vmrs r0, fpsid +vmrs r6, fpscr and r0, r0, #983040 cmp r0, #65536 bne 0xc0101d88 Fixes: 4708fb041346 ("ARM: vfp: Reimplement VFP exception entry in C code") Signed-off-by: Calvin Owens Signed-off-by: Russell King (Oracle) Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- arch/arm/vfp/vfpinstr.h | 48 ++++++++++++++++++++++------------------- 1 file changed, 26 insertions(+), 22 deletions(-) diff --git a/arch/arm/vfp/vfpinstr.h b/arch/arm/vfp/vfpinstr.h index 3c7938fd40aa..32090b0fb250 100644 --- a/arch/arm/vfp/vfpinstr.h +++ b/arch/arm/vfp/vfpinstr.h @@ -64,33 +64,37 @@ #ifdef CONFIG_AS_VFP_VMRS_FPINST -#define fmrx(_vfp_) ({ \ - u32 __v; \ - asm(".fpu vfpv2\n" \ - "vmrs %0, " #_vfp_ \ - : "=r" (__v) : : "cc"); \ - __v; \ - }) - -#define fmxr(_vfp_,_var_) \ - asm(".fpu vfpv2\n" \ - "vmsr " #_vfp_ ", %0" \ - : : "r" (_var_) : "cc") +#define fmrx(_vfp_) ({ \ + u32 __v; \ + asm volatile (".fpu vfpv2\n" \ + "vmrs %0, " #_vfp_ \ + : "=r" (__v) : : "cc"); \ + __v; \ +}) + +#define fmxr(_vfp_, _var_) ({ \ + asm volatile (".fpu vfpv2\n" \ + "vmsr " #_vfp_ ", %0" \ + : : "r" (_var_) : "cc"); \ +}) #else #define vfpreg(_vfp_) #_vfp_ -#define fmrx(_vfp_) ({ \ - u32 __v; \ - asm("mrc p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmrx %0, " #_vfp_ \ - : "=r" (__v) : : "cc"); \ - __v; \ - }) - -#define fmxr(_vfp_,_var_) \ - asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr " #_vfp_ ", %0" \ - : : "r" (_var_) : "cc") +#define fmrx(_vfp_) ({ \ + u32 __v; \ + asm volatile ("mrc p10, 7, %0, " vfpreg(_vfp_) "," \ + "cr0, 0 @ fmrx %0, " #_vfp_ \ + : "=r" (__v) : : "cc"); \ + __v; \ +}) + +#define fmxr(_vfp_, _var_) ({ \ + asm volatile ("mcr p10, 7, %0, " vfpreg(_vfp_) "," \ + "cr0, 0 @ fmxr " #_vfp_ ", %0" \ + : : "r" (_var_) : "cc"); \ +}) #endif -- Gitee From 0cc6abff3f5d6d3e2133f0a634124ee0717eec0f Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 6 Sep 2024 15:44:49 +0000 Subject: [PATCH 34/51] sock_map: Add a cond_resched() in sock_hash_free() stable inclusion from stable-6.6.54 commit 80bd490ac0a3b662a489e17d8eedeb1e905a3d40 category: bugfix issue: #IB55S9 CVE: CVE-2024-47710 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit b1339be951ad31947ae19bc25cb08769bf255100 ] Several syzbot soft lockup reports all have in common sock_hash_free() If a map with a large number of buckets is destroyed, we need to yield the cpu when needed. Fixes: 75e68e5bf2c7 ("bpf, sockhash: Synchronize delete from bucket list on map free") Reported-by: syzbot Signed-off-by: Eric Dumazet Signed-off-by: Daniel Borkmann Acked-by: Martin KaFai Lau Acked-by: John Fastabend Link: https://lore.kernel.org/bpf/20240906154449.3742932-1-edumazet@google.com Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- net/core/sock_map.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 8598466a3805..ecafc07d836f 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1177,6 +1177,7 @@ static void sock_hash_free(struct bpf_map *map) sock_put(elem->sk); sock_hash_free_elem(htab, elem); } + cond_resched(); } /* wait for psock readers accessing its map link */ -- Gitee From 1ef9a38a4e7a3dbc6ef65ed882cdeb729baac42f Mon Sep 17 00:00:00 2001 From: Zijun Hu Date: Sat, 27 Jul 2024 16:34:01 +0800 Subject: [PATCH 35/51] driver core: bus: Fix double free in driver API bus_register() stable inclusion from stable-6.6.57 commit d885c464c25018b81a6b58f5d548fc2e3ef87dd1 category: bugfix issue: #IB55S9 CVE: CVE-2024-50055 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit bfa54a793ba77ef696755b66f3ac4ed00c7d1248 ] For bus_register(), any error which happens after kset_register() will cause that @priv are freed twice, fixed by setting @priv with NULL after the first free. Signed-off-by: Zijun Hu Link: https://lore.kernel.org/r/20240727-bus_register_fix-v1-1-fed8dd0dba7a@quicinc.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- drivers/base/bus.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/base/bus.c b/drivers/base/bus.c index 84a21084d67d..cb5704675bc0 100644 --- a/drivers/base/bus.c +++ b/drivers/base/bus.c @@ -913,6 +913,8 @@ int bus_register(const struct bus_type *bus) bus_remove_file(bus, &bus_attr_uevent); bus_uevent_fail: kset_unregister(&priv->subsys); + /* Above kset_unregister() will kfree @priv */ + priv = NULL; out: kfree(priv); return retval; -- Gitee From dee1dbb0b6c870e70a48023f405725821c1140fb Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 26 Sep 2024 16:58:36 +0000 Subject: [PATCH 36/51] net: test for not too small csum_start in virtio_net_hdr_to_skb() stable inclusion from stable-6.6.55 commit d9dfd41e32ccc5198033ddd1ff1516822dfefa5a category: bugfix issue: #IB55S9 CVE: CVE-2024-49947 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 49d14b54a527289d09a9480f214b8c586322310a ] syzbot was able to trigger this warning [1], after injecting a malicious packet through af_packet, setting skb->csum_start and thus the transport header to an incorrect value. We can at least make sure the transport header is after the end of the network header (with a estimated minimal size). [1] [ 67.873027] skb len=4096 headroom=16 headlen=14 tailroom=0 mac=(-1,-1) mac_len=0 net=(16,-6) trans=10 shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0)) csum(0xa start=10 offset=0 ip_summed=3 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0 priority=0x0 mark=0x0 alloc_cpu=10 vlan_all=0x0 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0) [ 67.877172] dev name=veth0_vlan feat=0x000061164fdd09e9 [ 67.877764] sk family=17 type=3 proto=0 [ 67.878279] skb linear: 00000000: 00 00 10 00 00 00 00 00 0f 00 00 00 08 00 [ 67.879128] skb frag: 00000000: 0e 00 07 00 00 00 28 00 08 80 1c 00 04 00 00 02 [ 67.879877] skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.880647] skb frag: 00000020: 00 00 02 00 00 00 08 00 1b 00 00 00 00 00 00 00 [ 67.881156] skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.881753] skb frag: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882173] skb frag: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882790] skb frag: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883171] skb frag: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883733] skb frag: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.884206] skb frag: 00000090: 00 00 00 00 00 00 00 00 00 00 69 70 76 6c 61 6e [ 67.884704] skb frag: 000000a0: 31 00 00 00 00 00 00 00 00 00 2b 00 00 00 00 00 [ 67.885139] skb frag: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.885677] skb frag: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886042] skb frag: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886408] skb frag: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887020] skb frag: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887384] skb frag: 00000100: 00 00 [ 67.887878] ------------[ cut here ]------------ [ 67.887908] offset (-6) >= skb_headlen() (14) [ 67.888445] WARNING: CPU: 10 PID: 2088 at net/core/dev.c:3332 skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.889353] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p 9pnet netfs [ 67.890111] CPU: 10 UID: 0 PID: 2088 Comm: b363492833 Not tainted 6.11.0-virtme #1011 [ 67.890183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 67.890309] RIP: 0010:skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891043] Call Trace: [ 67.891173] [ 67.891274] ? __warn (kernel/panic.c:741) [ 67.891320] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891333] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 67.891348] ? handle_bug (arch/x86/kernel/traps.c:239) [ 67.891363] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 67.891372] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) [ 67.891388] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891399] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891416] ip_do_fragment (net/ipv4/ip_output.c:777 (discriminator 1)) [ 67.891448] ? __ip_local_out (./include/linux/skbuff.h:1146 ./include/net/l3mdev.h:196 ./include/net/l3mdev.h:213 net/ipv4/ip_output.c:113) [ 67.891459] ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:200) [ 67.891470] ? ip_route_output_flow (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:96 (discriminator 13) ./include/linux/rcupdate.h:871 (discriminator 13) net/ipv4/route.c:2625 (discriminator 13) ./include/net/route.h:141 (discriminator 13) net/ipv4/route.c:2852 (discriminator 13)) [ 67.891484] ipvlan_process_v4_outbound (drivers/net/ipvlan/ipvlan_core.c:445 (discriminator 1)) [ 67.891581] ipvlan_queue_xmit (drivers/net/ipvlan/ipvlan_core.c:542 drivers/net/ipvlan/ipvlan_core.c:604 drivers/net/ipvlan/ipvlan_core.c:670) [ 67.891596] ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:227) [ 67.891607] dev_hard_start_xmit (./include/linux/netdevice.h:4916 ./include/linux/netdevice.h:4925 net/core/dev.c:3588 net/core/dev.c:3604) [ 67.891620] __dev_queue_xmit (net/core/dev.h:168 (discriminator 25) net/core/dev.c:4425 (discriminator 25)) [ 67.891630] ? skb_copy_bits (./include/linux/uaccess.h:233 (discriminator 1) ./include/linux/uaccess.h:260 (discriminator 1) ./include/linux/highmem-internal.h:230 (discriminator 1) net/core/skbuff.c:3018 (discriminator 1)) [ 67.891645] ? __pskb_pull_tail (net/core/skbuff.c:2848 (discriminator 4)) [ 67.891655] ? skb_partial_csum_set (net/core/skbuff.c:5657) [ 67.891666] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/skbuff.h:2791 (discriminator 3) ./include/linux/skbuff.h:2799 (discriminator 3) ./include/linux/virtio_net.h:109 (discriminator 3)) [ 67.891684] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1)) [ 67.891700] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4)) [ 67.891716] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1)) [ 67.891734] ? do_sock_setsockopt (net/socket.c:2335) [ 67.891747] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355) [ 67.891761] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1)) [ 67.891772] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 67.891785] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Fixes: 9181d6f8a2bb ("net: add more sanity check in virtio_net_hdr_to_skb()") Signed-off-by: Eric Dumazet Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20240926165836.3797406-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- include/linux/virtio_net.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index 4dfa9b69ca8d..24d51e194e3f 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -103,8 +103,10 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, if (!skb_partial_csum_set(skb, start, off)) return -EINVAL; + if (skb_transport_offset(skb) < nh_min_len) + return -EINVAL; - nh_min_len = max_t(u32, nh_min_len, skb_transport_offset(skb)); + nh_min_len = skb_transport_offset(skb); p_off = nh_min_len + thlen; if (!pskb_may_pull(skb, p_off)) return -EINVAL; -- Gitee From cf89621d04ca70d1cf97516b5502c2c4e138842b Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 13 Sep 2024 21:17:50 +0200 Subject: [PATCH 37/51] bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error stable inclusion from stable-6.6.54 commit a634fa8e480ac2423f86311a602f6295df2c8ed0 category: bugfix issue: #IB55S9 CVE: CVE-2024-47728 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 4b3786a6c5397dc220b1483d8e2f4867743e966f ] For all non-tracing helpers which formerly had ARG_PTR_TO_{LONG,INT} as input arguments, zero the value for the case of an error as otherwise it could leak memory. For tracing, it is not needed given CAP_PERFMON can already read all kernel memory anyway hence bpf_get_func_arg() and bpf_get_func_ret() is skipped in here. Also, the MTU helpers mtu_len pointer value is being written but also read. Technically, the MEM_UNINIT should not be there in order to always force init. Removing MEM_UNINIT needs more verifier rework though: MEM_UNINIT right now implies two things actually: i) write into memory, ii) memory does not have to be initialized. If we lift MEM_UNINIT, it then becomes: i) read into memory, ii) memory must be initialized. This means that for bpf_*_check_mtu() we're readding the issue we're trying to fix, that is, it would then be able to write back into things like .rodata BPF maps. Follow-up work will rework the MEM_UNINIT semantics such that the intent can be better expressed. For now just clear the *mtu_len on error path which can be lifted later again. Fixes: 8a67f2de9b1d ("bpf: expose bpf_strtol and bpf_strtoul to all program types") Fixes: d7a4cb9b6705 ("bpf: Introduce bpf_strtol and bpf_strtoul helpers") Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/e5edd241-59e7-5e39-0ee5-a51e31b6840a@iogearbox.net Link: https://lore.kernel.org/r/20240913191754.13290-5-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- kernel/bpf/helpers.c | 2 ++ kernel/bpf/syscall.c | 1 + net/core/filter.c | 44 +++++++++++++++++++++++--------------------- 3 files changed, 26 insertions(+), 21 deletions(-) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 6951a2346a3c..4e8cefcd7da6 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -521,6 +521,7 @@ BPF_CALL_4(bpf_strtol, const char *, buf, size_t, buf_len, u64, flags, long long _res; int err; + *res = 0; err = __bpf_strtoll(buf, buf_len, flags, &_res); if (err < 0) return err; @@ -548,6 +549,7 @@ BPF_CALL_4(bpf_strtoul, const char *, buf, size_t, buf_len, u64, flags, bool is_negative; int err; + *res = 0; err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative); if (err < 0) return err; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 286377e6878e..c5beed88b96f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -5633,6 +5633,7 @@ static const struct bpf_func_proto bpf_sys_close_proto = { BPF_CALL_4(bpf_kallsyms_lookup_name, const char *, name, int, name_sz, int, flags, u64 *, res) { + *res = 0; if (flags) return -EINVAL; diff --git a/net/core/filter.c b/net/core/filter.c index 3b4a98d2b9c1..e4effe2fae01 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6188,20 +6188,25 @@ BPF_CALL_5(bpf_skb_check_mtu, struct sk_buff *, skb, int ret = BPF_MTU_CHK_RET_FRAG_NEEDED; struct net_device *dev = skb->dev; int skb_len, dev_len; - int mtu; + int mtu = 0; - if (unlikely(flags & ~(BPF_MTU_CHK_SEGS))) - return -EINVAL; + if (unlikely(flags & ~(BPF_MTU_CHK_SEGS))) { + ret = -EINVAL; + goto out; + } - if (unlikely(flags & BPF_MTU_CHK_SEGS && (len_diff || *mtu_len))) - return -EINVAL; + if (unlikely(flags & BPF_MTU_CHK_SEGS && (len_diff || *mtu_len))) { + ret = -EINVAL; + goto out; + } dev = __dev_via_ifindex(dev, ifindex); - if (unlikely(!dev)) - return -ENODEV; + if (unlikely(!dev)) { + ret = -ENODEV; + goto out; + } mtu = READ_ONCE(dev->mtu); - dev_len = mtu + dev->hard_header_len; /* If set use *mtu_len as input, L3 as iph->tot_len (like fib_lookup) */ @@ -6219,15 +6224,12 @@ BPF_CALL_5(bpf_skb_check_mtu, struct sk_buff *, skb, */ if (skb_is_gso(skb)) { ret = BPF_MTU_CHK_RET_SUCCESS; - if (flags & BPF_MTU_CHK_SEGS && !skb_gso_validate_network_len(skb, mtu)) ret = BPF_MTU_CHK_RET_SEGS_TOOBIG; } out: - /* BPF verifier guarantees valid pointer */ *mtu_len = mtu; - return ret; } @@ -6237,19 +6239,21 @@ BPF_CALL_5(bpf_xdp_check_mtu, struct xdp_buff *, xdp, struct net_device *dev = xdp->rxq->dev; int xdp_len = xdp->data_end - xdp->data; int ret = BPF_MTU_CHK_RET_SUCCESS; - int mtu, dev_len; + int mtu = 0, dev_len; /* XDP variant doesn't support multi-buffer segment check (yet) */ - if (unlikely(flags)) - return -EINVAL; + if (unlikely(flags)) { + ret = -EINVAL; + goto out; + } dev = __dev_via_ifindex(dev, ifindex); - if (unlikely(!dev)) - return -ENODEV; + if (unlikely(!dev)) { + ret = -ENODEV; + goto out; + } mtu = READ_ONCE(dev->mtu); - - /* Add L2-header as dev MTU is L3 size */ dev_len = mtu + dev->hard_header_len; /* Use *mtu_len as input, L3 as iph->tot_len (like fib_lookup) */ @@ -6259,10 +6263,8 @@ BPF_CALL_5(bpf_xdp_check_mtu, struct xdp_buff *, xdp, xdp_len += len_diff; /* minus result pass check */ if (xdp_len > dev_len) ret = BPF_MTU_CHK_RET_FRAG_NEEDED; - - /* BPF verifier guarantees valid pointer */ +out: *mtu_len = mtu; - return ret; } -- Gitee From 7a0601f2371b5bcea6a867212605d69f56238347 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Mon, 30 Sep 2024 13:26:21 -0400 Subject: [PATCH 38/51] Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_change stable inclusion from stable-6.6.57 commit 38b2d5a57d125e1c17661b8308c0240c4a43b534 category: bugfix issue: #IB55S9 CVE: CVE-2024-50044 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 08d1914293dae38350b8088980e59fbc699a72fe ] rfcomm_sk_state_change attempts to use sock_lock so it must never be called with it locked but rfcomm_sock_ioctl always attempt to lock it causing the following trace: ====================================================== WARNING: possible circular locking dependency detected 6.8.0-syzkaller-08951-gfe46a7dd189e #0 Not tainted ------------------------------------------------------ syz-executor386/5093 is trying to acquire lock: ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1671 [inline] ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: rfcomm_sk_state_change+0x5b/0x310 net/bluetooth/rfcomm/sock.c:73 but task is already holding lock: ffff88807badfd28 (&d->lock){+.+.}-{3:3}, at: __rfcomm_dlc_close+0x226/0x6a0 net/bluetooth/rfcomm/core.c:491 Reported-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com Tested-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d7ce59b06b3eb14fd218 Fixes: 3241ad820dbb ("[Bluetooth] Add timestamp support to L2CAP, RFCOMM and SCO") Signed-off-by: Luiz Augusto von Dentz Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- net/bluetooth/rfcomm/sock.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c index b54e8a530f55..1e215b6cf1ed 100644 --- a/net/bluetooth/rfcomm/sock.c +++ b/net/bluetooth/rfcomm/sock.c @@ -869,9 +869,7 @@ static int rfcomm_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned lon if (err == -ENOIOCTLCMD) { #ifdef CONFIG_BT_RFCOMM_TTY - lock_sock(sk); err = rfcomm_dev_ioctl(sk, cmd, (void __user *) arg); - release_sock(sk); #else err = -EOPNOTSUPP; #endif -- Gitee From fd3f1822ded6f37e790f8c034edca21193df4b69 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Thu, 22 Aug 2024 10:35:25 +0800 Subject: [PATCH 39/51] ext4: update orig_path in ext4_find_extent() stable inclusion from stable-6.6.55 commit f55ecc58d07a6c1f6d6d5b5af125c25f8da0bda2 category: bugfix issue: #IB55S9 CVE: CVE-2024-49881 Signed-off-by: zhangshuqi --------------------------------------- commit 5b4b2dcace35f618fe361a87bae6f0d13af31bc1 upstream. In ext4_find_extent(), if the path is not big enough, we free it and set *orig_path to NULL. But after reallocating and successfully initializing the path, we don't update *orig_path, in which case the caller gets a valid path but a NULL ppath, and this may cause a NULL pointer dereference or a path memory leak. For example: ext4_split_extent path = *ppath = 2000 ext4_find_extent if (depth > path[0].p_maxdepth) kfree(path = 2000); *orig_path = path = NULL; path = kcalloc() = 3000 ext4_split_extent_at(*ppath = NULL) path = *ppath; ex = path[depth].p_ext; // NULL pointer dereference! ================================================================== BUG: kernel NULL pointer dereference, address: 0000000000000010 CPU: 6 UID: 0 PID: 576 Comm: fsstress Not tainted 6.11.0-rc2-dirty #847 RIP: 0010:ext4_split_extent_at+0x6d/0x560 Call Trace: ext4_split_extent.isra.0+0xcb/0x1b0 ext4_ext_convert_to_initialized+0x168/0x6c0 ext4_ext_handle_unwritten_extents+0x325/0x4d0 ext4_ext_map_blocks+0x520/0xdb0 ext4_map_blocks+0x2b0/0x690 ext4_iomap_begin+0x20e/0x2c0 [...] ================================================================== Therefore, *orig_path is updated when the extent lookup succeeds, so that the caller can safely use path or *ppath. Fixes: 10809df84a4d ("ext4: teach ext4_ext_find_extent() to realloc path if necessary") Cc: stable@kernel.org Signed-off-by: Baokun Li Reviewed-by: Jan Kara Link: https://patch.msgid.link/20240822023545.1994557-6-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- fs/ext4/extents.c | 3 ++- fs/ext4/move_extent.c | 1 - 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index f7143d2be3a7..9765bf095a8a 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -957,6 +957,8 @@ ext4_find_extent(struct inode *inode, ext4_lblk_t block, ext4_ext_show_path(inode, path); + if (orig_path) + *orig_path = path; return path; err: @@ -3249,7 +3251,6 @@ static int ext4_split_extent_at(handle_t *handle, } depth = ext_depth(inode); ex = path[depth].p_ext; - *ppath = path; if (EXT4_EXT_MAY_ZEROOUT & split_flag) { if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c index e6976716e85d..0bfd5ff103aa 100644 --- a/fs/ext4/move_extent.c +++ b/fs/ext4/move_extent.c @@ -36,7 +36,6 @@ get_ext_path(struct inode *inode, ext4_lblk_t lblock, *ppath = NULL; return -ENODATA; } - *ppath = path; return 0; } -- Gitee From a8ca52bfbaf38ea1513685f78e89d8ad65eb781c Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Wed, 9 Oct 2024 13:51:39 +0900 Subject: [PATCH 40/51] zram: free secondary algorithms names stable inclusion from stable-6.6.57 commit 6272936fd242ca1f784c3e21596dfb3859dff276 category: bugfix issue: #IB55S9 CVE: CVE-2024-50064 Signed-off-by: zhangshuqi --------------------------------------- [ Upstream commit 684826f8271ad97580b138b9ffd462005e470b99 ] We need to kfree() secondary algorithms names when reset zram device that had multi-streams, otherwise we leak memory. [senozhatsky@chromium.org: kfree(NULL) is legal] Link: https://lkml.kernel.org/r/20240917013021.868769-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20240911025600.3681789-1-senozhatsky@chromium.org Fixes: 001d92735701 ("zram: add recompression algorithm sysfs knob") Signed-off-by: Sergey Senozhatsky Cc: Minchan Kim Cc: Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin Signed-off-by: zhangshuqi --- drivers/block/zram/zram_drv.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index e3e0da787dea..a0212a342012 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -2006,6 +2006,11 @@ static void zram_destroy_comps(struct zram *zram) zcomp_destroy(comp); zram->num_active_comps--; } + + for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) { + kfree(zram->comp_algs[prio]); + zram->comp_algs[prio] = NULL; + } } static void zram_reset_device(struct zram *zram) -- Gitee From 685689261a1d0e2ece2156e74b04788e19e7ab2d Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Thu, 10 Oct 2024 18:34:05 +0200 Subject: [PATCH 41/51] netfilter: bpf: must hold reference on net namespace stable inclusion from stable-6.12 commit 1230fe7ad3974f7bf6c78901473e039b34d4fb1f category: bugfix issue: #IB55S9 CVE: CVE-2024-50130 Signed-off-by: zhangshuqi --------------------------------------- BUG: KASAN: slab-use-after-free in __nf_unregister_net_hook+0x640/0x6b0 Read of size 8 at addr ffff8880106fe400 by task repro/72= bpf_nf_link_release+0xda/0x1e0 bpf_link_free+0x139/0x2d0 bpf_link_release+0x68/0x80 __fput+0x414/0xb60 Eric says: It seems that bpf was able to defer the __nf_unregister_net_hook() after exit()/close() time. Perhaps a netns reference is missing, because the netns has been dismantled/freed already. bpf_nf_link_attach() does : link->net = net; But I do not see a reference being taken on net. Add such a reference and release it after hook unreg. Note that I was unable to get syzbot reproducer to work, so I do not know if this resolves this splat. Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs") Diagnosed-by: Eric Dumazet Reported-by: Lai, Yi Signed-off-by: Florian Westphal Reviewed-by: Eric Dumazet Signed-off-by: Pablo Neira Ayuso Signed-off-by: zhangshuqi --- net/netfilter/nf_bpf_link.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/netfilter/nf_bpf_link.c b/net/netfilter/nf_bpf_link.c index 0e4beae421f8..ebb4e36659a1 100644 --- a/net/netfilter/nf_bpf_link.c +++ b/net/netfilter/nf_bpf_link.c @@ -23,6 +23,7 @@ static unsigned int nf_hook_run_bpf(void *bpf_prog, struct sk_buff *skb, struct bpf_nf_link { struct bpf_link link; struct nf_hook_ops hook_ops; + netns_tracker ns_tracker; struct net *net; u32 dead; const struct nf_defrag_hook *defrag_hook; @@ -120,6 +121,7 @@ static void bpf_nf_link_release(struct bpf_link *link) if (!cmpxchg(&nf_link->dead, 0, 1)) { nf_unregister_net_hook(nf_link->net, &nf_link->hook_ops); bpf_nf_disable_defrag(nf_link); + put_net_track(nf_link->net, &nf_link->ns_tracker); } } @@ -257,6 +259,8 @@ int bpf_nf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) return err; } + get_net_track(net, &link->ns_tracker, GFP_KERNEL); + return bpf_link_settle(&link_primer); } -- Gitee From bf9f151f455eb11c5c793b8c1109688bbde05f01 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Mon, 14 Oct 2024 15:33:12 -0700 Subject: [PATCH 42/51] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink(). stable inclusion from stable-6.12 commit e8c526f2bdf1845bedaf6a478816a3d06fa78b8f category: bugfix issue: #IB55S9 CVE: CVE-2024-50154 Signed-off-by: zhangshuqi --------------------------------------- Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler(). """ We are seeing a use-after-free from a bpf prog attached to trace_tcp_retransmit_synack. The program passes the req->sk to the bpf_sk_storage_get_tracing kernel helper which does check for null before using it. """ The commit 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not to call del_timer_sync() from reqsk_timer_handler(), but it introduced a small race window. Before the timer is called, expire_timers() calls detach_timer(timer, true) to clear timer->entry.pprev and marks it as not pending. If reqsk_queue_unlink() checks timer_pending() just after expire_timers() calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will continue running and send multiple SYN+ACKs until it expires. The reported UAF could happen if req->sk is close()d earlier than the timer expiration, which is 63s by default. The scenario would be 1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(), but del_timer_sync() is missed 2. reqsk timer is executed and scheduled again 3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but reqsk timer still has another one, and inet_csk_accept() does not clear req->sk for non-TFO sockets 4. sk is close()d 5. reqsk timer is executed again, and BPF touches req->sk Let's not use timer_pending() by passing the caller context to __inet_csk_reqsk_queue_drop(). Note that reqsk timer is pinned, so the issue does not happen in most use cases. [1] [0] BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0 Use-after-free read at 0x00000000a891fb3a (in kfence-#1): bpf_sk_storage_get_tracing+0x2e/0x1b0 bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda bpf_trace_run2+0x4c/0xc0 tcp_rtx_synack+0xf9/0x100 reqsk_timer_handler+0xda/0x3d0 run_timer_softirq+0x292/0x8a0 irq_exit_rcu+0xf5/0x320 sysvec_apic_timer_interrupt+0x6d/0x80 asm_sysvec_apic_timer_interrupt+0x16/0x20 intel_idle_irq+0x5a/0xa0 cpuidle_enter_state+0x94/0x273 cpu_startup_entry+0x15e/0x260 start_secondary+0x8a/0x90 secondary_startup_64_no_verify+0xfa/0xfb kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6 allocated by task 0 on cpu 9 at 260507.901592s: sk_prot_alloc+0x35/0x140 sk_clone_lock+0x1f/0x3f0 inet_csk_clone_lock+0x15/0x160 tcp_create_openreq_child+0x1f/0x410 tcp_v6_syn_recv_sock+0x1da/0x700 tcp_check_req+0x1fb/0x510 tcp_v6_rcv+0x98b/0x1420 ipv6_list_rcv+0x2258/0x26e0 napi_complete_done+0x5b1/0x2990 mlx5e_napi_poll+0x2ae/0x8d0 net_rx_action+0x13e/0x590 irq_exit_rcu+0xf5/0x320 common_interrupt+0x80/0x90 asm_common_interrupt+0x22/0x40 cpuidle_enter_state+0xfb/0x273 cpu_startup_entry+0x15e/0x260 start_secondary+0x8a/0x90 secondary_startup_64_no_verify+0xfa/0xfb freed by task 0 on cpu 9 at 260507.927527s: rcu_core_si+0x4ff/0xf10 irq_exit_rcu+0xf5/0x320 sysvec_apic_timer_interrupt+0x6d/0x80 asm_sysvec_apic_timer_interrupt+0x16/0x20 cpuidle_enter_state+0xfb/0x273 cpu_startup_entry+0x15e/0x260 start_secondary+0x8a/0x90 secondary_startup_64_no_verify+0xfa/0xfb Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()") Reported-by: Martin KaFai Lau Closes: https://lore.kernel.org/netdev/eb6684d0-ffd9-4bdc-9196-33f690c25824@linux.dev/ Link: https://lore.kernel.org/netdev/b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.dev/ [1] Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Reviewed-by: Martin KaFai Lau Link: https://patch.msgid.link/20241014223312.4254-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski Signed-off-by: zhangshuqi --- net/ipv4/inet_connection_sock.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index a018981b4514..97f0675ef75f 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -980,21 +980,31 @@ static bool reqsk_queue_unlink(struct request_sock *req) found = __sk_nulls_del_node_init_rcu(sk); spin_unlock(lock); } - if (timer_pending(&req->rsk_timer) && del_timer_sync(&req->rsk_timer)) - reqsk_put(req); + return found; } -bool inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req) +static bool __inet_csk_reqsk_queue_drop(struct sock *sk, + struct request_sock *req, + bool from_timer) { bool unlinked = reqsk_queue_unlink(req); + if (!from_timer && timer_delete_sync(&req->rsk_timer)) + reqsk_put(req); + if (unlinked) { reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req); reqsk_put(req); } + return unlinked; } + +bool inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req) +{ + return __inet_csk_reqsk_queue_drop(sk, req, false); +} EXPORT_SYMBOL(inet_csk_reqsk_queue_drop); void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock *req) @@ -1087,7 +1097,7 @@ static void reqsk_timer_handler(struct timer_list *t) if (!inet_ehash_insert(req_to_sk(nreq), req_to_sk(oreq), NULL)) { /* delete timer */ - inet_csk_reqsk_queue_drop(sk_listener, nreq); + __inet_csk_reqsk_queue_drop(sk_listener, nreq, true); goto no_ownership; } @@ -1113,7 +1123,8 @@ static void reqsk_timer_handler(struct timer_list *t) } drop: - inet_csk_reqsk_queue_drop_and_put(oreq->rsk_listener, oreq); + __inet_csk_reqsk_queue_drop(sk_listener, oreq, true); + reqsk_put(req); } static void reqsk_queue_hash_req(struct request_sock *req, -- Gitee From 402165fcb71646dad1040d9e845438a3e5455f22 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Thu, 18 Jul 2024 19:53:36 +0800 Subject: [PATCH 43/51] jbd2: stop waiting for space when jbd2_cleanup_journal_tail() returns error stable inclusion from stable-6.12 commit f5cacdc6f2bb2a9bf214469dd7112b43dd2dd68a category: bugfix issue: #IB55S9 CVE: CVE-2024-49959 Signed-off-by: zhangshuqi --------------------------------------- In __jbd2_log_wait_for_space(), we might call jbd2_cleanup_journal_tail() to recover some journal space. But if an error occurs while executing jbd2_cleanup_journal_tail() (e.g., an EIO), we don't stop waiting for free space right away, we try other branches, and if j_committing_transaction is NULL (i.e., the tid is 0), we will get the following complain: ============================================ JBD2: I/O error when updating journal superblock for sdd-8. __jbd2_log_wait_for_space: needed 256 blocks and only had 217 space available __jbd2_log_wait_for_space: no way to get more journal space in sdd-8 ------------[ cut here ]------------ WARNING: CPU: 2 PID: 139804 at fs/jbd2/checkpoint.c:109 __jbd2_log_wait_for_space+0x251/0x2e0 Modules linked in: CPU: 2 PID: 139804 Comm: kworker/u8:3 Not tainted 6.6.0+ #1 RIP: 0010:__jbd2_log_wait_for_space+0x251/0x2e0 Call Trace: add_transaction_credits+0x5d1/0x5e0 start_this_handle+0x1ef/0x6a0 jbd2__journal_start+0x18b/0x340 ext4_dirty_inode+0x5d/0xb0 __mark_inode_dirty+0xe4/0x5d0 generic_update_time+0x60/0x70 [...] ============================================ So only if jbd2_cleanup_journal_tail() returns 1, i.e., there is nothing to clean up at the moment, continue to try to reclaim free space in other ways. Note that this fix relies on commit 6f6a6fda2945 ("jbd2: fix ocfs2 corrupt when updating journal superblock fails") to make jbd2_cleanup_journal_tail return the correct error code. Fixes: 8c3f25d8950c ("jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_space") Cc: stable@kernel.org Signed-off-by: Baokun Li Reviewed-by: Jan Kara Link: https://patch.msgid.link/20240718115336.2554501-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: zhangshuqi --- fs/jbd2/checkpoint.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index 118699fff2f9..27a8931647ea 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -86,8 +86,11 @@ __releases(&journal->j_state_lock) write_unlock(&journal->j_state_lock); if (chkpt) { jbd2_log_do_checkpoint(journal); - } else if (jbd2_cleanup_journal_tail(journal) == 0) { - /* We were able to recover space; yay! */ + } else if (jbd2_cleanup_journal_tail(journal) <= 0) { + /* + * We were able to recover space or the + * journal was aborted due to an error. + */ ; } else if (tid) { /* -- Gitee From d651a3d2b82ccab20a21401b8a099f4937510e7a Mon Sep 17 00:00:00 2001 From: Aaron Thompson Date: Fri, 4 Oct 2024 23:04:08 +0000 Subject: [PATCH 44/51] Bluetooth: ISO: Fix multiple init when debugfs is disabled stable inclusion from stable-6.12 commit a9b7b535ba192c6b77e6c15a4c82d853163eab8c category: bugfix issue: #IB55S9 CVE: CVE-2024-50077 Signed-off-by: zhangshuqi --------------------------------------- If bt_debugfs is not created successfully, which happens if either CONFIG_DEBUG_FS or CONFIG_DEBUG_FS_ALLOW_ALL is unset, then iso_init() returns early and does not set iso_inited to true. This means that a subsequent call to iso_init() will result in duplicate calls to proto_register(), bt_sock_register(), etc. With CONFIG_LIST_HARDENED and CONFIG_BUG_ON_DATA_CORRUPTION enabled, the duplicate call to proto_register() triggers this BUG(): list_add double add: new=ffffffffc0b280d0, prev=ffffffffbab56250, next=ffffffffc0b280d0. ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:35! Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 2 PID: 887 Comm: bluetoothd Not tainted 6.10.11-1-ao-desktop #1 RIP: 0010:__list_add_valid_or_report+0x9a/0xa0 ... __list_add_valid_or_report+0x9a/0xa0 proto_register+0x2b5/0x340 iso_init+0x23/0x150 [bluetooth] set_iso_socket_func+0x68/0x1b0 [bluetooth] kmem_cache_free+0x308/0x330 hci_sock_sendmsg+0x990/0x9e0 [bluetooth] __sock_sendmsg+0x7b/0x80 sock_write_iter+0x9a/0x110 do_iter_readv_writev+0x11d/0x220 vfs_writev+0x180/0x3e0 do_writev+0xca/0x100 ... This change removes the early return. The check for iso_debugfs being NULL was unnecessary, it is always NULL when iso_inited is false. Cc: stable@vger.kernel.org Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Aaron Thompson Signed-off-by: Luiz Augusto von Dentz Signed-off-by: zhangshuqi --- net/bluetooth/iso.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 2f63ea9e62ec..fb1e06694f51 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -2125,13 +2125,9 @@ int iso_init(void) hci_register_cb(&iso_cb); - if (IS_ERR_OR_NULL(bt_debugfs)) - return 0; - - if (!iso_debugfs) { + if (!IS_ERR_OR_NULL(bt_debugfs)) iso_debugfs = debugfs_create_file("iso", 0444, bt_debugfs, NULL, &iso_debugfs_fops); - } iso_inited = true; -- Gitee From 5c68e86f87eb00a62b433d1bd33b5e1459b53f60 Mon Sep 17 00:00:00 2001 From: Zhihao Cheng Date: Fri, 9 Aug 2024 20:15:32 +0800 Subject: [PATCH 45/51] ext4: dax: fix overflowing extents beyond inode size when partially writing stable inclusion from stable-6.12 commit dda898d7ffe85931f9cca6d702a51f33717c501e category: bugfix issue: #IB55S9 CVE: CVE-2024-50015 Signed-off-by: zhangshuqi --------------------------------------- The dax_iomap_rw() does two things in each iteration: map written blocks and copy user data to blocks. If the process is killed by user(See signal handling in dax_iomap_iter()), the copied data will be returned and added on inode size, which means that the length of written extents may exceed the inode size, then fsck will fail. An example is given as: dd if=/dev/urandom of=file bs=4M count=1 dax_iomap_rw iomap_iter // round 1 ext4_iomap_begin ext4_iomap_alloc // allocate 0~2M extents(written flag) dax_iomap_iter // copy 2M data iomap_iter // round 2 iomap_iter_advance iter->pos += iter->processed // iter->pos = 2M ext4_iomap_begin ext4_iomap_alloc // allocate 2~4M extents(written flag) dax_iomap_iter fatal_signal_pending done = iter->pos - iocb->ki_pos // done = 2M ext4_handle_inode_extension ext4_update_inode_size // inode size = 2M fsck reports: Inode 13, i_size is 2097152, should be 4194304. Fix? Fix the problem by truncating extents if the written length is smaller than expected. Fixes: 776722e85d3b ("ext4: DAX iomap write support") CC: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=219136 Signed-off-by: Zhihao Cheng Reviewed-by: Jan Kara Reviewed-by: Zhihao Cheng Link: https://patch.msgid.link/20240809121532.2105494-1-chengzhihao@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: zhangshuqi --- fs/ext4/file.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 6aa15dafc677..c71af675e310 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -334,10 +334,10 @@ static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, * Clean up the inode after DIO or DAX extending write has completed and the * inode size has been updated using ext4_handle_inode_extension(). */ -static void ext4_inode_extension_cleanup(struct inode *inode, ssize_t count) +static void ext4_inode_extension_cleanup(struct inode *inode, bool need_trunc) { lockdep_assert_held_write(&inode->i_rwsem); - if (count < 0) { + if (need_trunc) { ext4_truncate_failed_write(inode); /* * If the truncate operation failed early, then the inode may @@ -586,7 +586,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) * writeback of delalloc blocks. */ WARN_ON_ONCE(ret == -EIOCBQUEUED); - ext4_inode_extension_cleanup(inode, ret); + ext4_inode_extension_cleanup(inode, ret < 0); } out: @@ -670,7 +670,7 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) if (extend) { ret = ext4_handle_inode_extension(inode, offset, ret); - ext4_inode_extension_cleanup(inode, ret); + ext4_inode_extension_cleanup(inode, ret < (ssize_t)count); } out: inode_unlock(inode); -- Gitee From 3ff9fb1f5249a313a8ec860df758b4a08d24ed3f Mon Sep 17 00:00:00 2001 From: Yuezhang Mo Date: Tue, 3 Sep 2024 15:01:09 +0800 Subject: [PATCH 46/51] exfat: fix memory leak in exfat_load_bitmap() stable inclusion from stable-6.12 commit d2b537b3e533f28e0d97293fe9293161fe8cd137 category: bugfix issue: #IB55S9 CVE: CVE-2024-50013 Signed-off-by: zhangshuqi --------------------------------------- If the first directory entry in the root directory is not a bitmap directory entry, 'bh' will not be released and reassigned, which will cause a memory leak. Fixes: 1e49a94cf707 ("exfat: add bitmap operations") Cc: stable@vger.kernel.org Signed-off-by: Yuezhang Mo Reviewed-by: Aoyama Wataru Signed-off-by: Namjae Jeon Signed-off-by: zhangshuqi --- fs/exfat/balloc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/exfat/balloc.c b/fs/exfat/balloc.c index e918decb3735..5b547a596380 100644 --- a/fs/exfat/balloc.c +++ b/fs/exfat/balloc.c @@ -110,11 +110,8 @@ int exfat_load_bitmap(struct super_block *sb) return -EIO; type = exfat_get_entry_type(ep); - if (type == TYPE_UNUSED) - break; - if (type != TYPE_BITMAP) - continue; - if (ep->dentry.bitmap.flags == 0x0) { + if (type == TYPE_BITMAP && + ep->dentry.bitmap.flags == 0x0) { int err; err = exfat_allocate_bitmap(sb, ep); @@ -122,6 +119,9 @@ int exfat_load_bitmap(struct super_block *sb) return err; } brelse(bh); + + if (type == TYPE_UNUSED) + return -EINVAL; } if (exfat_get_next_cluster(sb, &clu.dir)) -- Gitee From e4df8d07ef4b02a2c2b399c1783b61da8812e671 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Wed, 4 Sep 2024 11:20:47 +0800 Subject: [PATCH 47/51] f2fs: fix to check atomic_file in f2fs ioctl interfaces stable inclusion from stable-6.12 commit bfe5c02654261bfb8bd9cb174a67f3279ea99e58 category: bugfix issue: #IB55S9 CVE: CVE-2024-49859 Signed-off-by: zhangshuqi --------------------------------------- Some f2fs ioctl interfaces like f2fs_ioc_set_pin_file(), f2fs_move_file_range(), and f2fs_defragment_range() missed to check atomic_write status, which may cause potential race issue, fix it. Cc: stable@vger.kernel.org Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: zhangshuqi --- fs/f2fs/file.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index a4f0394491a9..8034b5388f9f 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -2595,7 +2595,8 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi, inode_lock(inode); - if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) { + if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED) || + f2fs_is_atomic_file(inode)) { err = -EINVAL; goto unlock_out; } @@ -2822,6 +2823,11 @@ static int f2fs_move_file_range(struct file *file_in, loff_t pos_in, goto out_unlock; } + if (f2fs_is_atomic_file(src) || f2fs_is_atomic_file(dst)) { + ret = -EINVAL; + goto out_unlock; + } + ret = -EINVAL; if (pos_in + len > src->i_size || pos_in + len < pos_in) goto out_unlock; @@ -3205,6 +3211,11 @@ static int f2fs_ioc_set_pin_file(struct file *filp, unsigned long arg) inode_lock(inode); + if (f2fs_is_atomic_file(inode)) { + ret = -EINVAL; + goto out; + } + if (!pin) { clear_inode_flag(inode, FI_PIN_FILE); f2fs_i_gc_failures_write(inode, 0); -- Gitee From 0977dab1cb766f9c26a7337ef5a70220a2f3b31e Mon Sep 17 00:00:00 2001 From: Edward Adam Davis Date: Mon, 1 Jul 2024 22:25:03 +0800 Subject: [PATCH 48/51] ext4: no need to continue when the number of entries is 1 stable inclusion from stable-6.12 commit 1a00a393d6a7fb1e745a41edd09019bd6a0ad64c category: bugfix issue: #IB55S9 CVE: CVE-2024-49967 Signed-off-by: zhangshuqi --------------------------------------- Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3") Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Signed-off-by: Edward Adam Davis Reported-and-tested-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Link: https://patch.msgid.link/tencent_BE7AEE6C7C2D216CB8949CE8E6EE7ECC2C0A@qq.com Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: zhangshuqi --- fs/ext4/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index bbda587f76b8..609fd8ec0481 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2044,7 +2044,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, split = count/2; hash2 = map[split].hash; - continued = hash2 == map[split - 1].hash; + continued = split > 0 ? hash2 == map[split - 1].hash : 0; dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n", (unsigned long)dx_get_block(frame->at), hash2, split, count-split)); -- Gitee From 519a8f77a0c8ff74b4b84bd92f27707a26a60d49 Mon Sep 17 00:00:00 2001 From: Artem Sadovnikov Date: Thu, 29 Aug 2024 15:22:09 +0000 Subject: [PATCH 49/51] ext4: fix i_data_sem unlock order in ext4_ind_migrate() stable inclusion from stable-6.12 commit cc749e61c011c255d81b192a822db650c68b313f category: bugfix issue: #IB55S9 CVE: CVE-2024-50006 Signed-off-by: zhangshuqi --------------------------------------- Fuzzing reports a possible deadlock in jbd2_log_wait_commit. This issue is triggered when an EXT4_IOC_MIGRATE ioctl is set to require synchronous updates because the file descriptor is opened with O_SYNC. This can lead to the jbd2_journal_stop() function calling jbd2_might_wait_for_commit(), potentially causing a deadlock if the EXT4_IOC_MIGRATE call races with a write(2) system call. This problem only arises when CONFIG_PROVE_LOCKING is enabled. In this case, the jbd2_might_wait_for_commit macro locks jbd2_handle in the jbd2_journal_stop function while i_data_sem is locked. This triggers lockdep because the jbd2_journal_start function might also lock the same jbd2_handle simultaneously. Found by Linux Verification Center (linuxtesting.org) with syzkaller. Reviewed-by: Ritesh Harjani (IBM) Co-developed-by: Mikhail Ukhin Signed-off-by: Mikhail Ukhin Signed-off-by: Artem Sadovnikov Rule: add Link: https://lore.kernel.org/stable/20240404095000.5872-1-mish.uxin2012%40yandex.ru Link: https://patch.msgid.link/20240829152210.2754-1-ancowi69@gmail.com Signed-off-by: Theodore Ts'o Signed-off-by: zhangshuqi --- fs/ext4/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c index d98ac2af8199..a5e1492bbaaa 100644 --- a/fs/ext4/migrate.c +++ b/fs/ext4/migrate.c @@ -663,8 +663,8 @@ int ext4_ind_migrate(struct inode *inode) if (unlikely(ret2 && !ret)) ret = ret2; errout: - ext4_journal_stop(handle); up_write(&EXT4_I(inode)->i_data_sem); + ext4_journal_stop(handle); out_unlock: ext4_writepages_up_write(inode->i_sb, alloc_ctx); return ret; -- Gitee From 6e35f94f9e5941f536119ff6092d8827a5286107 Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Sun, 25 Feb 2024 14:27:11 +0000 Subject: [PATCH 50/51] of: Add cleanup.h based auto release via __free(device_node) markings stable inclusion from stable-6.9 commit 9448e55d032d99af8e23487f51a542d51b2f1a48 category: bugfix issue: #IB55S9 CVE: CVE-2024-50012 Signed-off-by: zhangshuqi --------------------------------------- The recent addition of scope based cleanup support to the kernel provides a convenient tool to reduce the chances of leaking reference counts where of_node_put() should have been called in an error path. This enables struct device_node *child __free(device_node) = NULL; for_each_child_of_node(np, child) { if (test) return test; } with no need for a manual call of of_node_put(). A following patch will reduce the scope of the child variable to the for loop, to avoid an issues with ordering of autocleanup, and make it obvious when this assigned a non NULL value. In this simple example the gains are small but there are some very complex error handling cases buried in these loops that will be greatly simplified by enabling early returns with out the need for this manual of_node_put() call. Note that there are coccinelle checks in scripts/coccinelle/iterators/for_each_child.cocci to detect a failure to call of_node_put(). This new approach does not cause false positives. Longer term we may want to add scripting to check this new approach is done correctly with no double of_node_put() calls being introduced due to the auto cleanup. It may also be useful to script finding places this new approach is useful. Signed-off-by: Jonathan Cameron Reviewed-by: Rob Herring Link: https://lore.kernel.org/r/20240225142714.286440-2-jic23@kernel.org Signed-off-by: Rob Herring Signed-off-by: zhangshuqi --- include/linux/of.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/of.h b/include/linux/of.h index 6a9ddf20e79a..50e882ee91da 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -13,6 +13,7 @@ */ #include #include +#include #include #include #include @@ -134,6 +135,7 @@ static inline struct device_node *of_node_get(struct device_node *node) } static inline void of_node_put(struct device_node *node) { } #endif /* !CONFIG_OF_DYNAMIC */ +DEFINE_FREE(device_node, struct device_node *, if (_T) of_node_put(_T)) /* Pointer for first entry in chain of all nodes. */ extern struct device_node *of_root; -- Gitee From 2109af7c05271dd4412ecfcb77edb54cd7216eac Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miquel=20Sabat=C3=A9=20Sol=C3=A0?= Date: Tue, 17 Sep 2024 15:42:46 +0200 Subject: [PATCH 51/51] cpufreq: Avoid a bad reference count on CPU node MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit stable inclusion from stable-6.6.55 commit 0f41f383b5a61a2bf6429a449ebba7fb08179d81 category: bugfix issue: #IB55S9 CVE: CVE-2024-50012 Signed-off-by: zhangshuqi --------------------------------------- commit c0f02536fffbbec71aced36d52a765f8c4493dc2 upstream. In the parse_perf_domain function, if the call to of_parse_phandle_with_args returns an error, then the reference to the CPU device node that was acquired at the start of the function would not be properly decremented. Address this by declaring the variable with the __free(device_node) cleanup attribute. Signed-off-by: Miquel SabatĂ© SolĂ  Acked-by: Viresh Kumar Link: https://patch.msgid.link/20240917134246.584026-1-mikisabate@gmail.com Cc: All applicable Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman Signed-off-by: zhangshuqi --- include/linux/cpufreq.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 3a4cefb25ba6..9ca4211c063f 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -1124,10 +1124,9 @@ static inline int parse_perf_domain(int cpu, const char *list_name, const char *cell_name, struct of_phandle_args *args) { - struct device_node *cpu_np; int ret; - cpu_np = of_cpu_device_node_get(cpu); + struct device_node *cpu_np __free(device_node) = of_cpu_device_node_get(cpu); if (!cpu_np) return -ENODEV; @@ -1135,9 +1134,6 @@ static inline int parse_perf_domain(int cpu, const char *list_name, args); if (ret < 0) return ret; - - of_node_put(cpu_np); - return 0; } -- Gitee