linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-08-21 08:19:28 +00:00

Author	SHA1	Message	Date
Martin KaFai Lau	9c1292eca2	net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set It was reported that there is a compiler warning on the unused variable "sin_addr_len" in af_inet.c when CONFIG_CGROUP_BPF is not set. This patch is to address it similar to the ipv6 counterpart in inet6_getname(). It is to "return sin_addr_len;" instead of "return sizeof(*sin);". Fixes: `fefba7d1ae` ("bpf: Propagate modified uaddrlen from cgroup sockaddr programs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/bpf/20231013185702.3993710-1-martin.lau@linux.dev Closes: https://lore.kernel.org/bpf/20231013114007.2fb09691@canb.auug.org.au/	2023-10-13 12:35:43 -07:00
Linus Torvalds	a1ef447dee	Fixes for an overreaching WARN_ON, two error paths and a switch to kernel_connect() which recently grown protection against someone using BPF to rewrite the address. All but one marked for stable. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmUpYoQTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHziwvmCACK13dkAaupcHyteYPloBgtJLNixR3X 6++nHCOXGtE7cK6n1snobFQgp/d5BqSKAeymyLqDjJOJVqG/5n8FR1gwcuY/Ogdj Aju2Mkt7R/R/V6kvmCbqGSwiOvxZP1gBFkKcluRNQkFNP3boKw4vmJGq29Rabbyr 66NfZSETsR/H4JhAWvUyVmffrvxIx11THnvmrAnprGVKoK72HwOQFx0H4KLD2Hio aN/yiiR15yDS6hL8dfilt+QGO6o+ZWgvRl1GOzqjwISgfeUUYVknMJXrEf+20kHs kX3tWaLWVZsU1FyVFRO5HPYLAREjALXstFO4K1OHPERM7EipWNinIHoS =wyYP -----END PGP SIGNATURE----- Merge tag 'ceph-for-6.6-rc6' of https://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "Fixes for an overreaching WARN_ON, two error paths and a switch to kernel_connect() which recently grown protection against someone using BPF to rewrite the address. All but one marked for stable" * tag 'ceph-for-6.6-rc6' of https://github.com/ceph/ceph-client: ceph: fix type promotion bug on 32bit systems libceph: use kernel_connect() ceph: remove unnecessary IS_ERR() check in ceph_fname_to_usr() ceph: fix incorrect revoked caps assert in ceph_fill_file_size()	2023-10-13 11:27:31 -07:00
Kuniyuki Iwashima	8702cf12e6	tcp: Fix listen() warning with v4-mapped-v6 address. syzbot reported a warning [0] introduced by commit `c48ef9c4ae` ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address."). After the cited commit, a v4 socket's address matches the corresponding v4-mapped-v6 tb2 in inet_bind2_bucket_match_addr(), not vice versa. During X.X.X.X -> ::ffff:X.X.X.X order bind()s, the second bind() uses bhash and conflicts properly without checking bhash2 so that we need not check if a v4-mapped-v6 sk matches the corresponding v4 address tb2 in inet_bind2_bucket_match_addr(). However, the repro shows that we need to check that in a no-conflict case. The repro bind()s two sockets to the 2-tuples using SO_REUSEPORT and calls listen() for the first socket: from socket import * s1 = socket() s1.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1) s1.bind(('127.0.0.1', 0)) s2 = socket(AF_INET6) s2.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1) s2.bind(('::ffff:127.0.0.1', s1.getsockname()[1])) s1.listen() The second socket should belong to the first socket's tb2, but the second bind() creates another tb2 bucket because inet_bind2_bucket_find() returns NULL in inet_csk_get_port() as the v4-mapped-v6 sk does not match the corresponding v4 address tb2. bhash2[] -> tb2(::ffff:X.X.X.X) -> tb2(X.X.X.X) Then, listen() for the first socket calls inet_csk_get_port(), where the v4 address matches the v4-mapped-v6 tb2 and WARN_ON() is triggered. To avoid that, we need to check if v4-mapped-v6 sk address matches with the corresponding v4 address tb2 in inet_bind2_bucket_match(). The same checks are needed in inet_bind2_bucket_addr_match() too, so we can move all checks there and call it from inet_bind2_bucket_match(). Note that now tb->family is just an address family of tb->(v6_)?rcv_saddr and not of sockets in the bucket. This could be refactored later by defining tb->rcv_saddr as tb->v6_rcv_saddr.s6_addr32[3] and prepending ::ffff: when creating v4 tb2. [0]: WARNING: CPU: 0 PID: 5049 at net/ipv4/inet_connection_sock.c:587 inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587 Modules linked in: CPU: 0 PID: 5049 Comm: syz-executor288 Not tainted 6.6.0-rc2-syzkaller-00018-g2cf0f7156238 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 RIP: 0010:inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587 Code: 7c 24 08 e8 4c b6 8a 01 31 d2 be 88 01 00 00 48 c7 c7 e0 94 ae 8b e8 59 2e a3 f8 2e 2e 2e 31 c0 e9 04 fe ff ff e8 ca 88 d0 f8 <0f> 0b e9 0f f9 ff ff e8 be 88 d0 f8 49 8d 7e 48 e8 65 ca 5a 00 31 RSP: 0018:ffffc90003abfbf0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888026429100 RCX: 0000000000000000 RDX: ffff88807edcbb80 RSI: ffffffff88b73d66 RDI: ffff888026c49f38 RBP: ffff888026c49f30 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff9260f200 R13: ffff888026c49880 R14: 0000000000000000 R15: ffff888026429100 FS: 00005555557d5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000045ad50 CR3: 0000000025754000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> inet_csk_listen_start+0x155/0x360 net/ipv4/inet_connection_sock.c:1256 __inet_listen_sk+0x1b8/0x5c0 net/ipv4/af_inet.c:217 inet_listen+0x93/0xd0 net/ipv4/af_inet.c:239 __sys_listen+0x194/0x270 net/socket.c:1866 __do_sys_listen net/socket.c:1875 [inline] __se_sys_listen net/socket.c:1873 [inline] __x64_sys_listen+0x53/0x80 net/socket.c:1873 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f3a5bce3af9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc1a1c79e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000032 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a5bce3af9 RDX: 00007f3a5bce3af9 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007f3a5bd565f0 R08: 0000000000000006 R09: 0000000000000006 R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000001 R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001 </TASK> Fixes: `c48ef9c4ae` ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.") Reported-by: syzbot+71e724675ba3958edb31@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=71e724675ba3958edb31 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231010013814.70571-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-13 10:01:49 -07:00
Heng Guo	cf8b49fbd0	net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment check Reproduce environment: network with 3 VM linuxs is connected as below: VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3 VM1: eth0 ip: 192.168.122.207 MTU 1800 VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500 VM3: eth0 ip: 192.168.123.240 MTU 1800 Reproduce: VM1 send 1600 bytes UDP data to VM3 using tools scapy with flags='DF'. scapy command: send(IP(dst="192.168.123.240",flags='DF')/UDP()/str('0'*1600),count=1, inter=1.000000) Result: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 7 0 2 2 0 0 2 5 0 0 0 0 0 0 0 1 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- ForwDatagrams is always keeping 2 without increment. Issue description and patch: ip_exceeds_mtu() in ip_forward() drops this IP datagram because skb len (1600 sending by scapy) is over MTU(1500 in VM2) if "DF" is set. According to RFC 4293 "3.2.3. IP Statistics Tables", +-------+------>------+----->-----+----->-----+ \| InForwDatagrams (6) \| OutForwDatagrams (6) \| \| V +->-+ OutFragReqds \| InNoRoutes \| \| (packets) / (local packet (3) \| \| \| IF is that of the address \| +--> OutFragFails \| and may not be the receiving IF) \| \| (packets) the IPSTATS_MIB_OUTFORWDATAGRAMS should be counted before fragment check. The existing implementation, instead, would incease the counter after fragment check: ip_exceeds_mtu() in ipv4 and ip6_pkt_too_big() in ipv6. So do patch to move IPSTATS_MIB_OUTFORWDATAGRAMS counter to ip_forward() for ipv4 and ip6_forward() for ipv6. Test result with patch: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 7 0 2 3 0 0 2 5 0 0 0 0 0 0 0 1 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- ForwDatagrams is updated from 2 to 3. Reviewed-by: Filip Pudak <filip.pudak@windriver.com> Signed-off-by: Heng Guo <heng.guo@windriver.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231011015137.27262-1-heng.guo@windriver.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-13 09:58:45 -07:00
Sabrina Dubroca	9f0c824551	tls: use fixed size for tls_offload_context_{tx,rx}.driver_state driver_state is a flex array, but is always allocated by the tls core to a fixed size (TLS_DRIVER_STATE_SIZE_{TX,RX}). Simplify the code by making that size explicit so that sizeof(struct tls_offload_context_{tx,rx}) works. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	1cf7fbcee6	tls: validate crypto_info in a separate helper Simplify do_tls_setsockopt_conf a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	4f48669918	tls: remove tls_context argument from tls_set_device_offload It's not really needed since we end up refetching it as tls_ctx. We can also remove the NULL check, since we have already dereferenced ctx in do_tls_setsockopt_conf. While at it, fix up the reverse xmas tree ordering. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	b6a30ec923	tls: remove tls_context argument from tls_set_sw_offload It's not really needed since we end up refetching it as tls_ctx. We can also remove the NULL check, since we have already dereferenced ctx in do_tls_setsockopt_conf. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	0137407999	tls: add a helper to allocate/initialize offload_ctx_tx Simplify tls_set_device_offload a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	1a074f7618	tls: also use init_prot_info in tls_set_device_offload Most values are shared. Nonce size turns out to be equal to IV size for all offloadable ciphers. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	a9937816ed	tls: move tls_prot_info initialization out of tls_set_sw_offload Simplify tls_set_sw_offload, and allow reuse for the tls_device code. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	615580cbc9	tls: extract context alloc/initialization out of tls_set_sw_offload Simplify tls_set_sw_offload a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	1c1cb3110d	tls: store iv directly within cipher_context TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE is 20B, we don't get much benefit in cipher_context's size and can simplify the init code a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	bee6b7b307	tls: rename MAX_IV_SIZE to TLS_MAX_IV_SIZE It's defined in include/net/tls.h, avoid using an overly generic name. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:09 +01:00
Sabrina Dubroca	6d5029e547	tls: store rec_seq directly within cipher_context TLS_MAX_REC_SEQ_SIZE is 8B, we don't get anything by using kmalloc. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:09 +01:00
Sabrina Dubroca	8f1d532b4a	tls: drop unnecessary cipher_type checks in tls offload We should never reach tls_device_reencrypt, tls_enc_record, or tls_enc_skb with a cipher_type that can't be offloaded. Replace those checks with a DEBUG_NET_WARN_ON_ONCE, and use cipher_desc instead of hard-coding offloadable cipher types. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:09 +01:00
Sabrina Dubroca	3bab3ee0f9	tls: get salt using crypto_info_salt in tls_enc_skb I skipped this conversion in my previous series. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:09 +01:00
Amit Cohen	38985e8c27	net: Handle bulk delete policy in bridge driver The merge commit `9271686937` ("Merge branch 'br-flush-filtering'") added support for FDB flushing in bridge driver. The following patches will extend VXLAN driver to support FDB flushing as well. The netlink message for bulk delete is shared between the drivers. With the existing implementation, there is no way to prevent user from flushing with attributes that are not supported per driver. For example, when VNI will be added, user will not get an error for flush FDB entries in bridge with VNI, although this attribute is not relevant for bridge. As preparation for support of FDB flush in VXLAN driver, move the policy to be handled in bridge driver, later a new policy for VXLAN will be added in VXLAN driver. Do not pass 'vid' as part of ndo_fdb_del_bulk(), as this field is relevant only for bridge. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 10:00:30 +01:00
Eric Dumazet	de5724ca38	xfrm: fix a data-race in xfrm_lookup_with_ifid() syzbot complains about a race in xfrm_lookup_with_ifid() [1] When preparing commit `0a9e5794b2` ("xfrm: annotate data-race around use_time") I thought xfrm_lookup_with_ifid() was modifying a still private structure. [1] BUG: KCSAN: data-race in xfrm_lookup_with_ifid / xfrm_lookup_with_ifid write to 0xffff88813ea41108 of 8 bytes by task 8150 on cpu 1: xfrm_lookup_with_ifid+0xce7/0x12d0 net/xfrm/xfrm_policy.c:3218 xfrm_lookup net/xfrm/xfrm_policy.c:3270 [inline] xfrm_lookup_route+0x3b/0x100 net/xfrm/xfrm_policy.c:3281 ip6_dst_lookup_flow+0x98/0xc0 net/ipv6/ip6_output.c:1246 send6+0x241/0x3c0 drivers/net/wireguard/socket.c:139 wg_socket_send_skb_to_peer+0xbd/0x130 drivers/net/wireguard/socket.c:178 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 write to 0xffff88813ea41108 of 8 bytes by task 15867 on cpu 0: xfrm_lookup_with_ifid+0xce7/0x12d0 net/xfrm/xfrm_policy.c:3218 xfrm_lookup net/xfrm/xfrm_policy.c:3270 [inline] xfrm_lookup_route+0x3b/0x100 net/xfrm/xfrm_policy.c:3281 ip6_dst_lookup_flow+0x98/0xc0 net/ipv6/ip6_output.c:1246 send6+0x241/0x3c0 drivers/net/wireguard/socket.c:139 wg_socket_send_skb_to_peer+0xbd/0x130 drivers/net/wireguard/socket.c:178 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 value changed: 0x00000000651cd9d1 -> 0x00000000651cd9d2 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 15867 Comm: kworker/u4:58 Not tainted 6.6.0-rc4-syzkaller-00016-g5e62ed3b1c8a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023 Workqueue: wg-kex-wg2 wg_packet_handshake_send_worker Fixes: `0a9e5794b2` ("xfrm: annotate data-race around use_time") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2023-10-13 07:57:27 +02:00
Jakub Kicinski	0e6bb5b7f4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: kernel/bpf/verifier.c `829955981c` ("bpf: Fix verifier log for async callback return values") `a923819fb2` ("bpf: Treat first argument as return value for bpf_throw") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-12 17:07:34 -07:00
Florian Westphal	2f0968a030	net: gso_test: fix build with gcc-12 and earlier gcc 12 errors out with: net/core/gso_test.c:58:48: error: initializer element is not constant 58 \| .segs = (const unsigned int[]) { gso_size }, This version isn't old (2022), so switch to preprocessor-bsaed constant instead of 'static const int'. Cc: Willem de Bruijn <willemb@google.com> Reported-by: Tasmiya Nalatwad <tasmiya@linux.vnet.ibm.com> Closes: https://lore.kernel.org/netdev/79fbe35c-4dd1-4f27-acb2-7a60794bc348@linux.vnet.ibm.com/ Fixes: `1b4fa28a8b` ("net: parametrize skb_segment unit test to expand coverage") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231012120901.10765-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-12 15:34:08 +02:00
Florian Westphal	d351c1ea2d	netfilter: nft_payload: fix wrong mac header matching mcast packets get looped back to the local machine. Such packets have a 0-length mac header, we should treat this like "mac header not set" and abort rule evaluation. As-is, we just copy data from the network header instead. Fixes: `96518518cc` ("netfilter: add nftables") Reported-by: Blažej Krajňák <krajnak@levonet.sk> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Xingyuan Mo	505ce0630a	nf_tables: fix NULL pointer dereference in nft_expr_inner_parse() We should check whether the NFTA_EXPR_NAME netlink attribute is present before accessing it, otherwise a null pointer deference error will occur. Call Trace: <TASK> dump_stack_lvl+0x4f/0x90 print_report+0x3f0/0x620 kasan_report+0xcd/0x110 __asan_load2+0x7d/0xa0 nla_strcmp+0x2f/0x90 __nft_expr_type_get+0x41/0xb0 nft_expr_inner_parse+0xe3/0x200 nft_inner_init+0x1be/0x2e0 nf_tables_newrule+0x813/0x1230 nfnetlink_rcv_batch+0xec3/0x1170 nfnetlink_rcv+0x1e4/0x220 netlink_unicast+0x34e/0x4b0 netlink_sendmsg+0x45c/0x7e0 __sys_sendto+0x355/0x370 __x64_sys_sendto+0x84/0xa0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: `3a07327d10` ("netfilter: nft_inner: support for inner tunnel header matching") Signed-off-by: Xingyuan Mo <hdthky0@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Xingyuan Mo	52177bbf19	nf_tables: fix NULL pointer dereference in nft_inner_init() We should check whether the NFTA_INNER_NUM netlink attribute is present before accessing it, otherwise a null pointer deference error will occur. Call Trace: dump_stack_lvl+0x4f/0x90 print_report+0x3f0/0x620 kasan_report+0xcd/0x110 __asan_load4+0x84/0xa0 nft_inner_init+0x128/0x2e0 nf_tables_newrule+0x813/0x1230 nfnetlink_rcv_batch+0xec3/0x1170 nfnetlink_rcv+0x1e4/0x220 netlink_unicast+0x34e/0x4b0 netlink_sendmsg+0x45c/0x7e0 __sys_sendto+0x355/0x370 __x64_sys_sendto+0x84/0xa0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: `3a07327d10` ("netfilter: nft_inner: support for inner tunnel header matching") Signed-off-by: Xingyuan Mo <hdthky0@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Pablo Neira Ayuso	4c90bba60c	netfilter: nf_tables: do not refresh timeout when resetting element The dump and reset command should not refresh the timeout, this command is intended to allow users to list existing stateful objects and reset them, element expiration should be refresh via transaction instead with a specific command to achieve this, otherwise this is entering combo semantics that will be hard to be undone later (eg. a user asking to retrieve counters but _not_ requiring to refresh expiration). Fixes: `079cd63321` ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Kees Cook	d51c42cdef	netfilter: nf_tables: Annotate struct nft_pipapo_match with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct nft_pipapo_match. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Cc: netdev@vger.kernel.org Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Florian Westphal	2e1d175410	netfilter: nfnetlink_log: silence bogus compiler warning net/netfilter/nfnetlink_log.c:800:18: warning: variable 'ctinfo' is uninitialized The warning is bogus, the variable is only used if ct is non-NULL and always initialised in that case. Init to 0 too to silence this. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309100514.ndBFebXN-lkp@intel.com/ Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Pablo Neira Ayuso	ebd032fa88	netfilter: nf_tables: do not remove elements if set backend implements .abort pipapo set backend maintains two copies of the datastructure, removing the elements from the copy that is going to be discarded slows down the abort path significantly, from several minutes to few seconds after this patch. Fixes: `212ed75dc5` ("netfilter: nf_tables: integrate pipapo into commit protocol") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Jeremy Cline	354a6e707e	nfc: nci: assert requested protocol is valid The protocol is used in a bit mask to determine if the protocol is supported. Assert the provided protocol is less than the maximum defined so it doesn't potentially perform a shift-out-of-bounds and provide a clearer error for undefined protocols vs unsupported ones. Fixes: `6a2968aaf5` ("NFC: basic NCI protocol implementation") Reported-and-tested-by: syzbot+0839b78e119aae1fec78@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0839b78e119aae1fec78 Signed-off-by: Jeremy Cline <jeremy@jcline.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231009200054.82557-1-jeremy@jcline.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-12 09:32:10 +02:00
Kuniyuki Iwashima	e2bca4870f	af_packet: Fix fortified memcpy() without flex array. Sergei Trofimovich reported a regression [0] caused by commit `a0ade8404c` ("af_packet: Fix warning of fortified memcpy() in packet_getname()."). It introduced a flex array sll_addr_flex in struct sockaddr_ll as a union-ed member with sll_addr to work around the fortified memcpy() check. However, a userspace program uses a struct that has struct sockaddr_ll in the middle, where a flex array is illegal to exist. include/linux/if_packet.h:24:17: error: flexible array member 'sockaddr_ll::<unnamed union>::<unnamed struct>::sll_addr_flex' not at end of 'struct packet_info_t' 24 \| __DECLARE_FLEX_ARRAY(unsigned char, sll_addr_flex); \| ^~~~~~~~~~~~~~~~~~~~ To fix the regression, let's go back to the first attempt [1] telling memcpy() the actual size of the array. Reported-by: Sergei Trofimovich <slyich@gmail.com> Closes: https://github.com/NixOS/nixpkgs/pull/252587#issuecomment-1741733002 [0] Link: https://lore.kernel.org/netdev/20230720004410.87588-3-kuniyu@amazon.com/ [1] Fixes: `a0ade8404c` ("af_packet: Fix warning of fortified memcpy() in packet_getname().") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20231009153151.75688-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-12 09:15:15 +02:00
Jakub Kicinski	21b2e2624d	netfilter net-next pull request 2023-10-10 -----BEGIN PGP SIGNATURE----- iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmUlYWENHGZ3QHN0cmxl bi5kZQAKCRBwkajZrV/2AC68D/0WnxpQHiCMl7rUWrDJjmhYenUY1uM9VZb1fpd7 hH543aVbJmWNQj9uzWmKceuLKtIV5jCgsgzqHcxq5e0oHLc3IjIOX1j6L6Dtv7Wf 8obzfCZOYvp0rfaB/2W8d+ehet69i8UQlWM6uKVD/vEedAynfu8AaCgaNy6WwgrK ElXa3lZzflfoeqvwZkndxIkeuoXQB4mYCcuIXuseB+pYv5pz2th2OsGNxeeae0Q0 XlOPY7LNrAWMbUFfJUiRMb7bEXINOcmZMGC/9OTpYTJe3j32ybi0pZY91AV8HEQY xuQY2k4dVs4bSfxPFkrXlansPh7fbtl/EF6LNoDbCobV6RAZrc0SEM7eSybRoigm FrsWdms4bp5RRaBWxi3MRj1ir8nBHjLJ3AHlec1h6EMcXOWsF91u8mk0+0i9eF8E htw7Rxc7ZR3xRUKFxA5+53oYwMYnst9PdZ54DKrAMJOfykU6pGx2hJbsmTfodeGl zAT1GXPjEDejbXdmD84CzIs+bCmuGNrQQZ6o9gAaYVm9DJJMEXUpp1yK+/ApEnsz pcZNi7NeZRlwJhayLCYdvTT3sUlcDYqGuJoDE/krKNH6Aq0iAb/zvHxbOAn7R550 UFTVpB78W+0G0eufpgCIsxGHIgAZOGYWTe/TcrQIghWPNxfrt6p5XZVh2N4g8n9C nPiOdg== =pmbB -----END PGP SIGNATURE----- Merge tag 'nf-next-23-10-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for next First 5 patches, from Phil Sutter, clean up nftables dumpers to use the context buffer in the netlink_callback structure rather than a kmalloc'd buffer. Patch 6, from myself, zaps dead code and replaces the helper function with a small inlined helper. Patch 7, also from myself, removes another pr_debug and replaces it with the existing nf_log-based debug helpers. Last patch, from George Guo, gets nft_table comments back in sync with the structure members. * tag 'nf-next-23-10-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: cleanup struct nft_table netfilter: conntrack: prefer tcp_error_log to pr_debug netfilter: conntrack: simplify nf_conntrack_alter_reply netfilter: nf_tables: Don't allocate nft_rule_dump_ctx netfilter: nf_tables: Carry s_idx in nft_rule_dump_ctx netfilter: nf_tables: Carry reset flag in nft_rule_dump_ctx netfilter: nf_tables: Drop pointless memset when dumping rules netfilter: nf_tables: Always allocate nft_rule_dump_ctx ==================== Link: https://lore.kernel.org/r/20231010145343.12551-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-11 17:37:58 -07:00
Daan De Meyer	859051dd16	bpf: Implement cgroup sockaddr hooks for unix sockets These hooks allows intercepting connect(), getsockname(), getpeername(), sendmsg() and recvmsg() for unix sockets. The unix socket hooks get write access to the address length because the address length is not fixed when dealing with unix sockets and needs to be modified when a unix socket address is modified by the hook. Because abstract socket unix addresses start with a NUL byte, we cannot recalculate the socket address in kernelspace after running the hook by calculating the length of the unix socket path using strlen(). These hooks can be used when users want to multiplex syscall to a single unix socket to multiple different processes behind the scenes by redirecting the connect() and other syscalls to process specific sockets. We do not implement support for intercepting bind() because when using bind() with unix sockets with a pathname address, this creates an inode in the filesystem which must be cleaned up. If we rewrite the address, the user might try to clean up the wrong file, leaking the socket in the filesystem where it is never cleaned up. Until we figure out a solution for this (and a use case for intercepting bind()), we opt to not allow rewriting the sockaddr in bind() calls. We also implement recvmsg() support for connected streams so that after a connect() that is modified by a sockaddr hook, any corresponding recmvsg() on the connected socket can also be modified to make the connected program think it is connected to the "intended" remote. Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-10-11 17:27:47 -07:00
Jakub Kicinski	71c299c711	net: tcp: fix crashes trying to free half-baked MTU probes tcp_stream_alloc_skb() initializes the skb to use tcp_tsorted_anchor which is a union with the destructor. We need to clean that TCP-iness up before freeing. Fixes: `736013292e` ("tcp: let tcp_mtu_probe() build headless packets") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231010173651.3990234-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-11 17:24:46 -07:00
Daan De Meyer	53e380d214	bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf As prep for adding unix socket support to the cgroup sockaddr hooks, let's add a kfunc bpf_sock_addr_set_sun_path() that allows modifying a unix sockaddr from bpf. While this is already possible for AF_INET and AF_INET6, we'll need this kfunc when we add unix socket support since modifying the address for those requires modifying both the address and the sockaddr length. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-4-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-10-11 16:29:25 -07:00
Daan De Meyer	fefba7d1ae	bpf: Propagate modified uaddrlen from cgroup sockaddr programs As prep for adding unix socket support to the cgroup sockaddr hooks, let's propagate the sockaddr length back to the caller after running a bpf cgroup sockaddr hook program. While not important for AF_INET or AF_INET6, the sockaddr length is important when working with AF_UNIX sockaddrs as the size of the sockaddr cannot be determined just from the address family or the sockaddr's contents. __cgroup_bpf_run_filter_sock_addr() is modified to take the uaddrlen as an input/output argument. After running the program, the modified sockaddr length is stored in the uaddrlen pointer. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-3-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-10-11 15:03:40 -07:00
Ziyang Xuan	c7f59461f5	Bluetooth: Fix a refcnt underflow problem for hci_conn Syzbot reports a warning as follows: WARNING: CPU: 1 PID: 26946 at net/bluetooth/hci_conn.c:619 hci_conn_timeout+0x122/0x210 net/bluetooth/hci_conn.c:619 ... Call Trace: <TASK> process_one_work+0x884/0x15c0 kernel/workqueue.c:2630 process_scheduled_works kernel/workqueue.c:2703 [inline] worker_thread+0x8b9/0x1290 kernel/workqueue.c:2784 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 </TASK> It is because the HCI_EV_SIMPLE_PAIR_COMPLETE event handler drops hci_conn directly without check Simple Pairing whether be enabled. But the Simple Pairing process can only be used if both sides have the support enabled in the host stack. Add hci_conn_ssp_enabled() for hci_conn in HCI_EV_IO_CAPA_REQUEST and HCI_EV_SIMPLE_PAIR_COMPLETE event handlers to fix the problem. Fixes: `0493684ed2` ("[Bluetooth] Disable disconnect timer during Simple Pairing") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-10-11 11:17:08 -07:00
Pauli Virtanen	a239110ee8	Bluetooth: hci_sync: always check if connection is alive before deleting In hci_abort_conn_sync it is possible that conn is deleted concurrently by something else, also e.g. when waiting for hdev->lock. This causes double deletion of the conn, so UAF or conn_hash.list corruption. Fix by having all code paths check that the connection is still in conn_hash before deleting it, while holding hdev->lock which prevents any races. Log (when powering off while BAP streaming, occurs rarely): ======================================================================= kernel BUG at lib/list_debug.c:56! ... ? __list_del_entry_valid (lib/list_debug.c:56) hci_conn_del (net/bluetooth/hci_conn.c:154) bluetooth hci_abort_conn_sync (net/bluetooth/hci_sync.c:5415) bluetooth ? __pfx_hci_abort_conn_sync+0x10/0x10 [bluetooth] ? lock_release+0x1d5/0x3c0 ? hci_disconnect_all_sync.constprop.0+0xb2/0x230 [bluetooth] ? __pfx_lock_release+0x10/0x10 ? __kmem_cache_free+0x14d/0x2e0 hci_disconnect_all_sync.constprop.0+0xda/0x230 [bluetooth] ? __pfx_hci_disconnect_all_sync.constprop.0+0x10/0x10 [bluetooth] ? hci_clear_adv_sync+0x14f/0x170 [bluetooth] ? __pfx_set_powered_sync+0x10/0x10 [bluetooth] hci_set_powered_sync+0x293/0x450 [bluetooth] ======================================================================= Fixes: `94d9ba9f98` ("Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_sync") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-10-11 11:16:46 -07:00
Lee, Chun-Yi	1ffc6f8cc3	Bluetooth: Reject connection with the device which has same BD_ADDR This change is used to relieve CVE-2020-26555. The description of the CVE: Bluetooth legacy BR/EDR PIN code pairing in Bluetooth Core Specification 1.0B through 5.2 may permit an unauthenticated nearby device to spoof the BD_ADDR of the peer device to complete pairing without knowledge of the PIN. [1] The detail of this attack is in IEEE paper: BlueMirror: Reflections on Bluetooth Pairing and Provisioning Protocols [2] It's a reflection attack. The paper mentioned that attacker can induce the attacked target to generate null link key (zero key) without PIN code. In BR/EDR, the key generation is actually handled in the controller which is below HCI. A condition of this attack is that attacker should change the BR_ADDR of his hacking device (Host B) to equal to the BR_ADDR with the target device being attacked (Host A). Thus, we reject the connection with device which has same BD_ADDR both on HCI_Create_Connection and HCI_Connection_Request to prevent the attack. A similar implementation also shows in btstack project. [3][4] Cc: stable@vger.kernel.org Link: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-26555 [1] Link: https://ieeexplore.ieee.org/abstract/document/9474325/authors#authors [2] Link: https://github.com/bluekitchen/btstack/blob/master/src/hci.c#L3523 [3] Link: https://github.com/bluekitchen/btstack/blob/master/src/hci.c#L7297 [4] Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-10-11 11:16:24 -07:00
Lee, Chun-Yi	33155c4aae	Bluetooth: hci_event: Ignore NULL link key This change is used to relieve CVE-2020-26555. The description of the CVE: Bluetooth legacy BR/EDR PIN code pairing in Bluetooth Core Specification 1.0B through 5.2 may permit an unauthenticated nearby device to spoof the BD_ADDR of the peer device to complete pairing without knowledge of the PIN. [1] The detail of this attack is in IEEE paper: BlueMirror: Reflections on Bluetooth Pairing and Provisioning Protocols [2] It's a reflection attack. The paper mentioned that attacker can induce the attacked target to generate null link key (zero key) without PIN code. In BR/EDR, the key generation is actually handled in the controller which is below HCI. Thus, we can ignore null link key in the handler of "Link Key Notification event" to relieve the attack. A similar implementation also shows in btstack project. [3] v3: Drop the connection when null link key be detected. v2: - Used Link: tag instead of Closes: - Used bt_dev_dbg instead of BT_DBG - Added Fixes: tag Cc: stable@vger.kernel.org Fixes: `55ed8ca10f` ("Bluetooth: Implement link key handling for the management interface") Link: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-26555 [1] Link: https://ieeexplore.ieee.org/abstract/document/9474325/authors#authors [2] Link: https://github.com/bluekitchen/btstack/blob/master/src/hci.c#L3722 [3] Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-10-11 11:16:02 -07:00
Iulia Tanasescu	acab8ff29a	Bluetooth: ISO: Fix invalid context error This moves the hci_le_terminate_big_sync call from rx_work to cmd_sync_work, to avoid calling sleeping function from an invalid context. Reported-by: syzbot+c715e1bd8dfbcb1ab176@syzkaller.appspotmail.com Fixes: `a0bfde167b` ("Bluetooth: ISO: Add support for connecting multiple BISes") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-10-11 11:15:40 -07:00
Johannes Berg	f2ac54ebf8	net: rfkill: reduce data->mtx scope in rfkill_fop_open In syzbot runs, lockdep reports that there's a (potential) deadlock here of data->mtx being locked recursively. This isn't really a deadlock since they are different instances, but lockdep cannot know, and teaching it would be far more difficult than other fixes. At the same time we don't even really _need_ the mutex to be locked in rfkill_fop_open(), since we're modifying only a completely fresh instance of 'data' (struct rfkill_data) that's not yet added to the global list. However, to avoid any reordering etc. within the globally locked section, and to make the code look more symmetric, we should still lock the data->events list manipulation, but also need to lock _only_ that. So do that. Reported-by: syzbot+509238e523e032442b80@syzkaller.appspotmail.com Fixes: `2c3dfba4cf` ("rfkill: sync before userspace visibility/changes") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-10-11 16:55:10 +02:00
Josua Mayer	b2f750c3a8	net: rfkill: gpio: prevent value glitch during probe When either reset- or shutdown-gpio have are initially deasserted, e.g. after a reboot - or when the hardware does not include pull-down, there will be a short toggle of both IOs to logical 0 and back to 1. It seems that the rfkill default is unblocked, so the driver should not glitch to output low during probe. It can lead e.g. to unexpected lte modem reconnect: [1] root@localhost:~# dmesg \| grep "usb 2-1" [ 2.136124] usb 2-1: new SuperSpeed USB device number 2 using xhci-hcd [ 21.215278] usb 2-1: USB disconnect, device number 2 [ 28.833977] usb 2-1: new SuperSpeed USB device number 3 using xhci-hcd The glitch has been discovered on an arm64 board, now that device-tree support for the rfkill-gpio driver has finally appeared :). Change the flags for devm_gpiod_get_optional from GPIOD_OUT_LOW to GPIOD_ASIS to avoid any glitches. The rfkill driver will set the intended value during rfkill_sync_work. Fixes: `7176ba23f8` ("net: rfkill: add generic gpio rfkill driver") Signed-off-by: Josua Mayer <josua@solid-run.com> Link: https://lore.kernel.org/r/20231004163928.14609-1-josua@solid-run.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-10-11 16:36:52 +02:00
Johannes Berg	02e0e426a2	wifi: mac80211: fix error path key leak In the previous key leak fix for the other error paths, I meant to unify all of them to the same place, but used the wrong label, which I noticed when doing the merge into wireless-next. Fix it. Fixes: `d097ae01eb` ("wifi: mac80211: fix potential key leak") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-10-11 16:36:14 +02:00
Johannes Berg	91d20ab9d9	wifi: cfg80211: use system_unbound_wq for wiphy work Since wiphy work items can run pretty much arbitrary code in the stack/driver, it can take longer to run all of this, so we shouldn't be using system_wq via schedule_work(). Also, we lock the wiphy (which is the reason this exists), so use system_unbound_wq. Reported-and-tested-by: Kalle Valo <kvalo@kernel.org> Fixes: `a3ee4dc84c` ("wifi: cfg80211: add a work abstraction with special semantics") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-10-11 16:36:12 +02:00
Willem de Bruijn	4688ecb138	net: expand skb_segment unit test with frag_list coverage Expand the test with these variants that use skb frag_list: - GSO_TEST_FRAG_LIST: frag_skb length is gso_size - GSO_TEST_FRAG_LIST_PURE: same, data exclusively in frag skbs - GSO_TEST_FRAG_LIST_NON_UNIFORM: frag_skb length may vary - GSO_TEST_GSO_BY_FRAGS: frag_skb length defines gso_size, i.e., segs may have varying sizes. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:39:01 +01:00
Willem de Bruijn	1b4fa28a8b	net: parametrize skb_segment unit test to expand coverage Expand the test with variants - GSO_TEST_NO_GSO: payload size less than or equal to gso_size - GSO_TEST_FRAGS: payload in both linear and page frags - GSO_TEST_FRAGS_PURE: payload exclusively in page frags - GSO_TEST_GSO_PARTIAL: produce one gso segment of multiple of gso_size, plus optionally one non-gso trailer segment Define a test struct that encodes the input gso skb and output segs. Input in terms of linear and fragment lengths. Output as length of each segment. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:39:01 +01:00
Willem de Bruijn	b3098d32ed	net: add skb_segment kunit test Add unit testing for skb segment. This function is exercised by many different code paths, such as GSO_PARTIAL or GSO_BY_FRAGS, linear (with or without head_frag), frags or frag_list skbs, etc. It is infeasible to manually run tests that cover all code paths when making changes. The long and complex function also makes it hard to establish through analysis alone that a patch has no unintended side-effects. Add code coverage through kunit regression testing. Introduce kunit infrastructure for tests under net/core, and add this first test. This first skb_segment test exercises a simple case: a linear skb. Follow-on patches will parametrize the test and add more variants. Tested: Built and ran the test with make ARCH=um mrproper ./tools/testing/kunit/kunit.py run \ --kconfig_add CONFIG_NET=y \ --kconfig_add CONFIG_DEBUG_KERNEL=y \ --kconfig_add CONFIG_DEBUG_INFO=y \ --kconfig_add=CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y \ net_core_gso Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:39:01 +01:00
Nils Hoppmann	a950a5921d	net/smc: Fix pos miscalculation in statistics SMC_STAT_PAYLOAD_SUB(_smc_stats, _tech, key, _len, _rc) will calculate wrong bucket positions for payloads of exactly 4096 bytes and (1 << (m + 12)) bytes, with m == SMC_BUF_MAX - 1. Intended bucket distribution: Assume l == size of payload, m == SMC_BUF_MAX - 1. Bucket 0 : 0 < l <= 2^13 Bucket n, 1 <= n <= m-1 : 2^(n+12) < l <= 2^(n+13) Bucket m : l > 2^(m+12) Current solution: _pos = fls64((l) >> 13) [...] _pos = (_pos < m) ? ((l == 1 << (_pos + 12)) ? _pos - 1 : _pos) : m For l == 4096, _pos == -1, but should be _pos == 0. For l == (1 << (m + 12)), _pos == m, but should be _pos == m - 1. In order to avoid special treatment of these corner cases, the calculation is adjusted. The new solution first subtracts the length by one, and then calculates the correct bucket by shifting accordingly, i.e. _pos = fls64((l - 1) >> 13), l > 0. This not only fixes the issues named above, but also makes the whole bucket assignment easier to follow. Same is done for SMC_STAT_RMB_SIZE_SUB(_smc_stats, _tech, k, _len), where the calculation of the bucket position is similar to the one named above. Fixes: `e0e4b8fa53` ("net/smc: Add SMC statistics support") Suggested-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Nils Hoppmann <niho@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:36:35 +01:00
Yajun Deng	5247dbf16c	net/core: Introduce netdev_core_stats_inc() Although there is a kfree_skb_reason() helper function that can be used to find the reason why this skb is dropped, but most callers didn't increase one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped. For the users, people are more concerned about why the dropped in ip is increasing. Introduce netdev_core_stats_inc() for trace the caller of dev_core_stats_*_inc(). Also, add __code to netdev_core_stats_alloc(), as it's called with small probability. And add noinline make sure netdev_core_stats_inc was never inlined. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:07:40 +01:00
Russell King (Oracle)	63b9f7a19f	net: dsa: remove dsa_port_phylink_validate() As all drivers now provide phylink capabilities (including MAC), the if() condition in dsa_port_phylink_validate() will always be true. We will always use the generic validator, which phylink will call itself if the .validate method isn't populated. Thus, there is now no need to implement the .validate method, so this implementation can be removed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:06:05 +01:00
Jakub Kicinski	ad98426a88	bpf-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZSXLzwAKCRDbK58LschI g1wuAQDTT1mrUmRqrpPob/U3HCcTg64hgdRwyF+6IU39/+neGwEAoP0FKZoy3DDf C8FOdVChBjapPsp9zTeYPv0nlZMITAE= =1Shl -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-10-11 We've added 14 non-merge commits during the last 5 day(s) which contain a total of 12 files changed, 398 insertions(+), 104 deletions(-). The main changes are: 1) Fix s390 JIT backchain issues in the trampoline code generation which previously clobbered the caller's backchain, from Ilya Leoshkevich. 2) Fix zero-size allocation warning in xsk sockets when the configured ring size was close to SIZE_MAX, from Andrew Kanner. 3) Fixes for bpf_mprog API that were found when implementing support in the ebpf-go library along with selftests, from Daniel Borkmann and Lorenz Bauer. 4) Fix riscv JIT to properly sign-extend the return register in programs. This fixes various test_progs selftests on riscv, from Björn Töpel. 5) Fix verifier log for async callback return values where the allowed range was displayed incorrectly, from David Vernet. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: s390/bpf: Fix unwinding past the trampoline s390/bpf: Fix clobbering the caller's backchain in the trampoline selftests/bpf: Add testcase for async callback return value failure bpf: Fix verifier log for async callback return values xdp: Fix zero-size allocation warning in xskq_create() riscv, bpf: Track both a0 (RISC-V ABI) and a5 (BPF) return values riscv, bpf: Sign-extend return values selftests/bpf: Make seen_tc* variable tests more robust selftests/bpf: Test query on empty mprog and pass revision into attach selftests/bpf: Adapt assert_mprog_count to always expect 0 count selftests/bpf: Test bpf_mprog query API via libbpf and raw syscall bpf: Refuse unused attributes in bpf_prog_{attach,detach} bpf: Handle bpf_mprog_query with NULL entry bpf: Fix BPF_PROG_QUERY last field check ==================== Link: https://lore.kernel.org/r/20231010223610.3984-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-10 19:59:49 -07:00
Kory Maincent	108a36d07c	ethtool: Fix mod state of verbose no_mask bitset A bitset without mask in a _SET request means we want exactly the bits in the bitset to be set. This works correctly for compact format but when verbose format is parsed, ethnl_update_bitset32_verbose() only sets the bits present in the request bitset but does not clear the rest. The commit `6699170376` fixes this issue by clearing the whole target bitmap before we start iterating. The solution proposed brought an issue with the behavior of the mod variable. As the bitset is always cleared the old val will always differ to the new val. Fix it by adding a new temporary variable which save the state of the old bitmap. Fixes: `6699170376` ("ethtool: fix application of verbose no_mask bitset") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231009133645.44503-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-10 19:48:15 -07:00
Jakub Kicinski	b52acd02c1	linux-can-fixes-for-6.6-20231009 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEDs2BvajyNKlf9TJQvlAcSiqKBOgFAmUjqCwTHG1rbEBwZW5n dXRyb25peC5kZQAKCRC+UBxKKooE6HJUB/sGBLojDlbGAqMFwhCmZ6ZNLg3xQcrB SNgIxA87jsMfSCGX9vkhkaXfNLOgDE2zYe4i2QB4M1iMatVY4MSY2vtJbw8oL6dr X6zT9STwFPBVlH/CIqfCq9eQNhKrIQ65khmYg2DtFJCBuZniBrhfZLwVROUj3FXr FUIAMNjn9Xtj2R5JwtOtn5hvdzO8z3dCQMtzqFVm9pSm5LJVkTGaDe85t/mkLdS2 stwlbGPVz+WElHueBDEjfbxiWnPgpEVSbuThTRxS0M5+a96uVHa4F+SFGgkSdYlI 2MQUGiJ797qZTy2MvkGaqa/1/uqcmNOWNm8NqzLfg4LQMvnFW8/qAaV8 =9CD6 -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-6.6-20231009' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2023-10-09 Lukas Magel's patch for the CAN ISO-TP protocol fixes the TX state detection and wait behavior. John Watts contributes a patch to only show the sun4i_can Kconfig option on ARCH_SUNXI. A patch by Miquel Raynal fixes the soft-reset workaround for Renesas SoCs in the sja1000 driver. Markus Schneider-Pargmann's patch for the tcan4x5x m_can glue driver fixes the id2 register for the tcan4553. 2 patches by Haibo Chen fix the flexcan stop mode for the imx93 SoC. * tag 'linux-can-fixes-for-6.6-20231009' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: tcan4x5x: Fix id2_register for tcan4553 can: flexcan: remove the auto stop mode for IMX93 can: sja1000: Always restart the Tx queue after an overrun arm64: dts: imx93: add the Flex-CAN stop mode by GPR can: sun4i_can: Only show Kconfig if ARCH_SUNXI is set can: isotp: isotp_sendmsg(): fix TX state detection and wait behavior ==================== Link: https://lore.kernel.org/r/20231009085256.693378-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-10 19:46:00 -07:00
Eric Dumazet	31c07dffaf	net: nfc: fix races in nfc_llcp_sock_get() and nfc_llcp_sock_get_sn() Sili Luo reported a race in nfc_llcp_sock_get(), leading to UAF. Getting a reference on the socket found in a lookup while holding a lock should happen before releasing the lock. nfc_llcp_sock_get_sn() has a similar problem. Finally nfc_llcp_recv_snl() needs to make sure the socket found by nfc_llcp_sock_from_sn() does not disappear. Fixes: `8f50020ed9` ("NFC: LLCP late binding") Reported-by: Sili Luo <rootlab@huawei.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20231009123110.3735515-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-10 19:44:44 -07:00
Jeremy Kerr	5093bbfc10	mctp: perform route lookups under a RCU read-side lock Our current route lookups (mctp_route_lookup and mctp_route_lookup_null) traverse the net's route list without the RCU read lock held. This means the route lookup is subject to preemption, resulting in an potential grace period expiry, and so an eventual kfree() while we still have the route pointer. Add the proper read-side critical section locks around the route lookups, preventing premption and a possible parallel kfree. The remaining net->mctp.routes accesses are already under a rcu_read_lock, or protected by the RTNL for updates. Based on an analysis from Sili Luo <rootlab@huawei.com>, where introducing a delay in the route lookup could cause a UAF on simultaneous sendmsg() and route deletion. Reported-by: Sili Luo <rootlab@huawei.com> Fixes: `889b7da23a` ("mctp: Add initial routing framework") Cc: stable@vger.kernel.org Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/29c4b0e67dc1bf3571df3982de87df90cae9b631.1696837310.git.jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-10 19:43:22 -07:00
Arnd Bergmann	1dab47139e	appletalk: remove ipddp driver After the cops driver is removed, ipddp is now the only CONFIG_DEV_APPLETALK but as far as I can tell, this also has no users and can be removed, making appletalk support purely based on ethertalk, using ethernet hardware. Link: https://lore.kernel.org/netdev/e490dd0c-a65d-4acf-89c6-c06cb48ec880@app.fastmail.com/ Link: https://lore.kernel.org/netdev/9cac4fbd-9557-b0b8-54fa-93f0290a6fb8@schmorgal.com/ Cc: Doug Brown <doug@schmorgal.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20231009141139.1766345-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-10 17:49:00 -07:00
Florian Westphal	6ac9c51eeb	netfilter: conntrack: prefer tcp_error_log to pr_debug pr_debug doesn't provide any information other than that a packet did not match existing state but also was found to not create a new connection. Replaces this with tcp_error_log, which will also dump packets' content so one can see if this is a stray FIN or RST. Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-10 16:34:28 +02:00
Florian Westphal	8a23f4ab92	netfilter: conntrack: simplify nf_conntrack_alter_reply nf_conntrack_alter_reply doesn't do helper reassignment anymore. Remove the comments that make this claim. Furthermore, remove dead code from the function and place ot in nf_conntrack.h. Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-10 16:34:28 +02:00
Phil Sutter	99ab9f84b8	netfilter: nf_tables: Don't allocate nft_rule_dump_ctx Since struct netlink_callback::args is not used by rule dumpers anymore, use it to hold nft_rule_dump_ctx. Add a build-time check to make sure it won't ever exceed the available space. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-10 16:34:28 +02:00
Phil Sutter	8194d599bc	netfilter: nf_tables: Carry s_idx in nft_rule_dump_ctx In order to move the context into struct netlink_callback's scratch area, the latter must be unused first. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-10 16:34:28 +02:00
Phil Sutter	405c8fd62d	netfilter: nf_tables: Carry reset flag in nft_rule_dump_ctx This relieves the dump callback from having to check nlmsg_type upon each call and instead performs the check once in .start callback. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-10 16:34:27 +02:00
Phil Sutter	30fa41a0f6	netfilter: nf_tables: Drop pointless memset when dumping rules None of the dump callbacks uses netlink_callback::args beyond the first element, no need to zero the data. Fixes: `96518518cc` ("netfilter: add nftables") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-10 16:34:20 +02:00
Phil Sutter	afed2b54c5	netfilter: nf_tables: Always allocate nft_rule_dump_ctx It will move into struct netlink_callback's scratch area later, just put nf_tables_dump_rules_start in shape to reduce churn later. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-10 16:01:42 +02:00
Gerd Bayer	a72178cfe8	net/smc: Fix dependency of SMC on ISM When the SMC protocol is built into the kernel proper while ISM is configured to be built as module, linking the kernel fails due to unresolved dependencies out of net/smc/smc_ism.o to ism_get_smcd_ops, ism_register_client, and ism_unregister_client as reported via the linux-next test automation (see link). This however is a bug introduced a while ago. Correct the dependency list in ISM's and SMC's Kconfig to reflect the dependencies that are actually inverted. With this you cannot build a kernel with CONFIG_SMC=y and CONFIG_ISM=m. Either ISM needs to be 'y', too - or a 'n'. That way, SMC can still be configured on non-s390 architectures that do not have (nor need) an ISM driver. Fixes: `89e7d2ba61` ("net/ism: Add new API for client registration") Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/linux-next/d53b5b50-d894-4df8-8969-fd39e63440ae@infradead.org/ Co-developed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20231006125847.1517840-1-gbayer@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-10 11:51:41 +02:00
David Morley	939463016b	tcp: change data receiver flowlabel after one dup This commit changes the data receiver repath behavior to occur after receiving a single duplicate. This can help recover ACK connectivity quicker if a TLP was sent along a nonworking path. For instance, consider the case where we have an initially nonworking forward path and reverse path and subsequently switch to only working forward paths. Before this patch we would have the following behavior. +---------+--------+--------+----------+----------+----------+ \| Event \| For FL \| Rev FL \| FP Works \| RP Works \| Data Del \| +---------+--------+--------+----------+----------+----------+ \| Initial \| A \| 1 \| N \| N \| 0 \| +---------+--------+--------+----------+----------+----------+ \| TLP \| A \| 1 \| N \| N \| 0 \| +---------+--------+--------+----------+----------+----------+ \| RTO 1 \| B \| 1 \| Y \| N \| 1 \| +---------+--------+--------+----------+----------+----------+ \| RTO 2 \| C \| 1 \| Y \| N \| 2 \| +---------+--------+--------+----------+----------+----------+ \| RTO 3 \| D \| 2 \| Y \| Y \| 3 \| +---------+--------+--------+----------+----------+----------+ This patch gets rid of at least RTO 3, avoiding additional unnecessary repaths of a working forward path to a (potentially) nonworking one. In addition, this commit changes the behavior to avoid repathing upon rx of duplicate data if the local endpoint is in CA_Loss (in which case the RTOs will already be changing the outgoing flowlabel). Signed-off-by: David Morley <morleyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Tested-by: David Morley <morleyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-10 10:02:59 +02:00
David Morley	95b9a87c6a	tcp: record last received ipv6 flowlabel In order to better estimate whether a data packet has been retransmitted or is the result of a TLP, we save the last received ipv6 flowlabel. To make space for this field we resize the "ato" field in inet_connection_sock as the current value of TCP_DELACK_MAX can be fully contained in 8 bits and add a compile_time_assert ensuring this field is the required size. v2: addressed kernel bot feedback about dccp_delack_timer() v3: addressed build error introduced by commit `bbf80d713f` ("tcp: derive delack_max from rto_min") Signed-off-by: David Morley <morleyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Tested-by: David Morley <morleyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-10 10:02:59 +02:00
Ma Ke	513f61e219	net: ipv4: fix return value check in esp_remove_trailer In esp_remove_trailer(), to avoid an unexpected result returned by pskb_trim, we should check the return value of pskb_trim(). Signed-off-by: Ma Ke <make_ruc2021@163.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2023-10-10 09:55:35 +02:00
Ma Ke	dad4e491e3	net: ipv6: fix return value check in esp_remove_trailer In esp_remove_trailer(), to avoid an unexpected result returned by pskb_trim, we should check the return value of pskb_trim(). Signed-off-by: Ma Ke <make_ruc2021@163.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2023-10-10 09:55:35 +02:00
Eric Dumazet	26c29961b1	net: refine debug info in skb_checksum_help() syzbot uses panic_on_warn. This means that the skb_dump() I added in the blamed commit are not even called. Rewrite this so that we get the needed skb dump before syzbot crashes. Fixes: `eeee4b77dc` ("net: add more debug info in skb_checksum_help()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231006173355.2254983-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-09 19:46:41 -07:00
Martynas Pumputis	dab4e1f06c	bpf: Derive source IP addr via bpf__fib_lookup() Extend the bpf_fib_lookup() helper by making it to return the source IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set. For example, the following snippet can be used to derive the desired source IP address: struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr }; ret = bpf_skb_fib_lookup(skb, p, sizeof(p), BPF_FIB_LOOKUP_SRC \| BPF_FIB_LOOKUP_SKIP_NEIGH); if (ret != BPF_FIB_LKUP_RET_SUCCESS) return TC_ACT_SHOT; / the p.ipv4_src now contains the source address */ The inability to derive the proper source address may cause malfunctions in BPF-based dataplanes for hosts containing netdevs with more than one routable IP address or for multi-homed hosts. For example, Cilium implements packet masquerading in BPF. If an egressing netdev to which the Cilium's BPF prog is attached has multiple IP addresses, then only one [hardcoded] IP address can be used for masquerading. This breaks connectivity if any other IP address should have been selected instead, for example, when a public and private addresses are attached to the same egress interface. The change was tested with Cilium [1]. Nikolay Aleksandrov helped to figure out the IPv6 addr selection. [1]: https://github.com/cilium/cilium/pull/28283 Signed-off-by: Martynas Pumputis <m@lambda.lt> Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-10-09 16:28:35 -07:00
Andrew Kanner	a12bbb3ccc	xdp: Fix zero-size allocation warning in xskq_create() Syzkaller reported the following issue: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 2807 at mm/vmalloc.c:3247 __vmalloc_node_range (mm/vmalloc.c:3361) Modules linked in: CPU: 0 PID: 2807 Comm: repro Not tainted 6.6.0-rc2+ #12 Hardware name: Generic DT based system unwind_backtrace from show_stack (arch/arm/kernel/traps.c:258) show_stack from dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) dump_stack_lvl from __warn (kernel/panic.c:633 kernel/panic.c:680) __warn from warn_slowpath_fmt (./include/linux/context_tracking.h:153 kernel/panic.c:700) warn_slowpath_fmt from __vmalloc_node_range (mm/vmalloc.c:3361 (discriminator 3)) __vmalloc_node_range from vmalloc_user (mm/vmalloc.c:3478) vmalloc_user from xskq_create (net/xdp/xsk_queue.c:40) xskq_create from xsk_setsockopt (net/xdp/xsk.c:953 net/xdp/xsk.c:1286) xsk_setsockopt from __sys_setsockopt (net/socket.c:2308) __sys_setsockopt from ret_fast_syscall (arch/arm/kernel/entry-common.S:68) xskq_get_ring_size() uses struct_size() macro to safely calculate the size of struct xsk_queue and q->nentries of desc members. But the syzkaller repro was able to set q->nentries with the value initially taken from copy_from_sockptr() high enough to return SIZE_MAX by struct_size(). The next PAGE_ALIGN(size) is such case will overflow the size_t value and set it to 0. This will trigger WARN_ON_ONCE in vmalloc_user() -> __vmalloc_node_range(). The issue is reproducible on 32-bit arm kernel. Fixes: `9f78bf330a` ("xsk: support use vaddr as ring") Reported-by: syzbot+fae676d3cf469331fc89@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000c84b4705fb31741e@google.com/T/ Reported-by: syzbot+b132693e925cbbd89e26@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000e20df20606ebab4f@google.com/T/ Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+fae676d3cf469331fc89@syzkaller.appspotmail.com Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://syzkaller.appspot.com/bug?extid=fae676d3cf469331fc89 Link: https://lore.kernel.org/bpf/20231007075148.1759-1-andrew.kanner@gmail.com	2023-10-09 16:13:29 +02:00
Jordan Rife	7563cf17dc	libceph: use kernel_connect() Direct calls to ops->connect() can overwrite the address parameter when used in conjunction with BPF SOCK_ADDR hooks. Recent changes to kernel_connect() ensure that callers are insulated from such side effects. This patch wraps the direct call to ops->connect() with kernel_connect() to prevent unexpected changes to the address passed to ceph_tcp_connect(). This change was originally part of a larger patch targeting the net tree addressing all instances of unprotected calls to ops->connect() throughout the kernel, but this change was split up into several patches targeting various trees. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/netdev/20230821100007.559638-1-jrife@google.com/ Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.camel@redhat.com/ Fixes: `d74bad4e74` ("bpf: Hooks for sys_connect") Signed-off-by: Jordan Rife <jrife@google.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-10-09 13:35:24 +02:00
Eric Dumazet	48533eca60	net: sock_dequeue_err_skb() optimization Exit early if the list is empty. Some applications using TCP zerocopy are calling recvmsg( ... MSG_ERRQUEUE) and hit this case quite often, probably because busy polling only deals with sk_receive_queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231005114504.642589-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-06 16:29:08 -07:00
Jakub Kicinski	a1fb841f9d	wireless-next patches for v6.7 The first pull request for v6.7, with both stack and driver changes. We have a big change how locking is handled in cfg80211 and mac80211 which removes several locks and hopefully simplifies the locking overall. In drivers rtw89 got MCC support and smaller features to other active drivers but nothing out of ordinary. This pull request got delayed because we were waiting for the wireless tree pull requested processed first and after that we merged wireless into wireless-next to avoid several conflicts in the stack. When pulling this there's one conflict in drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c: <<<<<<< HEAD static int cfg80211_rtw_change_beacon(struct wiphy wiphy, struct net_device ndev, struct cfg80211_beacon_data info) ======= static int cfg80211_rtw_change_beacon(struct wiphy wiphy, struct net_device ndev, struct cfg80211_ap_update info) >>>>>>> origin/merge-wireless-2023-10-05 Take the latter hunk which uses struct cfg80211_ap_update. Major changes: cfg80211 * remove wdev mutex, use the wiphy mutex instead * annotate iftype_data pointer with sparse * first kunit tests, for element defrag * remove unused scan_width support mac80211 * major locking rework, remove several locks like sta_mtx, key_mtx etc. and use the wiphy mutex instead * remove unused shifted rate support * support antenna control in frame injection (requires driver support) * convert RX_DROP_UNUSABLE to more detailed reason codes rtw89 * TDMA-based multi-channel concurrency (MCC) support iwlwifi * support set_antenna() operation * support frame injection antenna control ath12k * WCN7850: enable 320 MHz channels in 6 GHz band * WCN7850: hardware rfkill support * WCN7850: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS to make scan faster ath11k * add chip id board name while searching board-2.bin -----BEGIN PGP SIGNATURE----- iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmUgHYERHGt2YWxvQGtl cm5lbC5vcmcACgkQbhckVSbrbZtzIwf9Hn8lOUmbErtRmOjciesHoCvvjRelNuqG ymUl/zdcXpl6ZEZZvvXTaUMFpbR1h8/8rfQABA91ADp/1ax4PKuIGtgBfkEkUelG ONIbjJbYmPae8RZvAtBoy8k/QEt2NR97l1eRy+eqLovh273pFJu4eEW0AiLYJlZ8 rTAoej4RShhIvvh7O1zLt1It49tAFn+BwhDj6d3xCgdpmGJboJxwecEpqmJ0AVW9 pST5qDbANCbvfzElWswdCG48xludCtRG8WncJnkWjxeGqr+81yMrgIy+PQTwTAJ8 95Qe0sE9N5o9ZIE4hrdH+PuUYOCjD2O8IJFl6HkLeGh3Pr3+qGnxhQ== =V52q -----END PGP SIGNATURE----- Merge tag 'wireless-next-2023-10-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.7 The first pull request for v6.7, with both stack and driver changes. We have a big change how locking is handled in cfg80211 and mac80211 which removes several locks and hopefully simplifies the locking overall. In drivers rtw89 got MCC support and smaller features to other active drivers but nothing out of ordinary. Major changes: cfg80211 - remove wdev mutex, use the wiphy mutex instead - annotate iftype_data pointer with sparse - first kunit tests, for element defrag - remove unused scan_width support mac80211 - major locking rework, remove several locks like sta_mtx, key_mtx etc. and use the wiphy mutex instead - remove unused shifted rate support - support antenna control in frame injection (requires driver support) - convert RX_DROP_UNUSABLE to more detailed reason codes rtw89 - TDMA-based multi-channel concurrency (MCC) support iwlwifi - support set_antenna() operation - support frame injection antenna control ath12k - WCN7850: enable 320 MHz channels in 6 GHz band - WCN7850: hardware rfkill support - WCN7850: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS to make scan faster ath11k - add chip id board name while searching board-2.bin * tag 'wireless-next-2023-10-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (272 commits) wifi: rtlwifi: remove unreachable code in rtl92d_dm_check_edca_turbo() wifi: rtw89: debug: txpwr table supports Wi-Fi 7 chips wifi: rtw89: debug: show txpwr table according to chip gen wifi: rtw89: phy: set TX power RU limit according to chip gen wifi: rtw89: phy: set TX power limit according to chip gen wifi: rtw89: phy: set TX power offset according to chip gen wifi: rtw89: phy: set TX power by rate according to chip gen wifi: rtw89: mac: get TX power control register according to chip gen wifi: rtlwifi: use unsigned long for rtl_bssid_entry timestamp wifi: rtlwifi: fix EDCA limit set by BT coexistence wifi: rt2x00: fix MT7620 low RSSI issue wifi: rtw89: refine bandwidth 160MHz uplink OFDMA performance wifi: rtw89: refine uplink trigger based control mechanism wifi: rtw89: 8851b: update TX power tables to R34 wifi: rtw89: 8852b: update TX power tables to R35 wifi: rtw89: 8852c: update TX power tables to R67 wifi: rtw89: regd: configure Thailand in regulation type wifi: mac80211: add back SPDX identifier wifi: mac80211: fix ieee80211_drop_unencrypted_mgmt return type/value wifi: rtlwifi: cleanup few rtlxxxx_set_hw_reg() routines ... ==================== Link: https://lore.kernel.org/r/87jzrz6bvw.fsf@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-06 16:07:29 -07:00
Moshe Shemesh	aba0e909dc	devlink: Hold devlink lock on health reporter dump get Devlink health dump get callback should take devlink lock as any other devlink callback. Otherwise, since devlink_mutex was removed, this callback is not protected from a race of the reporter being destroyed while handling the callback. Add devlink lock to the callback and to any call for devlink_health_do_dump(). This should be safe as non of the drivers dump callback implementation takes devlink lock. As devlink lock is added to any callback of dump, the reporter dump_lock is now redundant and can be removed. Fixes: `d3efc2a6a6` ("net: devlink: remove devlink_mutex") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/1696510216-189379-1-git-send-email-moshe@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-06 15:56:46 -07:00
Jakub Kicinski	e794b089cd	linux-can-next-for-6.7-20231005 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEDs2BvajyNKlf9TJQvlAcSiqKBOgFAmUfE5ETHG1rbEBwZW5n dXRyb25peC5kZQAKCRC+UBxKKooE6H+OCACG8YWLFqa4J0CeRJ6DgUL1bpIGGTOZ V5nDydDeFzl8AdMBt+Zr97Z9n9gBYG/E1yfLMH4h4bR8RgvT+O0o3zHa4vlx9Qzf ArMqZE2Us05TkLOpLurMxs2ya2iR6PBaulDKq/lV2FLZJxPRcgHKc7lUPfjwoqeV tIPZsCOLJXWtcux17pYrqZ/aY/t1f506PCklsTSR/XDFyNODKDT3FI3MrPnR0ZcP uMi76k/HQC7e/1T70Wrp9IAKijORURo+OZJsOdgU9jcTGSdujcNE8j52ReMtTB+g w1gDoXkRm2AR2o3xR+QctdPGy/q32B4FQGS6ChSZPnSMSNf1vCcJbhr9 =1OP/ -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-6.7-20231005' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2023-10-05 The first patch is by Miquel Raynal and fixes a comment in the sja1000 driver. Vincent Mailhol contributes 2 patches that fix W=1 compiler warnings in the etas_es58x driver. Jiapeng Chong's patch removes an unneeded NULL pointer check before dev_put() in the CAN raw protocol. A patch by Justin Stittreplaces a strncpy() by strscpy() in the peak_pci sja1000 driver. The next 5 patches are by me and fix the can_restart() handler and replace BUG_ON()s in the CAN dev helpers with proper error handling. The last 27 patches are also by me and target the at91_can driver. First a new helper function is introduced, the at91_can driver is cleaned up and updated to use the rx-offload helper. * tag 'linux-can-next-for-6.7-20231005' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (37 commits) can: at91_can: switch to rx-offload implementation can: at91_can: at91_alloc_can_err_skb() introduce new function can: at91_can: at91_irq_err_line(): send error counters with state change can: at91_can: at91_irq_err_line(): make use of can_change_state() and can_bus_off() can: at91_can: at91_irq_err_line(): take reg_sr into account for bus off can: at91_can: at91_irq_err_line(): make use of can_state_get_by_berr_counter() can: at91_can: at91_irq_err(): rename to at91_irq_err_line() can: at91_can: at91_irq_err_frame(): move next to at91_irq_err() can: at91_can: at91_irq_err_frame(): call directly from IRQ handler can: at91_can: at91_poll_err(): increase stats even if no quota left or OOM can: at91_can: at91_poll_err(): fold in at91_poll_err_frame() can: at91_can: add CAN transceiver support can: at91_can: at91_open(): forward request_irq()'s return value in case or an error can: at91_can: at91_chip_start(): don't disable IRQs twice can: at91_can: at91_set_bittiming(): demote register output to debug level can: at91_can: rename struct at91_priv::{tx_next,tx_echo} to {tx_head,tx_tail} can: at91_can: at91_setup_mailboxes(): update comments can: at91_can: add more register definitions can: at91_can: MCR Register: convert to FIELD_PREP() can: at91_can: MSR Register: convert to FIELD_PREP() ... ==================== Link: https://lore.kernel.org/r/20231005195812.549776-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-06 15:42:12 -07:00
Johannes Berg	7d6904bf26	Merge wireless into wireless-next Resolve several conflicts, mostly between changes/fixes in wireless and the locking rework in wireless-next. One of the conflicts actually shows a bug in wireless that we'll want to fix separately. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org>	2023-10-06 17:08:47 +03:00
Lukas Magel	d9c2ba65e6	can: isotp: isotp_sendmsg(): fix TX state detection and wait behavior With patch [1], isotp_poll was updated to also queue the poller in the so->wait queue, which is used for send state changes. Since the queue now also contains polling tasks that are not interested in sending, the queue fill state can no longer be used as an indication of send readiness. As a consequence, nonblocking writes can lead to a race and lock-up of the socket if there is a second task polling the socket in parallel. With this patch, isotp_sendmsg does not consult wq_has_sleepers but instead tries to atomically set so->tx.state and waits on so->wait if it is unable to do so. This behavior is in alignment with isotp_poll, which also checks so->tx.state to determine send readiness. V2: - Revert direct exit to goto err_event_drop [1] https://lore.kernel.org/all/20230331125511.372783-1-michal.sojka@cvut.cz Reported-by: Maxime Jayat <maxime.jayat@mobile-devices.fr> Closes: https://lore.kernel.org/linux-can/11328958-453f-447f-9af8-3b5824dfb041@munic.io/ Signed-off-by: Lukas Magel <lukas.magel@posteo.net> Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Fixes: `79e19fa79c` ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events") Link: https://github.com/pylessard/python-udsoncan/issues/178#issuecomment-1743786590 Link: https://lore.kernel.org/all/20230827092205.7908-1-lukas.magel@posteo.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-10-06 12:54:33 +02:00
Gustavo A. R. Silva	c4d49196ce	net: sched: cls_u32: Fix allocation size in u32_init() commit `d61491a51f` ("net/sched: cls_u32: Replace one-element array with flexible-array member") incorrecly replaced an instance of `sizeof(tp_c)` with `struct_size(tp_c, hlist->ht, 1)`. This results in a an over-allocation of 8 bytes. This change is wrong because `hlist` in `struct tc_u_common` is a pointer: net/sched/cls_u32.c: struct tc_u_common { struct tc_u_hnode __rcu hlist; void ptr; int refcnt; struct idr handle_idr; struct hlist_node hnode; long knodes; }; So, the use of `struct_size()` makes no sense: we don't need to allocate any extra space for a flexible-array member. `sizeof(tp_c)` is just fine. So, `struct_size(tp_c, hlist->ht, 1)` translates to: sizeof(tp_c) + sizeof(tp_c->hlist->ht) == sizeof(struct tc_u_common) + sizeof(struct tc_u_knode ) == 144 + 8 == 0x98 (byes) ^^^ \| unnecessary extra allocation size $ pahole -C tc_u_common net/sched/cls_u32.o struct tc_u_common { struct tc_u_hnode * hlist; /* 0 8 / void ptr; /* 8 8 / int refcnt; / 16 4 / / XXX 4 bytes hole, try to pack / struct idr handle_idr; / 24 96 / / --- cacheline 1 boundary (64 bytes) was 56 bytes ago --- / struct hlist_node hnode; / 120 16 / / --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- / long int knodes; / 136 8 / / size: 144, cachelines: 3, members: 6 / / sum members: 140, holes: 1, sum holes: 4 / / last cacheline: 16 bytes / }; And with `sizeof(tp_c)`, we have: sizeof(tp_c) == sizeof(struct tc_u_common) == 144 == 0x90 (bytes) which is the correct and original allocation size. Fix this issue by replacing `struct_size(tp_c, hlist->ht, 1)` with `sizeof(tp_c)`, and avoid allocating 8 too many bytes. The following difference in binary output is expected and reflects the desired change: \| net/sched/cls_u32.o \| @@ -6148,7 +6148,7 @@ \| include/linux/slab.h:599 \| 2cf5: mov 0x0(%rip),%rdi # 2cfc <u32_init+0xfc> \| 2cf8: R_X86_64_PC32 kmalloc_caches+0xc \|- 2cfc: mov $0x98,%edx \|+ 2cfc: mov $0x90,%edx Reported-by: Alejandro Colomar <alx@kernel.org> Closes: https://lore.kernel.org/lkml/09b4a2ce-da74-3a19-6961-67883f634d98@kernel.org/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-06 11:43:05 +01:00
Kees Cook	b3783e5efd	net/packet: Annotate struct packet_fanout with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct packet_fanout. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Anqi Shen <amy.saq@antgroup.com> Cc: netdev@vger.kernel.org Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-06 11:36:25 +01:00
Kees Cook	eaede99c3a	netlink: Annotate struct netlink_policy_dump_state with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct netlink_policy_dump_state. Additionally update the size of the usage array length before accessing it. This requires remembering the old size for the memset() and later assignments. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: netdev@vger.kernel.org Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-06 10:48:46 +01:00
Kees Cook	0fef0907d6	netem: Annotate struct disttable with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct disttable. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://lore.kernel.org/r/20231003231823.work.684-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-05 18:31:49 -07:00
Jakub Kicinski	2606cf059c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts (or adjacent changes of note). Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-05 13:16:47 -07:00
Linus Torvalds	f291209eca	Including fixes from Bluetooth, netfilter, BPF and WiFi. I didn't collect precise data but feels like we've got a lot of 6.5 fixes here. WiFi fixes are most user-awaited. Current release - regressions: - Bluetooth: fix hci_link_tx_to RCU lock usage Current release - new code bugs: - bpf: mprog: fix maximum program check on mprog attachment - eth: ti: icssg-prueth: fix signedness bug in prueth_init_tx_chns() Previous releases - regressions: - ipv6: tcp: add a missing nf_reset_ct() in 3WHS handling - vringh: don't use vringh_kiov_advance() in vringh_iov_xfer(), it doesn't handle zero length like we expected - wifi: - cfg80211: fix cqm_config access race, fix crashes with brcmfmac - iwlwifi: mvm: handle PS changes in vif_cfg_changed - mac80211: fix mesh id corruption on 32 bit systems - mt76: mt76x02: fix MT76x0 external LNA gain handling - Bluetooth: fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER - l2tp: fix handling of transhdrlen in __ip{,6}_append_data() - dsa: mv88e6xxx: avoid EEPROM timeout when EEPROM is absent - eth: stmmac: fix the incorrect parameter after refactoring Previous releases - always broken: - net: replace calls to sock->ops->connect() with kernel_connect(), prevent address rewrite in kernel_bind(); otherwise BPF hooks may modify arguments, unexpectedly to the caller - tcp: fix delayed ACKs when reads and writes align with MSS - bpf: - verifier: unconditionally reset backtrack_state masks on global func exit - s390: let arch_prepare_bpf_trampoline return program size, fix struct_ops offsets - sockmap: fix accounting of available bytes in presence of PEEKs - sockmap: reject sk_msg egress redirects to non-TCP sockets - ipv4/fib: send netlink notify when delete source address routes - ethtool: plca: fix width of reads when parsing netlink commands - netfilter: nft_payload: rebuild vlan header on h_proto access - Bluetooth: hci_codec: fix leaking memory of local_codecs - eth: intel: ice: always add legacy 32byte RXDID in supported_rxdids - eth: stmmac: - dwmac-stm32: fix resume on STM32 MCU - remove buggy and unneeded stmmac_poll_controller, depend on NAPI - ibmveth: always recompute TCP pseudo-header checksum, fix use of the driver with Open vSwitch - wifi: - rtw88: rtw8723d: fix MAC address offset in EEPROM - mt76: fix lock dependency problem for wed_lock - mwifiex: sanity check data reported by the device - iwlwifi: ensure ack flag is properly cleared - iwlwifi: mvm: fix a memory corruption due to bad pointer arithm - iwlwifi: mvm: fix incorrect usage of scan API Misc: - wifi: mac80211: work around Cisco AP 9115 VHT MPDU length Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmUe/hkACgkQMUZtbf5S Iru94hAAlkVsHgGy7T8Te23m5Q5v33ZRj+7hbjFBN+be2TJu8cWo9qL6jGvrxBP6 D0X32ZvoWtX95Ua023Ibs1WtEc6ebROctTuDZpoIW35MYamFz9fYecoY4i+t+m1+ 6Y/UVRu4eHWXUtZclRQUEUdMe5HAJms9uNlIkTxLivvFhERgmGKtAsca8nuU9wMo lJJUAFxf4wJKx/338AUaa0yfsg6cQcgRpbm1csAR9VSa3mU1PbrIYzf7fMbWjRET 6DiijheeClb7biF4ZKqqHgZcProwVFxoFCq1GH+PCzaw8K1eIkGwXlpVEW89lgsC Ukc1L9JgLJIBSObvNWiO2mu5w+V89+XR2rL9KM3tW6x+k5tEncNL3WOphqH69NPS gfizGlvXjP+2aCJgHovvlQEaBn/7xNYWTAbqh1jVDleUC95Ur8ap4X42YB/3QvPN X9l8hp4Htu8SclqjXKtMz9qt6Ug5Uyi88o+1U53BNE6C6ICKW9i4uApT1DsLBAK8 QM5WPTj/ChIBbQu7HWNW+Ux3NX5R6fFzZ5BfKrjbuNEHQKRauj2300gVtU6xGS7T IFWiu8i00T34aXF2Vnfykc0zNRylhw/DHqtJFUxmJQOBQgyKlkjYacf2cYru5lnR BWA8Zsg5wpapT5CWSGlSRid4sRMtcDiMsI7fnIqB5CPhJGnR6eI= =JAtc -----END PGP SIGNATURE----- Merge tag 'net-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Bluetooth, netfilter, BPF and WiFi. I didn't collect precise data but feels like we've got a lot of 6.5 fixes here. WiFi fixes are most user-awaited. Current release - regressions: - Bluetooth: fix hci_link_tx_to RCU lock usage Current release - new code bugs: - bpf: mprog: fix maximum program check on mprog attachment - eth: ti: icssg-prueth: fix signedness bug in prueth_init_tx_chns() Previous releases - regressions: - ipv6: tcp: add a missing nf_reset_ct() in 3WHS handling - vringh: don't use vringh_kiov_advance() in vringh_iov_xfer(), it doesn't handle zero length like we expected - wifi: - cfg80211: fix cqm_config access race, fix crashes with brcmfmac - iwlwifi: mvm: handle PS changes in vif_cfg_changed - mac80211: fix mesh id corruption on 32 bit systems - mt76: mt76x02: fix MT76x0 external LNA gain handling - Bluetooth: fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER - l2tp: fix handling of transhdrlen in __ip{,6}_append_data() - dsa: mv88e6xxx: avoid EEPROM timeout when EEPROM is absent - eth: stmmac: fix the incorrect parameter after refactoring Previous releases - always broken: - net: replace calls to sock->ops->connect() with kernel_connect(), prevent address rewrite in kernel_bind(); otherwise BPF hooks may modify arguments, unexpectedly to the caller - tcp: fix delayed ACKs when reads and writes align with MSS - bpf: - verifier: unconditionally reset backtrack_state masks on global func exit - s390: let arch_prepare_bpf_trampoline return program size, fix struct_ops offsets - sockmap: fix accounting of available bytes in presence of PEEKs - sockmap: reject sk_msg egress redirects to non-TCP sockets - ipv4/fib: send netlink notify when delete source address routes - ethtool: plca: fix width of reads when parsing netlink commands - netfilter: nft_payload: rebuild vlan header on h_proto access - Bluetooth: hci_codec: fix leaking memory of local_codecs - eth: intel: ice: always add legacy 32byte RXDID in supported_rxdids - eth: stmmac: - dwmac-stm32: fix resume on STM32 MCU - remove buggy and unneeded stmmac_poll_controller, depend on NAPI - ibmveth: always recompute TCP pseudo-header checksum, fix use of the driver with Open vSwitch - wifi: - rtw88: rtw8723d: fix MAC address offset in EEPROM - mt76: fix lock dependency problem for wed_lock - mwifiex: sanity check data reported by the device - iwlwifi: ensure ack flag is properly cleared - iwlwifi: mvm: fix a memory corruption due to bad pointer arithm - iwlwifi: mvm: fix incorrect usage of scan API Misc: - wifi: mac80211: work around Cisco AP 9115 VHT MPDU length" * tag 'net-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) MAINTAINERS: update Matthieu's email address mptcp: userspace pm allow creating id 0 subflow mptcp: fix delegated action races net: stmmac: remove unneeded stmmac_poll_controller net: lan743x: also select PHYLIB net: ethernet: mediatek: disable irq before schedule napi net: mana: Fix oversized sge0 for GSO packets net: mana: Fix the tso_bytes calculation net: mana: Fix TX CQE error handling netlink: annotate data-races around sk->sk_err sctp: update hb timer immediately after users change hb_interval sctp: update transport state when processing a dupcook packet tcp: fix delayed ACKs for MSS boundary condition tcp: fix quick-ack counting to count actual ACKs of new data page_pool: fix documentation typos tipc: fix a potential deadlock on &tx->lock net: stmmac: dwmac-stm32: fix resume on STM32 MCU ipv4: Set offload_failed flag in fibmatch results netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure netfilter: nf_tables: Deduplicate nft_register_obj audit logs ...	2023-10-05 11:29:21 -07:00
Geliang Tang	e5ed101a60	mptcp: userspace pm allow creating id 0 subflow This patch drops id 0 limitation in mptcp_nl_cmd_sf_create() to allow creating additional subflows with the local addr ID 0. There is no reason not to allow additional subflows from this local address: we should be able to create new subflows from the initial endpoint. This limitation was breaking fullmesh support from userspace. Fixes: `702c2f646d` ("mptcp: netlink: allow userspace-driven subflow establishment") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/391 Cc: stable@vger.kernel.org Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231004-send-net-20231004-v1-2-28de4ac663ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-05 09:34:32 -07:00
Paolo Abeni	a5efdbcece	mptcp: fix delegated action races The delegated action infrastructure is prone to the following race: different CPUs can try to schedule different delegated actions on the same subflow at the same time. Each of them will check different bits via mptcp_subflow_delegate(), and will try to schedule the action on the related per-cpu napi instance. Depending on the timing, both can observe an empty delegated list node, causing the same entry to be added simultaneously on two different lists. The root cause is that the delegated actions infra does not provide a single synchronization point. Address the issue reserving an additional bit to mark the subflow as scheduled for delegation. Acquiring such bit guarantee the caller to own the delegated list node, and being able to safely schedule the subflow. Clear such bit only when the subflow scheduling is completed, ensuring proper barrier in place. Additionally swap the meaning of the delegated_action bitmask, to allow the usage of the existing helper to set multiple bit at once. Fixes: `bcd9773431` ("mptcp: use delegate action to schedule 3rd ack retrans") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231004-send-net-20231004-v1-1-28de4ac663ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-05 09:34:31 -07:00
Eric Dumazet	49e7265fd0	net_sched: sch_fq: add TCA_FQ_WEIGHTS attribute This attribute can be used to tune the per band weight and report them in "tc qdisc show" output: qdisc fq 802f: parent 1:9 limit 100000p flow_limit 500p buckets 1024 orphan_mask 1023 quantum 8364b initial_quantum 41820b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 10s horizon_drop bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 weights 589824 196608 65536 Sent 236460814 bytes 792991 pkt (dropped 0, overlimits 0 requeues 0) rate 25816bit 10pps backlog 0b 0p requeues 0 flows 4 (inactive 4 throttled 0) gc 0 throttled 19 latency 17.6us fastpath 773882 Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-05 13:27:46 +02:00
Eric Dumazet	29f834aa32	net_sched: sch_fq: add 3 bands and WRR scheduling Before Google adopted FQ for its production servers, we had to ensure AF4 packets would get a higher share than BE1 ones. As discussed this week in Netconf 2023 in Paris, it is time to upstream this for public use. After this patch FQ can replace pfifo_fast, with the following differences : - FQ uses WRR instead of strict prio, to avoid starvation of low priority packets. - We make sure each band/prio tracks its own usage against sch->limit. This was done to make sure flood of low priority packets would not prevent AF4 packets to be queued. Contributed by Willem. - priomap can be changed, if needed (default value are the ones coming from pfifo_fast). In this patch, we set default band weights so that : - high prio (band=0) packets get 90% of the bandwidth if they compete with low prio (band=2) packets. - high prio packets get 75% of the bandwidth if they compete with medium prio (band=1) packets. Following patch in this series adds the possibility to tune the per-band weights. As we added many fields in 'struct fq_sched_data', we had to make sure to have the first cache line read-mostly, and avoid wasting precious cache lines. More optimizations are possible but will be sent separately. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-05 13:27:39 +02:00
Eric Dumazet	5579ee462d	net_sched: export pfifo_fast prio2band[] pfifo_fast prio2band[] is renamed to sch_default_prio2band[] and exported because we want to share it in FQ. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-05 13:27:31 +02:00
Eric Dumazet	2ae45136a9	net_sched: sch_fq: remove q->ktime_cache Now that both enqueue() and dequeue() need to use ktime_get_ns(), there is no point wasting 8 bytes in struct fq_sched_data. This makes room for future fields. ;) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-05 13:27:20 +02:00
Eric Dumazet	d0f95894fd	netlink: annotate data-races around sk->sk_err syzbot caught another data-race in netlink when setting sk->sk_err. Annotate all of them for good measure. BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg write to 0xffff8881613bb220 of 4 bytes by task 28147 on cpu 0: netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994 sock_recvmsg_nosec net/socket.c:1027 [inline] sock_recvmsg net/socket.c:1049 [inline] __sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229 __do_sys_recvfrom net/socket.c:2247 [inline] __se_sys_recvfrom net/socket.c:2243 [inline] __x64_sys_recvfrom+0x78/0x90 net/socket.c:2243 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd write to 0xffff8881613bb220 of 4 bytes by task 28146 on cpu 1: netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994 sock_recvmsg_nosec net/socket.c:1027 [inline] sock_recvmsg net/socket.c:1049 [inline] __sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229 __do_sys_recvfrom net/socket.c:2247 [inline] __se_sys_recvfrom net/socket.c:2243 [inline] __x64_sys_recvfrom+0x78/0x90 net/socket.c:2243 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00000000 -> 0x00000016 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 28146 Comm: syz-executor.0 Not tainted 6.6.0-rc3-syzkaller-00055-g9ed22ae6be81 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231003183455.3410550-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 17:32:54 -07:00
Eric Dumazet	d86e5fbd4c	net: skb_queue_purge_reason() optimizations 1) Exit early if the list is empty. 2) splice the list into a local list, so that we block hard irqs only once. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231003181920.3280453-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 17:32:24 -07:00
Xin Long	1f4e803cd9	sctp: update hb timer immediately after users change hb_interval Currently, when hb_interval is changed by users, it won't take effect until the next expiry of hb timer. As the default value is 30s, users have to wait up to 30s to wait its hb_interval update to work. This becomes pretty bad in containers where a much smaller value is usually set on hb_interval. This patch improves it by resetting the hb timer immediately once the value of hb_interval is updated by users. Note that we don't address the already existing 'problem' when sending a heartbeat 'on demand' if one hb has just been sent(from the timer) mentioned in: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg590224.html Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/75465785f8ee5df2fb3acdca9b8fafdc18984098.1696172660.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 17:29:58 -07:00
Xin Long	2222a78075	sctp: update transport state when processing a dupcook packet During the 4-way handshake, the transport's state is set to ACTIVE in sctp_process_init() when processing INIT_ACK chunk on client or COOKIE_ECHO chunk on server. In the collision scenario below: 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] 192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] when processing COOKIE_ECHO on 192.168.1.2, as it's in COOKIE_WAIT state, sctp_sf_do_dupcook_b() is called by sctp_sf_do_5_2_4_dupcook() where it creates a new association and sets its transport to ACTIVE then updates to the old association in sctp_assoc_update(). However, in sctp_assoc_update(), it will skip the transport update if it finds a transport with the same ipaddr already existing in the old asoc, and this causes the old asoc's transport state not to move to ACTIVE after the handshake. This means if DATA retransmission happens at this moment, it won't be able to enter PF state because of the check 'transport->state == SCTP_ACTIVE' in sctp_do_8_2_transport_strike(). This patch fixes it by updating the transport in sctp_assoc_update() with sctp_assoc_add_peer() where it updates the transport state if there is already a transport with the same ipaddr exists in the old asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/fd17356abe49713ded425250cc1ae51e9f5846c6.1696172325.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 17:29:44 -07:00
Neal Cardwell	4720852ed9	tcp: fix delayed ACKs for MSS boundary condition This commit fixes poor delayed ACK behavior that can cause poor TCP latency in a particular boundary condition: when an application makes a TCP socket write that is an exact multiple of the MSS size. The problem is that there is painful boundary discontinuity in the current delayed ACK behavior. With the current delayed ACK behavior, we have: (1) If an app reads data when > 1MSS is unacknowledged, then tcp_cleanup_rbuf() ACKs immediately because of: tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss \|\| (2) If an app reads all received data, and the packets were < 1MSS, and either (a) the app is not ping-pong or (b) we received two packets < 1MSS, then tcp_cleanup_rbuf() ACKs immediately beecause of: ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED2) \|\| ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED) && !inet_csk_in_pingpong_mode(sk))) && (3) However: if an app reads exactly 1MSS of data, tcp_cleanup_rbuf() does not send an immediate ACK. This is true even if the app is not ping-pong and the 1MSS of data had the PSH bit set, suggesting the sending application completed an application write. Thus if the app is not ping-pong, we have this painful case where >1MSS gets an immediate ACK, and <1MSS gets an immediate ACK, but a write whose last skb is an exact multiple of 1MSS can get a 40ms delayed ACK. This means that any app that transfers data in one direction and takes care to align write size or packet size with MSS can suffer this problem. With receive zero copy making 4KB MSS values more common, it is becoming more common to have application writes naturally align with MSS, and more applications are likely to encounter this delayed ACK problem. The fix in this commit is to refine the delayed ACK heuristics with a simple check: immediately ACK a received 1MSS skb with PSH bit set if the app reads all data. Why? If an skb has a len of exactly 1MSS and has the PSH bit set then it is likely the end of an application write. So more data may not be arriving soon, and yet the data sender may be waiting for an ACK if cwnd-bound or using TX zero copy. Thus we set ICSK_ACK_PUSHED in this case so that tcp_cleanup_rbuf() will send an ACK immediately if the app reads all of the data and is not ping-pong. Note that this logic is also executed for the case where len > MSS, but in that case this logic does not matter (and does not hurt) because tcp_cleanup_rbuf() will always ACK immediately if the app reads data and there is more than an MSS of unACKed data. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Xin Guo <guoxin0309@gmail.com> Link: https://lore.kernel.org/r/20231001151239.1866845-2-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 15:34:18 -07:00
Neal Cardwell	059217c18b	tcp: fix quick-ack counting to count actual ACKs of new data This commit fixes quick-ack counting so that it only considers that a quick-ack has been provided if we are sending an ACK that newly acknowledges data. The code was erroneously using the number of data segments in outgoing skbs when deciding how many quick-ack credits to remove. This logic does not make sense, and could cause poor performance in request-response workloads, like RPC traffic, where requests or responses can be multi-segment skbs. When a TCP connection decides to send N quick-acks, that is to accelerate the cwnd growth of the congestion control module controlling the remote endpoint of the TCP connection. That quick-ack decision is purely about the incoming data and outgoing ACKs. It has nothing to do with the outgoing data or the size of outgoing data. And in particular, an ACK only serves the intended purpose of allowing the remote congestion control to grow the congestion window quickly if the ACK is ACKing or SACKing new data. The fix is simple: only count packets as serving the goal of the quickack mechanism if they are ACKing/SACKing new data. We can tell whether this is the case by checking inet_csk_ack_scheduled(), since we schedule an ACK exactly when we are ACKing/SACKing new data. Fixes: `fc6415bcb0` ("[TCP]: Fix quick-ack decrementing with TSO.") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231001151239.1866845-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 15:34:18 -07:00
Jakub Kicinski	c56e67f3ff	netfilter pull request 2023-10-04 -----BEGIN PGP SIGNATURE----- iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmUdcDQNHGZ3QHN0cmxl bi5kZQAKCRBwkajZrV/2AOf7EACipnPx/532GUk1pECg+iWGTfhFOu1jdHjAILzy +Ft/kfTLvd8kfZg6DuKIb6KYfj7w7uQ/xcD6wfqV8HBcss0SOyilx2ZUgH8ThwDv tSIsUsx1M1gOGkXK713GrD6h/PR5BBv3vVFymvr+MliYH4C2mmsGOGWk5D+s8IqU q3XDMMMlsZpfqCA8QGKK7TkFhnvnHdeoHGhZhw9ywXik733Qa4OUbJ5tkxztDKrr DKF/FhpYxWPKHURtPXaQpWuni7xbMjg+3lHYlWTRZkQRQOoPWidBuTumqJxvwT3Z FYwlS7T7OBMiFByy4spBnBs0uGiA6rR3sZ2/Gn98o9HpYlCllxpZm53Ay0u8sZTL RBhMkacOXTWN5n1fbIqHIZc6vs7Tm1crvT2V/CseAuhe9TDiD5cHkaz7hJUQif6h dmF48QHCHuSgWGtyPmVbTDSZ02YF++R398zHuBM2TXkFz8B9vI5DRpbXw0yX4ktg LZSKnBALOPN5Ye27+W+itNfNaMC3+Elto3Cv9IvpTaXWl8WpF8hnNagLObEXxJ1Q 3dLRKpSHDKJe7BLQoqm9ESFUE80bZr+S1Xleukz0z7AamCrM/rxQGKBwbTJs2NoE 1YezWzhw68+aQ7BY8eWigDAQKmtn1Oju3v5u5IekGKQVvXd5x97VGlJQRVxQvr2Z jDHNFw== =YLDi -----END PGP SIGNATURE----- Merge tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter patches for net First patch resolves a regression with vlan header matching, this was broken since 6.5 release. From myself. Second patch fixes an ancient problem with sctp connection tracking in case INIT_ACK packets are delayed. This comes with a selftest, both patches from Xin Long. Patch 4 extends the existing nftables audit selftest, from Phil Sutter. Patch 5, also from Phil, avoids a situation where nftables would emit an audit record twice. This was broken since 5.13 days. Patch 6, from myself, avoids spurious insertion failure if we encounter an overlapping but expired range during element insertion with the 'nft_set_rbtree' backend. This problem exists since 6.2. * tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure netfilter: nf_tables: Deduplicate nft_register_obj audit logs selftests: netfilter: Extend nft_audit.sh selftests: netfilter: test for sctp collision processing in nf_conntrack netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp netfilter: nft_payload: rebuild vlan header on h_proto access ==================== Link: https://lore.kernel.org/r/20231004141405.28749-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 14:53:17 -07:00
Jakub Kicinski	07cf7974a2	netfilter pull request 2023-09-28 -----BEGIN PGP SIGNATURE----- iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmUVjwUNHGZ3QHN0cmxl bi5kZQAKCRBwkajZrV/2AKneEACzrKtIC0j0DyhgVW4Kb57T8Y7cD5wQCv7oz1Cx 8A3UJ1pSLYhRnz94zY453GIenK+zx/KKIetDhyWnjA9gjk95HkUN+OwuuiKnUAgu 7KPGbIYat7hERwoZpR88nrbTYXcDZfcZGTqWA++3yL2vn4Lu4lsuowqXYKBf/axk 5gEwEtwn2mVsdo0qTVJcXkHqnf5CCdqd26ixF4yB1rz/P6kISi4I9q7ul43paFJW +/ifacdG+7raQkGlUlYiDNMVd0uO01HHaAcWfYa+FOMK+GSn+89zzTs906CU0g2O GRJSWjNTgfDtM2AHN7peUnf/G9XHSK2Y7Re8FzauKzwWSl5N9w5610nbQnT+ME5O uOZE1P/lhnidOwCEV8zU4yhs6fBrCMCHz+S5Yh8C8PCUhi12IEEYRHyGCoUVMOwY 1LINjdn4HddL57QUGumy0VqVBlxQru8VXnlzm0eIyhsbZ3/mVXQWIHX4u1G36UUQ zSkm4/qP4kna/tV86mETNX1MUcJsQ1vQ842abcUbxudKei/uT9av6YHlz/aBOcQZ NDMrGVO6mjh7/HnYUr7+zbQfhLZdg424SpGEoiuS7dDcTpGlcT3pnWBJDGEHsy+4 0VnLI8/GPT1/jQCCYTVLu+tn0XmfZF18j2bvGhz1hM9J/HXaRpuqjGF6thLgYl63 CZf5Yg== =ALU2 -----END PGP SIGNATURE----- Merge tag 'nf-next-23-09-28' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for net-next First patch, from myself, is a bug fix. The issue (connect timeout) is ancient, so I think its safe to give this more soak time given the esoteric conditions needed to trigger this. Also updates the existing selftest to cover this. Add netlink extacks when an update references a non-existent table/chain/set. This allows userspace to provide much better errors to the user, from Pablo Neira Ayuso. Last patch adds more policy checks to nf_tables as a better alternative to the existing runtime checks, from Phil Sutter. * tag 'nf-next-23-09-28' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: Utilize NLA_POLICY_NESTED_ARRAY netfilter: nf_tables: missing extended netlink error in lookup functions selftests: netfilter: test nat source port clash resolution interaction with tcp early demux netfilter: nf_nat: undo erroneous tcp edemux lookup after port clash ==================== Link: https://lore.kernel.org/r/20230928144916.18339-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 14:25:37 -07:00
Geert Uytterhoeven	2b464cc2fd	sctp: Spelling s/preceeding/preceding/g Fix a misspelling of "preceding". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/663b14d07d6d716ddc34482834d6b65a2f714cfb.1695903447.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 14:04:58 -07:00
Chengfeng Ye	08e50cf071	tipc: fix a potential deadlock on &tx->lock It seems that tipc_crypto_key_revoke() could be be invoked by wokequeue tipc_crypto_work_rx() under process context and timer/rx callback under softirq context, thus the lock acquisition on &tx->lock seems better use spin_lock_bh() to prevent possible deadlock. This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. tipc_crypto_work_rx() <workqueue> --> tipc_crypto_key_distr() --> tipc_bcast_xmit() --> tipc_bcbase_xmit() --> tipc_bearer_bc_xmit() --> tipc_crypto_xmit() --> tipc_ehdr_build() --> tipc_crypto_key_revoke() --> spin_lock(&tx->lock) <timer interrupt> --> tipc_disc_timeout() --> tipc_bearer_xmit_skb() --> tipc_crypto_xmit() --> tipc_ehdr_build() --> tipc_crypto_key_revoke() --> spin_lock(&tx->lock) <deadlock here> Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Fixes: `fc1b6d6de2` ("tipc: introduce TIPC encryption & authentication") Link: https://lore.kernel.org/r/20230927181414.59928-1-dg573847474@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 13:24:12 -07:00
Benjamin Poirier	0add5c597f	ipv4: Set offload_failed flag in fibmatch results Due to a small omission, the offload_failed flag is missing from ipv4 fibmatch results. Make sure it is set correctly. The issue can be witnessed using the following commands: echo "1 1" > /sys/bus/netdevsim/new_device ip link add dummy1 up type dummy ip route add 192.0.2.0/24 dev dummy1 echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/fib/fail_route_offload ip route add 198.51.100.0/24 dev dummy1 ip route # 192.168.15.0/24 has rt_trap # 198.51.100.0/24 has rt_offload_failed ip route get 192.168.15.1 fibmatch # Result has rt_trap ip route get 198.51.100.1 fibmatch # Result differs from the route shown by `ip route`, it is missing # rt_offload_failed ip link del dev dummy1 echo 1 > /sys/bus/netdevsim/del_device Fixes: `36c5100e85` ("IPv4: Add "offload failed" indication to routes") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230926182730.231208-1-bpoirier@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 11:39:36 -07:00
Jakub Kicinski	72897b2959	Quite a collection of fixes this time, really too many to list individually. Many stack fixes, even rfkill (found by simulation and the new eevdf scheduler)! Also a bigger maintainers file cleanup, to remove old and redundant information. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmUT/FoACgkQ10qiO8sP aAAeqg/+Iaim4AFPPDeWUvSARyqcMIKqmDtaFqkXE+OY8oahbvvbnGuLM/v/7V3r NG5pzoi0gqFasF+DC1lNWFUnmtWdjzjhhY9FInoWgiNO04V48c3NPI/YF/2Yvy0x biJxSsiaY+buY/p2QOXvlXHetnaftXNikFPZaD1mVGG/GIGZAwqqUO/EkIdliZf5 q/RBt5jzMF/nXTRIGc53kq7tKT97gGnDDYMych0U130PlyyqAZ+iAsR9UbiPa+ww Z1JB8qZO72Hx9iN0WMjXNBX8vxC961Dj+fkttgJqZALTk0UqQitwcsIJgM29JjTG VHsxHMdZAROFGaEHdMs1eHkqHkqyOxA4Jhr5LAci/chzgKkRw2I2LAXndOeg5TLH cWdhNjRW/ZJ64czr5hALQQxK4nj8m7SPFHY/6UuBnNHEXCjr7vUcuhcoK63Kb6Np 6Sg4jtyhakaXemqjhcmIYNF1dG5CDbUmQXFkj9Z9EEyHjAzGJ7ASmdhHwlBQnJuH 39ESEky2zQINAJbisaw9R2zj+V9Ia/mFSbi2q30kX5J4xTHGIURNo9OPkLAQWDdw 6u/d5VZLigliIK+Qj8kVtn41wmUEwB3W5Aq4CI7xB91vKRHCUZQvZ5xBJfHtk2pD 7VIzZscReWCsQP9T0hv38jeaw4m/P+mZO1iOC2qxveJvrQogZMk= =OL/S -----END PGP SIGNATURE----- Merge tag 'wireless-2023-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Quite a collection of fixes this time, really too many to list individually. Many stack fixes, even rfkill (found by simulation and the new eevdf scheduler)! Also a bigger maintainers file cleanup, to remove old and redundant information. * tag 'wireless-2023-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (32 commits) wifi: iwlwifi: mvm: Fix incorrect usage of scan API wifi: mac80211: Create resources for disabled links wifi: cfg80211: avoid leaking stack data into trace wifi: mac80211: allow transmitting EAPOL frames with tainted key wifi: mac80211: work around Cisco AP 9115 VHT MPDU length wifi: cfg80211: Fix 6GHz scan configuration wifi: mac80211: fix potential key leak wifi: mac80211: fix potential key use-after-free wifi: mt76: mt76x02: fix MT76x0 external LNA gain handling wifi: brcmfmac: Replace 1-element arrays with flexible arrays wifi: mwifiex: Fix oob check condition in mwifiex_process_rx_packet wifi: rtw88: rtw8723d: Fix MAC address offset in EEPROM rfkill: sync before userspace visibility/changes wifi: mac80211: fix mesh id corruption on 32 bit systems wifi: cfg80211: add missing kernel-doc for cqm_rssi_work wifi: cfg80211: fix cqm_config access race wifi: iwlwifi: mvm: Fix a memory corruption issue wifi: iwlwifi: Ensure ack flag is properly cleared. wifi: iwlwifi: dbg_ini: fix structure packing iwlwifi: mvm: handle PS changes in vif_cfg_changed ... ==================== Link: https://lore.kernel.org/r/20230927095835.25803-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 11:30:22 -07:00
Jakub Kicinski	1eb3dee16a	bpf-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZRqk1wAKCRDbK58LschI g8GRAQC4E0bw6BTFRl0b3MxvpZES6lU0BUtX2gKVK4tLZdXw/wEAmTlBXQqNzF3b BkCQknVbFTSw/8l8pzUW123Fb46wUAQ= =E3hd -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-10-02 We've added 11 non-merge commits during the last 12 day(s) which contain a total of 12 files changed, 176 insertions(+), 41 deletions(-). The main changes are: 1) Fix BPF verifier to reset backtrack_state masks on global function exit as otherwise subsequent precision tracking would reuse them, from Andrii Nakryiko. 2) Several sockmap fixes for available bytes accounting, from John Fastabend. 3) Reject sk_msg egress redirects to non-TCP sockets given this is only supported for TCP sockets today, from Jakub Sitnicki. 4) Fix a syzkaller splat in bpf_mprog when hitting maximum program limits with BPF_F_BEFORE directive, from Daniel Borkmann and Nikolay Aleksandrov. 5) Fix BPF memory allocator to use kmalloc_size_roundup() to adjust size_index for selecting a bpf_mem_cache, from Hou Tao. 6) Fix arch_prepare_bpf_trampoline return code for s390 JIT, from Song Liu. 7) Fix bpf_trampoline_get when CONFIG_BPF_JIT is turned off, from Leon Hwang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Use kmalloc_size_roundup() to adjust size_index selftest/bpf: Add various selftests for program limits bpf, mprog: Fix maximum program check on mprog attachment bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets bpf, sockmap: Add tests for MSG_F_PEEK bpf, sockmap: Do not inc copied_seq when PEEK flag set bpf: tcp_read_skb needs to pop skb regardless of seq bpf: unconditionally reset backtrack_state masks on global func exit bpf: Fix tr dereferencing selftests/bpf: Check bpf_cubic_acked() is called via struct_ops s390/bpf: Let arch_prepare_bpf_trampoline return program size ==================== Link: https://lore.kernel.org/r/20231002113417.2309-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-04 08:28:07 -07:00
Florian Westphal	087388278e	netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure nft_rbtree_gc_elem() walks back and removes the end interval element that comes before the expired element. There is a small chance that we've cached this element as 'rbe_ge'. If this happens, we hold and test a pointer that has been queued for freeing. It also causes spurious insertion failures: $ cat test-testcases-sets-0044interval_overlap_0.1/testout.log Error: Could not process rule: File exists add element t s { 0 - 2 } ^^^^^^ Failed to insert 0 - 2 given: table ip t { set s { type inet_service flags interval,timeout timeout 2s gc-interval 2s } } The set (rbtree) is empty. The 'failure' doesn't happen on next attempt. Reason is that when we try to insert, the tree may hold an expired element that collides with the range we're adding. While we do evict/erase this element, we can trip over this check: if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new)) return -ENOTEMPTY; rbe_ge was erased by the synchronous gc, we should not have done this check. Next attempt won't find it, so retry results in successful insertion. Restart in-kernel to avoid such spurious errors. Such restart are rare, unless userspace intentionally adds very large numbers of elements with very short timeouts while setting a huge gc interval. Even in this case, this cannot loop forever, on each retry an existing element has been removed. As the caller is holding the transaction mutex, its impossible for a second entity to add more expiring elements to the tree. After this it also becomes feasible to remove the async gc worker and perform all garbage collection from the commit path. Fixes: `c9e6978e27` ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-04 15:57:28 +02:00
Phil Sutter	0d880dc6f0	netfilter: nf_tables: Deduplicate nft_register_obj audit logs When adding/updating an object, the transaction handler emits suitable audit log entries already, the one in nft_obj_notify() is redundant. To fix that (and retain the audit logging from objects' 'update' callback), Introduce an "audit log free" variant for internal use. Fixes: `c520292f29` ("audit: log nftables configuration change events once per table") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> (Audit) Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-04 15:57:06 +02:00
Xin Long	8e56b063c8	netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp In Scenario A and B below, as the delayed INIT_ACK always changes the peer vtag, SCTP ct with the incorrect vtag may cause packet loss. Scenario A: INIT_ACK is delayed until the peer receives its own INIT_ACK 192.168.1.2 > 192.168.1.1: [INIT] [init tag: 1328086772] 192.168.1.1 > 192.168.1.2: [INIT] [init tag: 1414468151] 192.168.1.2 > 192.168.1.1: [INIT ACK] [init tag: 1328086772] 192.168.1.1 > 192.168.1.2: [INIT ACK] [init tag: 1650211246] * 192.168.1.2 > 192.168.1.1: [COOKIE ECHO] 192.168.1.1 > 192.168.1.2: [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: [COOKIE ACK] Scenario B: INIT_ACK is delayed until the peer completes its own handshake 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] 192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] * This patch fixes it as below: In SCTP_CID_INIT processing: - clear ct->proto.sctp.init[!dir] if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir]. (Scenario E) - set ct->proto.sctp.init[dir]. In SCTP_CID_INIT_ACK processing: - drop it if !ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] && ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario B, Scenario C) - drop it if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario A) In SCTP_CID_COOKIE_ACK processing: - clear ct->proto.sctp.init[dir] and ct->proto.sctp.init[!dir]. (Scenario D) Also, it's important to allow the ct state to move forward with cookie_echo and cookie_ack from the opposite dir for the collision scenarios. There are also other Scenarios where it should allow the packet through, addressed by the processing above: Scenario C: new CT is created by INIT_ACK. Scenario D: start INIT on the existing ESTABLISHED ct. Scenario E: start INIT after the old collision on the existing ESTABLISHED ct. 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] (both side are stopped, then start new connection again in hours) 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 242308742] Fixes: `9fb9cbb108` ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-04 14:12:01 +02:00
Florian Westphal	af84f9e447	netfilter: nft_payload: rebuild vlan header on h_proto access nft can perform merging of adjacent payload requests. This means that: ether saddr 00:11 ... ether type 8021ad ... is a single payload expression, for 8 bytes, starting at the ethernet source offset. Check that offset+length is fully within the source/destination mac addersses. This bug prevents 'ether type' from matching the correct h_proto in case vlan tag got stripped. Fixes: `de6843be30` ("netfilter: nft_payload: rebuild vlan header when needed") Reported-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-04 14:12:01 +02:00
Jiapeng Chong	dd8bb80308	can: raw: Remove NULL check before dev_{put, hold} The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there is no need to check before using dev_{put, hold}, remove it to silence the warning: ./net/can/raw.c:497:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6231 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reported-by: Simon Horman <horms@kernel.org> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230825064656.87751-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-10-04 12:25:12 +02:00
Patrick Rohr	473267a491	net: add sysctl to disable rfc4862 5.5.3e lifetime handling This change adds a sysctl to opt-out of RFC4862 section 5.5.3e's valid lifetime derivation mechanism. RFC4862 section 5.5.3e prescribes that the valid lifetime in a Router Advertisement PIO shall be ignored if it less than 2 hours and to reset the lifetime of the corresponding address to 2 hours. An in-progress 6man draft (see draft-ietf-6man-slaac-renum-07 section 4.2) is currently looking to remove this mechanism. While this draft has not been moving particularly quickly for other reasons, there is widespread consensus on section 4.2 which updates RFC4862 section 5.5.3e. Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Jen Linkova <furry@google.com> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230925214711.959704-1-prohr@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-03 15:51:04 -07:00
Jeremy Cline	dfc7f7a988	net: nfc: llcp: Add lock when modifying device list The device list needs its associated lock held when modifying it, or the list could become corrupted, as syzbot discovered. Reported-and-tested-by: syzbot+c1d0a03d305972dbbe14@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c1d0a03d305972dbbe14 Signed-off-by: Jeremy Cline <jeremy@jcline.org> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: `6709d4b7bc` ("net: nfc: Fix use-after-free caused by nfc_llcp_find_local") Link: https://lore.kernel.org/r/20230908235853.1319596-1-jeremy@jcline.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-03 07:39:31 -07:00
Parthiban Veerasooran	8957261cd8	ethtool: plca: fix plca enable data type while parsing the value The ETHTOOL_A_PLCA_ENABLED data type is u8. But while parsing the value from the attribute, nla_get_u32() is used in the plca_update_sint() function instead of nla_get_u8(). So plca_cfg.enabled variable is updated with some garbage value instead of 0 or 1 and always enables plca even though plca is disabled through ethtool application. This bug has been fixed by parsing the values based on the attributes type in the policy. Fixes: `8580e16c28` ("net/ethtool: add netlink interface for the PLCA RS") Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230908044548.5878-1-Parthiban.Veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-03 07:18:58 -07:00
Lukasz Majewski	5e5db71a92	net: dsa: tag_ksz: Extend ksz9477_xmit() for HSR frame duplication The KSZ9477 has support for HSR (High-Availability Seamless Redundancy). One of its offloading (i.e. performed in the switch IC hardware) features is to duplicate received frame to both HSR aware switch ports. To achieve this goal - the tail TAG needs to be modified. To be more specific, both ports must be marked as destination (egress) ones. The NETIF_F_HW_HSR_DUP flag indicates that the device supports HSR and assures (in HSR core code) that frame is sent only once from HOST to switch with tail tag indicating both ports. Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-03 13:51:02 +02:00
Vladimir Oltean	6715042cd1	net: dsa: notify drivers of MAC address changes on user ports In some cases, drivers may need to veto the changing of a MAC address on a user port. Such is the case with KSZ9477 when it offloads a HSR device, because it programs the MAC address of multiple ports to a shared hardware register. Those ports need to have equal MAC addresses for the lifetime of the HSR offload. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-03 13:51:02 +02:00
Vladimir Oltean	fefe5dc4af	net: dsa: propagate extack to ds->ops->port_hsr_join() Drivers can provide meaningful error messages which state a reason why they can't perform an offload, and dsa_slave_changeupper() already has the infrastructure to propagate these over netlink rather than printing to the kernel log. So pass the extack argument and modify the xrs700x driver's port_hsr_join() prototype. Also take the opportunity and use the extack for the 2 -EOPNOTSUPP cases from xrs700x_hsr_join(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-03 13:51:02 +02:00
Beniamino Galvani	f25e621f5d	ipv6: mark address parameters of udp_tunnel6_xmit_skb() as const The function doesn't modify the addresses passed as input, mark them as 'const' to make that clear. Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/20230924153014.786962-1-b.galvani@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-03 11:48:19 +02:00
Christophe JAILLET	ef35bed6fa	udp_tunnel: Use flex array to simplify code 'n_tables' is small, UDP_TUNNEL_NIC_MAX_TABLES = 4 as a maximum. So there is no real point to allocate the 'entries' pointers array with a dedicate memory allocation. Using a flexible array for struct udp_tunnel_nic->entries avoids the overhead of an additional memory allocation. This also saves an indirection when the array is accessed. Finally, __counted_by() can be used for run-time bounds checking if configured and supported by the compiler. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/4a096ba9cf981a588aa87235bb91e933ee162b3d.1695542544.git.christophe.jaillet@wanadoo.fr Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-03 11:39:34 +02:00
Eric Dumazet	6532e257aa	tcp_metrics: optimize tcp_metrics_flush_all() This is inspired by several syzbot reports where tcp_metrics_flush_all() was seen in the traces. We can avoid acquiring tcp_metrics_lock for empty buckets, and we should add one cond_resched() to break potential long loops. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-03 10:05:22 +02:00
Eric Dumazet	a135798e6e	tcp_metrics: do not create an entry from tcp_init_metrics() tcp_init_metrics() only wants to get metrics if they were previously stored in the cache. Creating an entry is adding useless costs, especially when tcp_no_metrics_save is set. Fixes: `51c5d0c4b1` ("tcp: Maintain dynamic metrics in local cache.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-03 10:05:22 +02:00
Eric Dumazet	081480014a	tcp_metrics: properly set tp->snd_ssthresh in tcp_init_metrics() We need to set tp->snd_ssthresh to TCP_INFINITE_SSTHRESH in the case tcp_get_metrics() fails for some reason. Fixes: `9ad7c049f0` ("tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-03 10:05:22 +02:00
Eric Dumazet	cbc3a15322	tcp_metrics: add missing barriers on delete When removing an item from RCU protected list, we must prevent store-tearing, using rcu_assign_pointer() or WRITE_ONCE(). Fixes: `04f721c671` ("tcp_metrics: Rewrite tcp_metrics_flush_all") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-03 10:05:22 +02:00
Ilya Maximets	9593c7cb6c	ipv6: tcp: add a missing nf_reset_ct() in 3WHS handling Commit `b0e214d212` ("netfilter: keep conntrack reference until IPsecv6 policy checks are done") is a direct copy of the old commit `b59c270104` ("[NETFILTER]: Keep conntrack reference until IPsec policy checks are done") but for IPv6. However, it also copies a bug that this old commit had. That is: when the third packet of 3WHS connection establishment contains payload, it is added into socket receive queue without the XFRM check and the drop of connection tracking context. That leads to nf_conntrack module being impossible to unload as it waits for all the conntrack references to be dropped while the packet release is deferred in per-cpu cache indefinitely, if not consumed by the application. The issue for IPv4 was fixed in commit `6f0012e351` ("tcp: add a missing nf_reset_ct() in 3WHS handling") by adding a missing XFRM check and correctly dropping the conntrack context. However, the issue was introduced to IPv6 code afterwards. Fixing it the same way for IPv6 now. Fixes: `b0e214d212` ("netfilter: keep conntrack reference until IPsecv6 policy checks are done") Link: https://lore.kernel.org/netdev/d589a999-d4dd-2768-b2d5-89dec64a4a42@ovn.org/ Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230922210530.2045146-1-i.maximets@ovn.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-03 09:49:24 +02:00
Hangbin Liu	4b2b606075	ipv4/fib: send notify when delete source address routes After deleting an interface address in fib_del_ifaddr(), the function scans the fib_info list for stray entries and calls fib_flush() and fib_table_flush(). Then the stray entries will be deleted silently and no RTM_DELROUTE notification will be sent. This lack of notification can make routing daemons, or monitor like `ip monitor route` miss the routing changes. e.g. + ip link add dummy1 type dummy + ip link add dummy2 type dummy + ip link set dummy1 up + ip link set dummy2 up + ip addr add 192.168.5.5/24 dev dummy1 + ip route add 7.7.7.0/24 dev dummy2 src 192.168.5.5 + ip -4 route 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5 As Ido reminded, fib_table_flush() isn't only called when an address is deleted, but also when an interface is deleted or put down. The lack of notification in these cases is deliberate. And commit `7c6bb7d2fa` ("net/ipv6: Add knob to skip DELROUTE message on device down") introduced a sysctl to make IPv6 behave like IPv4 in this regard. So we can't send the route delete notify blindly in fib_table_flush(). To fix this issue, let's add a new flag in "struct fib_info" to track the deleted prefer source address routes, and only send notify for them. After update: + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5 Deleted 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5 Suggested-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230922075508.848925-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-03 09:00:40 +02:00
Chuck Lever	160f404495	handshake: Fix sign of key_serial_t fields key_serial_t fields are signed integers. Use nla_get/put_s32 for those to avoid implicit signed conversion in the netlink protocol. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/169530167716.8905.645746457741372879.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-02 12:34:21 -07:00
Chuck Lever	a6b07a51b1	handshake: Fix sign of socket file descriptor fields Socket file descriptors are signed integers. Use nla_get/put_s32 for those to avoid implicit signed conversion in the netlink protocol. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/169530165057.8905.8650469415145814828.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-02 12:34:21 -07:00
Kees Cook	16ae53d80c	net: openvswitch: Annotate struct dp_meter with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct dp_meter. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Pravin B Shelar <pshelar@ovn.org> Cc: dev@openvswitch.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230922172858.3822653-12-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-02 11:24:55 -07:00
Kees Cook	e7b34822fa	net: openvswitch: Annotate struct dp_meter_instance with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct dp_meter_instance. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Pravin B Shelar <pshelar@ovn.org> Cc: dev@openvswitch.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230922172858.3822653-10-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-02 11:24:54 -07:00
Eric Dumazet	0271592522	inet: implement lockless getsockopt(IP_MULTICAST_IF) Add missing annotations to inet->mc_index and inet->mc_addr to fix data-races. getsockopt(IP_MULTICAST_IF) can be lockless. setsockopt() side is left for later. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:39:19 +01:00
Eric Dumazet	c4480eb550	inet: lockless IP_PKTOPTIONS implementation Current implementation is already lockless, because the socket lock is released before reading socket fields. Add missing READ_ONCE() annotations. Note that corresponding WRITE_ONCE() are needed, the order of the patches do not really matter. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:39:19 +01:00
Eric Dumazet	959d5c1160	inet: implement lockless getsockopt(IP_UNICAST_IF) Add missing READ_ONCE() annotations when reading inet->uc_index Implementing getsockopt(IP_UNICAST_IF) locklessly seems possible, the setsockopt() part might not be possible at the moment. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:39:19 +01:00
Eric Dumazet	3523bc91e4	inet: lockless getsockopt(IP_MTU) sk_dst_get() does not require socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:39:19 +01:00
Eric Dumazet	a4725d0d89	inet: lockless getsockopt(IP_OPTIONS) inet->inet_opt being RCU protected, we can use RCU instead of locking the socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:39:18 +01:00
Eric Dumazet	e08d0b3d17	inet: implement lockless IP_TOS Some reads of inet->tos are racy. Add needed READ_ONCE() annotations and convert IP_TOS option lockless. v2: missing changes in include/net/route.h (David Ahern) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:39:18 +01:00
Eric Dumazet	ceaa714138	inet: implement lockless IP_MTU_DISCOVER inet->pmtudisc can be read locklessly. Implement proper lockless reads and writes to inet->pmtudisc ip_sock_set_mtu_discover() can now be called from arbitrary contexts. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:39:18 +01:00
Eric Dumazet	c9746e6a19	inet: implement lockless IP_MULTICAST_TTL inet->mc_ttl can be read locklessly. Implement proper lockless reads and writes to inet->mc_ttl Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:39:18 +01:00
Jordan Rife	c889a99a21	net: prevent address rewrite in kernel_bind() Similar to the change in commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect"), BPF hooks run on bind may rewrite the address passed to kernel_bind(). This change 1) Makes a copy of the bind address in kernel_bind() to insulate callers. 2) Replaces direct calls to sock->ops->bind() in net with kernel_bind() Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: `4fbac77d2d` ("bpf: Hooks for sys_bind") Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jordan Rife <jrife@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:31:29 +01:00
Jordan Rife	86a7e0b69b	net: prevent rewrite of msg_name in sock_sendmsg() Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel space may observe their value of msg_name change in cases where BPF sendmsg hooks rewrite the send address. This has been confirmed to break NFS mounts running in UDP mode and has the potential to break other systems. This patch: 1) Creates a new function called __sock_sendmsg() with same logic as the old sock_sendmsg() function. 2) Replaces calls to sock_sendmsg() made by __sys_sendto() and __sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy, as these system calls are already protected. 3) Modifies sock_sendmsg() so that it makes a copy of msg_name if present before passing it down the stack to insulate callers from changes to the send address. Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: `1cedee13d2` ("bpf: Hooks for sys_sendmsg") Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jordan Rife <jrife@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:31:29 +01:00
Jordan Rife	26297b4ce1	net: replace calls to sock->ops->connect() with kernel_connect() commit `0bdf399342` ("net: Avoid address overwrite in kernel_connect") ensured that kernel_connect() will not overwrite the address parameter in cases where BPF connect hooks perform an address rewrite. This change replaces direct calls to sock->ops->connect() in net with kernel_connect() to make these call safe. Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: `d74bad4e74` ("bpf: Hooks for sys_connect") Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jordan Rife <jrife@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:31:29 +01:00
Eric Dumazet	eb44ad4e63	net: annotate data-races around sk->sk_dst_pending_confirm This field can be read or written without socket lock being held. Add annotations to avoid load-store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:09:54 +01:00
Eric Dumazet	5eef0b8de1	net: lockless implementation of SO_TXREHASH sk->sk_txrehash readers are already safe against concurrent change of this field. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:09:54 +01:00
Eric Dumazet	28b24f9002	net: implement lockless SO_MAX_PACING_RATE SO_MAX_PACING_RATE setsockopt() does not need to hold the socket lock, because sk->sk_pacing_rate readers can run fine if the value is changed by other threads, after adding READ_ONCE() accessors. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:09:54 +01:00
Eric Dumazet	2a4319cf3c	net: lockless implementation of SO_BUSY_POLL, SO_PREFER_BUSY_POLL, SO_BUSY_POLL_BUDGET Setting sk->sk_ll_usec, sk_prefer_busy_poll and sk_busy_poll_budget do not require the socket lock, readers are lockless anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:09:54 +01:00
Eric Dumazet	b120251590	net: lockless SO_{TYPE\|PROTOCOL\|DOMAIN\|ERROR } setsockopt() This options can not be set and return -ENOPROTOOPT, no need to acqure socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:09:54 +01:00
Eric Dumazet	8ebfb6db5a	net: lockless SO_PASSCRED, SO_PASSPIDFD and SO_PASSSEC sock->flags are atomic, no need to hold the socket lock in sk_setsockopt() for SO_PASSCRED, SO_PASSPIDFD and SO_PASSSEC. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:09:54 +01:00
Eric Dumazet	10bbf1652c	net: implement lockless SO_PRIORITY This is a followup of `8bf43be799` ("net: annotate data-races around sk->sk_priority"). sk->sk_priority can be read and written without holding the socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:09:54 +01:00
Ilya Maximets	06bc3668cc	openvswitch: reduce stack usage in do_execute_actions do_execute_actions() function can be called recursively multiple times while executing actions that require pipeline forking or recirculations. It may also be re-entered multiple times if the packet leaves openvswitch module and re-enters it through a different port. Currently, there is a 256-byte array allocated on stack in this function that is supposed to hold NSH header. Compilers tend to pre-allocate that space right at the beginning of the function: a88: 48 81 ec b0 01 00 00 sub $0x1b0,%rsp NSH is not a very common protocol, but the space is allocated on every recursive call or re-entry multiplying the wasted stack space. Move the stack allocation to push_nsh() function that is only used if NSH actions are actually present. push_nsh() is also a simple function without a possibility for re-entry, so the stack is returned right away. With this change the preallocated space is reduced by 256 B per call: b18: 48 81 ec b0 00 00 00 sub $0xb0,%rsp Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eelco Chaudron echaudro@redhat.com Reviewed-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 19:07:22 +01:00
David Howells	9d4c75800f	ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data() Including the transhdrlen in length is a problem when the packet is partially filled (e.g. something like send(MSG_MORE) happened previously) when appending to an IPv4 or IPv6 packet as we don't want to repeat the transport header or account for it twice. This can happen under some circumstances, such as splicing into an L2TP socket. The symptom observed is a warning in __ip6_append_data(): WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800 that occurs when MSG_SPLICE_PAGES is used to append more data to an already partially occupied skbuff. The warning occurs when 'copy' is larger than the amount of data in the message iterator. This is because the requested length includes the transport header length when it shouldn't. This can be triggered by, for example: sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP); bind(sfd, ...); // ::1 connect(sfd, ...); // ::1 port 7 send(sfd, buffer, 4100, MSG_MORE); sendfile(sfd, dfd, NULL, 1024); Fix this by only adding transhdrlen into the length if the write queue is empty in l2tp_ip6_sendmsg(), analogously to how UDP does things. l2tp_ip_sendmsg() looks like it won't suffer from this problem as it builds the UDP packet itself. Fixes: `a32e0eec70` ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6") Reported-by: syzbot+62cbf263225ae13ff153@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/0000000000001c12b30605378ce8@google.com/ Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Dumazet <edumazet@google.com> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> cc: "David S. Miller" <davem@davemloft.net> cc: David Ahern <dsahern@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jakub Kicinski <kuba@kernel.org> cc: netdev@vger.kernel.org cc: bpf@vger.kernel.org cc: syzkaller-bugs@googlegroups.com Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 18:19:27 +01:00
Eric Dumazet	5baa0433a1	neighbour: fix data-races around n->output n->output field can be read locklessly, while a writer might change the pointer concurrently. Add missing annotations to prevent load-store tearing. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 17:14:37 +01:00
Eric Dumazet	a56d9390bd	net: l2tp_eth: use generic dev->stats fields Core networking has opt-in atomic variant of dev->stats, simply use DEV_STATS_INC(), DEV_STATS_ADD() and DEV_STATS_READ(). v2: removed @priv local var in l2tp_eth_dev_recv() (Simon) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 16:33:01 +01:00
Eric Dumazet	25563b581b	net: fix possible store tearing in neigh_periodic_work() While looking at a related syzbot report involving neigh_periodic_work(), I found that I forgot to add an annotation when deleting an RCU protected item from a list. Readers use rcu_deference(*np), we need to use either rcu_assign_pointer() or WRITE_ONCE() on writer side to prevent store tearing. I use rcu_assign_pointer() to have lockdep support, this was the choice made in neigh_flush_dev(). Fixes: `767e97e1e0` ("neigh: RCU conversion of struct neighbour") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 16:29:06 +01:00
David S. Miller	c15cd642d4	bluetooth pull request for net: - Fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER - Fix handling of listen for ISO unicast - Fix build warnings - Fix leaking content of local_codecs - Add shutdown function for QCA6174 - Delete unused hci_req_prepare_suspend() declaration - Fix hci_link_tx_to RCU lock usage - Avoid redundant authentication -----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmULNMMZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKYF+D/4lgcowz8/5ry6Idp1yDR4r H4ns/Q199jUlLRKjUQxwGcAM3mp5JlM+ZGjHooFHZmXhthRCvqv2poyheVq9GdQT 46yCwAWSK+PmHlXzLUQrPK7eNUwFCG5T6wVtvQHMybN1sf3fBfR9TMZup4rMHUtI Zdz2e2WxiFj07TaSWIDh966YTPxq0uGEB+Fl7UQXnd4rlWWbke7uD6XcNm2b5+M5 mSeJUXfr+w3RUcNMT86OQ+vDlRTkzBN7zWkHvskI9o8wnnkkNPoUvIb7nJH6BafP iKHgC28eBI+8GXLTgC4GQs/LHQpTOPI6u2WSEjVImAdtTqMum4d7x55Tf+l8sY/G J222Izqt0cAvmPbkV+GLhtVzASfaE5Dmz5ORFf3r/sbG1TYndBnrjGUvFMr++D6r Bl19piDj08+k2GeJVJpnKNRDDycGnrxOZfbAKbezQSuFCU252RiIpT+FwJvkkkg2 epX1VJk0JQiE3MO7fOEO65smRxxQp/mWgNaCbZgbEeK1o/erKAPXCjTP3AMZ/kEW 2rrxQisv/BzaEBWrQSM3+1r7omzngOVfOJtlXmN94kIDJBCdM7A2Y5VRajgMR+tC uxfS4YeClB/dl217Yh4bcvZYYse+opkHH5c+nmC+Q+Lr1ZxMcgqWpY8fr/AtdJbB yMBTfPKX0BCb0NKZinPtxQ== =L6VV -----END PGP SIGNATURE----- Merge tag 'for-net-2023-09-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth bluetooth pull request for net: - Fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER - Fix handling of listen for ISO unicast - Fix build warnings - Fix leaking content of local_codecs - Add shutdown function for QCA6174 - Delete unused hci_req_prepare_suspend() declaration - Fix hci_link_tx_to RCU lock usage - Avoid redundant authentication Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 14:15:29 +01:00
Eric Dumazet	8f6c4ff9e0	net_sched: sch_fq: always garbage collect FQ performs garbage collection at enqueue time, and only if number of flows is above a given threshold, which is hit after the qdisc has been used a bit. Since an RB-tree traversal is needed to locate a flow, it makes sense to perform gc all the time, to keep rb-trees smaller. This reduces by 50 % average storage costs in FQ, and avoids 1 cache line miss at enqueue time when fast path added in prior patch can not be used. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 13:20:36 +01:00
Eric Dumazet	076433bd78	net_sched: sch_fq: add fast path for mostly idle qdisc TCQ_F_CAN_BYPASS can be used by few qdiscs. Idea is that if we queue a packet to an empty qdisc, following dequeue() would pick it immediately. FQ can not use the generic TCQ_F_CAN_BYPASS code, because some additional checks need to be performed. This patch adds a similar fast path to FQ. Most of the time, qdisc is not throttled, and many packets can avoid bringing/touching at least four cache lines, and consuming 128bytes of memory to store the state of a flow. After this patch, netperf can send UDP packets about 13 % faster, and pktgen goes 30 % faster (when FQ is in the way), on a fast NIC. TCP traffic is also improved, thanks to a reduction of cache line misses. I have measured a 5 % increase of throughput on a tcp_rr intensive workload. tc -s -d qd sh dev eth1 ... qdisc fq 8004: parent 1:2 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 10s horizon_drop Sent 5646784384 bytes 1985161 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 flows 122 (inactive 122 throttled 0) gc 0 highprio 0 fastpath 659990 throttled 27762 latency 8.57us Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 13:20:36 +01:00
Eric Dumazet	ee9af4e14d	net_sched: sch_fq: change how @inactive is tracked Currently, when one fq qdisc has no more packets to send, it can still have some flows stored in its RR lists (q->new_flows & q->old_flows) This was a design choice, but what is a bit disturbing is that the inactive_flows counter does not include the count of empty flows in RR lists. As next patch needs to know better if there are active flows, this change makes inactive_flows exact. Before the patch, following command on an empty qdisc could have returned: lpaa17:~# tc -s -d qd sh dev eth1 \| grep inactive flows 1322 (inactive 1316 throttled 0) flows 1330 (inactive 1325 throttled 0) flows 1193 (inactive 1190 throttled 0) flows 1208 (inactive 1202 throttled 0) After the patch, we now have: lpaa17:~# tc -s -d qd sh dev eth1 \| grep inactive flows 1322 (inactive 1322 throttled 0) flows 1330 (inactive 1330 throttled 0) flows 1193 (inactive 1193 throttled 0) flows 1208 (inactive 1208 throttled 0) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 13:20:36 +01:00
Eric Dumazet	54ff8ad69c	net_sched: sch_fq: struct sched_data reorg q->flows can be often modified, and q->timer_slack is read mostly. Exchange the two fields, so that cache line countaining quantum, initial_quantum, and other critical parameters stay clean (read-mostly). Move q->watchdog next to q->stat_throttled Add comments explaining how the structure is split in three different parts. pahole output before the patch: struct fq_sched_data { struct fq_flow_head new_flows; /* 0 0x10 / struct fq_flow_head old_flows; / 0x10 0x10 / struct rb_root delayed; / 0x20 0x8 / u64 time_next_delayed_flow; / 0x28 0x8 / u64 ktime_cache; / 0x30 0x8 / unsigned long unthrottle_latency_ns; / 0x38 0x8 / / --- cacheline 1 boundary (64 bytes) --- / struct fq_flow internal __attribute__((__aligned__(64))); / 0x40 0x80 / / XXX last struct has 16 bytes of padding / / --- cacheline 3 boundary (192 bytes) --- / u32 quantum; / 0xc0 0x4 / u32 initial_quantum; / 0xc4 0x4 / u32 flow_refill_delay; / 0xc8 0x4 / u32 flow_plimit; / 0xcc 0x4 / unsigned long flow_max_rate; / 0xd0 0x8 / u64 ce_threshold; / 0xd8 0x8 / u64 horizon; / 0xe0 0x8 / u32 orphan_mask; / 0xe8 0x4 / u32 low_rate_threshold; / 0xec 0x4 / struct rb_root fq_root; /* 0xf0 0x8 / u8 rate_enable; / 0xf8 0x1 / u8 fq_trees_log; / 0xf9 0x1 / u8 horizon_drop; / 0xfa 0x1 / / XXX 1 byte hole, try to pack / <bad> u32 flows; / 0xfc 0x4 / / --- cacheline 4 boundary (256 bytes) --- / u32 inactive_flows; / 0x100 0x4 / u32 throttled_flows; / 0x104 0x4 / u64 stat_gc_flows; / 0x108 0x8 / u64 stat_internal_packets; / 0x110 0x8 / u64 stat_throttled; / 0x118 0x8 / u64 stat_ce_mark; / 0x120 0x8 / u64 stat_horizon_drops; / 0x128 0x8 / u64 stat_horizon_caps; / 0x130 0x8 / u64 stat_flows_plimit; / 0x138 0x8 / / --- cacheline 5 boundary (320 bytes) --- / u64 stat_pkts_too_long; / 0x140 0x8 / u64 stat_allocation_errors; / 0x148 0x8 / <bad> u32 timer_slack; / 0x150 0x4 / / XXX 4 bytes hole, try to pack / struct qdisc_watchdog watchdog; / 0x158 0x48 / / size: 448, cachelines: 7, members: 34 / / sum members: 411, holes: 2, sum holes: 5 / / padding: 32 / / paddings: 1, sum paddings: 16 / / forced alignments: 1 / }; pahole output after the patch: struct fq_sched_data { struct fq_flow_head new_flows; / 0 0x10 / struct fq_flow_head old_flows; / 0x10 0x10 / struct rb_root delayed; / 0x20 0x8 / u64 time_next_delayed_flow; / 0x28 0x8 / u64 ktime_cache; / 0x30 0x8 / unsigned long unthrottle_latency_ns; / 0x38 0x8 / / --- cacheline 1 boundary (64 bytes) --- / struct fq_flow internal __attribute__((__aligned__(64))); / 0x40 0x80 / / XXX last struct has 16 bytes of padding / / --- cacheline 3 boundary (192 bytes) --- / u32 quantum; / 0xc0 0x4 / u32 initial_quantum; / 0xc4 0x4 / u32 flow_refill_delay; / 0xc8 0x4 / u32 flow_plimit; / 0xcc 0x4 / unsigned long flow_max_rate; / 0xd0 0x8 / u64 ce_threshold; / 0xd8 0x8 / u64 horizon; / 0xe0 0x8 / u32 orphan_mask; / 0xe8 0x4 / u32 low_rate_threshold; / 0xec 0x4 / struct rb_root fq_root; /* 0xf0 0x8 / u8 rate_enable; / 0xf8 0x1 / u8 fq_trees_log; / 0xf9 0x1 / u8 horizon_drop; / 0xfa 0x1 / / XXX 1 byte hole, try to pack / <good> u32 timer_slack; / 0xfc 0x4 / / --- cacheline 4 boundary (256 bytes) --- / <good> u32 flows; / 0x100 0x4 / u32 inactive_flows; / 0x104 0x4 / u32 throttled_flows; / 0x108 0x4 / / XXX 4 bytes hole, try to pack / u64 stat_throttled; / 0x110 0x8 / <better> struct qdisc_watchdog watchdog; / 0x118 0x48 / / --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- / u64 stat_gc_flows; / 0x160 0x8 / u64 stat_internal_packets; / 0x168 0x8 / u64 stat_ce_mark; / 0x170 0x8 / u64 stat_horizon_drops; / 0x178 0x8 / / --- cacheline 6 boundary (384 bytes) --- / u64 stat_horizon_caps; / 0x180 0x8 / u64 stat_flows_plimit; / 0x188 0x8 / u64 stat_pkts_too_long; / 0x190 0x8 / u64 stat_allocation_errors; / 0x198 0x8 / / Force padding: / u64 :64; u64 :64; u64 :64; u64 :64; / size: 448, cachelines: 7, members: 34 / / sum members: 411, holes: 2, sum holes: 5 / / padding: 32 / / paddings: 1, sum paddings: 16 / / forced alignments: 1 */ }; Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 13:20:36 +01:00
Eric Dumazet	bbf80d713f	tcp: derive delack_max from rto_min While BPF allows to set icsk->->icsk_delack_max and/or icsk->icsk_rto_min, we have an ip route attribute (RTAX_RTO_MIN) to be able to tune rto_min, but nothing to consequently adjust max delayed ack, which vary from 40ms to 200 ms (TCP_DELACK_{MIN\|MAX}). This makes RTAX_RTO_MIN of almost no practical use, unless customers are in big trouble. Modern days datacenter communications want to set rto_min to ~5 ms, and the max delayed ack one jiffie smaller to avoid spurious retransmits. After this patch, an "rto_min 5" route attribute will effectively lower max delayed ack timers to 4 ms. Note in the following ss output, "rto:6 ... ato:4" $ ss -temoi dst XXXXXX State Recv-Q Send-Q Local Address:Port Peer Address:Port Process ESTAB 0 0 [2002:a05:6608:295::]:52950 [2002:a05:6608:297::]:41597 ino:255134 sk:1001 <-> skmem:(r0,rb1707063,t872,tb262144,f0,w0,o0,bl0,d0) ts sack cubic wscale:8,8 rto:6 rtt:0.02/0.002 ato:4 mss:4096 pmtu:4500 rcvmss:536 advmss:4096 cwnd:10 bytes_sent:54823160 bytes_acked:54823121 bytes_received:54823120 segs_out:1370582 segs_in:1370580 data_segs_out:1370579 data_segs_in:1370578 send 16.4Gbps pacing_rate 32.6Gbps delivery_rate 1.72Gbps delivered:1370579 busy:26920ms unacked:1 rcv_rtt:34.615 rcv_space:65920 rcv_ssthresh:65535 minrtt:0.015 snd_wnd:65536 While we could argue this patch fixes a bug with RTAX_RTO_MIN, I do not add a Fixes: tag, so that we can soak it a bit before asking backports to stable branches. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-01 13:13:01 +01:00
Johannes Berg	aa75cc029e	wifi: mac80211: add back SPDX identifier Looks like I lost that by accident, add it back. Fixes: `076fc8775d` ("wifi: cfg80211: remove wdev mutex") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-29 23:21:33 +02:00
Johannes Berg	c419d88455	wifi: mac80211: fix ieee80211_drop_unencrypted_mgmt return type/value Somehow, I managed to botch this and pretty much completely break wifi. My original patch did contain these changes, but I seem to have lost them before sending to the list. Fix it now. Reported-and-tested-by: Kalle Valo <kvalo@kernel.org> Fixes: `6c02fab724` ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-29 23:21:15 +02:00
Jakub Sitnicki	b80e31baa4	bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets With a SOCKMAP/SOCKHASH map and an sk_msg program user can steer messages sent from one TCP socket (s1) to actually egress from another TCP socket (s2): tcp_bpf_sendmsg(s1) // = sk_prot->sendmsg tcp_bpf_send_verdict(s1) // __SK_REDIRECT case tcp_bpf_sendmsg_redir(s2) tcp_bpf_push_locked(s2) tcp_bpf_push(s2) tcp_rate_check_app_limited(s2) // expects tcp_sock tcp_sendmsg_locked(s2) // ditto There is a hard-coded assumption in the call-chain, that the egress socket (s2) is a TCP socket. However in commit `122e6c79ef` ("sock_map: Update sock type checks for UDP") we have enabled redirects to non-TCP sockets. This was done for the sake of BPF sk_skb programs. There was no indention to support sk_msg send-to-egress use case. As a result, attempts to send-to-egress through a non-TCP socket lead to a crash due to invalid downcast from sock to tcp_sock: BUG: kernel NULL pointer dereference, address: 000000000000002f ... Call Trace: <TASK> ? show_regs+0x60/0x70 ? __die+0x1f/0x70 ? page_fault_oops+0x80/0x160 ? do_user_addr_fault+0x2d7/0x800 ? rcu_is_watching+0x11/0x50 ? exc_page_fault+0x70/0x1c0 ? asm_exc_page_fault+0x27/0x30 ? tcp_tso_segs+0x14/0xa0 tcp_write_xmit+0x67/0xce0 __tcp_push_pending_frames+0x32/0xf0 tcp_push+0x107/0x140 tcp_sendmsg_locked+0x99f/0xbb0 tcp_bpf_push+0x19d/0x3a0 tcp_bpf_sendmsg_redir+0x55/0xd0 tcp_bpf_send_verdict+0x407/0x550 tcp_bpf_sendmsg+0x1a1/0x390 inet_sendmsg+0x6a/0x70 sock_sendmsg+0x9d/0xc0 ? sockfd_lookup_light+0x12/0x80 __sys_sendto+0x10e/0x160 ? syscall_enter_from_user_mode+0x20/0x60 ? __this_cpu_preempt_check+0x13/0x20 ? lockdep_hardirqs_on+0x82/0x110 __x64_sys_sendto+0x1f/0x30 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reject selecting a non-TCP sockets as redirect target from a BPF sk_msg program to prevent the crash. When attempted, user will receive an EACCES error from send/sendto/sendmsg() syscall. Fixes: `122e6c79ef` ("sock_map: Update sock type checks for UDP") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20230920102055.42662-1-jakub@cloudflare.com	2023-09-29 17:11:07 +02:00
John Fastabend	da9e915eaf	bpf, sockmap: Do not inc copied_seq when PEEK flag set When data is peek'd off the receive queue we shouldn't considered it copied from tcp_sock side. When we increment copied_seq this will confuse tcp_data_ready() because copied_seq can be arbitrarily increased. From application side it results in poll() operations not waking up when expected. Notice tcp stack without BPF recvmsg programs also does not increment copied_seq. We broke this when we moved copied_seq into recvmsg to only update when actual copy was happening. But, it wasn't working correctly either before because the tcp_data_ready() tried to use the copied_seq value to see if data was read by user yet. See fixes tags. Fixes: `e5c6de5fa0` ("bpf, sockmap: Incorrectly handling copied_seq") Fixes: `04919bed94` ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230926035300.135096-3-john.fastabend@gmail.com	2023-09-29 17:05:00 +02:00
John Fastabend	9b7177b1df	bpf: tcp_read_skb needs to pop skb regardless of seq Before fix `e5c6de5fa0` tcp_read_skb() would increment the tp->copied-seq value. This (as described in the commit) would cause an error for apps because once that is incremented the application might believe there is no data to be read. Then some apps would stall or abort believing no data is available. However, the fix is incomplete because it introduces another issue in the skb dequeue. The loop does tcp_recv_skb() in a while loop to consume as many skbs as possible. The problem is the call is ... tcp_recv_skb(sk, seq, &offset) ... where 'seq' is: u32 seq = tp->copied_seq; Now we can hit a case where we've yet incremented copied_seq from BPF side, but then tcp_recv_skb() fails this test ... if (offset < skb->len \|\| (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) ... so that instead of returning the skb we call tcp_eat_recv_skb() which frees the skb. This is because the routine believes the SKB has been collapsed per comment: /* This looks weird, but this can happen if TCP collapsing * splitted a fat GRO packet, while we released socket lock * in skb_splice_bits() */ This can't happen here we've unlinked the full SKB and orphaned it. Anyways it would confuse any BPF programs if the data were suddenly moved underneath it. To fix this situation do simpler operation and just skb_peek() the data of the queue followed by the unlink. It shouldn't need to check this condition and tcp_read_skb() reads entire skbs so there is no need to handle the 'offset!=0' case as we would see in tcp_read_sock(). Fixes: `e5c6de5fa0` ("bpf, sockmap: Incorrectly handling copied_seq") Fixes: `04919bed94` ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230926035300.135096-2-john.fastabend@gmail.com	2023-09-29 17:04:07 +02:00
Phil Sutter	013714bf3e	netfilter: nf_tables: Utilize NLA_POLICY_NESTED_ARRAY Mark attributes which are supposed to be arrays of nested attributes with known content as such. Originally suggested for NFTA_RULE_EXPRESSIONS only, but does apply to others as well. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-09-28 16:31:29 +02:00
Pablo Neira Ayuso	aee1f692bf	netfilter: nf_tables: missing extended netlink error in lookup functions Set netlink extended error reporting for several lookup functions which allows userspace to infer what is the error cause. Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-09-28 16:25:42 +02:00
Florian Westphal	e27c329511	netfilter: nf_nat: undo erroneous tcp edemux lookup after port clash In commit `03a3ca37e4` ("netfilter: nf_nat: undo erroneous tcp edemux lookup") I fixed a problem with source port clash resolution and DNAT. A very similar issue exists with REDIRECT (DNAT to local address) and port rewrites. Consider two port redirections done at prerouting hook: -p tcp --port 1111 -j REDIRECT --to-ports 80 -p tcp --port 1112 -j REDIRECT --to-ports 80 Its possible, however unlikely, that we get two connections sharing the same source port, i.e. saddr:12345 -> daddr:1111 saddr:12345 -> daddr:1112 This works on sender side because destination address is different. After prerouting, nat will change first syn packet to saddr:12345 -> daddr:80, stack will send a syn-ack back and 3whs completes. The second syn however will result in a source port clash: after dnat rewrite, new syn has saddr:12345 -> daddr:80 This collides with the reply direction of the first connection. The NAT engine will handle this in the input nat hook by also altering the source port, so we get for example saddr:13535 -> daddr:80 This allows the stack to send back a syn-ack to that address. Reverse NAT during POSTROUTING will rewrite the packet to daddr:1112 -> saddr:12345 again. Tuple will be unique on-wire and peer can process it normally. Problem is when ACK packet comes in: After prerouting, packet payload is mangled to saddr:12345 -> daddr:80. Early demux will assign the 3whs-completing ACK skb to the first connections' established socket. This will then elicit a challenge ack from the first connections' socket rather than complete the connection of the second. The second connection can never complete. Detect this condition by checking if the associated sockets port matches the conntrack entries reply tuple. If it doesn't, then input source address translation mangled payload after early demux and the found sk is incorrect. Discard this sk and let TCP stack do another lookup. Signed-off-by: Florian Westphal <fw@strlen.de>	2023-09-28 16:25:42 +02:00
Liang Chen	7c7dd1d649	pktgen: Introducing 'SHARED' flag for testing with non-shared skb Currently, skbs generated by pktgen always have their reference count incremented before transmission, causing their reference count to be always greater than 1, leading to two issues: 1. Only the code paths for shared skbs can be tested. 2. In certain situations, skbs can only be released by pktgen. To enhance testing comprehensiveness, we are introducing the "SHARED" flag to indicate whether an SKB is shared. This flag is enabled by default, aligning with the current behavior. However, disabling this flag allows skbs with a reference count of 1 to be transmitted. So we can test non-shared skbs and code paths where skbs are released within the stack. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/20230920125658.46978-2-liangchen.linux@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-28 16:25:14 +02:00
Liang Chen	057708a9ca	pktgen: Automate flag enumeration for unknown flag handling When specifying an unknown flag, it will print all available flags. Currently, these flags are provided as fixed strings, which requires manual updates when flags change. Replacing it with automated flag enumeration. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/20230920125658.46978-1-liangchen.linux@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-28 16:25:14 +02:00
Anna Schumaker	26e8bfa30d	SUNRPC/TLS: Lock the lower_xprt during the tls handshake Otherwise we run the risk of having the lower_xprt freed from underneath us, causing an oops that looks like this: [ 224.150698] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 224.150951] #PF: supervisor read access in kernel mode [ 224.151117] #PF: error_code(0x0000) - not-present page [ 224.151278] PGD 0 P4D 0 [ 224.151361] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 224.151499] CPU: 2 PID: 99 Comm: kworker/u10:6 Not tainted 6.6.0-rc3-g6465e260f487 #41264 a00b0960990fb7bc6d6a330ee03588b67f08a47b [ 224.151977] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022 [ 224.152216] Workqueue: xprtiod xs_tcp_tls_setup_socket [sunrpc] [ 224.152434] RIP: 0010:xs_tcp_tls_setup_socket+0x3cc/0x7e0 [sunrpc] [ 224.152643] Code: 00 00 48 8b 7c 24 08 e9 f3 01 00 00 48 83 7b c0 00 0f 85 d2 01 00 00 49 8d 84 24 f8 05 00 00 48 89 44 24 10 48 8b 00 48 89 c5 <4c> 8b 68 18 66 41 83 3f 0a 75 71 45 31 ff 4c 89 ef 31 f6 e8 5c 76 [ 224.153246] RSP: 0018:ffffb00ec060fd18 EFLAGS: 00010246 [ 224.153427] RAX: 0000000000000000 RBX: ffff8c06c2e53e40 RCX: 0000000000000001 [ 224.153652] RDX: ffff8c073bca2408 RSI: 0000000000000282 RDI: ffff8c06c259ee00 [ 224.153868] RBP: 0000000000000000 R08: ffffffff9da55aa0 R09: 0000000000000001 [ 224.154084] R10: 00000034306c30f1 R11: 0000000000000002 R12: ffff8c06c2e51800 [ 224.154300] R13: ffff8c06c355d400 R14: 0000000004208160 R15: ffff8c06c2e53820 [ 224.154521] FS: 0000000000000000(0000) GS:ffff8c073bd00000(0000) knlGS:0000000000000000 [ 224.154763] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 224.154940] CR2: 0000000000000018 CR3: 0000000062c1e000 CR4: 0000000000750ee0 [ 224.155157] PKRU: 55555554 [ 224.155244] Call Trace: [ 224.155325] <TASK> [ 224.155395] ? __die_body+0x68/0xb0 [ 224.155507] ? page_fault_oops+0x34c/0x3a0 [ 224.155635] ? _raw_spin_unlock_irqrestore+0xe/0x40 [ 224.155793] ? exc_page_fault+0x7a/0x1b0 [ 224.155916] ? asm_exc_page_fault+0x26/0x30 [ 224.156047] ? xs_tcp_tls_setup_socket+0x3cc/0x7e0 [sunrpc ae3a15912ae37fd51dafbdbc2dbd069117f8f5c8] [ 224.156367] ? xs_tcp_tls_setup_socket+0x2fe/0x7e0 [sunrpc ae3a15912ae37fd51dafbdbc2dbd069117f8f5c8] [ 224.156697] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc ae3a15912ae37fd51dafbdbc2dbd069117f8f5c8] [ 224.157013] process_scheduled_works+0x24e/0x450 [ 224.157158] worker_thread+0x21c/0x2d0 [ 224.157275] ? __pfx_worker_thread+0x10/0x10 [ 224.157409] kthread+0xe8/0x110 [ 224.157510] ? __pfx_kthread+0x10/0x10 [ 224.157628] ret_from_fork+0x37/0x50 [ 224.157741] ? __pfx_kthread+0x10/0x10 [ 224.157859] ret_from_fork_asm+0x1b/0x30 [ 224.157983] </TASK> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2023-09-27 15:16:40 -04:00
Trond Myklebust	a275ab6260	Revert "SUNRPC dont update timeout value on connection reset" This reverts commit `88428cc4ae`. The problem this commit is intended to fix was comprehensively fixed in commit `7de62bc09f` ("SUNRPC dont update timeout value on connection reset"). Since then, this commit has been preventing the correct timeout of soft mounted requests. Cc: stable@vger.kernel.org # 5.9.x: `09252177d5`: SUNRPC: Handle major timeout in xprt_adjust_timeout() Cc: stable@vger.kernel.org # 5.9.x: `7de62bc09f`: SUNRPC dont update timeout value on connection reset Cc: stable@vger.kernel.org # 5.9.x Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2023-09-27 15:16:40 -04:00
Chuck Lever	5623ecfcbe	SUNRPC: Fail quickly when server does not recognize TLS rpcauth_checkverf() should return a distinct error code when a server recognizes the AUTH_TLS probe but does not support TLS so that the client's header decoder can respond appropriately and quickly. No retries are necessary is in this case, since the server has already affirmatively answered "TLS is unsupported". Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2023-09-27 15:16:40 -04:00
Johannes Berg	2a1c5c7de4	wifi: mac80211: expand __ieee80211_data_to_8023() status Make __ieee80211_data_to_8023() return more individual drop reasons instead of just doing RX_DROP_U_INVALID_8023. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-26 09:16:47 +02:00
Johannes Berg	6c02fab724	wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value This has many different reasons, split the return value into the individual reasons for better traceability. Also, since symbolic tracing doesn't work for these, add a few comments for the numbering. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-26 09:16:45 +02:00
Johannes Berg	dccc9aa7ee	wifi: mac80211: remove RX_DROP_UNUSABLE Convert all instances of RX_DROP_UNUSABLE to indicate a better reason, and then remove RX_DROP_UNUSABLE. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-26 09:16:42 +02:00
Johannes Berg	583058542f	wifi: mac80211: fix check for unusable RX result If we just check "result & RX_DROP_UNUSABLE", this really only works by accident, because SKB_DROP_REASON_SUBSYS_MAC80211_UNUSABLE got to have the value 1, and SKB_DROP_REASON_SUBSYS_MAC80211_MONITOR is 2. Fix this to really check the entire subsys mask for the value, so it doesn't matter what the subsystem value is. Fixes: `7f4e09700b` ("wifi: mac80211: report all unusable beacon frames") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-26 09:16:11 +02:00
Johannes Berg	e406f29150	wifi: cfg80211: add local_state_change to deauth trace Add the local_state_change request to the deauth trace for easier debugging. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-26 09:16:00 +02:00
Benjamin Berg	aaba3cd33f	wifi: mac80211: Create resources for disabled links When associating to an MLD AP, links may be disabled. Create all resources associated with a disabled link so that we can later enable it without having to create these resources on the fly. Fixes: `6d543b34db` ("wifi: mac80211: Support disabled links during association") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://lore.kernel.org/r/20230925173028.f9afdb26f6c7.I4e6e199aaefc1bf017362d64f3869645fa6830b5@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-26 09:12:47 +02:00
Benjamin Berg	334bf33eec	wifi: cfg80211: avoid leaking stack data into trace If the structure is not initialized then boolean types might be copied into the tracing data without being initialised. This causes data from the stack to leak into the trace and also triggers a UBSAN failure which can easily be avoided here. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://lore.kernel.org/r/20230925171855.a9271ef53b05.I8180bae663984c91a3e036b87f36a640ba409817@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-26 09:12:27 +02:00
Wen Gong	61304336c6	wifi: mac80211: allow transmitting EAPOL frames with tainted key Lower layer device driver stop/wake TX by calling ieee80211_stop_queue()/ ieee80211_wake_queue() while hw scan. Sometimes hw scan and PTK rekey are running in parallel, when M4 sent from wpa_supplicant arrive while the TX queue is stopped, then the M4 will pending send, and then new key install from wpa_supplicant. After TX queue wake up by lower layer device driver, the M4 will be dropped by below call stack. When key install started, the current key flag is set KEY_FLAG_TAINTED in ieee80211_pairwise_rekey(), and then mac80211 wait key install complete by lower layer device driver. Meanwhile ieee80211_tx_h_select_key() will return TX_DROP for the M4 in step 12 below, and then ieee80211_free_txskb() called by ieee80211_tx_dequeue(), so the M4 will not send and free, then the rekey process failed becaue AP not receive M4. Please see details in steps below. There are a interval between KEY_FLAG_TAINTED set for current key flag and install key complete by lower layer device driver, the KEY_FLAG_TAINTED is set in this interval, all packet including M4 will be dropped in this interval, the interval is step 8~13 as below. issue steps: TX thread install key thread 1. stop_queue -idle- 2. sending M4 -idle- 3. M4 pending -idle- 4. -idle- starting install key from wpa_supplicant 5. -idle- =>ieee80211_key_replace() 6. -idle- =>ieee80211_pairwise_rekey() and set currently key->flags \|= KEY_FLAG_TAINTED 7. -idle- =>ieee80211_key_enable_hw_accel() 8. -idle- =>drv_set_key() and waiting key install complete from lower layer device driver 9. wake_queue -waiting state- 10. re-sending M4 -waiting state- 11. =>ieee80211_tx_h_select_key() -waiting state- 12. drop M4 by KEY_FLAG_TAINTED -waiting state- 13. -idle- install key complete with success/fail success: clear flag KEY_FLAG_TAINTED fail: start disconnect Hence add check in step 11 above to allow the EAPOL send out in the interval. If lower layer device driver use the old key/cipher to encrypt the M4, then AP received/decrypt M4 correctly, after M4 send out, lower layer device driver install the new key/cipher to hardware and return success. If lower layer device driver use new key/cipher to send the M4, then AP will/should drop the M4, then it is same result with this issue, AP will/ should kick out station as well as this issue. issue log: kworker/u16:4-5238 [000] 6456.108926: stop_queue: phy1 queue:0, reason:0 wpa_supplicant-961 [003] 6456.119737: rdev_tx_control_port: wiphy_name=phy1 name=wlan0 ifindex=6 dest=ARRAY[9e, 05, 31, 20, 9b, d0] proto=36488 unencrypted=0 wpa_supplicant-961 [003] 6456.119839: rdev_return_int_cookie: phy1, returned 0, cookie: 504 wpa_supplicant-961 [003] 6456.120287: rdev_add_key: phy1, netdev:wlan0(6), key_index: 0, mode: 0, pairwise: true, mac addr: 9e:05:31:20:9b:d0 wpa_supplicant-961 [003] 6456.120453: drv_set_key: phy1 vif:wlan0(2) sta:9e:05:31:20:9b:d0 cipher:0xfac04, flags=0x9, keyidx=0, hw_key_idx=0 kworker/u16:9-3829 [001] 6456.168240: wake_queue: phy1 queue:0, reason:0 kworker/u16:9-3829 [001] 6456.168255: drv_wake_tx_queue: phy1 vif:wlan0(2) sta:9e:05:31:20:9b:d0 ac:0 tid:7 kworker/u16:9-3829 [001] 6456.168305: cfg80211_control_port_tx_status: wdev(1), cookie: 504, ack: false wpa_supplicant-961 [003] 6459.167982: drv_return_int: phy1 - -110 issue call stack: nl80211_frame_tx_status+0x230/0x340 [cfg80211] cfg80211_control_port_tx_status+0x1c/0x28 [cfg80211] ieee80211_report_used_skb+0x374/0x3e8 [mac80211] ieee80211_free_txskb+0x24/0x40 [mac80211] ieee80211_tx_dequeue+0x644/0x954 [mac80211] ath10k_mac_tx_push_txq+0xac/0x238 [ath10k_core] ath10k_mac_op_wake_tx_queue+0xac/0xe0 [ath10k_core] drv_wake_tx_queue+0x80/0x168 [mac80211] __ieee80211_wake_txqs+0xe8/0x1c8 [mac80211] _ieee80211_wake_txqs+0xb4/0x120 [mac80211] ieee80211_wake_txqs+0x48/0x80 [mac80211] tasklet_action_common+0xa8/0x254 tasklet_action+0x2c/0x38 __do_softirq+0xdc/0x384 Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Link: https://lore.kernel.org/r/20230801064751.25803-1-quic_wgong@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:32:01 +02:00
Benjamin Berg	1228c74941	wifi: mac80211: reject MLO channel configuration if not supported Reject configuring a channel for MLO if either EHT is not supported or the BSS does not have the correct ML element. This avoids trying to do a multi-link association with a misconfigured AP. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.80c3b8e5a344.Iaa2d466ee6280994537e1ae7ab9256a27934806f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:34 +02:00
Benjamin Berg	4aa0644845	wifi: mac80211: report per-link error during association With this cfg80211 can report the link that caused the error to userspace which is then able to react to it by e.g. removing the link from the association and retrying. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.275fc7f5c426.I8086c0fdbbf92537d6a8b8e80b33387fcfd5553d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:34 +02:00
Benjamin Berg	a7b2cc591d	wifi: cfg80211: report per-link errors during association When one of the links (other than the assoc_link) is misconfigured and cannot work the association will fail. However, userspace was not able to tell that the operation only failed because of a problem with one of the links. Fix this, by allowing the driver to set a per-link error code and reporting the (first) offending link by setting the bad_attr accordingly. This only allows us to report the first error, but that is sufficient for userspace to e.g. remove the offending link and retry. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.ebe63c0bd513.I40799998f02bf987acee1501a2522dc98bb6eb5a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:34 +02:00
Johannes Berg	ef246a1480	wifi: mac80211: support antenna control in injection Support antenna control for injection by parsing the antenna radiotap field (which may be presented multiple times) and telling the driver about the resulting antenna bitmap. Of course there's no guarantee the driver will actually honour this, just like any other injection control. If misconfigured, i.e. the injected HT/VHT MCS needs more chains than antennas are configured, the bitmap is reset to zero, indicating no selection. For now this is only set up for two anntenas so we keep more free bits, but that can be trivially extended if any driver implements support for it that can deal with hardware with more antennas. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.f71001aa4da9.I00ccb762a806ea62bc3d728fa3a0d29f4f285eeb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:34 +02:00
Ayala Beker	702e80470a	wifi: mac80211: support handling of advertised TID-to-link mapping Support handling of advertised TID-to-link mapping elements received in a beacon. These elements are used by AP MLD to disable specific links and force all clients to stop using these links. By default if no TID-to-link mapping is advertised, all TIDs shall be mapped to all links. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.623c4b692ff9.Iab0a6f561d85b8ab6efe541590985a2b6e9e74aa@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:34 +02:00
Ayala Beker	62e9c64eed	wifi: mac80211: add support for parsing TID to Link mapping element Add the relevant definitions for TID to Link mapping element according to the P802.11be_D4.0. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.9ea9b0b4412a.I2281ab2c70e8b43a39032dc115db6a80f1f0b3f4@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:34 +02:00
Ilan Peer	041a74cbe4	wifi: mac80211: Notify the low level driver on change in MLO valid links Notify the low level driver when there is change in the valid links. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.4fc85b0a51b0.I64238e0e892709a2bd4764b3bca93cdcf021e2fd@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:33 +02:00
Johannes Berg	cef7104720	wifi: mac80211: describe return values in kernel-doc Add descriptions for two return values for two functions that are missing them. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.79307c341723.Ibae386f0354f2e215d4955752ac378acc2466b51@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:33 +02:00
Johannes Berg	87cd646f61	wifi: cfg80211: reg: describe return values in kernel-doc Describe the function return values in kernel-doc. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.8b1e45c8bab8.I6dbae4f6dfe8f5352bc44565cc5131e73dd1873f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:33 +02:00
Ayala Beker	c09c4f3199	wifi: mac80211: don't connect to an AP while it's in a CSA process Connection to an AP that is running a CSA flow may end up with a failure as the AP might change its channel during the connection flow while we do not track the channel change yet. Avoid that by rejecting a connection to such an AP. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.e5001a762a4a.I9745c695f3403b259ad000ce94110588a836c04a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:32 +02:00
Emmanuel Grumbach	2bf57b00ab	wifi: mac80211: update the rx_chains after set_antenna() rx_chains was set only upon registration and it we rely on it for the active chains upon SMPS configuration after association. When we use the set_antenna() API to limit the rx_chains from 2 to 1, this caused issues with iwlwifi since we still had 2 active_chains requested. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.2dde4da246b2.I904223c868c77cf2ba132a3088fe6506fcbb443b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:32 +02:00
Johannes Berg	b323949835	wifi: mac80211: use bandwidth indication element for CSA In CSA, parse the (EHT) bandwidth indication element and use it (in fact prefer it if present). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.43ef01920556.If4f24a61cd634ab1e50eba43899b9e992bf25602@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:12:32 +02:00
Johannes Berg	bb55441c57	wifi: cfg80211: split struct cfg80211_ap_settings Using the full struct cfg80211_ap_settings for an update is misleading, since most settings cannot be updated. Split the update case off into a new struct cfg80211_ap_update. Change-Id: I3ba4dd9280938ab41252f145227a7005edf327e4 Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:00:39 +02:00
Johannes Berg	6b348f6e34	wifi: mac80211: ethtool: always hold wiphy mutex Drivers should really be able to rely on the wiphy mutex being held all the time, unless otherwise documented. For ethtool, that wasn't quite right. Fix and clarify this in both code and documentation. Reported-by: syzbot+c12a771b218dcbba32e1@syzkaller.appspotmail.com Fixes: `0e8185ce1d` ("wifi: mac80211: check wiphy mutex in ops") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 09:00:39 +02:00
Johannes Berg	084cf2aeca	wifi: mac80211: work around Cisco AP 9115 VHT MPDU length Cisco AP module 9115 with FW 17.3 has a bug and sends a too large maximum MPDU length in the association response (indicating 12k) that it cannot actually process. Work around that by taking the minimum between what's in the association response and the BSS elements (from beacon or probe response). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230918140607.d1966a9a532e.I090225babb7cd4d1081ee9acd40e7de7e41c15ae@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 08:41:27 +02:00
Ilan Peer	0914468adf	wifi: cfg80211: Fix 6GHz scan configuration When the scan request includes a non broadcast BSSID, when adding the scan parameters for 6GHz collocated scanning, do not include entries that do not match the given BSSID. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230918140607.6d31d2a96baf.I6c4e3e3075d1d1878ee41f45190fdc6b86f18708@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 08:41:11 +02:00
Colin Ian King	5b43bd71f4	wifi: cfg80211: make read-only array centers_80mhz static const Don't populate the read-only array lanes on the stack, instead make it static const. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20230919095205.24949-1-colin.i.king@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 08:40:35 +02:00
Johannes Berg	d097ae01eb	wifi: mac80211: fix potential key leak When returning from ieee80211_key_link(), the key needs to have been freed or successfully installed. This was missed in a number of error paths, fix it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 08:40:07 +02:00
Johannes Berg	31db78a492	wifi: mac80211: fix potential key use-after-free When ieee80211_key_link() is called by ieee80211_gtk_rekey_add() but returns 0 due to KRACK protection (identical key reinstall), ieee80211_gtk_rekey_add() will still return a pointer into the key, in a potential use-after-free. This normally doesn't happen since it's only called by iwlwifi in case of WoWLAN rekey offload which has its own KRACK protection, but still better to fix, do that by returning an error code and converting that to success on the cfg80211 boundary only, leaving the error for bad callers of ieee80211_gtk_rekey_add(). Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: `fdf7cb4185` ("mac80211: accept key reinstall without changing anything") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-25 08:40:04 +02:00
Paolo Abeni	e9cbc89067	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-21 21:49:45 +02:00
Linus Torvalds	27bbf45eae	Networking fixes for 6.6-rc2, including fixes from netfilter and bpf Current release - regressions: - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE - netfilter: fix entries val in rule reset audit log - eth: stmmac: fix incorrect rxq\|txq_stats reference Previous releases - regressions: - ipv4: fix null-deref in ipv4_link_failure - netfilter: - fix several GC related issues - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP - eth: team: fix null-ptr-deref when team device type is changed - eth: i40e: fix VF VLAN offloading when port VLAN is configured - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB Previous releases - always broken: - core: fix ETH_P_1588 flow dissector - mptcp: fix several connection hang-up conditions - bpf: - avoid deadlock when using queue and stack maps from NMI - add override check to kprobe multi link attach - hsr: properly parse HSRv1 supervisor frames. - eth: igc: fix infinite initialization loop with early XDP redirect - eth: octeon_ep: fix tx dma unmap len values in SG - eth: hns3: fix GRE checksum offload issue Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmUMFG8SHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOksHAP+QE2eNf5yxo86dIS+3RQnOQ8kFBnNbEn 04lrheGnzG7PpNnGoCoTZna+xYQPYVLgbmmip2/CFnQvnQIsKyLQfCui85sfV2V9 KjUeE/kTgeC+jUQOWNDyz3zDP/MPC2LmiK8Gwyggvm9vFYn5tVZXC36aPZBZ7Vok /DUW6iXyl31SeVGOOEKakcwn0GIYJSABhVFNsjrDe4tV+leUwvf8obAq3ZWxOGaU D94ez28lSXgfOSWfQQ/l1rHI/yC0fr8HYyWJ60dNG2uS3fNEqT8LyqZfAUK24kVz XbAGZa+GA7CDq3cVsU7vCWNWbB5fO+kXtmGOwPtuKtJQM5LPo4X77CuSHlpzdyvq TuW0vxeVfdzAYVb3Zg+2QgWxDJjY0B8ujwdDWrnnKTPu4Ylhn6HLISXIlkMBoGwT 1/47TCnmn9t+lGagkMADppRRnJotHWObQG5wkzksqVa2CUB0HTESgbrm4rsxe6Ku JiZhHbTiiPWy7LgY6EFtj/YGPvLs0CSltvh4QUsd+QtDTM/EN7y3HcHqkv88ropG bSvJIh6WXdEJkwfSUdA0LECXSC6dizzZW2Y1glnT+7FMlhE1jVY4gruNJ37mCYMb 0gh9Zr76c2KYLA5vljGp6uo3j3A7wARJTdLfRFVcaFoz6NQmuFf9ZdBfDNDcymxs AGvO3j55JAZf =AoVg -----END PGP SIGNATURE----- Merge tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. Current release - regressions: - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE - netfilter: fix entries val in rule reset audit log - eth: stmmac: fix incorrect rxq\|txq_stats reference Previous releases - regressions: - ipv4: fix null-deref in ipv4_link_failure - netfilter: - fix several GC related issues - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP - eth: team: fix null-ptr-deref when team device type is changed - eth: i40e: fix VF VLAN offloading when port VLAN is configured - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB Previous releases - always broken: - core: fix ETH_P_1588 flow dissector - mptcp: fix several connection hang-up conditions - bpf: - avoid deadlock when using queue and stack maps from NMI - add override check to kprobe multi link attach - hsr: properly parse HSRv1 supervisor frames. - eth: igc: fix infinite initialization loop with early XDP redirect - eth: octeon_ep: fix tx dma unmap len values in SG - eth: hns3: fix GRE checksum offload issue" * tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast() igc: Expose tx-usecs coalesce setting to user octeontx2-pf: Do xdp_do_flush() after redirects. bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI net: ena: Flush XDP packets on error. net/handshake: Fix memory leak in __sock_create() and sock_alloc_file() net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev' netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP netfilter: nf_tables: fix memleak when more than 255 elements expired netfilter: nf_tables: disable toggling dormant table state more than once vxlan: Add missing entries to vxlan_get_size() net: rds: Fix possible NULL-pointer dereference team: fix null-ptr-deref when team device type is changed net: bridge: use DEV_STATS_INC() net: hns3: add 5ms delay before clear firmware reset irq source net: hns3: fix fail to delete tc flower rules during reset issue net: hns3: only enable unicast promisc when mac table full net: hns3: fix GRE checksum offload issue net: hns3: add cmdq check for vf periodic service task net: stmmac: fix incorrect rxq\|txq_stats reference ...	2023-09-21 11:28:16 -07:00
Arseniy Krasnov	581512a6dc	vsock/virtio: MSG_ZEROCOPY flag support This adds handling of MSG_ZEROCOPY flag on transmission path: 1) If this flag is set and zerocopy transmission is possible (enabled in socket options and transport allows zerocopy), then non-linear skb will be created and filled with the pages of user's buffer. Pages of user's buffer are locked in memory by 'get_user_pages()'. 2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so to decrease this field correctly, proper skb destructor is needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'. 3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'. If this callback is set, then transport needs extra check to be able to send provided number of buffers in zerocopy mode. Currently, the only transport that needs this callback set is virtio, because this transport adds new buffers to the virtio queue and we need to check, that number of these buffers is less than size of the queue (it is required by virtio spec). vhost and loopback transports don't need this check. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-21 12:34:00 +02:00
Arseniy Krasnov	4b0bf10eb0	vsock/virtio: non-linear skb handling for tap For tap device new skb is created and data from the current skb is copied to it. This adds copying data from non-linear skb to new the skb. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-21 12:34:00 +02:00
Arseniy Krasnov	64c99d2d6a	vsock/virtio: support to send non-linear skb For non-linear skb use its pages from fragment array as buffers in virtio tx queue. These pages are already pinned by 'get_user_pages()' during such skb creation. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-21 12:34:00 +02:00
Arseniy Krasnov	0df7cd3c13	vsock/virtio/vhost: read data from non-linear skb This is preparation patch for MSG_ZEROCOPY support. It adds handling of non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with 'skb_copy_datagram_iter()'. Main advantage of the second one is that it can handle paged part of the skb by using 'kmap()' on each page, but if there are no pages in the skb, it behaves like simple copying to iov iterator. This patch also adds new field to the control block of skb - this value shows current offset in the skb to read next portion of data (it doesn't matter linear it or not). Idea behind this field is that 'skb_copy_datagram_iter()' handles both types of skb internally - it just needs an offset from which to copy data from the given skb. This offset is incremented on each read from skb. This approach allows to simplify handling of both linear and non-linear skbs, because for linear skb we need to call 'skb_pull()' after reading data from it, while in non-linear case we need to update 'data_len'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-21 12:34:00 +02:00
Paolo Abeni	ecf4392600	netfilter PR 2023-09-20 -----BEGIN PGP SIGNATURE----- iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmUKru0NHGZ3QHN0cmxl bi5kZQAKCRBwkajZrV/2AOLvEACEDiOEB9yhlnx4ZENgNRHMG48qLXKyRs+0cVL/ PLmc+jNtq/doMLUXOkZUnoTKdr6W5JuTVHMK/BJOWu5QjHCCybeQScx7uIjG8Yl1 F6zufiZ1g+nzyQsMHh/xQazqb75qnc7XvhzeW2vCiasPIDAaFpzmscX5nDy1rx7Q HIpJCippnQ7Ami0X+qdH3SDgP4bK0+YXtLhtXLHGskb1yyP1QynjzUbDdZNHca88 sLn6OB36DCuANGDtgAFNo1dptkgT2aaE6XBGIVlpwtfo5KjJtzvGRIohxnbTcnSY n562GokPQXn+yeJD2ZKW5XpvXTEDSLimzpHm4wNTRjTyrG3kIy+JxBjBM5inZam+ c7AJjuAZlfgrxbk2J6/laiV2WjqazswHuvoqs4wND2rCgi7Cdw9cBJ7iXy6ijN0J oQS8Or0FnOU280hWEJLvp8jbdRzhksvRI5PHYaX5mvOy/iLKB9m6LIlHcjTMx0dh NyLVPLx7C8UNOsqNIBe5545FqiRi5z/VxCDdL8AbCIy8cLja8A3D4t2Otggiktx7 skQCrLSBXcxh8sdmDvUUDfjeRmthqDnIbICCPJlWAZIR8e4rxbh8HxJHXutO7eEs QH5CXJsW2psM/AcBrypqb2TQvsOAyGfL7XUt92GGOkHUjr333Qx2E4pGAsS7I+lz XSguQA== =dTwV -----END PGP SIGNATURE----- Merge tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter updates for net The following three patches fix regressions in the netfilter subsystem: 1. Reject attempts to repeatedly toggle the 'dormant' flag in a single transaction. Doing so makes nf_tables lose track of the real state vs. the desired state. This ends with an attempt to unregister hooks that were never registered in the first place, which yields a splat. 2. Fix element counting in the new nftables garbage collection infra that came with 6.5: More than 255 expired elements wraps a counter which results in memory leak. 3. Since 6.4 ipset can BUG when a set is renamed while a CREATE command is in progress, fix from Jozsef Kadlecsik. * tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP netfilter: nf_tables: fix memleak when more than 255 elements expired netfilter: nf_tables: disable toggling dormant table state more than once ==================== Link: https://lore.kernel.org/r/20230920084156.4192-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-21 11:09:45 +02:00
Luiz Augusto von Dentz	b938790e70	Bluetooth: hci_codec: Fix leaking content of local_codecs The following memory leak can be observed when the controller supports codecs which are stored in local_codecs list but the elements are never freed: unreferenced object 0xffff88800221d840 (size 32): comm "kworker/u3:0", pid 36, jiffies 4294898739 (age 127.060s) hex dump (first 32 bytes): f8 d3 02 03 80 88 ff ff 80 d8 21 02 80 88 ff ff ..........!..... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffb324f557>] __kmalloc+0x47/0x120 [<ffffffffb39ef37d>] hci_codec_list_add.isra.0+0x2d/0x160 [<ffffffffb39ef643>] hci_read_codec_capabilities+0x183/0x270 [<ffffffffb39ef9ab>] hci_read_supported_codecs+0x1bb/0x2d0 [<ffffffffb39f162e>] hci_read_local_codecs_sync+0x3e/0x60 [<ffffffffb39ff1b3>] hci_dev_open_sync+0x943/0x11e0 [<ffffffffb396d55d>] hci_power_on+0x10d/0x3f0 [<ffffffffb30c99b4>] process_one_work+0x404/0x800 [<ffffffffb30ca134>] worker_thread+0x374/0x670 [<ffffffffb30d9108>] kthread+0x188/0x1c0 [<ffffffffb304db6b>] ret_from_fork+0x2b/0x50 [<ffffffffb300206a>] ret_from_fork_asm+0x1a/0x30 Cc: stable@vger.kernel.org Fixes: `8961987f3f` ("Bluetooth: Enumerate local supported codec and cache details") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-09-20 11:03:11 -07:00
Luiz Augusto von Dentz	dcda165706	Bluetooth: hci_core: Fix build warnings This fixes the following warnings: net/bluetooth/hci_core.c: In function ‘hci_register_dev’: net/bluetooth/hci_core.c:2620:54: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 5 [-Wformat-truncation=] 2620 \| snprintf(hdev->name, sizeof(hdev->name), "hci%d", id); \| ^~ net/bluetooth/hci_core.c:2620:50: note: directive argument in the range [0, 2147483647] 2620 \| snprintf(hdev->name, sizeof(hdev->name), "hci%d", id); \| ^~~~~~~ net/bluetooth/hci_core.c:2620:9: note: ‘snprintf’ output between 5 and 14 bytes into a destination of size 8 2620 \| snprintf(hdev->name, sizeof(hdev->name), "hci%d", id); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-09-20 11:02:41 -07:00
Ying Hsu	1d8e801422	Bluetooth: Avoid redundant authentication While executing the Android 13 CTS Verifier Secure Server test on a ChromeOS device, it was observed that the Bluetooth host initiates authentication for an RFCOMM connection after SSP completes. When this happens, some Intel Bluetooth controllers, like AC9560, would disconnect with "Connection Rejected due to Security Reasons (0x0e)". Historically, BlueZ did not mandate this authentication while an authenticated combination key was already in use for the connection. This behavior was changed since commit `7b5a9241b7` ("Bluetooth: Introduce requirements for security level 4"). So, this patch addresses the aforementioned disconnection issue by restoring the previous behavior. Signed-off-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-09-20 11:02:22 -07:00
Luiz Augusto von Dentz	e0275ea521	Bluetooth: ISO: Fix handling of listen for unicast iso_listen_cis shall only return -EADDRINUSE if the listening socket has the destination set to BDADDR_ANY otherwise if the destination is set to a specific address it is for broadcast which shall be ignored. Fixes: `f764a6c2c1` ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-09-20 11:02:02 -07:00
Ying Hsu	c7eaf80bfb	Bluetooth: Fix hci_link_tx_to RCU lock usage Syzbot found a bug "BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580". It is because hci_link_tx_to holds an RCU read lock and calls hci_disconnect which would hold a mutex lock since the commit `a13f316e90` ("Bluetooth: hci_conn: Consolidate code for aborting connections"). Here's an example call trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xfc/0x174 lib/dump_stack.c:106 ___might_sleep+0x4a9/0x4d3 kernel/sched/core.c:9663 __mutex_lock_common kernel/locking/mutex.c:576 [inline] __mutex_lock+0xc7/0x6e7 kernel/locking/mutex.c:732 hci_cmd_sync_queue+0x3a/0x287 net/bluetooth/hci_sync.c:388 hci_abort_conn+0x2cd/0x2e4 net/bluetooth/hci_conn.c:1812 hci_disconnect+0x207/0x237 net/bluetooth/hci_conn.c:244 hci_link_tx_to net/bluetooth/hci_core.c:3254 [inline] __check_timeout net/bluetooth/hci_core.c:3419 [inline] __check_timeout+0x310/0x361 net/bluetooth/hci_core.c:3399 hci_sched_le net/bluetooth/hci_core.c:3602 [inline] hci_tx_work+0xe8f/0x12d0 net/bluetooth/hci_core.c:3652 process_one_work+0x75c/0xba1 kernel/workqueue.c:2310 worker_thread+0x5b2/0x73a kernel/workqueue.c:2457 kthread+0x2f7/0x30b kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298 This patch releases RCU read lock before calling hci_disconnect and reacquires it afterward to fix the bug. Fixes: `a13f316e90` ("Bluetooth: hci_conn: Consolidate code for aborting connections") Signed-off-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-09-20 11:01:42 -07:00
Luiz Augusto von Dentz	941c998b42	Bluetooth: hci_sync: Fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER When HCI_QUIRK_STRICT_DUPLICATE_FILTER is set LE scanning requires periodic restarts of the scanning procedure as the controller would consider device previously found as duplicated despite of RSSI changes, but in order to set the scan timeout properly set le_scan_restart needs to be synchronous so it shall not use hci_cmd_sync_queue which defers the command processing to cmd_sync_work. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-bluetooth/578e6d7afd676129decafba846a933f5@agner.ch/#t Fixes: `27d54b778a` ("Bluetooth: Rework le_scan_restart for hci_sync") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-09-20 11:00:49 -07:00
Yao Xiao	cbaabbcdcb	Bluetooth: Delete unused hci_req_prepare_suspend() declaration hci_req_prepare_suspend() has been deprecated in favor of hci_suspend_sync(). Fixes: `182ee45da0` ("Bluetooth: hci_sync: Rework hci_suspend_notifier") Signed-off-by: Yao Xiao <xiaoyao@rock-chips.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-09-20 10:55:29 -07:00
Zhang Changzhong	cc9b364bb1	xfrm6: fix inet6_dev refcount underflow problem There are race conditions that may lead to inet6_dev refcount underflow in xfrm6_dst_destroy() and rt6_uncached_list_flush_dev(). One of the refcount underflow bugs is shown below: (cpu 1) \| (cpu 2) xfrm6_dst_destroy() \| ... \| in6_dev_put() \| \| rt6_uncached_list_flush_dev() ... \| ... \| in6_dev_put() rt6_uncached_list_del() \| ... ... \| xfrm6_dst_destroy() calls rt6_uncached_list_del() after in6_dev_put(), so rt6_uncached_list_flush_dev() has a chance to call in6_dev_put() again for the same inet6_dev. Fix it by moving in6_dev_put() after rt6_uncached_list_del() in xfrm6_dst_destroy(). Fixes: `510c321b55` ("xfrm: reuse uncached_list to track xdsts") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2023-09-20 13:03:59 +02:00
Jinjie Ruan	4a0f07d71b	net/handshake: Fix memory leak in __sock_create() and sock_alloc_file() When making CONFIG_DEBUG_KMEMLEAK=y and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, modprobe handshake-test and then rmmmod handshake-test, the below memory leak is detected. The struct socket_alloc which is allocated by alloc_inode_sb() in __sock_create() is not freed. And the struct dentry which is allocated by __d_alloc() in sock_alloc_file() is not freed. Since fput() will call file->f_op->release() which is sock_close() here and it will call __sock_release(). and fput() will call dput(dentry) to free the struct dentry. So replace sock_release() with fput() to fix the below memory leak. After applying this patch, the following memory leak is never detected. unreferenced object 0xffff888109165840 (size 768): comm "kunit_try_catch", pid 1852, jiffies 4294685807 (age 976.262s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa0209ba2>] 0xffffffffa0209ba2 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810f472008 (size 192): comm "kunit_try_catch", pid 1852, jiffies 4294685808 (age 976.261s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 08 20 47 0f 81 88 ff ff ......... G..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0209bbb>] 0xffffffffa0209bbb [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810958e580 (size 224): comm "kunit_try_catch", pid 1852, jiffies 4294685808 (age 976.261s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0209bbb>] 0xffffffffa0209bbb [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810926dc88 (size 192): comm "kunit_try_catch", pid 1854, jiffies 4294685809 (age 976.271s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 88 dc 26 09 81 88 ff ff ..........&..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208fdc>] 0xffffffffa0208fdc [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810a241380 (size 224): comm "kunit_try_catch", pid 1854, jiffies 4294685809 (age 976.271s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208fdc>] 0xffffffffa0208fdc [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888109165040 (size 768): comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.269s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa0208860>] 0xffffffffa0208860 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810926d568 (size 192): comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.269s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 68 d5 26 09 81 88 ff ff ........h.&..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208879>] 0xffffffffa0208879 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810a240580 (size 224): comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.347s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208879>] 0xffffffffa0208879 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888109164c40 (size 768): comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa0208541>] 0xffffffffa0208541 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810926cd18 (size 192): comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 18 cd 26 09 81 88 ff ff ..........&..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa020855a>] 0xffffffffa020855a [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810a240200 (size 224): comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa020855a>] 0xffffffffa020855a [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888109164840 (size 768): comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa02093e2>] 0xffffffffa02093e2 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810926cab8 (size 192): comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 b8 ca 26 09 81 88 ff ff ..........&..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa02093fb>] 0xffffffffa02093fb [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810a240040 (size 224): comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa02093fb>] 0xffffffffa02093fb [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888109166440 (size 768): comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa02097c1>] 0xffffffffa02097c1 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810926c398 (size 192): comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 98 c3 26 09 81 88 ff ff ..........&..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa02097da>] 0xffffffffa02097da [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888107e0b8c0 (size 224): comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa02097da>] 0xffffffffa02097da [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888109164440 (size 768): comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.487s) hex dump (first 32 bytes): 01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ ....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0 [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0 [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70 [<ffffffff8397889c>] sock_alloc+0x3c/0x260 [<ffffffff83979b46>] __sock_create+0x66/0x3d0 [<ffffffffa020824e>] 0xffffffffa020824e [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff88810f4cf698 (size 192): comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.501s) hex dump (first 32 bytes): 00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............ 00 00 00 00 00 00 00 00 98 f6 4c 0f 81 88 ff ff ..........L..... backtrace: [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0 [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50 [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208267>] 0xffffffffa0208267 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 unreferenced object 0xffff888107e0b000 (size 224): comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.501s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160 [<ffffffff819d4cf9>] alloc_file+0x59/0x730 [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210 [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0 [<ffffffffa0208267>] 0xffffffffa0208267 [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90 [<ffffffff81236fc6>] kthread+0x2b6/0x380 [<ffffffff81096afd>] ret_from_fork+0x2d/0x70 [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20 Fixes: `88232ec1ec` ("net/handshake: Add Kunit tests for the handshake consumer API") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-20 11:54:49 +01:00
Colin Ian King	6c0da84063	wifi: cfg80211: make read-only array centers_80mhz static const Don't populate the read-only array lanes on the stack, instead make it static const. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-20 11:52:13 +01:00
Jozsef Kadlecsik	7433b6d2af	netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP Kyle Zeng reported that there is a race between IPSET_CMD_ADD and IPSET_CMD_SWAP in netfilter/ip_set, which can lead to the invocation of `__ip_set_put` on a wrong `set`, triggering the `BUG_ON(set->ref == 0);` check in it. The race is caused by using the wrong reference counter, i.e. the ref counter instead of ref_netlink. Fixes: `24e227896b` ("netfilter: ipset: Add schedule point in call_ad().") Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Closes: https://lore.kernel.org/netfilter-devel/ZPZqetxOmH+w%2Fmyc@westworld/#r Tested-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-09-20 10:35:24 +02:00
Florian Westphal	cf5000a778	netfilter: nf_tables: fix memleak when more than 255 elements expired When more than 255 elements expired we're supposed to switch to a new gc container structure. This never happens: u8 type will wrap before reaching the boundary and nft_trans_gc_space() always returns true. This means we recycle the initial gc container structure and lose track of the elements that came before. While at it, don't deref 'gc' after we've passed it to call_rcu. Fixes: `5f68718b34` ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-09-20 10:35:23 +02:00
Florian Westphal	c9bd26513b	netfilter: nf_tables: disable toggling dormant table state more than once nft -f -<<EOF add table ip t add table ip t { flags dormant; } add chain ip t c { type filter hook input priority 0; } add table ip t EOF Triggers a splat from nf core on next table delete because we lose track of right hook register state: WARNING: CPU: 2 PID: 1597 at net/netfilter/core.c:501 __nf_unregister_net_hook RIP: 0010:__nf_unregister_net_hook+0x41b/0x570 nf_unregister_net_hook+0xb4/0xf0 __nf_tables_unregister_hook+0x160/0x1d0 [..] The above should have table in active state, but in fact no hooks were registered. Reject on/off/on games rather than attempting to fix this. Fixes: `179d9ba555` ("netfilter: nf_tables: fix table flag updates") Reported-by: "Lee, Cherie-Anne" <cherie.lee@starlabs.sg> Cc: Bing-Jhong Billy Jheng <billy@starlabs.sg> Cc: info@starlabs.sg Signed-off-by: Florian Westphal <fw@strlen.de>	2023-09-20 10:35:23 +02:00
Artem Chernyshev	f1d95df0f3	net: rds: Fix possible NULL-pointer dereference In rds_rdma_cm_event_handler_cmn() check, if conn pointer exists before dereferencing it as rdma_set_service_type() argument Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `fd261ce6a3` ("rds: rdma: update rdma transport for tos") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-20 08:49:03 +01:00
Eric Dumazet	fa17a6d8a5	ipv6: lockless IPV6_ADDR_PREFERENCES implementation We have data-races while reading np->srcprefs Switch the field to a plain byte, add READ_ONCE() and WRITE_ONCE() annotations where needed, and IPV6_ADDR_PREFERENCES setsockopt() can now be lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230918142321.1794107-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-19 18:21:44 +02:00
Eric Dumazet	44bdb313da	net: bridge: use DEV_STATS_INC() syzbot/KCSAN reported data-races in br_handle_frame_finish() [1] This function can run from multiple cpus without mutual exclusion. Adopt SMP safe DEV_STATS_INC() to update dev->stats fields. Handles updates to dev->stats.tx_dropped while we are at it. [1] BUG: KCSAN: data-race in br_handle_frame_finish / br_handle_frame_finish read-write to 0xffff8881374b2178 of 8 bytes by interrupt on cpu 1: br_handle_frame_finish+0xd4f/0xef0 net/bridge/br_input.c:189 br_nf_hook_thresh+0x1ed/0x220 br_nf_pre_routing_finish_ipv6+0x50f/0x540 NF_HOOK include/linux/netfilter.h:304 [inline] br_nf_pre_routing_ipv6+0x1e3/0x2a0 net/bridge/br_netfilter_ipv6.c:178 br_nf_pre_routing+0x526/0xba0 net/bridge/br_netfilter_hooks.c:508 nf_hook_entry_hookfn include/linux/netfilter.h:144 [inline] nf_hook_bridge_pre net/bridge/br_input.c:272 [inline] br_handle_frame+0x4c9/0x940 net/bridge/br_input.c:417 __netif_receive_skb_core+0xa8a/0x21e0 net/core/dev.c:5417 __netif_receive_skb_one_core net/core/dev.c:5521 [inline] __netif_receive_skb+0x57/0x1b0 net/core/dev.c:5637 process_backlog+0x21f/0x380 net/core/dev.c:5965 __napi_poll+0x60/0x3b0 net/core/dev.c:6527 napi_poll net/core/dev.c:6594 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6727 __do_softirq+0xc1/0x265 kernel/softirq.c:553 run_ksoftirqd+0x17/0x20 kernel/softirq.c:921 smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 read-write to 0xffff8881374b2178 of 8 bytes by interrupt on cpu 0: br_handle_frame_finish+0xd4f/0xef0 net/bridge/br_input.c:189 br_nf_hook_thresh+0x1ed/0x220 br_nf_pre_routing_finish_ipv6+0x50f/0x540 NF_HOOK include/linux/netfilter.h:304 [inline] br_nf_pre_routing_ipv6+0x1e3/0x2a0 net/bridge/br_netfilter_ipv6.c:178 br_nf_pre_routing+0x526/0xba0 net/bridge/br_netfilter_hooks.c:508 nf_hook_entry_hookfn include/linux/netfilter.h:144 [inline] nf_hook_bridge_pre net/bridge/br_input.c:272 [inline] br_handle_frame+0x4c9/0x940 net/bridge/br_input.c:417 __netif_receive_skb_core+0xa8a/0x21e0 net/core/dev.c:5417 __netif_receive_skb_one_core net/core/dev.c:5521 [inline] __netif_receive_skb+0x57/0x1b0 net/core/dev.c:5637 process_backlog+0x21f/0x380 net/core/dev.c:5965 __napi_poll+0x60/0x3b0 net/core/dev.c:6527 napi_poll net/core/dev.c:6594 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6727 __do_softirq+0xc1/0x265 kernel/softirq.c:553 do_softirq+0x5e/0x90 kernel/softirq.c:454 __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] batadv_tt_local_purge+0x1a8/0x1f0 net/batman-adv/translation-table.c:1356 batadv_tt_purge+0x2b/0x630 net/batman-adv/translation-table.c:3560 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 value changed: 0x00000000000d7190 -> 0x00000000000d7191 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 14848 Comm: kworker/u4:11 Not tainted 6.6.0-rc1-syzkaller-00236-gad8a69f361b9 #0 Fixes: `1c29fc4989` ("[BRIDGE]: keep track of received multicast packets") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: bridge@lists.linux-foundation.org Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20230918091351.1356153-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-19 13:35:15 +02:00
Linus Torvalds	2cf0f71562	NFS Client Bugfixes for Linux 6.6 Bugfixes: * Various O_DIRECT related fixes from Trond * Error handling * Locking issues * Use the correct commit infor for joining page groups * Fixes for rescheduling IO * Sunrpc bad verifier fixes * Report EINVAL errors from connect() * Revalidate creds that the server has rejected * Revert "SUNRPC: Fail faster on bad verifier" * Fix pNFS session trunking when MDS=DS * Fix zero-value filehandles for post-open getattr operations * Fix compiler warning about tautological comparisons * Revert "SUNRPC: clean up integer overflow check" before Trond's fix -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmUIjesACgkQ18tUv7Cl QOv/MBAAxxZrW45qlTsEMvhnFXu1cvt08IdqivWURuSxK6oV31tARMqw00MxzhsF ubNX4hynDNmsWYJLryHTTGwUJzaNQOhAGjU46AhwFgS3Lj2wefeuR3IYuEDgQcHF 8uXrrT/GWRIG4a/Uu1IxodfgMvgUPV+HGFc5BQFGkXGqPQdgbdxBDtOtyU1QlmaV rqKnxPgUibEZwVoz8+PNsN+pwNjKSaSR3pH53KKS6HUBAM69sT4jJiJDO/UGSlzb F9U1DHPeyltHXVAaQXUjHlsjh49NbIjzD5/T8xKWscvS7rbQFCcJkUl3MBHyvSEr eixx7muqOMCEm4lNFJqHIY7XrEN+50wi92tKj05Bz9wrnp4L84cxSV/KNZrKNA3T LCeIm4DGynPcLPopEUm9lqZMrabwvdV4wZSiybB6p/F/5ksVyaMnwTLzZZ1+NhkB OglNhF4L6m18/Ai/bY6XOixzgzWfcmYfXRhq7YoeO/JSvixG9Sk2crC6ryjWadgo xbitjjCYXHl3MUkH2TcaQoMGFhmvSShXg//5YXpxm3/0C3EVPa44T7/aFJqelL2p kkVsLcjcevAOwBHxehYLofL6c3GhLqRnwmTV7yqgJqfZf8uGxeQhXCMqOmv57/c0 3iJJM4Llb14B+t3xl3T8MotT4WejzlHDnJKrscipfXJwRT7UDIM= =M6aI -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.6-2' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: "Various O_DIRECT related fixes from Trond: - Error handling - Locking issues - Use the correct commit info for joining page groups - Fixes for rescheduling IO Sunrpc bad verifier fixes: - Report EINVAL errors from connect() - Revalidate creds that the server has rejected - Revert "SUNRPC: Fail faster on bad verifier" Misc: - Fix pNFS session trunking when MDS=DS - Fix zero-value filehandles for post-open getattr operations - Fix compiler warning about tautological comparisons - Revert 'SUNRPC: clean up integer overflow check' before Trond's fix" * tag 'nfs-for-6.6-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Silence compiler complaints about tautological comparisons Revert "SUNRPC: clean up integer overflow check" NFSv4.1: fix zero value filehandle in post open getattr NFSv4.1: fix pnfs MDS=DS session trunking Revert "SUNRPC: Fail faster on bad verifier" SUNRPC: Mark the cred for revalidation if the server rejects it NFS/pNFS: Report EINVAL errors from connect() to the server NFS: More fixes for nfs_direct_write_reschedule_io() NFS: Use the correct commit info in nfs_join_page_group() NFS: More O_DIRECT accounting fixes for error paths NFS: Fix O_DIRECT locking issues NFS: Fix error handling for O_DIRECT write scheduling	2023-09-18 12:16:34 -07:00
Peter Lafreniere	71273c46a3	ax25: Kconfig: Update link for linux-ax25.org http://linux-ax25.org has been down for nearly a year. Its official replacement is https://linux-ax25.in-berlin.de. Change all references to the old site in the ax25 Kconfig to its replacement. Link: https://marc.info/?m=166792551600315 Signed-off-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 12:56:58 +01:00
Paolo Abeni	27e5ccc2d5	mptcp: fix dangling connection hang-up According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit `bbd49d114d` ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/430 Fixes: `e16163b6e2` ("mptcp: refactor shutdown and close") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 12:47:56 +01:00
Paolo Abeni	f6909dc1c1	mptcp: rename timer related helper to less confusing names The msk socket uses to different timeout to track close related events and retransmissions. The existing helpers do not indicate clearly which timer they actually touch, making the related code quite confusing. Change the existing helpers name to avoid such confusion. No functional change intended. This patch is linked to the next one ("mptcp: fix dangling connection hang-up"). The two patches are supposed to be backported together. Cc: stable@vger.kernel.org # v5.11+ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 12:47:56 +01:00
Paolo Abeni	9f1a98813b	mptcp: process pending subflow error on close On incoming TCP reset, subflow closing could happen before error propagation. That in turn could cause the socket error being ignored, and a missing socket state transition, as reported by Daire-Byrne. Address the issues explicitly checking for subflow socket error at close time. To avoid code duplication, factor-out of __mptcp_error_report() a new helper implementing the relevant bits. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/429 Fixes: `15cc104533` ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 12:47:56 +01:00
Paolo Abeni	d5fbeff1ab	mptcp: move __mptcp_error_report in protocol.c This will simplify the next patch ("mptcp: process pending subflow error on close"). No functional change intended. Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 12:47:56 +01:00
Paolo Abeni	6bec041147	mptcp: fix bogus receive window shrinkage with multiple subflows In case multiple subflows race to update the mptcp-level receive window, the subflow losing the race should use the window value provided by the "winning" subflow to update it's own tcp-level rcv_wnd. To such goal, the current code bogusly uses the mptcp-level rcv_wnd value as observed before the update attempt. On unlucky circumstances that may lead to TCP-level window shrinkage, and stall the other end. Address the issue feeding to the rcv wnd update the correct value. Fixes: `f3589be0c4` ("mptcp: never shrink offered window") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/427 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 12:47:55 +01:00
Kees Cook	1cb6422eca	ceph: Annotate struct ceph_monmap with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ceph_monmap. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 10:39:29 +01:00
Gustavo A. R. Silva	2506a91734	tipc: Use size_add() in calls to struct_size() If, for any reason, the open-coded arithmetic causes a wraparound, the protection that `struct_size()` adds against potential integer overflows is defeated. Fix this by hardening call to `struct_size()` with `size_add()`. Fixes: `e034c6d23b` ("tipc: Use struct_size() helper") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 09:38:32 +01:00
Gustavo A. R. Silva	a2713257ee	tls: Use size_add() in call to struct_size() If, for any reason, the open-coded arithmetic causes a wraparound, the protection that `struct_size()` adds against potential integer overflows is defeated. Fix this by hardening call to `struct_size()` with `size_add()`. Fixes: `b89fec54fd` ("tls: rx: wrap decrypt params in a struct") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 09:37:23 +01:00
Wen Gong	ddd7f45c89	wifi: cfg80211: save power spectral density(psd) of regulatory rule 6 GHz regulatory domains introduces Power Spectral Density (PSD). The PSD value of the regulatory rule should be taken into effect for the ieee80211_channels falling into that particular regulatory rule. Save the values in the channel which has PSD value and add nl80211 attributes accordingly to handle it. Co-developed-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Link: https://lore.kernel.org/r/20230914082026.3709-1-quic_wgong@quicinc.com [use hole in chan flags, reword docs] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-18 09:44:05 +02:00
Johannes Berg	2c3dfba4cf	rfkill: sync before userspace visibility/changes If userspace quickly opens /dev/rfkill after a new instance was created, it might see the old state of the instance from before the sync work runs and may even _change_ the state, only to have the sync work change it again. Fix this by doing the sync inline where needed, not just for /dev/rfkill but also for sysfs. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-18 09:36:57 +02:00
Sebastian Andrzej Siewior	fbd825fcd7	net: hsr: Add __packed to struct hsr_sup_tlv. Struct hsr_sup_tlv describes HW layout and therefore it needs a __packed attribute to ensure the compiler does not add any padding. Due to the size and __packed attribute of the structs that use hsr_sup_tlv it has no functional impact. Add __packed to struct hsr_sup_tlv. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 08:26:19 +01:00
Lukasz Majewski	295de650d3	net: hsr: Properly parse HSRv1 supervisor frames. While adding support for parsing the redbox supervision frames, the author added `pull_size' and `total_pull_size' to track the amount of bytes that were pulled from the skb during while parsing the skb so it can be reverted/ pushed back at the end. In the process probably copy&paste error occurred and for the HSRv1 case the ethhdr was used instead of the hsr_tag. Later the hsr_tag was used instead of hsr_sup_tag. The later error didn't matter because both structs have the size so HSRv0 was still working. It broke however HSRv1 parsing because struct ethhdr is larger than struct hsr_tag. Reinstate the old pulling flow and pull first ethhdr, hsr_tag in v1 case followed by hsr_sup_tag. [bigeasy: commit message] Fixes: `eafaa88b3e` ("net: hsr: Add support for redbox supervision frames")' Suggested-by: Tristram.Ha@microchip.com Signed-off-by: Lukasz Majewski <lukma@denx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 08:26:19 +01:00
Eric Dumazet	6af289746a	dccp: fix dccp_v4_err()/dccp_v6_err() again dh->dccph_x is the 9th byte (offset 8) in "struct dccp_hdr", not in the "byte 7" as Jann claimed. We need to make sure the ICMP messages are big enough, using more standard ways (no more assumptions). syzbot reported: BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2667 [inline] BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2681 [inline] BUG: KMSAN: uninit-value in dccp_v6_err+0x426/0x1aa0 net/dccp/ipv6.c:94 pskb_may_pull_reason include/linux/skbuff.h:2667 [inline] pskb_may_pull include/linux/skbuff.h:2681 [inline] dccp_v6_err+0x426/0x1aa0 net/dccp/ipv6.c:94 icmpv6_notify+0x4c7/0x880 net/ipv6/icmp.c:867 icmpv6_rcv+0x19d5/0x30d0 ip6_protocol_deliver_rcu+0xda6/0x2a60 net/ipv6/ip6_input.c:438 ip6_input_finish net/ipv6/ip6_input.c:483 [inline] NF_HOOK include/linux/netfilter.h:304 [inline] ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492 ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish+0x5db/0x870 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:304 [inline] ipv6_rcv+0xda/0x390 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core net/core/dev.c:5523 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5637 netif_receive_skb_internal net/core/dev.c:5723 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5782 tun_rx_batched+0x83b/0x920 tun_get_user+0x564c/0x6940 drivers/net/tun.c:2002 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:1985 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x8ef/0x15c0 fs/read_write.c:584 ksys_write+0x20f/0x4c0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Uninit was created at: slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767 slab_alloc_node mm/slub.c:3478 [inline] kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:559 __alloc_skb+0x318/0x740 net/core/skbuff.c:650 alloc_skb include/linux/skbuff.h:1286 [inline] alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6313 sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2795 tun_alloc_skb drivers/net/tun.c:1531 [inline] tun_get_user+0x23cf/0x6940 drivers/net/tun.c:1846 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:1985 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x8ef/0x15c0 fs/read_write.c:584 ksys_write+0x20f/0x4c0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd CPU: 0 PID: 4995 Comm: syz-executor153 Not tainted 6.6.0-rc1-syzkaller-00014-ga747acc0b752 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 Fixes: `977ad86c2a` ("dccp: Fix out of bounds access in DCCP error handler") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jann Horn <jannh@google.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 07:10:31 +01:00
Johnathan Mantey	3780bb2931	ncsi: Propagate carrier gain/loss events to the NCSI controller Report the carrier/no-carrier state for the network interface shared between the BMC and the passthrough channel. Without this functionality the BMC is unable to reconfigure the NIC in the event of a re-cabling to a different subnet. Signed-off-by: Johnathan Mantey <johnathanx.mantey@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-18 07:06:05 +01:00
Kyle Zeng	0113d9c9d1	ipv4: fix null-deref in ipv4_link_failure Currently, we assume the skb is associated with a device before calling __ip_options_compile, which is not always the case if it is re-routed by ipvs. When skb->dev is NULL, dev_net(skb->dev) will become null-dereference. This patch adds a check for the edge case and switch to use the net_device from the rtable when skb->dev is NULL. Fixes: `ed0de45a10` ("ipv4: recompile ip options in ipv4_link_failure") Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com> Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Cc: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 15:14:58 +01:00
David S. Miller	685c6d5b2c	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== The following pull-request contains BPF updates for your net-next tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 79 files changed, 5275 insertions(+), 600 deletions(-). The main changes are: 1) Basic BTF validation in libbpf, from Andrii Nakryiko. 2) bpf_assert(), bpf_throw(), exceptions in bpf progs, from Kumar Kartikeya Dwivedi. 3) next_thread cleanups, from Oleg Nesterov. 4) Add mcpu=v4 support to arm32, from Puranjay Mohan. 5) Add support for __percpu pointers in bpf progs, from Yonghong Song. 6) Fix bpf tailcall interaction with bpf trampoline, from Leon Hwang. 7) Raise irq_work in bpf_mem_alloc while irqs are disabled to improve refill probabablity, from Hou Tao. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Alan Maguire, Andrey Konovalov, Dave Marchevsky, "Eric W. Biederman", Jiri Olsa, Maciej Fijalkowski, Quentin Monnet, Russell King (Oracle), Song Liu, Stanislav Fomichev, Yonghong Song ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 15:12:06 +01:00
Jiri Pirko	c5e1bf8a51	devlink: introduce possibility to expose info about nested devlinks In mlx5, there is a devlink instance created for PCI device. Also, one separate devlink instance is created for auxiliary device that represents the netdev of uplink port. This relation is currently invisible to the devlink user. Benefit from the rel infrastructure and allow for nested devlink instance to set the relationship for the nested-in devlink instance. Note that there may be many nested instances, therefore use xarray to hold the list of rel_indexes for individual nested instances. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 14:01:47 +01:00
Jiri Pirko	9473bc0119	devlink: convert linecard nested devlink to new rel infrastructure Benefit from the newly introduced rel infrastructure, treat the linecard nested devlink instances in the same way as port function instances. Convert the code to use the rel infrastructure. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 14:01:47 +01:00
Jiri Pirko	0b7a2721e3	devlink: expose peer SF devlink instance Introduce a new helper devl_port_fn_devlink_set() to be used by driver assigning a devlink instance to the peer devlink port function. Expose this to user over new netlink attribute nested under port function nest to expose devlink handle related to the port function. This is particularly helpful for user to understand the relationship between devlink instances created for SFs and the port functions they belong to. Note that caller of devlink_port_notify() needs to hold devlink instance lock, put the assertion to devl_port_fn_devlink_set() to make this requirement explicit. Also note the limitations that only allow to make this assignment for registered objects. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 14:01:47 +01:00
Jiri Pirko	c137743bce	devlink: introduce object and nested devlink relationship infra It is a bit tricky to maintain relationship between devlink objects and nested devlink instances due to following aspects: 1) Locking. It is necessary to lock the devlink instance that contains the object first, only after that to lock the nested instance. 2) Lifetimes. Objects (e.g devlink port) may be removed before the nested devlink instance. 3) Notifications. If nested instance changes (e.g. gets registered/unregistered) the nested-in object needs to send appropriate notifications. Resolve this by introducing an xarray that holds 1:1 relationships between devlink object and related nested devlink instance. Use that xarray index to get the object/nested devlink instance on the other side. Provide necessary helpers: devlink_rel_nested_in_add/clear() to add and clear the relationship. devlink_rel_nested_in_notify() to call the nested-in object to send notifications during nested instance register/unregister/netns change. devlink_rel_devlink_handle_put() to be used by nested-in object fill function to fill the nested handle. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 14:01:47 +01:00
Jiri Pirko	1c2197c47a	devlink: extend devlink_nl_put_nested_handle() with attrtype arg As the next patch is going to call this helper with need to fill another type of nested attribute, pass it over function arg. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 14:01:46 +01:00
Jiri Pirko	af1f1400af	devlink: move devlink_nl_put_nested_handle() into netlink.c As the next patch is going to call this helper out of the linecard.c, move to netlink.c. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 14:01:46 +01:00
Jiri Pirko	ad99637ac9	devlink: put netnsid to nested handle If netns of devlink instance and nested devlink instance differs, put netnsid attr to indicate that. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 14:01:46 +01:00
Jiri Pirko	d0b7e990f7	devlink: move linecard struct into linecard.c Instead of exposing linecard struct, expose a simple helper to get the linecard index, which is all is needed outside linecard.c. Move the linecard struct to linecard.c and keep it private similar to the rest of the devlink objects. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 14:01:46 +01:00
Jiri Pirko	5f18426928	netdev: expose DPLL pin handle for netdevice In case netdevice represents a SyncE port, the user needs to understand the connection between netdevice and associated DPLL pin. There might me multiple netdevices pointing to the same pin, in case of VF/SF implementation. Add a IFLA Netlink attribute to nest the DPLL pin handle, similar to how it is implemented for devlink port. Add a struct dpll_pin pointer to netdev and protect access to it by RTNL. Expose netdev_dpll_pin_set() and netdev_dpll_pin_clear() helpers to the drivers so they can set/clear the DPLL pin relationship to netdev. Note that during the lifetime of struct dpll_pin the pin handle does not change. Therefore it is save to access it lockless. It is drivers responsibility to call netdev_dpll_pin_clear() before dpll_pin_put(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 11:50:20 +01:00
Aananth V	3868ab0f19	tcp: new TCP_INFO stats for RTO events The 2023 SIGCOMM paper "Improving Network Availability with Protective ReRoute" has indicated Linux TCP's RTO-triggered txhash rehashing can effectively reduce application disruption during outages. To better measure the efficacy of this feature, this patch adds three more detailed stats during RTO recovery and exports via TCP_INFO. Applications and monitoring systems can leverage this data to measure the network path diversity and end-to-end repair latency during network outages to improve their network infrastructure. The following counters are added to tcp_sock in order to track RTO events over the lifetime of a TCP socket. 1. u16 total_rto - Counts the total number of RTO timeouts. 2. u16 total_rto_recoveries - Counts the total number of RTO recoveries. 3. u32 total_rto_time - Counts the total time spent (ms) in RTO recoveries. (time spent in CA_Loss and CA_Recovery states) To compute total_rto_time, we add a new u32 rto_stamp field to tcp_sock. rto_stamp records the start timestamp (ms) of the last RTO recovery (CA_Loss). Corresponding fields are also added to the tcp_info struct. Signed-off-by: Aananth V <aananthv@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-16 13:42:34 +01:00
Aananth V	e326578a21	tcp: call tcp_try_undo_recovery when an RTOd TFO SYNACK is ACKed For passive TCP Fast Open sockets that had SYN/ACK timeout and did not send more data in SYN_RECV, upon receiving the final ACK in 3WHS, the congestion state may awkwardly stay in CA_Loss mode unless the CA state was undone due to TCP timestamp checks. However, if tcp_rcv_synrecv_state_fastopen() decides not to undo, then we should enter CA_Open, because at that point we have received an ACK covering the retransmitted SYNACKs. Currently, the icsk_ca_state is only set to CA_Open after we receive an ACK for a data-packet. This is because tcp_ack does not call tcp_fastretrans_alert (and tcp_process_loss) if !prior_packets Note that tcp_process_loss() calls tcp_try_undo_recovery(), so having tcp_rcv_synrecv_state_fastopen() decide that if we're in CA_Loss we should call tcp_try_undo_recovery() is consistent with that, and low risk. Fixes: `dad8cea7ad` ("tcp: fix TFO SYNACK undo to avoid double-timestamp-undo") Signed-off-by: Aananth V <aananthv@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-16 13:42:34 +01:00
Andy Shevchenko	aabb4af9bb	net: core: Use the bitmap API to allocate bitmaps Use bitmap_zalloc() and bitmap_free() instead of hand-writing them. It is less verbose and it improves the type checking and semantic. While at it, add missing header inclusion (should be bitops.h, but with the above change it becomes bitmap.h). Suggested-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230911154534.4174265-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-16 13:32:30 +01:00
David S. Miller	1612cc4b14	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== The following pull-request contains BPF updates for your net tree. We've added 21 non-merge commits during the last 8 day(s) which contain a total of 21 files changed, 450 insertions(+), 36 deletions(-). The main changes are: 1) Adjust bpf_mem_alloc buckets to match ksize(), from Hou Tao. 2) Check whether override is allowed in kprobe mult, from Jiri Olsa. 3) Fix btf_id symbol generation with ld.lld, from Jiri and Nick. 4) Fix potential deadlock when using queue and stack maps from NMI, from Toke Høiland-Jørgensen. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Alan Maguire, Biju Das, Björn Töpel, Dan Carpenter, Daniel Borkmann, Eduard Zingerman, Hsin-Wei Hung, Marcus Seyfarth, Nathan Chancellor, Satya Durga Srinivasu Prabhala, Song Liu, Stephen Rothwell ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-16 11:16:00 +01:00
Eric Dumazet	c123e0d30b	net: add truesize debug checks in skb_{add\|coalesce}_rx_frag() It can be time consuming to track driver bugs, that might be detected too late from this confusing warning in skb_try_coalesce() WARN_ON_ONCE(delta < len); Add sanity check in skb_add_rx_frag() and skb_coalesce_rx_frag() to better track bug origin for CONFIG_DEBUG_NET=y builds. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-16 10:10:27 +01:00
Eric Dumazet	41862d12e7	net: use indirect call helpers for sk->sk_prot->release_cb() When adding sk->sk_prot->release_cb() call from __sk_flush_backlog() Paolo suggested using indirect call helpers to take care of CONFIG_RETPOLINE=y case. It turns out Google had such mitigation for years in release_sock(), it is time to make this public :) Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-16 10:09:43 +01:00
Stanislav Fomichev	a9c2a60854	bpf: expose information about supported xdp metadata kfunc Add new xdp-rx-metadata-features member to netdev netlink which exports a bitmask of supported kfuncs. Most of the patch is autogenerated (headers), the only relevant part is netdev.yaml and the changes in netdev-genl.c to marshal into netlink. Example output on veth: $ ip link add veth0 type veth peer name veth1 # ifndex == 12 $ ./tools/net/ynl/samples/netdev 12 Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12 veth1[12] xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0 Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-09-15 11:26:58 -07:00
Stanislav Fomichev	fc45c5b642	bpf: make it easier to add new metadata kfunc No functional changes. Instead of having hand-crafted code in bpf_dev_bound_resolve_kfunc, move kfunc <> xmo handler relationship into XDP_METADATA_KFUNC_xxx. This way, any time new kfunc is added, we don't have to touch bpf_dev_bound_resolve_kfunc. Also document XDP_METADATA_KFUNC_xxx arguments since we now have more than two and it might be confusing what is what. Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-2-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-09-15 11:26:58 -07:00
Tirthendu Sarkar	d609f3d228	xsk: add multi-buffer support for sockets sharing umem Userspace applications indicate their multi-buffer capability to xsk using XSK_USE_SG socket bind flag. For sockets using shared umem the bind flag may contain XSK_USE_SG only for the first socket. For any subsequent socket the only option supported is XDP_SHARED_UMEM. Add option XDP_UMEM_SG_FLAG in umem config flags to store the multi-buffer handling capability when indicated by XSK_USE_SG option in bing flag by the first socket. Use this to derive multi-buffer capability for subsequent sockets in xsk core. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Fixes: `81470b5c3c` ("xsk: introduce XSK_USE_SG bind flag for xsk socket") Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230907035032.2627879-1-tirthendu.sarkar@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-15 11:00:22 -07:00
Ilya Leoshkevich	837723b22a	netfilter, bpf: Adjust timeouts of non-confirmed CTs in bpf_ct_insert_entry() bpf_nf testcase fails on s390x: bpf_skb_ct_lookup() cannot find the entry that was added by bpf_ct_insert_entry() within the same BPF function. The reason is that this entry is deleted by nf_ct_gc_expired(). The CT timeout starts ticking after the CT confirmation; therefore nf_conn.timeout is initially set to the timeout value, and __nf_conntrack_confirm() sets it to the deadline value. bpf_ct_insert_entry() sets IPS_CONFIRMED_BIT, but does not adjust the timeout, making its value meaningless and causing false positives. Fix the problem by making bpf_ct_insert_entry() adjust the timeout, like __nf_conntrack_confirm(). Fixes: `2cdaa3eefe` ("netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert()") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/bpf/20230830011128.1415752-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-15 10:17:55 -07:00
David S. Miller	615efed8b6	netfilter pull request 23-09-13 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmUCKboACgkQ1V2XiooU IOQ1CxAAqKwyeROJ7+qLvIBbwRFIQr70pPCjfY/GskP9aqljhth+e5TsKurWA12X wwbVhQ9xblvxarekR4B8lwGhvenYHk3l6R/3wuTMYPHFTXkE+mluGgljffaMwV+D YywK5hOkLenBZmxdjUfdJ87DJwAadcbLOABmEiSQ3hDxj3/xTBf7gToqlSwHtjCC JDC7vhxjosQHQSLhjqetfrUauz0OZAqldZ2is/FELYg56oCGKddGAZxnC4fQBnXx DzvRroP8f8bkqGjKwkt945bKiQ4Cz1frQE+YP1+pRk0rOkv70hhzH0JXIELQ5q9L RYLFfgkemp2HfBJ+y2PK8lBDailre4MdGdsAI5eWjBXgrl3jRBybioafhhUbJVIq Q3zIzXVgLQqXwSONBF2sfVssVZzhfjAzZQzzgw3wayhWj1WgwqsCb0EChvA4FJZ7 HW4xyROeOV7GHoUAWCPcoeBiNJYKmGNWjkWwlT4q5LtYMyWWP9oYx2kOn9/JQ9QI Tth8QobntRr8Gw/f0awGULM2pcecCLyYhIoJtWctegFSN2ejrKiV9XItbxZ3G1in 3pYSVgpyve9ZAvHmTSyvh+mjZ71X2ZebLyMADrWbsHrCXgIUSUkoksQd97XsffeZ noRVlLj0MlfRlUoorDQG3A+QxdQb+ZaHkBKTOEzouKOYEj6vylY= =TgRd -----END PGP SIGNATURE----- Merge tag 'nf-23-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf netfilter pull request 23-09-13 ==================== The following patchset contains Netfilter fixes for net: 1) Do not permit to remove rules from chain binding, otherwise double rule release is possible, triggering UaF. This rule deletion support does not make sense and userspace does not use this. Problem exists since the introduction of chain binding support. 2) rbtree GC worker only collects the elements that have expired. This operation is not destructive, therefore, turn write into read spinlock to avoid datapath contention due to GC worker run. This was not fixed in the recent GC fix batch in the 6.5 cycle. 3) pipapo set backend performs sync GC, therefore, catchall elements must use sync GC queue variant. This bug was introduced in the 6.5 cycle with the recent GC fixes. 4) Stop GC run if memory allocation fails in pipapo set backend, otherwise access to NULL pointer to GC transaction object might occur. This bug was introduced in the 6.5 cycle with the recent GC fixes. 5) rhash GC run uses an iterator that might hit EAGAIN to rewind, triggering double-collection of the same element. This bug was introduced in the 6.5 cycle with the recent GC fixes. 6) Do not permit to remove elements in anonymous sets, this type of sets are populated once and then bound to rules. This fix is similar to the chain binding patch coming first in this batch. API permits since the very beginning but it has no use case from userspace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 13:56:58 +01:00
Dan Carpenter	4fa5ce3e3a	tcp: indent an if statement Indent this if statement one tab. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 13:55:50 +01:00
Sasha Neftin	75ad80ed88	net/core: Fix ETH_P_1588 flow dissector When a PTP ethernet raw frame with a size of more than 256 bytes followed by a 0xff pattern is sent to __skb_flow_dissect, nhoff value calculation is wrong. For example: hdr->message_length takes the wrong value (0xffff) and it does not replicate real header length. In this case, 'nhoff' value was overridden and the PTP header was badly dissected. This leads to a kernel crash. net/core: flow_dissector net/core flow dissector nhoff = 0x0000000e net/core flow dissector hdr->message_length = 0x0000ffff net/core flow dissector nhoff = 0x0001000d (u16 overflow) ... skb linear: 00000000: 00 a0 c9 00 00 00 00 a0 c9 00 00 00 88 skb frag: 00000000: f7 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Using the size of the ptp_header struct will allow the corrected calculation of the nhoff value. net/core flow dissector nhoff = 0x0000000e net/core flow dissector nhoff = 0x00000030 (sizeof ptp_header) ... skb linear: 00000000: 00 a0 c9 00 00 00 00 a0 c9 00 00 00 88 f7 ff ff skb linear: 00000010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff skb linear: 00000020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff skb frag: 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Kernel trace: [ 74.984279] ------------[ cut here ]------------ [ 74.989471] kernel BUG at include/linux/skbuff.h:2440! [ 74.995237] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 75.001098] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G U 5.15.85-intel-ese-standard-lts #1 [ 75.011629] Hardware name: Intel Corporation A-Island (CPU:AlderLake)/A-Island (ID:06), BIOS SB_ADLP.01.01.00.01.03.008.D-6A9D9E73-dirty Mar 30 2023 [ 75.026507] RIP: 0010:eth_type_trans+0xd0/0x130 [ 75.031594] Code: 03 88 47 78 eb c7 8b 47 68 2b 47 6c 48 8b 97 c0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb ab <0f> 0b b8 00 01 00 00 eb a2 48 85 ff 74 eb 48 8d 54 24 06 31 f6 b9 [ 75.052612] RSP: 0018:ffff9948c0228de0 EFLAGS: 00010297 [ 75.058473] RAX: 00000000000003f2 RBX: ffff8e47047dc300 RCX: 0000000000001003 [ 75.066462] RDX: ffff8e4e8c9ea040 RSI: ffff8e4704e0a000 RDI: ffff8e47047dc300 [ 75.074458] RBP: ffff8e4704e2acc0 R08: 00000000000003f3 R09: 0000000000000800 [ 75.082466] R10: 000000000000000d R11: ffff9948c0228dec R12: ffff8e4715e4e010 [ 75.090461] R13: ffff9948c0545018 R14: 0000000000000001 R15: 0000000000000800 [ 75.098464] FS: 0000000000000000(0000) GS:ffff8e4e8fb00000(0000) knlGS:0000000000000000 [ 75.107530] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 75.113982] CR2: 00007f5eb35934a0 CR3: 0000000150e0a002 CR4: 0000000000770ee0 [ 75.121980] PKRU: 55555554 [ 75.125035] Call Trace: [ 75.127792] <IRQ> [ 75.130063] ? eth_get_headlen+0xa4/0xc0 [ 75.134472] igc_process_skb_fields+0xcd/0x150 [ 75.139461] igc_poll+0xc80/0x17b0 [ 75.143272] __napi_poll+0x27/0x170 [ 75.147192] net_rx_action+0x234/0x280 [ 75.151409] __do_softirq+0xef/0x2f4 [ 75.155424] irq_exit_rcu+0xc7/0x110 [ 75.159432] common_interrupt+0xb8/0xd0 [ 75.163748] </IRQ> [ 75.166112] <TASK> [ 75.168473] asm_common_interrupt+0x22/0x40 [ 75.173175] RIP: 0010:cpuidle_enter_state+0xe2/0x350 [ 75.178749] Code: 85 c0 0f 8f 04 02 00 00 31 ff e8 39 6c 67 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 50 02 00 00 31 ff e8 52 b0 6d ff fb 45 85 f6 <0f> 88 b1 00 00 00 49 63 ce 4c 2b 2c 24 48 89 c8 48 6b d1 68 48 c1 [ 75.199757] RSP: 0018:ffff9948c013bea8 EFLAGS: 00000202 [ 75.205614] RAX: ffff8e4e8fb00000 RBX: ffffb948bfd23900 RCX: 000000000000001f [ 75.213619] RDX: 0000000000000004 RSI: ffffffff94206161 RDI: ffffffff94212e20 [ 75.221620] RBP: 0000000000000004 R08: 000000117568973a R09: 0000000000000001 [ 75.229622] R10: 000000000000afc8 R11: ffff8e4e8fb29ce4 R12: ffffffff945ae980 [ 75.237628] R13: 000000117568973a R14: 0000000000000004 R15: 0000000000000000 [ 75.245635] ? cpuidle_enter_state+0xc7/0x350 [ 75.250518] cpuidle_enter+0x29/0x40 [ 75.254539] do_idle+0x1d9/0x260 [ 75.258166] cpu_startup_entry+0x19/0x20 [ 75.262582] secondary_startup_64_no_verify+0xc2/0xcb [ 75.268259] </TASK> [ 75.270721] Modules linked in: 8021q snd_sof_pci_intel_tgl snd_sof_intel_hda_common tpm_crb snd_soc_hdac_hda snd_sof_intel_hda snd_hda_ext_core snd_sof_pci snd_sof snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress iTCO_wdt ac97_bus intel_pmc_bxt mei_hdcp iTCO_vendor_support snd_hda_codec_hdmi pmt_telemetry intel_pmc_core pmt_class snd_hda_intel x86_pkg_temp_thermal snd_intel_dspcfg snd_hda_codec snd_hda_core kvm_intel snd_pcm snd_timer kvm snd mei_me soundcore tpm_tis irqbypass i2c_i801 mei tpm_tis_core pcspkr intel_rapl_msr tpm i2c_smbus intel_pmt thermal sch_fq_codel uio uhid i915 drm_buddy video drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm fuse configfs [ 75.342736] ---[ end trace 3785f9f360400e3a ]--- [ 75.347913] RIP: 0010:eth_type_trans+0xd0/0x130 [ 75.352984] Code: 03 88 47 78 eb c7 8b 47 68 2b 47 6c 48 8b 97 c0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb ab <0f> 0b b8 00 01 00 00 eb a2 48 85 ff 74 eb 48 8d 54 24 06 31 f6 b9 [ 75.373994] RSP: 0018:ffff9948c0228de0 EFLAGS: 00010297 [ 75.379860] RAX: 00000000000003f2 RBX: ffff8e47047dc300 RCX: 0000000000001003 [ 75.387856] RDX: ffff8e4e8c9ea040 RSI: ffff8e4704e0a000 RDI: ffff8e47047dc300 [ 75.395864] RBP: ffff8e4704e2acc0 R08: 00000000000003f3 R09: 0000000000000800 [ 75.403857] R10: 000000000000000d R11: ffff9948c0228dec R12: ffff8e4715e4e010 [ 75.411863] R13: ffff9948c0545018 R14: 0000000000000001 R15: 0000000000000800 [ 75.419875] FS: 0000000000000000(0000) GS:ffff8e4e8fb00000(0000) knlGS:0000000000000000 [ 75.428946] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 75.435403] CR2: 00007f5eb35934a0 CR3: 0000000150e0a002 CR4: 0000000000770ee0 [ 75.443410] PKRU: 55555554 [ 75.446477] Kernel panic - not syncing: Fatal exception in interrupt [ 75.453738] Kernel Offset: 0x11c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 75.465794] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: `4f1cc51f34` ("net: flow_dissector: Parse PTP L2 packet header") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:40:04 +01:00
Eric Dumazet	859f8b265f	ipv6: lockless IPV6_FLOWINFO_SEND implementation np->sndflow reads are racy. Use one bit ftom atomic inet->inet_flags instead, IPV6_FLOWINFO_SEND setsockopt() can be lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:48 +01:00
Eric Dumazet	6b724bc430	ipv6: lockless IPV6_MTU_DISCOVER implementation Most np->pmtudisc reads are racy. Move this 3bit field on a full byte, add annotations and make IPV6_MTU_DISCOVER setsockopt() lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:48 +01:00
Eric Dumazet	83cd5eb654	ipv6: lockless IPV6_ROUTER_ALERT_ISOLATE implementation Reads from np->rtalert_isolate are racy. Move this flag to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:48 +01:00
Eric Dumazet	3cccda8db2	ipv6: move np->repflow to atomic flags Move np->repflow to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:48 +01:00
Eric Dumazet	3fa29971c6	ipv6: lockless IPV6_RECVERR implemetation np->recverr is moved to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:48 +01:00
Eric Dumazet	1086ca7cce	ipv6: lockless IPV6_DONTFRAG implementation Move np->dontfrag flag to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:47 +01:00
Eric Dumazet	5121516b0c	ipv6: lockless IPV6_AUTOFLOWLABEL implementation Move np->autoflowlabel and np->autoflowlabel_set in inet->inet_flags, to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:47 +01:00
Eric Dumazet	6559c0ff3b	ipv6: lockless IPV6_MULTICAST_ALL implementation Move np->mc_all to an atomic flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:47 +01:00
Eric Dumazet	dcae74622c	ipv6: lockless IPV6_RECVERR_RFC4884 implementation Move np->recverr_rfc4884 to an atomic flag to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:47 +01:00
Eric Dumazet	273784d3c5	ipv6: lockless IPV6_MINHOPCOUNT implementation Add one missing READ_ONCE() annotation in do_ipv6_getsockopt() and make IPV6_MINHOPCOUNT setsockopt() lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:47 +01:00
Eric Dumazet	15f926c445	ipv6: lockless IPV6_MTU implementation np->frag_size can be read/written without holding socket lock. Add missing annotations and make IPV6_MTU setsockopt() lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:46 +01:00
Eric Dumazet	2da23eb07c	ipv6: lockless IPV6_MULTICAST_HOPS implementation This fixes data-races around np->mcast_hops, and make IPV6_MULTICAST_HOPS lockless. Note that np->mcast_hops is never negative, thus can fit an u8 field instead of s16. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:46 +01:00
Eric Dumazet	d986f52124	ipv6: lockless IPV6_MULTICAST_LOOP implementation Add inet6_{test\|set\|clear\|assign}_bit() helpers. Note that I am using bits from inet->inet_flags, this might change in the future if we need more flags. While solving data-races accessing np->mc_loop, this patch also allows to implement lockless accesses to np->mcast_hops in the following patch. Also constify sk_mc_loop() argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:46 +01:00
Eric Dumazet	b0adfba7ee	ipv6: lockless IPV6_UNICAST_HOPS implementation Some np->hop_limit accesses are racy, when socket lock is not held. Add missing annotations and switch to full lockless implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-15 10:33:46 +01:00
Paolo Abeni	f2fa1c812c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 19:48:23 +02:00
Gavrilov Ilia	59bb1d6980	ipv6: mcast: Remove redundant comparison in igmp6_mcf_get_next() The 'state->im' value will always be non-zero after the 'while' statement, so the check can be removed. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230912084100.1502379-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 17:20:35 +02:00
Gavrilov Ilia	a613ed1afd	ipv4: igmp: Remove redundant comparison in igmp_mcf_get_next() The 'state->im' value will always be non-zero after the 'while' statement, so the check can be removed. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230912084039.1501984-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 17:20:17 +02:00
Eric Dumazet	882af43a0f	udplite: fix various data-races udp->pcflag, udp->pcslen and udp->pcrlen reads/writes are racy. Move udp->pcflag to udp->udp_flags for atomicity, and add READ_ONCE()/WRITE_ONCE() annotations for pcslen and pcrlen. Fixes: `ba4e58eca8` ("[NET]: Supporting UDP-Lite (RFC 3828) in Linux") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 16:16:36 +02:00
Eric Dumazet	729549aa35	udplite: remove UDPLITE_BIT This flag is set but never read, we can remove it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 16:16:36 +02:00
Eric Dumazet	70a36f5713	udp: annotate data-races around udp->encap_type syzbot/KCSAN complained about UDP_ENCAP_L2TPINUDP setsockopt() racing. Add READ_ONCE()/WRITE_ONCE() to document races on this lockless field. syzbot report was: BUG: KCSAN: data-race in udp_lib_setsockopt / udp_lib_setsockopt read-write to 0xffff8881083603fa of 1 bytes by task 16557 on cpu 0: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read-write to 0xffff8881083603fa of 1 bytes by task 16554 on cpu 1: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x01 -> 0x05 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 16554 Comm: syz-executor.5 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 16:16:36 +02:00
Eric Dumazet	ac9a7f4ce5	udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GRO Move udp->encap_enabled to udp->udp_flags. Add udp_test_and_set_bit() helper to allow lockless udp_tunnel_encap_enable() implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 16:16:36 +02:00
Eric Dumazet	f5f52f0884	udp: move udp->accept_udp_{l4\|fraglist} to udp->udp_flags These are read locklessly, move them to udp_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 16:16:36 +02:00
Eric Dumazet	6d5a12eb91	udp: add missing WRITE_ONCE() around up->encap_rcv UDP_ENCAP_ESPINUDP_NON_IKE setsockopt() writes over up->encap_rcv while other cpus read it. Fixes: `067b207b28` ("[UDP]: Cleanup UDP encapsulation code") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 16:16:36 +02:00
Eric Dumazet	e1dc0615c6	udp: move udp->gro_enabled to udp->udp_flags syzbot reported that udp->gro_enabled can be read locklessly. Use one atomic bit from udp->udp_flags. Fixes: `e20cf8d3f1` ("udp: implement GRO for plain UDP sockets.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 16:16:36 +02:00
Eric Dumazet	bcbc1b1de8	udp: move udp->no_check6_rx to udp->udp_flags syzbot reported that udp->no_check6_rx can be read locklessly. Use one atomic bit from udp->udp_flags. Fixes: `1c19448c9b` ("net: Make enabling of zero UDP6 csums more restrictive") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 16:16:36 +02:00
Eric Dumazet	a0002127cd	udp: move udp->no_check6_tx to udp->udp_flags syzbot reported that udp->no_check6_tx can be read locklessly. Use one atomic bit from udp->udp_flags Fixes: `1c19448c9b` ("net: Make enabling of zero UDP6 csums more restrictive") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 16:16:36 +02:00
Eric Dumazet	81b36803ac	udp: introduce udp->udp_flags According to syzbot, it is time to use proper atomic flags for various UDP flags. Add udp_flags field, and convert udp->corkflag to first bit in it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 16:16:36 +02:00
Kuniyuki Iwashima	a22730b1b4	kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg(). syzkaller found a memory leak in kcm_sendmsg(), and commit `c821a88bd7` ("kcm: Fix memory leak in error path of kcm_sendmsg()") suppressed it by updating kcm_tx_msg(head)->last_skb if partial data is copied so that the following sendmsg() will resume from the skb. However, we cannot know how many bytes were copied when we get the error. Thus, we could mess up the MSG_MORE queue. When kcm_sendmsg() fails for SOCK_DGRAM, we should purge the queue as we do so for UDP by udp_flush_pending_frames(). Even without this change, when the error occurred, the following sendmsg() resumed from a wrong skb and the queue was messed up. However, we have yet to get such a report, and only syzkaller stumbled on it. So, this can be changed safely. Note this does not change SOCK_SEQPACKET behaviour. Fixes: `c821a88bd7` ("kcm: Fix memory leak in error path of kcm_sendmsg()") Fixes: `ab7ac4eb98` ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230912022753.33327-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 10:43:51 +02:00
Arseniy Krasnov	8ecf0cedc0	vsock: send SIGPIPE on write to shutdowned socket POSIX requires to send SIGPIPE on write to SOCK_STREAM socket which was shutdowned with SHUT_WR flag or its peer was shutdowned with SHUT_RD flag. Also we must not send SIGPIPE if MSG_NOSIGNAL flag is set. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-09-14 08:19:55 +02:00
Phil Sutter	7fb818f248	netfilter: nf_tables: Fix entries val in rule reset audit log The value in idx and the number of rules handled in that particular __nf_tables_dump_rules() call is not identical. The former is a cursor to pick up from if multiple netlink messages are needed, so its value is ever increasing. Fixing this is not just a matter of subtracting s_idx from it, though: When resetting rules in multiple chains, __nf_tables_dump_rules() is called for each and cb->args[0] is not adjusted in between. Introduce a dedicated counter to record the number of rules reset in this call in a less confusing way. While being at it, prevent the direct return upon buffer exhaustion: Any rules previously dumped into that skb would evade audit logging otherwise. Fixes: `9b5ba5c9c5` ("netfilter: nf_tables: Unbreak audit log reset") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-09-13 21:57:50 +02:00
Florian Westphal	4908d5af16	netfilter: conntrack: fix extension size table The size table is incorrect due to copypaste error, this reserves more size than needed. TSTAMP reserved 32 instead of 16 bytes. TIMEOUT reserved 16 instead of 8 bytes. Fixes: `5f31edc067` ("netfilter: conntrack: move extension sizes into core") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-09-13 21:57:50 +02:00
Olga Kornievskaia	806a3bc421	NFSv4.1: fix pnfs MDS=DS session trunking Currently, when GETDEVICEINFO returns multiple locations where each is a different IP but the server's identity is same as MDS, then nfs4_set_ds_client() finds the existing nfs_client structure which has the MDS's max_connect value (and if it's 1), then the 1st IP on the DS's list will get dropped due to MDS trunking rules. Other IPs would be added as they fall under the pnfs trunking rules. For the list of IPs the 1st goes thru calling nfs4_set_ds_client() which will eventually call nfs4_add_trunk() and call into rpc_clnt_test_and_add_xprt() which has the check for MDS trunking. The other IPs (after the 1st one), would call rpc_clnt_add_xprt() which doesn't go thru that check. nfs4_add_trunk() is called when MDS trunking is happening and it needs to enforce the usage of max_connect mount option of the 1st mount. However, this shouldn't be applied to pnfs flow. Instead, this patch proposed to treat MDS=DS as DS trunking and make sure that MDS's max_connect limit does not apply to the 1st IP returned in the GETDEVICEINFO list. It does so by marking the newly created client with a new flag NFS_CS_PNFS which then used to pass max_connect value to use into the rpc_clnt_test_and_add_xprt() instead of the existing rpc client's max_connect value set by the MDS connection. For example, mount was done without max_connect value set so MDS's rpc client has cl_max_connect=1. Upon calling into rpc_clnt_test_and_add_xprt() and using rpc client's value, the caller passes in max_connect value which is previously been set in the pnfs path (as a part of handling GETDEVICEINFO list of IPs) in nfs4_set_ds_client(). However, when NFS_CS_PNFS flag is not set and we know we are doing MDS trunking, comparing a new IP of the same server, we then set the max_connect value to the existing MDS's value and pass that into rpc_clnt_test_and_add_xprt(). Fixes: `dc48e0abee` ("SUNRPC enforce creation of no more than max_connect xprts") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2023-09-13 11:51:11 -04:00
Trond Myklebust	e86fcf0820	Revert "SUNRPC: Fail faster on bad verifier" This reverts commit `0701214cd6`. The premise of this commit was incorrect. There are exactly 2 cases where rpcauth_checkverf() will return an error: 1) If there was an XDR decode problem (i.e. garbage data). 2) If gss_validate() had a problem verifying the RPCSEC_GSS MIC. In the second case, there are again 2 subcases: a) The GSS context expires, in which case gss_validate() will force a new context negotiation on retry by invalidating the cred. b) The sequence number check failed because an RPC call timed out, and the client retransmitted the request using a new sequence number, as required by RFC2203. In neither subcase is this a fatal error. Reported-by: Russell Cattelan <cattelan@thebarn.com> Fixes: `0701214cd6` ("SUNRPC: Fail faster on bad verifier") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2023-09-13 11:51:11 -04:00
Trond Myklebust	611fa42dfa	SUNRPC: Mark the cred for revalidation if the server rejects it If the server rejects the credential as being stale, or bad, then we should mark it for revalidation before retransmitting. Fixes: `7f5667a5f8` ("SUNRPC: Clean up rpc_verify_header()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2023-09-13 11:51:11 -04:00
Ping-Ke Shih	e160ab8516	wifi: mac80211: don't return unset power in ieee80211_get_tx_power() We can get a UBSAN warning if ieee80211_get_tx_power() returns the INT_MIN value mac80211 internally uses for "unset power level". UBSAN: signed-integer-overflow in net/wireless/nl80211.c:3816:5 -2147483648 * 100 cannot be represented in type 'int' CPU: 0 PID: 20433 Comm: insmod Tainted: G WC OE Call Trace: dump_stack+0x74/0x92 ubsan_epilogue+0x9/0x50 handle_overflow+0x8d/0xd0 __ubsan_handle_mul_overflow+0xe/0x10 nl80211_send_iface+0x688/0x6b0 [cfg80211] [...] cfg80211_register_wdev+0x78/0xb0 [cfg80211] cfg80211_netdev_notifier_call+0x200/0x620 [cfg80211] [...] ieee80211_if_add+0x60e/0x8f0 [mac80211] ieee80211_register_hw+0xda5/0x1170 [mac80211] In this case, simply return an error instead, to indicate that no data is available. Cc: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://lore.kernel.org/r/20230203023636.4418-1-pkshih@realtek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-13 16:29:24 +02:00
Stephen Douthit	3e99b4d282	wifi: mac80211: Sanity check tx bitrate if not provided by driver If the driver doesn't fill NL80211_STA_INFO_TX_BITRATE in sta_set_sinfo() then as a fallback sta->deflink.tx_stats.last_rate is used. Unfortunately there's no guarantee that this has actually been set before it's used. Originally found when 'iw <dev> link' would always return a tx rate of 6Mbps regardless of actual link speed for the QCA9337 running firmware WLAN.TF.2.1-00021-QCARMSWP-1 in my netbook. Use the sanity check logic from ieee80211_fill_rx_status() and refactor that to use the new inline function. Signed-off-by: Stephen Douthit <stephen.douthit@gmail.com> Link: https://lore.kernel.org/r/20230213204024.3377-1-stephen.douthit@gmail.com [change to bool ..._rate_valid() instead of int ..._rate_invalid()] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-13 16:24:05 +02:00
Pedro Tammela	ef765c2587	net/sched: cls_route: make netlink errors meaningful Use netlink extended ack and parsing policies to return more meaningful errors instead of the relying solely on errnos. Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-13 12:38:52 +01:00
Aditya Kumar Singh	30ca8b0c4d	wifi: cfg80211: export DFS CAC time and usable state helper functions cfg80211 has cfg80211_chandef_dfs_usable() function to know whether at least one channel in the chandef is in usable state or not. Also, cfg80211_chandef_dfs_cac_time() function is there which tells the CAC time required for the given chandef. Make these two functions visible to drivers by exporting their symbol to global list of kernel symbols. Lower level drivers can make use of these two functions to be aware if CAC is required on the given chandef and for how long. For example drivers which maintains the CAC state internally can make use of these. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230912051857.2284-2-quic_adisi@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-13 13:24:11 +02:00
Abhishek Kumar	b13b6bbfbb	wifi: cfg80211: call reg_call_notifier on beacon hints Currently the channel property updates are not propagated to driver. This causes issues in the discovery of hidden SSIDs and fails to connect to them. This change defines a new wiphy flag which when enabled by vendor driver, the reg_call_notifier callback will be trigger on beacon hints. This ensures that the channel property changes are visible to the vendor driver. The vendor changes the channels for active scans. This fixes the discovery issue of hidden SSID. Signed-off-by: Abhishek Kumar <kuabhs@chromium.org> Link: https://lore.kernel.org/r/20230629035254.1.I059fe585f9f9e896c2d51028ef804d197c8c009e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-13 12:34:01 +02:00
Raj Kumar Bhagat	13ba6794d2	wifi: cfg80211: allow reg update by driver even if wiphy->regd is set Currently regulatory update by driver is not allowed when the wiphy->regd is already set and drivers_request->intersect is false. During wiphy registration, some drivers (ath10k does this currently) first register the world regulatory to cfg80211 using wiphy_apply_custom_regulatory(). The driver then obtain the current operating country and tries to update the correct regulatory to cfg80211 using regulatory_hint(). But at this point, wiphy->regd is already set to world regulatory. Also, since this is the first request from driver after the world regulatory is set this will result in drivers_request->intersect set to false. In this condition the driver request regulatory is not allowed to update to cfg80211 in reg_set_rd_driver(). This restricts the device operation to the world regulatory. This driver request to update the regulatory with current operating country is valid and should be updated to cfg80211. Hence allow regulatory update by driver even if the wiphy->regd is already set and driver_request->intersect is false. Signed-off-by: Raj Kumar Bhagat <quic_rajkbhag@quicinc.com> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230421061312.13722-1-quic_rajkbhag@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-13 12:34:01 +02:00
Aloka Dixit	6bc5ddb2fd	wifi: mac80211: additions to change_beacon() Process FILS discovery and unsolicited broadcast probe response transmission configurations in ieee80211_change_beacon(). Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230727174100.11721-6-quic_alokad@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-13 12:34:01 +02:00
Aloka Dixit	b2d431d43c	wifi: nl80211: additions to NL80211_CMD_SET_BEACON FILS discovery and unsolicited broadcast probe response templates need to be updated along with beacon templates in some cases such as the channel switch operation. Add the missing implementation. Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230727174100.11721-5-quic_alokad@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-09-13 12:34:01 +02:00

... 4 5 6 7 8 ...

75114 commits