Commit Graph

76367 Commits

Author SHA1 Message Date
Yuxuan Hu 2535b848fa Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security
During our fuzz testing of the connection and disconnection process at the
RFCOMM layer, we discovered this bug. By comparing the packets from a
normal connection and disconnection process with the testcase that
triggered a KASAN report. We analyzed the cause of this bug as follows:

1. In the packets captured during a normal connection, the host sends a
`Read Encryption Key Size` type of `HCI_CMD` packet
(Command Opcode: 0x1408) to the controller to inquire the length of
encryption key.After receiving this packet, the controller immediately
replies with a Command Completepacket (Event Code: 0x0e) to return the
Encryption Key Size.

2. In our fuzz test case, the timing of the controller's response to this
packet was delayed to an unexpected point: after the RFCOMM and L2CAP
layers had disconnected but before the HCI layer had disconnected.

3. After receiving the Encryption Key Size Response at the time described
in point 2, the host still called the rfcomm_check_security function.
However, by this time `struct l2cap_conn *conn = l2cap_pi(sk)->chan->conn;`
had already been released, and when the function executed
`return hci_conn_security(conn->hcon, d->sec_level, auth_type, d->out);`,
specifically when accessing `conn->hcon`, a null-ptr-deref error occurred.

To fix this bug, check if `sk->sk_state` is BT_CLOSED before calling
rfcomm_recv_frame in rfcomm_process_rx.

Signed-off-by: Yuxuan Hu <20373622@buaa.edu.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-02-28 09:42:26 -05:00
Luiz Augusto von Dentz e5469adb2a Bluetooth: hci_sync: Fix accept_list when attempting to suspend
During suspend, only wakeable devices can be in acceptlist, so if the
device was previously added it needs to be removed otherwise the device
can end up waking up the system prematurely.

Fixes: 3b42055388 ("Bluetooth: hci_sync: Fix attempting to suspend with unfiltered passive scan")
Signed-off-by: Clancy Shang <clancy.shang@quectel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
2024-02-28 09:42:02 -05:00
Ying Hsu 2449007d3f Bluetooth: Avoid potential use-after-free in hci_error_reset
While handling the HCI_EV_HARDWARE_ERROR event, if the underlying
BT controller is not responding, the GPIO reset mechanism would
free the hci_dev and lead to a use-after-free in hci_error_reset.

Here's the call trace observed on a ChromeOS device with Intel AX201:
   queue_work_on+0x3e/0x6c
   __hci_cmd_sync_sk+0x2ee/0x4c0 [bluetooth <HASH:3b4a6>]
   ? init_wait_entry+0x31/0x31
   __hci_cmd_sync+0x16/0x20 [bluetooth <HASH:3b4a 6>]
   hci_error_reset+0x4f/0xa4 [bluetooth <HASH:3b4a 6>]
   process_one_work+0x1d8/0x33f
   worker_thread+0x21b/0x373
   kthread+0x13a/0x152
   ? pr_cont_work+0x54/0x54
   ? kthread_blkcg+0x31/0x31
    ret_from_fork+0x1f/0x30

This patch holds the reference count on the hci_dev while processing
a HCI_EV_HARDWARE_ERROR event to avoid potential crash.

Fixes: c7741d16a5 ("Bluetooth: Perform a power cycle when receiving hardware error event")
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-02-28 09:41:35 -05:00
Jonas Dreßler 6b3899be24 Bluetooth: hci_sync: Check the correct flag before starting a scan
There's a very confusing mistake in the code starting a HCI inquiry: We're
calling hci_dev_test_flag() to test for HCI_INQUIRY, but hci_dev_test_flag()
checks hdev->dev_flags instead of hdev->flags. HCI_INQUIRY is a bit that's
set on hdev->flags, not on hdev->dev_flags though.

HCI_INQUIRY equals the integer 7, and in hdev->dev_flags, 7 means
HCI_BONDABLE, so we were actually checking for HCI_BONDABLE here.

The mistake is only present in the synchronous code for starting an inquiry,
not in the async one. Also devices are typically bondable while doing an
inquiry, so that might be the reason why nobody noticed it so far.

Fixes: abfeea476c ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY")
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-02-28 09:41:10 -05:00
Andrew Lunn 292fac464b net: ethtool: eee: Remove legacy _u32 from keee
All MAC drivers have been converted to use the link mode members of
keee. So remove the _u32 values, and the code in the ethtool core to
convert the legacy _u32 values to link modes.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 12:18:05 +00:00
Zhengchao Shao 2e26b6dfad ipv6: raw: remove useless input parameter in rawv6_get/seticmpfilter
The input parameter 'level' in rawv6_get/seticmpfilter is not used.
Therefore, remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 11:33:07 +00:00
Lukasz Majewski 995161edfd net: hsr: Fix typo in the hsr_forward_do() function comment
Correct type in the hsr_forward_do() comment.

Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 11:23:03 +00:00
Justin Iurman f655c78d62 net: exthdrs: ioam6: send trace event
If we're processing an IOAM Pre-allocated Trace Option-Type (the only
one supported currently), then send the trace as an ioam6 event to the
netlink multicast group. This way, user space apps will be able to
collect IOAM data.

Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 11:19:41 +00:00
Justin Iurman 67c8e4bb4f net: ioam6: multicast event
Add a multicast group to the ioam6 generic netlink family and provide
ioam6_event() to send an ioam6 event to the multicast group.

Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 11:19:41 +00:00
Jason Xing ee01defe25 tcp: make dropreason in tcp_child_process() work
It's time to let it work right now. We've already prepared for this:)

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 10:39:22 +00:00
Jason Xing b982569593 tcp: make the dropreason really work when calling tcp_rcv_state_process()
Update three callers including both ipv4 and ipv6 and let the dropreason
mechanism work in reality.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 10:39:22 +00:00
Jason Xing 7d6ed9afde tcp: add dropreasons in tcp_rcv_state_process()
In this patch, I equipped this function with more dropreasons, but
it still doesn't work yet, which I will do later.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 10:39:22 +00:00
Jason Xing e615e3a24e tcp: add more specific possible drop reasons in tcp_rcv_synsent_state_process()
This patch does two things:
1) add two more new reasons
2) only change the return value(1) to various drop reason values
for the future use

For now, we still cannot trace those two reasons. We'll implement the full
function in the subsequent patch in this series.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 10:39:22 +00:00
Jason Xing 253541a3c1 tcp: use drop reasons in cookie check for ipv6
Like what I did to ipv4 mode, refine this part: adding more drop
reasons for better tracing.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 10:39:21 +00:00
Jason Xing ed43e76cdc tcp: directly drop skb in cookie check for ipv6
Like previous patch does, only moving skb drop logical code to
cookie_v6_check() for later refinement.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 10:39:21 +00:00
Jason Xing a4a69a3719 tcp: use drop reasons in cookie check for ipv4
Now it's time to use the prepared definitions to refine this part.
Four reasons used might enough for now, I think.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 10:39:21 +00:00
Jason Xing 65be4393f3 tcp: directly drop skb in cookie check for ipv4
Only move the skb drop from tcp_v4_do_rcv() to cookie_v4_check() itself,
no other changes made. It can help us refine the specific drop reasons
later.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 10:39:21 +00:00
Adam Li 12a686c2e7 net: make SK_MEMORY_PCPU_RESERV tunable
This patch adds /proc/sys/net/core/mem_pcpu_rsv sysctl file,
to make SK_MEMORY_PCPU_RESERV tunable.

Commit 3cd3399dd7 ("net: implement per-cpu reserves for
memory_allocated") introduced per-cpu forward alloc cache:

"Implement a per-cpu cache of +1/-1 MB, to reduce number
of changes to sk->sk_prot->memory_allocated, which
would otherwise be cause of false sharing."

sk_prot->memory_allocated points to global atomic variable:
atomic_long_t tcp_memory_allocated ____cacheline_aligned_in_smp;

If increasing the per-cpu cache size from 1MB to e.g. 16MB,
changes to sk->sk_prot->memory_allocated can be further reduced.
Performance may be improved on system with many cores.

Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28 09:23:08 +00:00
Jakub Kicinski ed2c0e4cb6 wireless fixes for v6.8-rc7
Few remaining fixes, hopefully the last wireless pull request to v6.8. Two
 fixes to the stack and two to iwlwifi but no high priority fixes this
 time.
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmXd6e8RHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZvZ1Qf9HVcuUzZoWabHq7AlgW1cSM1Qre0vTXIx
 kuBypruFuStvHaTQ/YR5tFA/lLdbyvS9LfpX3LbiW0W+EgL7I+1eB2h2y6To+jgZ
 IsbTCf+sHzkBT+EYPJnzTOv86YfN+sbGDKrQsOfAqIop9H9taF8VxVuUZofDgbEB
 S/uV88e6B1Ot0mpaUNWO3w5/Xv+YBNtUBgPawRUU4IIpGEhluePkRMm3RSVi2wjm
 2mxNJHjsI7OLkKivCm1PYMfC177p47/2fMAKMDkOqkhc8//svrOzcyoh0vNBea/a
 7OO5GX39lcVgOy97uj7mvNL0qvFAK8396gTi8k1heOmkGhI0mlcKow==
 =/obR
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2024-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.8-rc7

Few remaining fixes, hopefully the last wireless pull request to v6.8.
Two fixes to the stack and two to iwlwifi but no high priority fixes
this time.

* tag 'wireless-2024-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mac80211: only call drv_sta_rc_update for uploaded stations
  MAINTAINERS: wifi: Add N: ath1*k entries to match .yaml files
  MAINTAINERS: wifi: update Jeff Johnson e-mail address
  wifi: iwlwifi: mvm: fix the TXF mapping for BZ devices
  wifi: iwlwifi: mvm: ensure offloading TID queue exists
  wifi: nl80211: reject iftype change with mesh ID change
====================

Link: https://lore.kernel.org/r/20240227135751.C5EC6C43390@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27 19:19:16 -08:00
Zhengchao Shao d75fe63a07 ipv6: raw: remove useless input parameter in rawv6_err
The input parameter 'opt' in rawv6_err() is not used. Therefore, remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240224084121.2479603-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27 18:08:09 -08:00
Eric Dumazet f8cbf6bde4 netlink: use kvmalloc() in netlink_alloc_large_skb()
This is a followup of commit 234ec0b603 ("netlink: fix potential
sleeping issue in mqueue_flush_file"), because vfree_atomic()
overhead is unfortunate for medium sized allocations.

1) If the allocation is smaller than PAGE_SIZE, do not bother
   with vmalloc() at all. Some arches have 64KB PAGE_SIZE,
   while NLMSG_GOODSIZE is smaller than 8KB.

2) Use kvmalloc(), which might allocate one high order page
   instead of vmalloc if memory is not too fragmented.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20240224090630.605917-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27 08:31:28 -08:00
Matthieu Baerts (NGI0) 14d29ec530 mptcp: check the protocol in mptcp_sk() with DEBUG_NET
Fuzzers and static checkers might not detect when mptcp_sk() is used
with a non mptcp_sock structure.

This is similar to the parent commit, where it is easy to use mptcp_sk()
with a TCP sock, e.g. with a subflow sk.

So a new simple check is done when CONFIG_DEBUG_NET is enabled to tell
kernel devs when a non-MPTCP socket is being used as an MPTCP one.
'mptcp_sk()' macro is then defined differently: with an extra WARN to
complain when an unexpected socket is being used.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-4-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:42:12 -08:00
Matthieu Baerts (NGI0) dcc03f270d mptcp: check the protocol in tcp_sk() with DEBUG_NET
Fuzzers and static checkers might not detect when tcp_sk() is used with
a non tcp_sock structure.

This kind of mistake already happened a few times with MPTCP: when
wrongly using TCP-specific helpers with mptcp_sock pointers. On the
other hand, there are many 'tcp_xxx()' helpers that are taking a 'struct
sock' pointer as arguments, and some of them are only looking at fields
from 'struct sock', and nothing from 'struct tcp_sock'. It is then
tempting to use them with a 'struct mptcp_sock'.

So a new simple check is done when CONFIG_DEBUG_NET is enabled to tell
kernel devs when a non-TCP socket is being used as a TCP one. 'tcp_sk()'
macro is then re-defined to add a WARN when an unexpected socket is
being used.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-3-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:42:12 -08:00
Matthieu Baerts (NGI0) 28de50eeb7 mptcp: token kunit: set protocol
As it would be done when initiating an MPTCP sock.

This is not strictly needed for this test, but it will be when a later
patch will check if the right protocol is being used when calling
mptcp_sk().

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-2-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:42:12 -08:00
Paolo Abeni d6a9608af9 mptcp: fix possible deadlock in subflow diag
Syzbot and Eric reported a lockdep splat in the subflow diag:

   WARNING: possible circular locking dependency detected
   6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted

   syz-executor.2/24141 is trying to acquire lock:
   ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at:
   tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline]
   ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at:
   tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137

   but task is already holding lock:
   ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock
   include/linux/spinlock.h:351 [inline]
   ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at:
   inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #1 (&h->lhash2[i].lock){+.+.}-{2:2}:
   lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
   __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
   _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
   spin_lock include/linux/spinlock.h:351 [inline]
   __inet_hash+0x335/0xbe0 net/ipv4/inet_hashtables.c:743
   inet_csk_listen_start+0x23a/0x320 net/ipv4/inet_connection_sock.c:1261
   __inet_listen_sk+0x2a2/0x770 net/ipv4/af_inet.c:217
   inet_listen+0xa3/0x110 net/ipv4/af_inet.c:239
   rds_tcp_listen_init+0x3fd/0x5a0 net/rds/tcp_listen.c:316
   rds_tcp_init_net+0x141/0x320 net/rds/tcp.c:577
   ops_init+0x352/0x610 net/core/net_namespace.c:136
   __register_pernet_operations net/core/net_namespace.c:1214 [inline]
   register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1283
   register_pernet_device+0x33/0x80 net/core/net_namespace.c:1370
   rds_tcp_init+0x62/0xd0 net/rds/tcp.c:735
   do_one_initcall+0x238/0x830 init/main.c:1236
   do_initcall_level+0x157/0x210 init/main.c:1298
   do_initcalls+0x3f/0x80 init/main.c:1314
   kernel_init_freeable+0x42f/0x5d0 init/main.c:1551
   kernel_init+0x1d/0x2a0 init/main.c:1441
   ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242

   -> #0 (k-sk_lock-AF_INET6){+.+.}-{0:0}:
   check_prev_add kernel/locking/lockdep.c:3134 [inline]
   check_prevs_add kernel/locking/lockdep.c:3253 [inline]
   validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
   __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
   lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
   lock_sock_fast include/net/sock.h:1723 [inline]
   subflow_get_info+0x166/0xd20 net/mptcp/diag.c:28
   tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline]
   tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137
   inet_sk_diag_fill+0x10ed/0x1e00 net/ipv4/inet_diag.c:345
   inet_diag_dump_icsk+0x55b/0x1f80 net/ipv4/inet_diag.c:1061
   __inet_diag_dump+0x211/0x3a0 net/ipv4/inet_diag.c:1263
   inet_diag_dump_compat+0x1c1/0x2d0 net/ipv4/inet_diag.c:1371
   netlink_dump+0x59b/0xc80 net/netlink/af_netlink.c:2264
   __netlink_dump_start+0x5df/0x790 net/netlink/af_netlink.c:2370
   netlink_dump_start include/linux/netlink.h:338 [inline]
   inet_diag_rcv_msg_compat+0x209/0x4c0 net/ipv4/inet_diag.c:1405
   sock_diag_rcv_msg+0xe7/0x410
   netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
   sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
   netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
   netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
   netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
   sock_sendmsg_nosec net/socket.c:730 [inline]
   __sock_sendmsg+0x221/0x270 net/socket.c:745
   ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
   ___sys_sendmsg net/socket.c:2638 [inline]
   __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
   do_syscall_64+0xf9/0x240
   entry_SYSCALL_64_after_hwframe+0x6f/0x77

As noted by Eric we can break the lock dependency chain avoid
dumping any extended info for the mptcp subflow listener:
nothing actually useful is presented there.

Fixes: b8adb69a7d ("mptcp: fix lockless access in subflow ULP diag")
Cc: stable@vger.kernel.org
Reported-by: Eric Dumazet <edumazet@google.com>
Closes: https://lore.kernel.org/netdev/CANn89iJ=Oecw6OZDwmSYc9HJKQ_G32uN11L+oUcMu+TOD5Xiaw@mail.gmail.com/
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-9-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:56 -08:00
Davide Caratti 10048689de mptcp: fix double-free on socket dismantle
when MPTCP server accepts an incoming connection, it clones its listener
socket. However, the pointer to 'inet_opt' for the new socket has the same
value as the original one: as a consequence, on program exit it's possible
to observe the following splat:

  BUG: KASAN: double-free in inet_sock_destruct+0x54f/0x8b0
  Free of addr ffff888485950880 by task swapper/25/0

  CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Not tainted 6.8.0-rc1+ #609
  Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0  07/26/2013
  Call Trace:
   <IRQ>
   dump_stack_lvl+0x32/0x50
   print_report+0xca/0x620
   kasan_report_invalid_free+0x64/0x90
   __kasan_slab_free+0x1aa/0x1f0
   kfree+0xed/0x2e0
   inet_sock_destruct+0x54f/0x8b0
   __sk_destruct+0x48/0x5b0
   rcu_do_batch+0x34e/0xd90
   rcu_core+0x559/0xac0
   __do_softirq+0x183/0x5a4
   irq_exit_rcu+0x12d/0x170
   sysvec_apic_timer_interrupt+0x6b/0x80
   </IRQ>
   <TASK>
   asm_sysvec_apic_timer_interrupt+0x16/0x20
  RIP: 0010:cpuidle_enter_state+0x175/0x300
  Code: 30 00 0f 84 1f 01 00 00 83 e8 01 83 f8 ff 75 e5 48 83 c4 18 44 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc fb 45 85 ed <0f> 89 60 ff ff ff 48 c1 e5 06 48 c7 43 18 00 00 00 00 48 83 44 2b
  RSP: 0018:ffff888481cf7d90 EFLAGS: 00000202
  RAX: 0000000000000000 RBX: ffff88887facddc8 RCX: 0000000000000000
  RDX: 1ffff1110ff588b1 RSI: 0000000000000019 RDI: ffff88887fac4588
  RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000043080
  R10: 0009b02ea273363f R11: ffff88887fabf42b R12: ffffffff932592e0
  R13: 0000000000000004 R14: 0000000000000000 R15: 00000022c880ec80
   cpuidle_enter+0x4a/0xa0
   do_idle+0x310/0x410
   cpu_startup_entry+0x51/0x60
   start_secondary+0x211/0x270
   secondary_startup_64_no_verify+0x184/0x18b
   </TASK>

  Allocated by task 6853:
   kasan_save_stack+0x1c/0x40
   kasan_save_track+0x10/0x30
   __kasan_kmalloc+0xa6/0xb0
   __kmalloc+0x1eb/0x450
   cipso_v4_sock_setattr+0x96/0x360
   netlbl_sock_setattr+0x132/0x1f0
   selinux_netlbl_socket_post_create+0x6c/0x110
   selinux_socket_post_create+0x37b/0x7f0
   security_socket_post_create+0x63/0xb0
   __sock_create+0x305/0x450
   __sys_socket_create.part.23+0xbd/0x130
   __sys_socket+0x37/0xb0
   __x64_sys_socket+0x6f/0xb0
   do_syscall_64+0x83/0x160
   entry_SYSCALL_64_after_hwframe+0x6e/0x76

  Freed by task 6858:
   kasan_save_stack+0x1c/0x40
   kasan_save_track+0x10/0x30
   kasan_save_free_info+0x3b/0x60
   __kasan_slab_free+0x12c/0x1f0
   kfree+0xed/0x2e0
   inet_sock_destruct+0x54f/0x8b0
   __sk_destruct+0x48/0x5b0
   subflow_ulp_release+0x1f0/0x250
   tcp_cleanup_ulp+0x6e/0x110
   tcp_v4_destroy_sock+0x5a/0x3a0
   inet_csk_destroy_sock+0x135/0x390
   tcp_fin+0x416/0x5c0
   tcp_data_queue+0x1bc8/0x4310
   tcp_rcv_state_process+0x15a3/0x47b0
   tcp_v4_do_rcv+0x2c1/0x990
   tcp_v4_rcv+0x41fb/0x5ed0
   ip_protocol_deliver_rcu+0x6d/0x9f0
   ip_local_deliver_finish+0x278/0x360
   ip_local_deliver+0x182/0x2c0
   ip_rcv+0xb5/0x1c0
   __netif_receive_skb_one_core+0x16e/0x1b0
   process_backlog+0x1e3/0x650
   __napi_poll+0xa6/0x500
   net_rx_action+0x740/0xbb0
   __do_softirq+0x183/0x5a4

  The buggy address belongs to the object at ffff888485950880
   which belongs to the cache kmalloc-64 of size 64
  The buggy address is located 0 bytes inside of
   64-byte region [ffff888485950880, ffff8884859508c0)

  The buggy address belongs to the physical page:
  page:0000000056d1e95e refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888485950700 pfn:0x485950
  flags: 0x57ffffc0000800(slab|node=1|zone=2|lastcpupid=0x1fffff)
  page_type: 0xffffffff()
  raw: 0057ffffc0000800 ffff88810004c640 ffffea00121b8ac0 dead000000000006
  raw: ffff888485950700 0000000000200019 00000001ffffffff 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff888485950780: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
   ffff888485950800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
  >ffff888485950880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                     ^
   ffff888485950900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
   ffff888485950980: 00 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc

Something similar (a refcount underflow) happens with CALIPSO/IPv6. Fix
this by duplicating IP / IPv6 options after clone, so that
ip{,6}_sock_destruct() doesn't end up freeing the same memory area twice.

Fixes: cf7da0d66c ("mptcp: Create SUBFLOW socket for incoming connections")
Cc: stable@vger.kernel.org
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-8-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:56 -08:00
Paolo Abeni b111d8fbd2 mptcp: fix potential wake-up event loss
After the blamed commit below, the send buffer auto-tuning can
happen after that the mptcp_propagate_sndbuf() completes - via
the delegated action infrastructure.

We must check for write space even after such change or we risk
missing the wake-up event.

Fixes: 8005184fd1 ("mptcp: refactor sndbuf auto-tuning")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-6-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:56 -08:00
Paolo Abeni adf1bb78da mptcp: fix snd_wnd initialization for passive socket
Such value should be inherited from the first subflow, but
passive sockets always used 'rsk_rcv_wnd'.

Fixes: 6f8a612a33 ("mptcp: keep track of advertised windows right edge")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-5-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:56 -08:00
Paolo Abeni b9cd26f640 mptcp: push at DSS boundaries
when inserting not contiguous data in the subflow write queue,
the protocol creates a new skb and prevent the TCP stack from
merging it later with already queued skbs by setting the EOR marker.

Still no push flag is explicitly set at the end of previous GSO
packet, making the aggregation on the receiver side sub-optimal -
and packetdrill self-tests less predictable.

Explicitly mark the end of not contiguous DSS with the push flag.

Fixes: 6d0060f600 ("mptcp: Write MPTCP DSS headers to outgoing data packets")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-4-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:55 -08:00
Matthieu Baerts (NGI0) 5b49c41ac8 mptcp: avoid printing warning once on client side
After the 'Fixes' commit mentioned below, the client side might print
the following warning once when a subflow is fully established at the
reception of any valid additional ack:

  MPTCP: bogus mpc option on established client sk

That's a normal situation, and no warning should be printed for that. We
can then skip the check when the label is used.

Fixes: e4a0fa47e8 ("mptcp: corner case locking for rx path fields initialization")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-3-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:55 -08:00
Geliang Tang 535d620ea5 mptcp: map v4 address to v6 when destroying subflow
Address family of server side mismatches with that of client side, like
in "userspace pm add & remove address" test:

    userspace_pm_add_addr $ns1 10.0.2.1 10
    userspace_pm_rm_sf $ns1 "::ffff:10.0.2.1" $SUB_ESTABLISHED

That's because on the server side, the family is set to AF_INET6 and the
v4 address is mapped in a v6 one.

This patch fixes this issue. In mptcp_pm_nl_subflow_destroy_doit(), before
checking local address family with remote address family, map an IPv4
address to an IPv6 address if the pair is a v4-mapped address.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-1-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:55 -08:00
Eric Dumazet c3718936ec ipv6: anycast: complete RCU handling of struct ifacaddr6
struct ifacaddr6 are already freed after RCU grace period.

Add __rcu qualifier to aca_next pointer, and idev->ac_list

Add relevant rcu_assign_pointer() and dereference accessors.

ipv6_chk_acast_dev() no longer needs to acquire idev->lock.

/proc/net/anycast6 is now purely RCU protected, it no
longer acquires idev->lock.

Similarly in6_dump_addrs() can use RCU protection to iterate
through anycast addresses. It was relying on a mixture of RCU
and RTNL but next patches will get rid of RTNL there.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240223201054.220534-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:40:34 -08:00
Eric Dumazet 0d60d8df6f dpll: rely on rcu for netdev_dpll_pin()
This fixes a possible UAF in if_nlmsg_size(),
which can run without RTNL.

Add rcu protection to "struct dpll_pin"

Move netdev_dpll_pin() from netdevice.h to dpll.h to
decrease name pollution.

Note: This looks possible to no longer acquire RTNL in
netdev_dpll_pin_assign() later in net-next.

v2: do not force rcu_read_lock() in rtnl_dpll_pin_size() (Jiri Pirko)

Fixes: 5f18426928 ("netdev: expose DPLL pin handle for netdevice")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240223123208.3543319-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:39:34 -08:00
Eric Dumazet 10bfd453da ipv6: fix potential "struct net" leak in inet6_rtm_getaddr()
It seems that if userspace provides a correct IFA_TARGET_NETNSID value
but no IFA_ADDRESS and IFA_LOCAL attributes, inet6_rtm_getaddr()
returns -EINVAL with an elevated "struct net" refcount.

Fixes: 6ecf4c37eb ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:56:23 +00:00
Eric Dumazet 0ec4e48c3a rtnetlink: provide RCU protection to rtnl_fill_prop_list()
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.

dev->name_node items are already rcu protected.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:13 +00:00
Eric Dumazet 74808e72e0 rtnetlink: make rtnl_fill_link_ifmap() RCU ready
Use READ_ONCE() to read the following device fields:

	dev->mem_start
	dev->mem_end
	dev->base_addr
	dev->irq
	dev->dma
	dev->if_port

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:13 +00:00
Eric Dumazet 4ce5dc9316 inet: switch inet_dump_fib() to RCU protection
No longer hold RTNL while calling inet_dump_fib().

Also change return value for a completed dump:

Returning 0 instead of skb->len allows NLMSG_DONE
to be appended to the skb. User space does not have
to call us again to get a standalone NLMSG_DONE marker.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:13 +00:00
Eric Dumazet 22e36ea9f5 inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU
Add a new field into struct fib_dump_filter, to let callers
tell if they use RTNL locking or RCU.

This is used in the following patch, when inet_dump_fib()
no longer holds RTNL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet 69fdb7e411 ipv6: switch inet6_dump_ifinfo() to RCU protection
No longer hold RTNL while calling inet6_dump_ifinfo()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet 386520e0ec rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag
allows dump operations registered via rtnl_register()
or rtnl_register_module() to opt-out from RTNL protection.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet e39951d965 rtnetlink: change nlk->cb_mutex role
In commit af65bdfce9 ("[NETLINK]: Switch cb_lock spinlock
to mutex and allow to override it"), Patrick McHardy used
a common mutex to protect both nlk->cb and the dump() operations.

The override is used for rtnl dumps, registered with
rntl_register() and rntl_register_module().

We want to be able to opt-out some dump() operations
to not acquire RTNL, so we need to protect nlk->cb
with a per socket mutex.

This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex

The optional pointer to the mutex used to protect dump()
call is stored in nlk->dump_cb_mutex

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet b559027006 netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
__netlink_dump_start() releases nlk->cb_mutex right before
calling netlink_dump() which grabs it again.

This seems dangerous, even if KASAN did not bother yet.

Add a @lock_taken parameter to netlink_dump() to let it
grab the mutex if called from netlink_recvmsg() only.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet 6647b338fc netlink: fix netlink_diag_dump() return value
__netlink_diag_dump() returns 1 if the dump is not complete,
zero if no error occurred.

If err variable is zero, this means the dump is complete:
We should not return skb->len in this case, but 0.

This allows NLMSG_DONE to be appended to the skb.
User space does not have to call us again only to get NLMSG_DONE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet ac14ad9755 ipv6: use xarray iterator to implement inet6_dump_ifinfo()
Prepare inet6_dump_ifinfo() to run with RCU protection
instead of RTNL and use for_each_netdev_dump() interface.

Also properly return 0 at the end of a dump, avoiding
an extra recvmsg() system call and RTNL acquisition.

Note that RTNL-less dumps need core changes, coming later
in the series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet 8afc7a78d5 ipv6: prepare inet6_fill_ifinfo() for RCU protection
We want to use RCU protection instead of RTNL
for inet6_fill_ifinfo().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet 4ad2681364 ipv6: prepare inet6_fill_ifla6_attrs() for RCU
We want to no longer hold RTNL while calling inet6_fill_ifla6_attrs()
in the future. Add needed READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet e353ea9ce4 rtnetlink: prepare nla_put_iflink() to run under RCU
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.

This patch prepares dev_get_iflink() and nla_put_iflink()
to run either with RTNL or RCU held.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Florian Westphal 025f8ad20f net: mpls: error out if inner headers are not set
mpls_gso_segment() assumes skb_inner_network_header() returns
a valid result:

  mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
  if (unlikely(!mpls_hlen || mpls_hlen % MPLS_HLEN))
        goto out;
  if (unlikely(!pskb_may_pull(skb, mpls_hlen)))

With syzbot reproducer, skb_inner_network_header() yields 0,
skb_network_header() returns 108, so this will
"pskb_may_pull(skb, -108)))" which triggers a newly added
DEBUG_NET_WARN_ON_ONCE() check:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull_reason include/linux/skbuff.h:2723 [inline]
WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull include/linux/skbuff.h:2739 [inline]
WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 mpls_gso_segment+0x773/0xaa0 net/mpls/mpls_gso.c:34
[..]
 skb_mac_gso_segment+0x383/0x740 net/core/gso.c:53
 nsh_gso_segment+0x40a/0xad0 net/nsh/nsh.c:108
 skb_mac_gso_segment+0x383/0x740 net/core/gso.c:53
 __skb_gso_segment+0x324/0x4c0 net/core/gso.c:124
 skb_gso_segment include/net/gso.h:83 [inline]
 [..]
 sch_direct_xmit+0x11a/0x5f0 net/sched/sch_generic.c:327
 [..]
 packet_sendmsg+0x46a9/0x6130 net/packet/af_packet.c:3113
 [..]

First iteration of this patch made mpls_hlen signed and changed
test to error out to "mpls_hlen <= 0 || ..".

Eric Dumazet said:
 > I was thinking about adding a debug check in skb_inner_network_header()
 > if inner_network_header is zero (that would mean it is not 'set' yet),
 > but this would trigger even after your patch.

So add new skb_inner_network_header_was_set() helper and use that.

The syzbot reproducer injects data via packet socket. The skb that gets
allocated and passed down the stack has ->protocol set to NSH (0x894f)
and gso_type set to SKB_GSO_UDP | SKB_GSO_DODGY.

This gets passed to skb_mac_gso_segment(), which sees NSH as ptype to
find a callback for.  nsh_gso_segment() retrieves next type:

        proto = tun_p_to_eth_p(nsh_hdr(skb)->np);

... which is MPLS (TUN_P_MPLS_UC). It updates skb->protocol and then
calls mpls_gso_segment().  Inner offsets are all 0, so mpls_gso_segment()
ends up with a negative header size.

In case more callers rely on silent handling of such large may_pull values
we could also 'legalize' this behaviour, either replacing the debug check
with (len > INT_MAX) test or removing it and instead adding a comment
before existing

 if (unlikely(len > skb->len))
    return SKB_DROP_REASON_PKT_TOO_SMALL;

test in pskb_may_pull_reason(), saying that this check also implicitly
takes care of callers that miscompute header sizes.

Cc: Simon Horman <horms@kernel.org>
Fixes: 219eee9c0d ("net: skbuff: add overflow debug check to pull/push helpers")
Reported-by: syzbot+99d15fcdb0132a1e1a82@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/00000000000043b1310611e388aa@google.com/raw
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240222140321.14080-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-23 16:22:56 -08:00
Jann Horn d2efeb52c3 net: ethtool: avoid rebuilds on UTS_RELEASE change
Currently, when you switch between branches or something like that and
rebuild, net/ethtool/ioctl.c has to be built again because it depends
on UTS_RELEASE.

By instead referencing a string variable stored in another object file,
this can be avoided.

Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240220194244.2056384-1-jannh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-23 16:22:25 -08:00
Felix Fietkau 413dafc817 wifi: mac80211: only call drv_sta_rc_update for uploaded stations
When a station has not been uploaded yet, receiving SMPS or channel width
notification action frames can lead to rate_control_rate_update calling
drv_sta_rc_update with uninitialized driver private data.
Fix this by adding a missing check for sta->uploaded.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://msgid.link/20240221140535.16102-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-23 09:22:52 +01:00
Jeremy Kerr 3773d65ae5 net: mctp: take ownership of skb in mctp_local_output
Currently, mctp_local_output only takes ownership of skb on success, and
we may leak an skb if mctp_local_output fails in specific states; the
skb ownership isn't transferred until the actual output routing occurs.

Instead, make mctp_local_output free the skb on all error paths up to
the route action, so it always consumes the passed skb.

Fixes: 833ef3b91d ("mctp: Populate socket implementation")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220081053.1439104-1-jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 19:21:11 -08:00
Florian Westphal 5ae1e9922b net: ip_tunnel: prevent perpetual headroom growth
syzkaller triggered following kasan splat:
BUG: KASAN: use-after-free in __skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170
Read of size 1 at addr ffff88812fb4000e by task syz-executor183/5191
[..]
 kasan_report+0xda/0x110 mm/kasan/report.c:588
 __skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170
 skb_flow_dissect_flow_keys include/linux/skbuff.h:1514 [inline]
 ___skb_get_hash net/core/flow_dissector.c:1791 [inline]
 __skb_get_hash+0xc7/0x540 net/core/flow_dissector.c:1856
 skb_get_hash include/linux/skbuff.h:1556 [inline]
 ip_tunnel_xmit+0x1855/0x33c0 net/ipv4/ip_tunnel.c:748
 ipip_tunnel_xmit+0x3cc/0x4e0 net/ipv4/ipip.c:308
 __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
 netdev_start_xmit include/linux/netdevice.h:4954 [inline]
 xmit_one net/core/dev.c:3548 [inline]
 dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564
 __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4349
 dev_queue_xmit include/linux/netdevice.h:3134 [inline]
 neigh_connected_output+0x42c/0x5d0 net/core/neighbour.c:1592
 ...
 ip_finish_output2+0x833/0x2550 net/ipv4/ip_output.c:235
 ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323
 ..
 iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82
 ip_tunnel_xmit+0x1dbc/0x33c0 net/ipv4/ip_tunnel.c:831
 ipgre_xmit+0x4a1/0x980 net/ipv4/ip_gre.c:665
 __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
 netdev_start_xmit include/linux/netdevice.h:4954 [inline]
 xmit_one net/core/dev.c:3548 [inline]
 dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564
 ...

The splat occurs because skb->data points past skb->head allocated area.
This is because neigh layer does:
  __skb_pull(skb, skb_network_offset(skb));

... but skb_network_offset() returns a negative offset and __skb_pull()
arg is unsigned.  IOW, we skb->data gets "adjusted" by a huge value.

The negative value is returned because skb->head and skb->data distance is
more than 64k and skb->network_header (u16) has wrapped around.

The bug is in the ip_tunnel infrastructure, which can cause
dev->needed_headroom to increment ad infinitum.

The syzkaller reproducer consists of packets getting routed via a gre
tunnel, and route of gre encapsulated packets pointing at another (ipip)
tunnel.  The ipip encapsulation finds gre0 as next output device.

This results in the following pattern:

1). First packet is to be sent out via gre0.
Route lookup found an output device, ipip0.

2).
ip_tunnel_xmit for gre0 bumps gre0->needed_headroom based on the future
output device, rt.dev->needed_headroom (ipip0).

3).
ip output / start_xmit moves skb on to ipip0. which runs the same
code path again (xmit recursion).

4).
Routing step for the post-gre0-encap packet finds gre0 as output device
to use for ipip0 encapsulated packet.

tunl0->needed_headroom is then incremented based on the (already bumped)
gre0 device headroom.

This repeats for every future packet:

gre0->needed_headroom gets inflated because previous packets' ipip0 step
incremented rt->dev (gre0) headroom, and ipip0 incremented because gre0
needed_headroom was increased.

For each subsequent packet, gre/ipip0->needed_headroom grows until
post-expand-head reallocations result in a skb->head/data distance of
more than 64k.

Once that happens, skb->network_header (u16) wraps around when
pskb_expand_head tries to make sure that skb_network_offset() is unchanged
after the headroom expansion/reallocation.

After this skb_network_offset(skb) returns a different (and negative)
result post headroom expansion.

The next trip to neigh layer (or anything else that would __skb_pull the
network header) makes skb->data point to a memory location outside
skb->head area.

v2: Cap the needed_headroom update to an arbitarily chosen upperlimit to
prevent perpetual increase instead of dropping the headroom increment
completely.

Reported-and-tested-by: syzbot+bfde3bef047a81b8fde6@syzkaller.appspotmail.com
Closes: https://groups.google.com/g/syzkaller-bugs/c/fL9G6GtWskY/m/VKk_PR5FBAAJ
Fixes: 243aad830e ("ip_gre: include route header_len in max_headroom calculation")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220135606.4939-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 19:18:10 -08:00
Jakub Kicinski 4679f4f123 netfilter pr 2024-21-02
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmXV2OYNHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2AHCRD/9sHoOd4QCVVgcDr3SjpaVWikM0Zdkge65At/uY
 bFENWgcDsSfsH7kAQm+nwzseT+QtTk9OOv9wqWzdEYROD7sqjVK2Zv/CUs24odGj
 7Wj35OLYLgUIEMlHF/G9kOuWqW61URXwXcHvoFWkew1WweAVDqi648osLWUP9qkL
 IFJ5729/1upq9XJc+pMxIy2Oe2zhMc4XNHsy1OCOg4fUQtDM81jgoJz0137ohCIh
 PW4aaSno8ZeRuFe1RKfya5+suv3WgMui/fOBmpnnhjWVxHRJvYZ926wsy/jC7xRJ
 E7/TdmymbzijRBEHh+IxQYZkE55XXc0E1Lj1ic653AzUWJ3tQRfD+HWg+GYj/WCu
 sWy1e7eRJIjYVbeB5m6ao3g47Zq1XIRXo7E2Rvt3E2beM6t9aMIMuuajBHAOEV2O
 pCfG4zBlEYw1SuuuoqzcXTVLKDf6WZjx1xtUAJCTks8JFTjPEwPwOQhGCv1cc/BC
 qox7MejeDH/L+ZreeTYnWlQr1GGokNgrmpdDx0G8GBBRUDPoP8D4GTxvNEz44XOO
 SfL2yl5v82GBBmsFHzC2J8BGN8KC4JyzDGupU+bcdMWCs8tSvMK0KVeankRvpdBl
 x4VLmdoNo6zvtOYlPOxdphhsd6xA0dFiLMgSr9f5WsIgepaC+Umxp59IfCEH/bfl
 1Kcg9g==
 =GYgG
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-24-02-21' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter updates for net-next

1. Prefer KMEM_CACHE() macro to create kmem caches, from Kunwu Chan.

Patches 2 and 3 consolidate nf_log NULL checks and introduces
extra boundary checks on family and type to make it clear that no out
of bounds access will happen.  No in-tree user currently passes such
values, but thats not clear from looking at the function.
From Pablo Neira Ayuso.

Patch 4, also from Pablo, gets rid of unneeded conditional in
nft_osf init function.

Patch 5, from myself, fixes erroneous Kconfig dependencies that
came in an earlier net-next pull request. This should get rid
of the xtables related build failure reports.

Patches 6 to 10 are an update to nftables' concatenated-ranges
set type to speed up element insertions.  This series also
compacts a few data structures and cleans up a few oddities such
as reliance on ZERO_SIZE_PTR when asking to allocate a set with
no elements. From myself.

Patches 11 moves the nf_reinject function from the netfilter core
(vmlinux) into the nfnetlink_queue backend, the only location where
this is called from. Also from myself.

Patch 12, from Kees Cook, switches xtables' compat layer to use
unsafe_memcpy because xt_entry_target cannot easily get converted
to a real flexible array (its UAPI and used inside other structs).

* tag 'nf-next-24-02-21' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: x_tables: Use unsafe_memcpy() for 0-sized destination
  netfilter: move nf_reinject into nfnetlink_queue modules
  netfilter: nft_set_pipapo: use GFP_KERNEL for insertions
  netfilter: nft_set_pipapo: speed up bulk element insertions
  netfilter: nft_set_pipapo: shrink data structures
  netfilter: nft_set_pipapo: do not rely on ZERO_SIZE_PTR
  netfilter: nft_set_pipapo: constify lookup fn args where possible
  netfilter: xtables: fix up kconfig dependencies
  netfilter: nft_osf: simplify init path
  netfilter: nf_log: validate nf_logger_find_get()
  netfilter: nf_log: consolidate check for NULL logger in lookup function
  netfilter: expect: Simplify the allocation of slab caches in nf_conntrack_expect_init
====================

Link: https://lore.kernel.org/r/20240221112637.5396-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 19:06:20 -08:00
Breno Leitao 3e7a0dccf0 ipv6/sit: Do not allocate stats in the driver
With commit 34d21de99c ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Remove the allocation in the ipv6/sit driver and leverage the network
core allocation.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240221161732.3026127-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 18:57:53 -08:00
Ryosuke Yasuoka 661779e1fc netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter
syzbot reported the following uninit-value access issue [1]:

netlink_to_full_skb() creates a new `skb` and puts the `skb->data`
passed as a 1st arg of netlink_to_full_skb() onto new `skb`. The data
size is specified as `len` and passed to skb_put_data(). This `len`
is based on `skb->end` that is not data offset but buffer offset. The
`skb->end` contains data and tailroom. Since the tailroom is not
initialized when the new `skb` created, KMSAN detects uninitialized
memory area when copying the data.

This patch resolved this issue by correct the len from `skb->end` to
`skb->len`, which is the actual data offset.

BUG: KMSAN: kernel-infoleak-after-free in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
BUG: KMSAN: kernel-infoleak-after-free in copy_to_user_iter lib/iov_iter.c:24 [inline]
BUG: KMSAN: kernel-infoleak-after-free in iterate_ubuf include/linux/iov_iter.h:29 [inline]
BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance include/linux/iov_iter.h:271 [inline]
BUG: KMSAN: kernel-infoleak-after-free in _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186
 instrument_copy_to_user include/linux/instrumented.h:114 [inline]
 copy_to_user_iter lib/iov_iter.c:24 [inline]
 iterate_ubuf include/linux/iov_iter.h:29 [inline]
 iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
 iterate_and_advance include/linux/iov_iter.h:271 [inline]
 _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186
 copy_to_iter include/linux/uio.h:197 [inline]
 simple_copy_to_iter+0x68/0xa0 net/core/datagram.c:532
 __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:420
 skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546
 skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline]
 packet_recvmsg+0xd9c/0x2000 net/packet/af_packet.c:3482
 sock_recvmsg_nosec net/socket.c:1044 [inline]
 sock_recvmsg net/socket.c:1066 [inline]
 sock_read_iter+0x467/0x580 net/socket.c:1136
 call_read_iter include/linux/fs.h:2014 [inline]
 new_sync_read fs/read_write.c:389 [inline]
 vfs_read+0x8f6/0xe00 fs/read_write.c:470
 ksys_read+0x20f/0x4c0 fs/read_write.c:613
 __do_sys_read fs/read_write.c:623 [inline]
 __se_sys_read fs/read_write.c:621 [inline]
 __x64_sys_read+0x93/0xd0 fs/read_write.c:621
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Uninit was stored to memory at:
 skb_put_data include/linux/skbuff.h:2622 [inline]
 netlink_to_full_skb net/netlink/af_netlink.c:181 [inline]
 __netlink_deliver_tap_skb net/netlink/af_netlink.c:298 [inline]
 __netlink_deliver_tap+0x5be/0xc90 net/netlink/af_netlink.c:325
 netlink_deliver_tap net/netlink/af_netlink.c:338 [inline]
 netlink_deliver_tap_kernel net/netlink/af_netlink.c:347 [inline]
 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
 netlink_unicast+0x10f1/0x1250 net/netlink/af_netlink.c:1368
 netlink_sendmsg+0x1238/0x13d0 net/netlink/af_netlink.c:1910
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg net/socket.c:745 [inline]
 ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584
 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
 __sys_sendmsg net/socket.c:2667 [inline]
 __do_sys_sendmsg net/socket.c:2676 [inline]
 __se_sys_sendmsg net/socket.c:2674 [inline]
 __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Uninit was created at:
 free_pages_prepare mm/page_alloc.c:1087 [inline]
 free_unref_page_prepare+0xb0/0xa40 mm/page_alloc.c:2347
 free_unref_page_list+0xeb/0x1100 mm/page_alloc.c:2533
 release_pages+0x23d3/0x2410 mm/swap.c:1042
 free_pages_and_swap_cache+0xd9/0xf0 mm/swap_state.c:316
 tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
 tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
 tlb_flush_mmu+0x6f5/0x980 mm/mmu_gather.c:300
 tlb_finish_mmu+0x101/0x260 mm/mmu_gather.c:392
 exit_mmap+0x49e/0xd30 mm/mmap.c:3321
 __mmput+0x13f/0x530 kernel/fork.c:1349
 mmput+0x8a/0xa0 kernel/fork.c:1371
 exit_mm+0x1b8/0x360 kernel/exit.c:567
 do_exit+0xd57/0x4080 kernel/exit.c:858
 do_group_exit+0x2fd/0x390 kernel/exit.c:1021
 __do_sys_exit_group kernel/exit.c:1032 [inline]
 __se_sys_exit_group kernel/exit.c:1030 [inline]
 __x64_sys_exit_group+0x3c/0x50 kernel/exit.c:1030
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Bytes 3852-3903 of 3904 are uninitialized
Memory access of size 3904 starts at ffff88812ea1e000
Data copied to user address 0000000020003280

CPU: 1 PID: 5043 Comm: syz-executor297 Not tainted 6.7.0-rc5-syzkaller-00047-g5bd7ef53ffe5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023

Fixes: 1853c94964 ("netlink, mmap: transform mmap skb into full skb on taps")
Reported-and-tested-by: syzbot+34ad5fab48f7bf510349@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=34ad5fab48f7bf510349 [1]
Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240221074053.1794118-1-ryasuoka@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 18:56:09 -08:00
Alexander Gordeev 9eda38dc91 net/af_iucv: fix virtual vs physical address confusion
Fix virtual vs physical address confusion. This does not fix a bug
since virtual and physical address spaces are currently the same.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20240215080500.2616848-1-agordeev@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 18:28:13 -08:00
Jakub Kicinski fecc51559a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/udp.c
  f796feabb9 ("udp: add local "peek offset enabled" flag")
  56667da739 ("net: implement lockless setsockopt(SO_PEEK_OFF)")

Adjacent changes:

net/unix/garbage.c
  aa82ac51d6 ("af_unix: Drop oob_skb ref before purging queue in GC.")
  11498715f2 ("af_unix: Remove io_uring code for GC.")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 15:29:26 -08:00
Jakub Kicinski 0fb848d1a4 wireless-next patches for v6.9
The third "new features" pull request for v6.9. This is a quick
 followup to send commit 04edb5dc68 ("wifi: ath12k: Fix uninitialized
 use of ret in ath12k_mac_allocate()") to fix the ath12k clang warning
 introduced in the previous pull request.
 
 We also have support for QCA2066 in ath11k, several new features in
 ath12k and few other changes in drivers. In stack it's mostly cleanup
 and refactoring.
 
 Major changes:
 
 ath12k
 
 * firmware-2.bin support
 
 * support having multiple identical PCI devices (firmware needs to
   have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
 
 * QCN9274: support split-PHY devices
 
 * WCN7850: enable Power Save Mode in station mode
 
 * WCN7850: P2P support
 
 ath11k:
 
 * QCA6390 & WCN6855: support 2 concurrent station interfaces
 
 * QCA2066 support
 
 iwlwifi
 
 * mvm: support wider-bandwidth OFDMA
 
 * bump firmware API to 90 for BZ/SC devices
 
 brcmfmac
 
 * DMI nvram filename quirk for ACEPC W5 Pro
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmXXJpARHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZu2Twf/QZ5FVecvOu/qNQyUeaclXjNuFw0+cJpz
 luVzxG54wh484L1dRAmsztwHPA78rLMcExZi2Zb9PszVv4V9mD6rHoV0ws/o86Gr
 QTq+8To9sr9wJfooVRO1ifgfoiafxX2TYJ9yGR3XwkCDf5ROq9JLBOQWW8p0bO/M
 UyWoMvSf/WBAjOHUfCQzPCVPQhyld7JL/V7LGMZGmdy3cPkWPVXmRMyHL9f3+vdq
 O5/HxZBP4Dg3zEsUiOADmD/l+8wuf/Tebqt7uRJ4/sAHsmlEPzoZPsuNxy6FljYX
 5fuZVpzmjGzdGV+YHidpbZl/9Shq/Bc7Cf7eTQZ70P9cL1kbZ7usIA==
 =mcyY
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2024-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.9

The third "new features" pull request for v6.9. This is a quick
followup to send commit 04edb5dc68 ("wifi: ath12k: Fix uninitialized
use of ret in ath12k_mac_allocate()") to fix the ath12k clang warning
introduced in the previous pull request.

We also have support for QCA2066 in ath11k, several new features in
ath12k and few other changes in drivers. In stack it's mostly cleanup
and refactoring.

Major changes:

ath12k
 * firmware-2.bin support
 * support having multiple identical PCI devices (firmware needs to
   have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
 * QCN9274: support split-PHY devices
 * WCN7850: enable Power Save Mode in station mode
 * WCN7850: P2P support

ath11k:
 * QCA6390 & WCN6855: support 2 concurrent station interfaces
 * QCA2066 support

iwlwifi
 * mvm: support wider-bandwidth OFDMA
 * bump firmware API to 90 for BZ/SC devices

brcmfmac
 * DMI nvram filename quirk for ACEPC W5 Pro

* tag 'wireless-next-2024-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (75 commits)
  wifi: wilc1000: revert reset line logic flip
  wifi: brcmfmac: Add DMI nvram filename quirk for ACEPC W5 Pro
  wifi: rtlwifi: set initial values for unexpected cases of USB endpoint priority
  wifi: rtl8xxxu: check vif before using in rtl8xxxu_tx()
  wifi: rtlwifi: rtl8192cu: Fix TX aggregation
  wifi: wilc1000: remove AKM suite be32 conversion for external auth request
  wifi: nl80211: refactor parsing CSA offsets
  wifi: nl80211: force WLAN_AKM_SUITE_SAE in big endian in NL80211_CMD_EXTERNAL_AUTH
  wifi: iwlwifi: load b0 version of ucode for HR1/HR2
  wifi: iwlwifi: handle per-phy statistics from fw
  wifi: iwlwifi: iwl-fh.h: fix kernel-doc issues
  wifi: iwlwifi: api: fix kernel-doc reference
  wifi: iwlwifi: mvm: unlock mvm if there is no primary link
  wifi: iwlwifi: bump FW API to 90 for BZ/SC devices
  wifi: iwlwifi: mvm: support PHY context version 6
  wifi: iwlwifi: mvm: partially support PHY context version 6
  wifi: iwlwifi: mvm: support wider-bandwidth OFDMA
  wifi: cfg80211: use ML element parsing helpers
  wifi: mac80211: align ieee80211_mle_get_bss_param_ch_cnt()
  wifi: cfg80211: refactor RNR parsing
  ...
====================

Link: https://lore.kernel.org/r/20240222105205.CEC54C433F1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 15:11:19 -08:00
Jeremy Kerr d192eaf57f net: mctp: tests: Add a test for proper tag creation on local output
Ensure we have the correct key parameters on sending a message.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:55 +01:00
Jeremy Kerr 109a533114 net: mctp: tests: Test that outgoing skbs have flow data populated
When CONFIG_MCTP_FLOWS is enabled, outgoing skbs should have their
SKB_EXT_MCTP extension set for drivers to consume.

Add two tests for local-to-output routing that check for the flow
extensions: one for the simple single-packet case, and one for
fragmentation.

We now make MCTP_TEST select MCTP_FLOWS, so we always get coverage of
these flow tests. The tests are skippable if MCTP_FLOWS is (otherwise)
disabled, but that would need manual config tweaking.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:55 +01:00
Jeremy Kerr 1394c1dec1 net: mctp: copy skb ext data when fragmenting
If we're fragmenting on local output, the original packet may contain
ext data for the MCTP flows. We'll want this in the resulting fragment
skbs too.

So, do a skb_ext_copy() in the fragmentation path, and implement the
MCTP-specific parts of an ext copy operation.

Fixes: 67737c4572 ("mctp: Pass flow data & flow release events to drivers")
Reported-by: Jian Zhang <zhangjian.3032@bytedance.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:55 +01:00
Jeremy Kerr 9acdc089c0 net: mctp: tests: Add MCTP net isolation tests
Add a couple of tests that excersise the new net-specific sk_key and
bind lookups

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:55 +01:00
Jeremy Kerr 61b50531dc net: mctp: tests: Add netid argument to __mctp_route_test_init
We'll want to create net-specific test setups in an upcoming change, so
allow the caller to provide a non-default netid.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:55 +01:00
Jeremy Kerr c16d2380e8 net: mctp: provide a more specific tag allocation ioctl
Now that we have net-specific tags, extend the tag allocation ioctls
(SIOCMCTPALLOCTAG / SIOCMCTPDROPTAG) to allow a network parameter to be
passed to the tag allocation.

We also add a local_addr member to the ioc struct, to allow for a future
finer-grained tag allocation using local EIDs too. We don't add any
specific support for that now though, so require MCTP_ADDR_ANY or
MCTP_ADDR_NULL for those at present.

The old ioctls will still work, but allocate for the default MCTP net.
These are now marked as deprecated in the header.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:55 +01:00
Jeremy Kerr 43e6795574 net: mctp: separate key correlation across nets
Currently, we lookup sk_keys from the entire struct net_namespace, which
may contain multiple MCTP net IDs. In those cases we want to distinguish
between endpoints with the same EID but different net ID.

Add the net ID data to the struct mctp_sk_key, populate on add and
filter on this during route lookup.

For the ioctl interface, we use a default net of
MCTP_INITIAL_DEFAULT_NET (ie., what will be in use for single-net
configurations), but we'll extend the ioctl interface to provide
net-specific tag allocation in an upcoming change.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:55 +01:00
Jeremy Kerr a1f4cf5791 net: mctp: tests: create test skbs with the correct net and device
In our test skb creation functions, we're not setting up the net and
device data. This doesn't matter at the moment, but we will want to add
support for distinct net IDs in future.

Set the ->net identifier on the test MCTP device, and ensure that test
skbs are set up with the correct device-related data on creation. Create
a helper for setting skb->dev and mctp_skb_cb->net.

We have a few cases where we're calling __mctp_cb() to initialise the cb
(which we need for the above) separately, so integrate this into the skb
creation helpers.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:55 +01:00
Jeremy Kerr fc944ecc4f net: mctp: make key lookups match the ANY address on either local or peer
We may have an ANY address in either the local or peer address of a
sk_key, and may want to match on an incoming daddr or saddr being ANY.

Do this by altering the conflicting-tag lookup to also accept ANY as
the local/peer address.

We don't want mctp_address_matches to match on the requested EID being
ANY, as that is a specific lookup case on packet input.

Reported-by: Eric Chuang <echuang@google.com>
Reported-by: Anthony <anthonyhkf@google.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:54 +01:00
Jeremy Kerr aee6479a45 net: mctp: Add some detail on the key allocation implementation
We could do with a little more comment on where MCTP_ADDR_ANY will match
in the key allocations.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:54 +01:00
Jeremy Kerr ee076b73e5 net: mctp: avoid confusion over local/peer dest/source addresses
We have a double-swap of local and peer addresses in
mctp_alloc_local_tag; the arguments in both call sites are swapped, but
there is also a swap in the implementation of alloc_local_tag. This is
opaque because we're using source/dest address references, which don't
match the local/peer semantics.

Avoid this confusion by naming the arguments as 'local' and 'peer', and
remove the double swap. The calling order now matches mctp_key_alloc.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:54 +01:00
Tom Parkin 359e54a93a l2tp: pass correct message length to ip6_append_data
l2tp_ip6_sendmsg needs to avoid accounting for the transport header
twice when splicing more data into an already partially-occupied skbuff.

To manage this, we check whether the skbuff contains data using
skb_queue_empty when deciding how much data to append using
ip6_append_data.

However, the code which performed the calculation was incorrect:

     ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0;

...due to C operator precedence, this ends up setting ulen to
transhdrlen for messages with a non-zero length, which results in
corrupted packets on the wire.

Add parentheses to correct the calculation in line with the original
intent.

Fixes: 9d4c75800f ("ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()")
Cc: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220122156.43131-1-tparkin@katalix.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 10:42:17 +01:00
Paolo Abeni 9ff2794306 netfilter pull request 24-02-22
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmXWjUEACgkQ1V2XiooU
 IOQEqA//c15K4sL5v3ROUgDKYo7d5W7mnb3c9T2b2I4tZIyAj+f1+6DhGz2PB/5L
 BjdAXbz2FrrkSt/x4fcu0CkcXC2d5tVcVhvt+CTpaph70xOXBL0XtN+x3NfXlZVg
 r9Q/6tV3pBE6u6LdqogQsQtehhqYMgzPVfKuUVYbvM4efqV/vvKiBbHl5DedtKk0
 GKiwGEKnXbBUxpJueSUAX/+C64Ldlhw4MVswkJfjA8r56FJEsxPet9tlAphqd+6P
 qg1bECQf3NQw8DVBtMc9U1Izb8HhiGEskG72e450Uo1X7SL6EACDFCMVTAeEz+fu
 sDNPpS/V7PEP3tImJm0Rj6N6iYGL19tWfCVMdesP+KF5yokNbQF54Xgz4ETkVyrt
 EZkR5JL6pRLBQ7FJ/2TD0IIFEn09KayMXLI0Dlugl90lOsn1T6Dnmmh8563nZ2eT
 6zio/4NqYRzCXSieCs3zHRZCH0l2tttkkKi0MhVJwGBJd8Wl0qeXHK/UxlE5Pfkn
 qhD2ryuCHbIad2JxS8mb1pIzMhw8sy3LsxQQ91CShQ2ujTLY35dhcWJPDm8u77md
 VE4lTUmj8uvExLjG/xf+5bzvTutYUGdacRmYwzyFTl/ix3tYvg75GoI4l2iGd4xl
 6COxPtLaFDkxo5GxODLrmjxe11E8nT2BKI7rPsOjee8ueAzJZdM=
 =iPIR
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) If user requests to wake up a table and hook fails, restore the
   dormant flag from the error path, from Florian Westphal.

2) Reset dst after transferring it to the flow object, otherwise dst
   gets released twice from the error path.

3) Release dst in case the flowtable selects a direct xmit path, eg.
   transmission to bridge port. Otherwise, dst is memleaked.

4) Register basechain and flowtable hooks at the end of the command.
   Error path releases these datastructure without waiting for the
   rcu grace period.

5) Use kzalloc() to initialize struct nft_hook to fix a KMSAN report
   on access to hook type, also from Florian Westphal.

netfilter pull request 24-02-22

* tag 'nf-24-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: use kzalloc for hook allocation
  netfilter: nf_tables: register hooks last when adding new chain/flowtable
  netfilter: nft_flow_offload: release dst in case direct xmit path is used
  netfilter: nft_flow_offload: reset dst in route object after setting up flow
  netfilter: nf_tables: set dormant flag on hook register failure
====================

Link: https://lore.kernel.org/r/20240222000843.146665-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 10:20:50 +01:00
Paolo Abeni fdcd4467ba bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZdaBCwAKCRDbK58LschI
 g3EhAP0d+S18mNabiEGz8efnE2yz3XcFchJgjiRS8WjOv75GvQEA6/sWncFjbc8k
 EqxPHmeJa19rWhQlFrmlyNQfLYGe4gY=
 =VkOs
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-02-22

The following pull-request contains BPF updates for your *net* tree.

We've added 11 non-merge commits during the last 24 day(s) which contain
a total of 15 files changed, 217 insertions(+), 17 deletions(-).

The main changes are:

1) Fix a syzkaller-triggered oops when attempting to read the vsyscall
   page through bpf_probe_read_kernel and friends, from Hou Tao.

2) Fix a kernel panic due to uninitialized iter position pointer in
   bpf_iter_task, from Yafang Shao.

3) Fix a race between bpf_timer_cancel_and_free and bpf_timer_cancel,
   from Martin KaFai Lau.

4) Fix a xsk warning in skb_add_rx_frag() (under CONFIG_DEBUG_NET)
   due to incorrect truesize accounting, from Sebastian Andrzej Siewior.

5) Fix a NULL pointer dereference in sk_psock_verdict_data_ready,
   from Shigeru Yoshida.

6) Fix a resolve_btfids warning when bpf_cpumask symbol cannot be
   resolved, from Hari Bathini.

bpf-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready()
  selftests/bpf: Add negtive test cases for task iter
  bpf: Fix an issue due to uninitialized bpf_iter_task
  selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel
  bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel
  selftest/bpf: Test the read of vsyscall page under x86-64
  x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
  x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
  bpf, scripts: Correct GPL license name
  xsk: Add truesize to skb_add_rx_frag().
  bpf: Fix warning for bpf_cpumask in verifier
====================

Link: https://lore.kernel.org/r/20240221231826.1404-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 10:04:47 +01:00
Justin Iurman f198d933c2 Fix write to cloned skb in ipv6_hop_ioam()
ioam6_fill_trace_data() writes inside the skb payload without ensuring
it's writeable (e.g., not cloned). This function is called both from the
input and output path. The output path (ioam6_iptunnel) already does the
check. This commit provides a fix for the input path, inside
ipv6_hop_ioam(). It also updates ip6_parse_tlv() to refresh the network
header pointer ("nh") when returning from ipv6_hop_ioam().

Fixes: 9ee11f0fff ("ipv6: ioam: Data plane support for Pre-allocated Trace")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 09:28:03 +01:00
Rémi Denis-Courmont 7d2a894d7f phonet/pep: fix racy skb_queue_empty() use
The receive queues are protected by their respective spin-lock, not
the socket lock. This could lead to skb_peek() unexpectedly
returning NULL or a pointer to an already dequeued socket buffer.

Fixes: 9641458d3e ("Phonet: Pipe End Point for Phonet Pipes protocol")
Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com>
Link: https://lore.kernel.org/r/20240218081214.4806-2-remi@remlab.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 09:05:50 +01:00
Rémi Denis-Courmont 3b2d9bc4d4 phonet: take correct lock to peek at the RX queue
The receive queue is protected by its embedded spin-lock, not the
socket lock, so we need the former lock here (and only that one).

Fixes: 107d0d9b8d ("Phonet: Phonet datagram transport protocol")
Reported-by: Luosili <rootlab@huawei.com>
Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240218081214.4806-1-remi@remlab.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 09:05:50 +01:00
Jianbo Liu 1fde0ca3a0 net/sched: flower: Add lock protection when remove filter handle
As IDR can't protect itself from the concurrent modification, place
idr_remove() under the protection of tp->lock.

Fixes: 08a0063df3 ("net/sched: flower: Move filter handle initialization earlier")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20240220085928.9161-1-jianbol@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21 17:12:03 -08:00
Jiri Pirko 61c43780e9 devlink: fix port dump cmd type
Unlike other commands, due to a c&p error, port dump fills-up cmd with
wrong value, different from port-get request cmd, port-get doit reply
and port notification.

Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW.

Skimmed through devlink userspace implementations, none of them cares
about this cmd value. Only ynl, for which, this is actually a fix, as it
expects doit and dumpit ops rsp_value to be the same.

Omit the fixes tag, even thought this is fix, better to target this for
next release.

Fixes: bfcd3a4661 ("Introduce devlink infrastructure")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240220075245.75416-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21 17:11:04 -08:00
Paolo Abeni f796feabb9 udp: add local "peek offset enabled" flag
We want to re-organize the struct sock layout. The sk_peek_off
field location is problematic, as most protocols want it in the
RX read area, while UDP wants it on a cacheline different from
sk_receive_queue.

Create a local (inside udp_sock) copy of the 'peek offset is enabled'
flag and place it inside the same cacheline of reader_queue.

Check such flag before reading sk_peek_off. This will save potential
false sharing and cache misses in the fast-path.

Tested under UDP flood with small packets. The struct sock layout
update causes a 4% performance drop, and this patch restores completely
the original tput.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/67ab679c15fbf49fa05b3ffe05d91c47ab84f147.1708426665.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21 17:05:01 -08:00
Jeremy Kerr 9990889be1 net: mctp: put sock on tag allocation failure
We may hold an extra reference on a socket if a tag allocation fails: we
optimistically allocate the sk_key, and take a ref there, but do not
drop if we end up not using the allocated key.

Ensure we're dropping the sock on this failure by doing a proper unref
rather than directly kfree()ing.

Fixes: de8a6b15d9 ("net: mctp: add an explicit reference from a mctp_sk_key to sock")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/ce9b61e44d1cdae7797be0c5e3141baf582d23a0.1707983487.git.jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21 17:01:54 -08:00
Florian Westphal 195e5f88c2 netfilter: nf_tables: use kzalloc for hook allocation
KMSAN reports unitialized variable when registering the hook,
   reg->hook_ops_type == NF_HOOK_OP_BPF)
        ~~~~~~~~~~~ undefined

This is a small structure, just use kzalloc to make sure this
won't happen again when new fields get added to nf_hook_ops.

Fixes: 7b4b2fa375 ("netfilter: annotate nf_tables base hook ops")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-22 00:15:58 +01:00
Pablo Neira Ayuso d472e9853d netfilter: nf_tables: register hooks last when adding new chain/flowtable
Register hooks last when adding chain/flowtable to ensure that packets do
not walk over datastructure that is being released in the error path
without waiting for the rcu grace period.

Fixes: 91c7b38dc9 ("netfilter: nf_tables: use new transaction infrastructure to handle chain")
Fixes: 3b49e2e94e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-22 00:14:54 +01:00
Pablo Neira Ayuso 8762785f45 netfilter: nft_flow_offload: release dst in case direct xmit path is used
Direct xmit does not use it since it calls dev_queue_xmit() to send
packets, hence it calls dst_release().

kmemleak reports:

unreferenced object 0xffff88814f440900 (size 184):
  comm "softirq", pid 0, jiffies 4294951896
  hex dump (first 32 bytes):
    00 60 5b 04 81 88 ff ff 00 e6 e8 82 ff ff ff ff  .`[.............
    21 0b 50 82 ff ff ff ff 00 00 00 00 00 00 00 00  !.P.............
  backtrace (crc cb2bf5d6):
    [<000000003ee17107>] kmem_cache_alloc+0x286/0x340
    [<0000000021a5de2c>] dst_alloc+0x43/0xb0
    [<00000000f0671159>] rt_dst_alloc+0x2e/0x190
    [<00000000fe5092c9>] __mkroute_output+0x244/0x980
    [<000000005fb96fb0>] ip_route_output_flow+0xc0/0x160
    [<0000000045367433>] nf_ip_route+0xf/0x30
    [<0000000085da1d8e>] nf_route+0x2d/0x60
    [<00000000d1ecd1cb>] nft_flow_route+0x171/0x6a0 [nft_flow_offload]
    [<00000000d9b2fb60>] nft_flow_offload_eval+0x4e8/0x700 [nft_flow_offload]
    [<000000009f447dbb>] expr_call_ops_eval+0x53/0x330 [nf_tables]
    [<00000000072e1be6>] nft_do_chain+0x17c/0x840 [nf_tables]
    [<00000000d0551029>] nft_do_chain_inet+0xa1/0x210 [nf_tables]
    [<0000000097c9d5c6>] nf_hook_slow+0x5b/0x160
    [<0000000005eccab1>] ip_forward+0x8b6/0x9b0
    [<00000000553a269b>] ip_rcv+0x221/0x230
    [<00000000412872e5>] __netif_receive_skb_one_core+0xfe/0x110

Fixes: fa502c8656 ("netfilter: flowtable: simplify route logic")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-22 00:14:54 +01:00
Pablo Neira Ayuso 9e0f043038 netfilter: nft_flow_offload: reset dst in route object after setting up flow
dst is transferred to the flow object, route object does not own it
anymore.  Reset dst in route object, otherwise if flow_offload_add()
fails, error path releases dst twice, leading to a refcount underflow.

Fixes: a3c90f7a23 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-22 00:14:54 +01:00
Florian Westphal bccebf6470 netfilter: nf_tables: set dormant flag on hook register failure
We need to set the dormant flag again if we fail to register
the hooks.

During memory pressure hook registration can fail and we end up
with a table marked as active but no registered hooks.

On table/base chain deletion, nf_tables will attempt to unregister
the hook again which yields a warn splat from the nftables core.

Reported-and-tested-by: syzbot+de4025c006ec68ac56fc@syzkaller.appspotmail.com
Fixes: 179d9ba555 ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-22 00:14:54 +01:00
Sabrina Dubroca ec823bf3a4 tls: don't skip over different type records from the rx_list
If we queue 3 records:
 - record 1, type DATA
 - record 2, some other type
 - record 3, type DATA
and do a recv(PEEK), the rx_list will contain the first two records.

The next large recv will walk through the rx_list and copy data from
record 1, then stop because record 2 is a different type. Since we
haven't filled up our buffer, we will process the next available
record. It's also DATA, so we can merge it with the current read.

We shouldn't do that, since there was a record in between that we
ignored.

Add a flag to let process_rx_list inform tls_sw_recvmsg that it had
more data available.

Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/f00c0c0afa080c60f016df1471158c1caf983c34.1708007371.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21 14:25:51 -08:00
Sabrina Dubroca fdfbaec592 tls: stop recv() if initial process_rx_list gave us non-DATA
If we have a non-DATA record on the rx_list and another record of the
same type still on the queue, we will end up merging them:
 - process_rx_list copies the non-DATA record
 - we start the loop and process the first available record since it's
   of the same type
 - we break out of the loop since the record was not DATA

Just check the record type and jump to the end in case process_rx_list
did some work.

Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/bd31449e43bd4b6ff546f5c51cf958c31c511deb.1708007371.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21 14:25:51 -08:00
Sabrina Dubroca 10f41d0710 tls: break out of main loop when PEEK gets a non-data record
PEEK needs to leave decrypted records on the rx_list so that we can
receive them later on, so it jumps back into the async code that
queues the skb. Unfortunately that makes us skip the
TLS_RECORD_TYPE_DATA check at the bottom of the main loop, so if two
records of the same (non-DATA) type are queued, we end up merging
them.

Add the same record type check, and make it unlikely to not penalize
the async fastpath. Async decrypt only applies to data record, so this
check is only needed for PEEK.

process_rx_list also has similar issues.

Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/3df2eef4fdae720c55e69472b5bea668772b45a2.1708007371.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21 14:25:51 -08:00
Shigeru Yoshida 4cd12c6065 bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready()
syzbot reported the following NULL pointer dereference issue [1]:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  [...]
  RIP: 0010:0x0
  [...]
  Call Trace:
   <TASK>
   sk_psock_verdict_data_ready+0x232/0x340 net/core/skmsg.c:1230
   unix_stream_sendmsg+0x9b4/0x1230 net/unix/af_unix.c:2293
   sock_sendmsg_nosec net/socket.c:730 [inline]
   __sock_sendmsg+0x221/0x270 net/socket.c:745
   ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
   ___sys_sendmsg net/socket.c:2638 [inline]
   __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
   do_syscall_64+0xf9/0x240
   entry_SYSCALL_64_after_hwframe+0x6f/0x77

If sk_psock_verdict_data_ready() and sk_psock_stop_verdict() are called
concurrently, psock->saved_data_ready can be NULL, causing the above issue.

This patch fixes this issue by calling the appropriate data ready function
using the sk_psock_data_ready() helper and protecting it from concurrency
with sk->sk_callback_lock.

Fixes: 6df7f764cd ("bpf, sockmap: Wake up polling after data copy")
Reported-by: syzbot+fd7b34375c1c8ce29c93@syzkaller.appspotmail.com
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+fd7b34375c1c8ce29c93@syzkaller.appspotmail.com
Acked-by: John Fastabend <john.fastabend@gmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=fd7b34375c1c8ce29c93 [1]
Link: https://lore.kernel.org/bpf/20240218150933.6004-1-syoshida@redhat.com
2024-02-21 17:15:23 +01:00
Johannes Berg 81830c8f80 wifi: nl80211: refactor parsing CSA offsets
The CSA offset parsing happens the same way for all of
beacon template offsets, probe response template offsets
and TX offsets (for using during probe response TX from
userspace directly).

Refactor the parsing here. There's an additional check
this introduces, which is that the number of counters in
TX offsets doesn't exceed the driver capability, but as
only two counters are used at most for anything, this is
hopefully OK.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:05 +01:00
Alexis Lothoré 4f4d8be6dc wifi: nl80211: force WLAN_AKM_SUITE_SAE in big endian in NL80211_CMD_EXTERNAL_AUTH
User-space supplicant (observed at least on wpa_supplicant) historically
parses the NL80211_ATTR_AKM_SUITES from the NL80211_CMD_EXTERNAL_AUTH
message as big endian _only_ when its value is WLAN_AKM_SUITE_SAE, while
processing anything else in host endian. This behavior makes any driver
relying on SAE external auth to switch AKM suite to big endian if it is
WLAN_AKM_SUITE_SAE. A fix bringing compatibility with both endianness
has been brought into wpa_supplicant, however we must keep compatibility
with older versions, while trying to reduce the occurences of this manual
conversion in wireless drivers.

Add the be32 conversion specifically on WLAN_AKM_SUITE_SAE in nl80211 layer
to keep compatibility with older wpa_supplicant versions.

Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Link: https://msgid.link/20240215-nl80211_fix_akm_suites_endianness-v1-1-57e902632f9d@bootlin.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:05 +01:00
Johannes Berg 894dd84e49 wifi: cfg80211: use ML element parsing helpers
Use the existing ML element parsing helpers and add a new
one for this (ieee80211_mle_get_mld_id).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240216135047.4da47b1f035b.I437a5570ac456449facb0b147851ef24a1e473c2@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:04 +01:00
Johannes Berg 6bd14aee0b wifi: mac80211: align ieee80211_mle_get_bss_param_ch_cnt()
Align the prototype of ieee80211_mle_get_bss_param_ch_cnt()
to also take a u8 * like the other functions, and make it
return -1 when the field isn't found, so that mac80211 can
check that instead of explicitly open-coding the check.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240216135047.583309181bc3.Ia61cb0b4fc034d5ac8fcfaf6f6fb2e115fadafe7@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:04 +01:00
Johannes Berg 6b756efcd9 wifi: cfg80211: refactor RNR parsing
We'll need more parsing of the reduced neighbor report element,
and we already have two places doing pretty much the same.
Combine by refactoring the parsing into a separate function
with a callback for each item found.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240216135047.cfff14b692fc.Ibe25be88a769eab29ebb17b9d19af666df6a2227@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:04 +01:00
Johannes Berg 7e899c1d6f wifi: cfg80211: clean up cfg80211_inform_bss_frame_data()
Make cfg80211_inform_bss_frame_data() call the existing
cfg80211_inform_bss_data() after parsing the frame in the
appropriate way, so we have less code duplication. This
required introducing a new CFG80211_BSS_FTYPE_S1G_BEACON,
but that can be used by other drivers as well.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240216135047.874aed1eff5f.Ib7d88d126eec50c64763251a78cb432bb5df14df@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:04 +01:00
Johannes Berg 317bad4c3b wifi: cfg80211: remove cfg80211_inform_single_bss_frame_data()
This function pretty much does what cfg80211_inform_single_bss_data()
already does, except on a frame data. But we can call the other one,
after populating the inform_data more completely, so we don't need to
do everything twice.

This also uncovered a few bugs:
 * the 6 GHz power type checks were only done in this function, move
   (and rename from 'uhb') those;
 * the chains/chain_signal information wasn't used in the latter,
   add that

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240216135047.f3f864f94c78.I2192adb32ab10713e71f395a9d203386264f6ed5@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:04 +01:00
Benjamin Berg f8599d6340 wifi: cfg80211: set correct param change count in ML element
The ML element generation code to create a BSS entry from a per-STA
profile was not overwriting the BSS parameter change count. This meant
that the incorrect parameter change count would be reported within the
multi-link element.

Fix this by returning the BSS parameter change count from the function
and placing it into the ML element. The returned tbtt info was never
used, so just drop that to simplify the code.

Fixes: 5f478adf1f ("wifi: cfg80211: generate an ML element for per-STA profiles")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240216135047.f2a507634692.I06b122c7a319a38b4e970f5e0bd3d3ef9cac4cbe@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:03 +01:00
Andy Shevchenko f79ab5d2bc wifi: cfg80211: Add KHZ_PER_GHZ to units.h and reuse
The KHZ_PER_GHZ might be used by others (with the name aligned
with similar constants). Define it in units.h and convert
wireless to use it.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://msgid.link/20240215154136.630029-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:03 +01:00
Colin Ian King ba4b1fa312 wifi: mac80211: clean up assignments to pointer cache.
The assignment to pointer cache in function mesh_fast_tx_gc can
be made at the declaration time rather than a later assignment.
There are also 3 functions where pointer cache is being initialized
at declaration time and later re-assigned again with the same
value, these are redundant and can be removed.

Cleans up code and three clang scan build warnings:
warning: Value stored to 'cache' during its initialization is never
read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://msgid.link/20240215232151.2075483-1-colin.i.king@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:03 +01:00
Miri Korenblit d73fbaf24c wifi: mac80211: make associated BSS pointer visible to the driver
Some drivers need the data in it, so move it to the link conf,
which is exposed to the driver.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240206164849.6fe9782b87b4.Ifbffef638f07ca7f5c2b27f40d2cf2942d21de0b@changeid
[remove bss pointer from internal struct, update docs]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:03 +01:00
Aditya Kumar Singh 6030b3a469 wifi: mac80211: check beacon countdown is complete on per link basis
Currently, function to check if beacon countdown is complete uses deflink
to fetch the beacon and check the counter. However, with MLO, there is
a need to check the counter for the beacon in a particular link.

Add support to use link_id in order to fetch the beacon from a particular
link data.

Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://msgid.link/20240216144621.514385-2-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-21 15:19:03 +01:00