linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-25 11:55:37 +00:00

Author	SHA1	Message	Date
David Howells	b0f571ecd7	rxrpc: Fix locking in rxrpc's sendmsg Fix three bugs in the rxrpc's sendmsg implementation: (1) rxrpc_new_client_call() should release the socket lock when returning an error from rxrpc_get_call_slot(). (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex held in the event that we're interrupted by a signal whilst waiting for tx space on the socket or relocking the call mutex afterwards. Fix this by: (a) moving the unlock/lock of the call mutex up to rxrpc_send_data() such that the lock is not held around all of rxrpc_wait_for_tx_window*() and (b) indicating to higher callers whether we're return with the lock dropped. Note that this means recvmsg() will not block on this call whilst we're waiting. (3) After dropping and regaining the call mutex, rxrpc_send_data() needs to go and recheck the state of the tx_pending buffer and the tx_total_len check in case we raced with another sendmsg() on the same call. Thinking on this some more, it might make sense to have different locks for sendmsg() and recvmsg(). There's probably no need to make recvmsg() wait for sendmsg(). It does mean that recvmsg() can return MSG_EOR indicating that a call is dead before a sendmsg() to that call returns - but that can currently happen anyway. Without fix (2), something like the following can be induced: WARNING: bad unlock balance detected! 5.16.0-rc6-syzkaller #0 Not tainted ------------------------------------- syz-executor011/3597 is trying to release lock (&call->user_mutex) at: [<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748 but there are no more locks to release! other info that might help us debug this: no locks held by syz-executor011/3597. ... Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline] __lock_release kernel/locking/lockdep.c:5306 [inline] lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae [Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this] Fixes: `bc5e3a546d` ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com cc: Hawkins Jiawei <yin31149@gmail.com> cc: Khalid Masum <khalid.masum.92@gmail.com> cc: Dan Carpenter <dan.carpenter@oracle.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-25 12:39:40 -07:00
Lorenzo Bianconi	0cf731f9eb	net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2 Properly report hw rx hash for mt7986 chipset accroding to the new dma descriptor layout. Fixes: `197c9e9b17` ("net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/091394ea4e705fbb35f828011d98d0ba33808f69.1661257293.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-25 13:17:01 +02:00
Jakub Kicinski	24c7a64ea4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix crash with malformed ebtables blob which do not provide all entry points, from Florian Westphal. 2) Fix possible TCP connection clogging up with default 5-days timeout in conntrack, from Florian. 3) Fix crash in nf_tables tproxy with unsupported chains, also from Florian. 4) Do not allow to update implicit chains. 5) Make table handle allocation per-netns to fix data race. 6) Do not truncated payload length and offset, and checksum offset. Instead report EINVAl. 7) Enable chain stats update via static key iff no error occurs. 8) Restrict osf expression to ip, ip6 and inet families. 9) Restrict tunnel expression to netdev family. 10) Fix crash when trying to bind again an already bound chain. 11) Flowtable garbage collector might leave behind pending work to delete entries. This patch comes with a previous preparation patch as dependency. 12) Allow net.netfilter.nf_conntrack_frag6_high_thresh to be lowered, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases netfilter: flowtable: fix stuck flows on cleanup due to pending work netfilter: flowtable: add function to invoke garbage collection immediately netfilter: nf_tables: disallow binding to already bound chain netfilter: nft_tunnel: restrict it to netdev family netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families netfilter: nf_tables: do not leave chain stats enabled on error netfilter: nft_payload: do not truncate csum_offset and csum_type netfilter: nft_payload: report ERANGE for too long offset and length netfilter: nf_tables: make table handle allocation per-netns friendly netfilter: nf_tables: disallow updates of implicit chain netfilter: nft_tproxy: restrict to prerouting hook netfilter: conntrack: work around exceeded receive window netfilter: ebtables: reject blobs that don't provide all entry points ==================== Link: https://lore.kernel.org/r/20220824220330.64283-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-24 19:18:10 -07:00
Lukas Bulwahn	b09da0126c	MAINTAINERS: rectify file entry in BONDING DRIVER Commit `c078290a2b` ("selftests: include bonding tests into the kselftest infra") adds the bonding tests in the directory: tools/testing/selftests/drivers/net/bonding/ The file entry in MAINTAINERS for the BONDING DRIVER however refers to: tools/testing/selftests/net/bonding/ Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a broken file pattern. Repair this file entry in BONDING DRIVER. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/20220824072945.28606-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-24 18:47:32 -07:00
David S. Miller	0c4a95417e	Merge branch 'sysctl-data-races' Kuniyuki Iwashima says: ==================== net: sysctl: Fix data-races around net.core.XXX This series fixes data-races around all knobs in net_core_table and netns_core_table except for bpf stuff. These knobs are skipped: - 4 bpf knobs - netdev_rss_key: Written only once by net_get_random_once() and read-only knob - rps_sock_flow_entries: Protected with sock_flow_mutex - flow_limit_cpu_bitmap: Protected with flow_limit_update_mutex - flow_limit_table_len: Protected with flow_limit_update_mutex - default_qdisc: Protected with qdisc_mod_lock - warnings: Unused - high_order_alloc_disable: Protected with static_key_mutex - skb_defer_max: Already using READ_ONCE() - sysctl_txrehash: Already using READ_ONCE() Note 5th patch fixes net.core.message_cost and net.core.message_burst, and lib/ratelimit.c does not have an explicit maintainer. Changes: v3: * Fix build failures of CONFIG_SYSCTL=n case in 13th & 14th patches v2: https://lore.kernel.org/netdev/20220818035227.81567-1-kuniyu@amazon.com/ * Remove 4 bpf knobs and added 6 knobs v1: https://lore.kernel.org/netdev/20220816052347.70042-1-kuniyu@amazon.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:59 +01:00
Kuniyuki Iwashima	3c9ba81d72	net: Fix a data-race around sysctl_somaxconn. While reading sysctl_somaxconn, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima	05e49cfc89	net: Fix a data-race around netdev_unregister_timeout_secs. While reading netdev_unregister_timeout_secs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `5aa3afe107` ("net: make unregister netdev warning timeout configurable") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima	8db24af3f0	net: Fix a data-race around gro_normal_batch. While reading gro_normal_batch, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `323ebb61e3` ("net: use listified RX for handling GRO_NORMAL skbs") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima	a5612ca10d	net: Fix data-races around sysctl_devconf_inherit_init_net. While reading sysctl_devconf_inherit_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `856c395cfa` ("net: introduce a knob to control whether to inherit devconf config") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima	af67508ea6	net: Fix data-races around sysctl_fb_tunnels_only_for_init_net. While reading sysctl_fb_tunnels_only_for_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `79134e6ce2` ("net: do not create fallback tunnels for non-default namespaces") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima	fa45d484c5	net: Fix a data-race around netdev_budget_usecs. While reading netdev_budget_usecs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `7acf8a1e8a` ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima	657b991afb	net: Fix data-races around sysctl_max_skb_frags. While reading sysctl_max_skb_frags, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `5f74f82ea3` ("net:Add sysctl_max_skb_frags") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima	2e0c42374e	net: Fix a data-race around netdev_budget. While reading netdev_budget, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `51b0bdedb8` ("[NET]: Separate two usages of netdev_max_backlog.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima	e59ef36f07	net: Fix a data-race around sysctl_net_busy_read. While reading sysctl_net_busy_read, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `2d48d67fa8` ("net: poll/select low latency socket support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima	c42b7cddea	net: Fix a data-race around sysctl_net_busy_poll. While reading sysctl_net_busy_poll, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `0602129286` ("net: add low latency socket poll") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima	d2154b0afa	net: Fix a data-race around sysctl_tstamp_allow_data. While reading sysctl_tstamp_allow_data, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `b245be1f4d` ("net-timestamp: no-payload only sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima	7de6d09f51	net: Fix data-races around sysctl_optmem_max. While reading sysctl_optmem_max, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:57 +01:00
Kuniyuki Iwashima	6bae8ceb90	ratelimit: Fix data-races in ___ratelimit(). While reading rs->interval and rs->burst, they can be changed concurrently via sysctl (e.g. net_ratelimit_state). Thus, we need to add READ_ONCE() to their readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:57 +01:00
Kuniyuki Iwashima	61adf447e3	net: Fix data-races around netdev_tstamp_prequeue. While reading netdev_tstamp_prequeue, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `3b098e2d7c` ("net: Consistent skb timestamping") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:57 +01:00
Kuniyuki Iwashima	5dcd08cd19	net: Fix data-races around netdev_max_backlog. While reading netdev_max_backlog, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. While at it, we remove the unnecessary spaces in the doc. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:57 +01:00
Kuniyuki Iwashima	bf955b5ab8	net: Fix data-races around weight_p and dev_weight_[rt]x_bias. While reading weight_p, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Also, dev_[rt]x_weight can be read/written at the same time. So, we need to use READ_ONCE() and WRITE_ONCE() for its access. Moreover, to use the same weight_p while changing dev_[rt]x_weight, we add a mutex in proc_do_dev_weight(). Fixes: `3d48b53fb2` ("net: dev_weight: TX/RX orthogonality") Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:57 +01:00
Kuniyuki Iwashima	1227c1771d	net: Fix data-races around sysctl_[rw]mem_(max\|default). While reading sysctl_[rw]mem_(max\|default), they can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:57 +01:00
lily	c624c58e08	net/core/skbuff: Check the return value of skb_copy_bits() skb_copy_bits() could fail, which requires a check on the return value. Signed-off-by: Li Zhong <floridsleeves@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:16:48 +01:00
David S. Miller	76de008340	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-08-24 1) Fix a refcount leak in __xfrm_policy_check. From Xin Xiong. 2) Revert "xfrm: update SA curlft.use_time". This violates RFC 2367. From Antony Antony. 3) Fix a comment on XFRMA_LASTUSED. From Antony Antony. 4) x->lastused is not cloned in xfrm_do_migrate. Fix from Antony Antony. 5) Serialize the calls to xfrm_probe_algs. From Herbert Xu. 6) Fix a null pointer dereference of dst->dev on a metadata dst in xfrm_lookup_with_ifid. From Nikolay Aleksandrov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 12:51:50 +01:00
Csókás Bence	f79959220f	fec: Restart PPS after link state change On link state change, the controller gets reset, causing PPS to drop out and the PHC to lose its time and calibration. So we restart it if needed, restoring calibration and time registers. Changes since v2: * Add `fec_ptp_save_state()`/`fec_ptp_restore_state()` * Use `ktime_get_real_ns()` * Use `BIT()` macro Changes since v1: * More ECR #define's * Stop PPS in `fec_ptp_stop()` Signed-off-by: Csókás Bence <csokas.bence@prolan.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 09:53:23 +01:00
Yang Yingliang	d5485d9dd2	net: neigh: don't call kfree_skb() under spin_lock_irqsave() It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled. So add all skb to a tmp list, then free them after spin_unlock_irqrestore() at once. Fixes: `66ba215cb5` ("neigh: fix possible DoS due to net iface start/stop loop") Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 09:49:20 +01:00
Eric Dumazet	00cd7bf9f9	netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases Currently, net.netfilter.nf_conntrack_frag6_high_thresh can only be lowered. I found this issue while investigating a probable kernel issue causing flakes in tools/testing/selftests/net/ip_defrag.sh In particular, these sysctl changes were ignored: ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_high_thresh=9000000 >/dev/null 2>&1 ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_low_thresh=7000000 >/dev/null 2>&1 This change is inline with commit `8361962392` ("net/ipfrag: let ip[6]frag_high_thresh in ns be higher than in init_net") Fixes: 8db3d41569bb ("netfilter: nf_defrag_ipv6: use net_generic infra") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 08:06:44 +02:00
Pablo Neira Ayuso	9afb4b2734	netfilter: flowtable: fix stuck flows on cleanup due to pending work To clear the flow table on flow table free, the following sequence normally happens in order: 1) gc_step work is stopped to disable any further stats/del requests. 2) All flow table entries are set to teardown state. 3) Run gc_step which will queue HW del work for each flow table entry. 4) Waiting for the above del work to finish (flush). 5) Run gc_step again, deleting all entries from the flow table. 6) Flow table is freed. But if a flow table entry already has pending HW stats or HW add work step 3 will not queue HW del work (it will be skipped), step 4 will wait for the pending add/stats to finish, and step 5 will queue HW del work which might execute after freeing of the flow table. To fix the above, this patch flushes the pending work, then it sets the teardown flag to all flows in the flowtable and it forces a garbage collector run to queue work to remove the flows from hardware, then it flushes this new pending work and (finally) it forces another garbage collector run to remove the entry from the software flowtable. Stack trace: [47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460 [47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704 [47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2 [47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009) [47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table] [47773.889727] Call Trace: [47773.890214] dump_stack+0xbb/0x107 [47773.890818] print_address_description.constprop.0+0x18/0x140 [47773.892990] kasan_report.cold+0x7c/0xd8 [47773.894459] kasan_check_range+0x145/0x1a0 [47773.895174] down_read+0x99/0x460 [47773.899706] nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table] [47773.907137] flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table] [47773.913372] process_one_work+0x8ac/0x14e0 [47773.921325] [47773.921325] Allocated by task 592159: [47773.922031] kasan_save_stack+0x1b/0x40 [47773.922730] __kasan_kmalloc+0x7a/0x90 [47773.923411] tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct] [47773.924363] tcf_ct_init+0x71c/0x1156 [act_ct] [47773.925207] tcf_action_init_1+0x45b/0x700 [47773.925987] tcf_action_init+0x453/0x6b0 [47773.926692] tcf_exts_validate+0x3d0/0x600 [47773.927419] fl_change+0x757/0x4a51 [cls_flower] [47773.928227] tc_new_tfilter+0x89a/0x2070 [47773.936652] [47773.936652] Freed by task 543704: [47773.937303] kasan_save_stack+0x1b/0x40 [47773.938039] kasan_set_track+0x1c/0x30 [47773.938731] kasan_set_free_info+0x20/0x30 [47773.939467] __kasan_slab_free+0xe7/0x120 [47773.940194] slab_free_freelist_hook+0x86/0x190 [47773.941038] kfree+0xce/0x3a0 [47773.941644] tcf_ct_flow_table_cleanup_work Original patch description and stack trace by Paul Blakey. Fixes: `c29f74e0df` ("netfilter: nf_flow_table: hardware offload support") Reported-by: Paul Blakey <paulb@nvidia.com> Tested-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	759eebbcfa	netfilter: flowtable: add function to invoke garbage collection immediately Expose nf_flow_table_gc_run() to force a garbage collector run from the offload infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	e02f0d3970	netfilter: nf_tables: disallow binding to already bound chain Update nft_data_init() to report EINVAL if chain is already bound. Fixes: `d0e2c7de92` ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	01e4092d53	netfilter: nft_tunnel: restrict it to netdev family Only allow to use this expression from NFPROTO_NETDEV family. Fixes: `af308b94a2` ("netfilter: nf_tables: add tunnel support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	5f3b7aae14	netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families As it was originally intended, restrict extension to supported families. Fixes: `b96af92d6e` ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	43eb8949cf	netfilter: nf_tables: do not leave chain stats enabled on error Error might occur later in the nf_tables_addchain() codepath, enable static key only after transaction has been created. Fixes: `9f08ea8481` ("netfilter: nf_tables: keep chain counters away from hot path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	7044ab281f	netfilter: nft_payload: do not truncate csum_offset and csum_type Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type is not support. Fixes: `7ec3f7b47b` ("netfilter: nft_payload: add packet mangling support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	94254f990c	netfilter: nft_payload: report ERANGE for too long offset and length Instead of offset and length are truncation to u8, report ERANGE. Fixes: `96518518cc` ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:20 +02:00
Pablo Neira Ayuso	ab482c6b66	netfilter: nf_tables: make table handle allocation per-netns friendly mutex is per-netns, move table_netns to the pernet area. read-write to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0: nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921 Fixes: `f102d66b33` ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-by: Abhishek Shah <abhishek.shah@columbia.edu> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:20 +02:00
Pablo Neira Ayuso	5dc52d83ba	netfilter: nf_tables: disallow updates of implicit chain Updates on existing implicit chain make no sense, disallow this. Fixes: `d0e2c7de92` ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:20 +02:00
Jakub Kicinski	550e9a4d85	mlx5-fixes-2022-08-22 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmMD35cACgkQSD+KveBX +j5HZAf/fUg1QXlnIscB2/YaKUYyiagFNCotQz/8TIIkdwddTde4RkXQ192xYzKp EeL1dV1ZNthgeq20OX5mXzquW/w1fa7zTK6eweby1Pq2VdyVuqKKuUbJXXO5Ci1m rg3xxUeL8L+Rk9mGJo4HZynddH7tbHbuBq4dVgZAvDqq9jJOo6pHQPjQT1aUuc/N KT/Ezd9fWaLlTs0VuONMjlj2OjukfXUMjgCkoq6BZxlVLTGMfRlWVdEFfCoxTtkk eUvgiy2P9ApCBR8OtrjPfQ/L0bABiZvvDS4mMez9u6zutxBCKCtq4047I3dLfFyS Er61j/QCPfbkmOWB/5+6Bysa47MI/Q== =Z4kM -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2022-08-22 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Unlock on error in mlx5_sriov_enable() net/mlx5e: Fix use after free in mlx5e_fs_init() net/mlx5e: kTLS, Use _safe() iterator in mlx5e_tls_priv_tx_list_cleanup() net/mlx5: unlock on error path in esw_vfs_changed_event_handler() net/mlx5e: Fix wrong tc flag used when set hw-tc-offload off net/mlx5e: TC, Add missing policer validation net/mlx5e: Fix wrong application of the LRO state net/mlx5: Avoid false positive lockdep warning by adding lock_class_key net/mlx5: Fix cmd error logging for manage pages cmd net/mlx5: Disable irq when locking lag_lock net/mlx5: Eswitch, Fix forwarding decision to uplink net/mlx5: LAG, fix logic over MLX5_LAG_FLAG_NDEVS_READY net/mlx5e: Properly disable vlan strip on non-UL reps ==================== Link: https://lore.kernel.org/r/20220822195917.216025-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:50:26 -07:00
Jakub Kicinski	3118067842	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== ice: xsk: reduced queue count fixes Maciej Fijalkowski says: this small series is supposed to fix the issues around AF_XDP usage with reduced queue count on interface. Due to the XDP rings setup, some configurations can result in sockets not seeing traffic flowing. More about this in description of patch 2. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: xsk: use Rx ring's XDP ring when picking NAPI context ice: xsk: prohibit usage of non-balanced queue id ==================== Link: https://lore.kernel.org/r/20220822163257.2382487-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:49:02 -07:00
Jakub Kicinski	1eec80946a	Merge branch 'bnxt_en-bug-fixes' Michael Chan says: ==================== bnxt_en: Bug fixes This series includes 2 fixes for regressions introduced by the XDP multi-buffer feature, 1 devlink reload bug fix, and 1 SRIOV resource accounting bug fix. ==================== Link: https://lore.kernel.org/r/1661180814-19350-1-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 15:32:25 -07:00
Vikas Gupta	366c304741	bnxt_en: fix LRO/GRO_HW features in ndo_fix_features callback LRO/GRO_HW should be disabled if there is an attached XDP program. BNXT_FLAG_TPA is the current setting of the LRO/GRO_HW. Using BNXT_FLAG_TPA to disable LRO/GRO_HW will cause these features to be permanently disabled once they are disabled. Fixes: `1dc4c557bf` ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 15:32:20 -07:00
Vikas Gupta	09a89cc59a	bnxt_en: fix NQ resource accounting during vf creation on 57500 chips There are 2 issues: 1. We should decrement hw_resc->max_nqs instead of hw_resc->max_irqs with the number of NQs assigned to the VFs. The IRQs are fixed on each function and cannot be re-assigned. Only the NQs are being assigned to the VFs. 2. vf_msix is the total number of NQs to be assigned to the VFs. So we should decrement vf_msix from hw_resc->max_nqs. Fixes: `b16b689186` ("bnxt_en: Add SR-IOV support for 57500 chips.") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 15:32:17 -07:00
Vikas Gupta	574b2bb969	bnxt_en: set missing reload flag in devlink features Add missing devlink_set_features() API for callbacks reload_down and reload_up to function. Fixes: `228ea8c187` ("bnxt_en: implement devlink dev reload driver_reinit") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 15:32:13 -07:00
Pavan Chebbi	7dd3de7cb1	bnxt_en: Use PAGE_SIZE to init buffer when multi buffer XDP is not in use Using BNXT_PAGE_MODE_BUF_SIZE + offset as buffer length value is not sufficient when running single buffer XDP programs doing redirect operations. The stack will complain on missing skb tail room. Fix it by using PAGE_SIZE when calling xdp_init_buff() for single buffer programs. Fixes: `b231c3f341` ("bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 15:32:07 -07:00
Vladimir Oltean	15f7cfae91	net: dsa: microchip: make learning configurable and keep it off while standalone Address learning should initially be turned off by the driver for port operation in standalone mode, then the DSA core handles changes to it via ds->ops->port_bridge_flags(). Leaving address learning enabled while ports are standalone breaks any kind of communication which involves port B receiving what port A has sent. Notably it breaks the ksz9477 driver used with a (non offloaded, ports act as if standalone) bonding interface in active-backup mode, when the ports are connected together through external switches, for redundancy purposes. This fixes a major design flaw in the ksz9477 and ksz8795 drivers, which unconditionally leave address learning enabled even while ports operate as standalone. Fixes: `b987e98e50` ("dsa: add DSA switch driver for Microchip KSZ9477") Link: https://lore.kernel.org/netdev/CAFZh4h-JVWt80CrQWkFji7tZJahMfOToUJQgKS5s0_=9zzpvYQ@mail.gmail.com/ Reported-by: Brian Hutchinson <b.hutchman@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220818164809.3198039-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 14:36:56 -07:00
Florian Westphal	18bbc32133	netfilter: nft_tproxy: restrict to prerouting hook TPROXY is only allowed from prerouting, but nft_tproxy doesn't check this. This fixes a crash (null dereference) when using tproxy from e.g. output. Fixes: `4ed8eb6570` ("netfilter: nf_tables: Add native tproxy support") Reported-by: Shell Chen <xierch@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-23 21:24:34 +02:00
Florian Westphal	cf97769c76	netfilter: conntrack: work around exceeded receive window When a TCP sends more bytes than allowed by the receive window, all future packets can be marked as invalid. This can clog up the conntrack table because of 5-day default timeout. Sequence of packets: 01 initiator > responder: [S], seq 171, win 5840, options [mss 1330,sackOK,TS val 63 ecr 0,nop,wscale 1] 02 responder > initiator: [S.], seq 33211, ack 172, win 65535, options [mss 1460,sackOK,TS val 010 ecr 63,nop,wscale 8] 03 initiator > responder: [.], ack 33212, win 2920, options [nop,nop,TS val 068 ecr 010], length 0 04 initiator > responder: [P.], seq 172:240, ack 33212, win 2920, options [nop,nop,TS val 279 ecr 010], length 68 Window is 5840 starting from 33212 -> 39052. 05 responder > initiator: [.], ack 240, win 256, options [nop,nop,TS val 872 ecr 279], length 0 06 responder > initiator: [.], seq 33212:34530, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 This is fine, conntrack will flag the connection as having outstanding data (UNACKED), which lowers the conntrack timeout to 300s. 07 responder > initiator: [.], seq 34530:35848, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 08 responder > initiator: [.], seq 35848:37166, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 09 responder > initiator: [.], seq 37166:38484, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 10 responder > initiator: [.], seq 38484:39802, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 Packet 10 is already sending more than permitted, but conntrack doesn't validate this (only seq is tested vs. maxend, not 'seq+len'). 38484 is acceptable, but only up to 39052, so this packet should not have been sent (or only 568 bytes, not 1318). At this point, connection is still in '300s' mode. Next packet however will get flagged: 11 responder > initiator: [P.], seq 39802:40128, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 326 nf_ct_proto_6: SEQ is over the upper bound (over the window of the receiver) .. LEN=378 .. SEQ=39802 ACK=240 ACK PSH .. Now, a couple of replies/acks comes in: 12 initiator > responder: [.], ack 34530, win 4368, [.. irrelevant acks removed ] 16 initiator > responder: [.], ack 39802, win 8712, options [nop,nop,TS val 296201291 ecr 2982371892], length 0 This ack is significant -- this acks the last packet send by the responder that conntrack considered valid. This means that ack == td_end. This will withdraw the 'unacked data' flag, the connection moves back to the 5-day timeout of established conntracks. 17 initiator > responder: ack 40128, win 10030, ... This packet is also flagged as invalid. Because conntrack only updates state based on packets that are considered valid, packet 11 'did not exist' and that gets us: nf_ct_proto_6: ACK is over upper bound 39803 (ACKed data not seen yet) .. SEQ=240 ACK=40128 WINDOW=10030 RES=0x00 ACK URG Because this received and processed by the endpoints, the conntrack entry remains in a bad state, no packets will ever be considered valid again: 30 responder > initiator: [F.], seq 40432, ack 2045, win 391, .. 31 initiator > responder: [.], ack 40433, win 11348, .. 32 initiator > responder: [F.], seq 2045, ack 40433, win 11348 .. ... all trigger 'ACK is over bound' test and we end up with non-early-evictable 5-day default timeout. NB: This patch triggers a bunch of checkpatch warnings because of silly indent. I will resend the cleanup series linked below to reduce the indent level once this change has propagated to net-next. I could route the cleanup via nf but that causes extra backport work for stable maintainers. Link: https://lore.kernel.org/netfilter-devel/20220720175228.17880-1-fw@strlen.de/T/#mb1d7147d36294573cc4f81d00f9f8dadfdd06cd8 Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-23 18:26:48 +02:00
Florian Westphal	7997eff828	netfilter: ebtables: reject blobs that don't provide all entry points Harshit Mogalapalli says: In ebt_do_table() function dereferencing 'private->hook_entry[hook]' can lead to NULL pointer dereference. [..] Kernel panic: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] [..] RIP: 0010:ebt_do_table+0x1dc/0x1ce0 Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88 [..] Call Trace: nf_hook_slow+0xb1/0x170 __br_forward+0x289/0x730 maybe_deliver+0x24b/0x380 br_flood+0xc6/0x390 br_dev_xmit+0xa2e/0x12c0 For some reason ebtables rejects blobs that provide entry points that are not supported by the table, but what it should instead reject is the opposite: blobs that DO NOT provide an entry point supported by the table. t->valid_hooks is the bitmask of hooks (input, forward ...) that will see packets. Providing an entry point that is not support is harmless (never called/used), but the inverse isn't: it results in a crash because the ebtables traverser doesn't expect a NULL blob for a location its receiving packets for. Instead of fixing all the individual checks, do what iptables is doing and reject all blobs that differ from the expected hooks. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-23 18:23:15 +02:00
Vladimir Oltean	855a28f9c9	net: dsa: don't dereference NULL extack in dsa_slave_changeupper() When a driver returns -EOPNOTSUPP in dsa_port_bridge_join() but failed to provide a reason for it, DSA attempts to set the extack to say that software fallback will kick in. The problem is, when we use brctl and the legacy bridge ioctls, the extack will be NULL, and DSA dereferences it in the process of setting it. Sergei Antonov proves this using the following stack trace: Unable to handle kernel NULL pointer dereference at virtual address 00000000 PC is at dsa_slave_changeupper+0x5c/0x158 dsa_slave_changeupper from raw_notifier_call_chain+0x38/0x6c raw_notifier_call_chain from __netdev_upper_dev_link+0x198/0x3b4 __netdev_upper_dev_link from netdev_master_upper_dev_link+0x50/0x78 netdev_master_upper_dev_link from br_add_if+0x430/0x7f4 br_add_if from br_ioctl_stub+0x170/0x530 br_ioctl_stub from br_ioctl_call+0x54/0x7c br_ioctl_call from dev_ifsioc+0x4e0/0x6bc dev_ifsioc from dev_ioctl+0x2f8/0x758 dev_ioctl from sock_ioctl+0x5f0/0x674 sock_ioctl from sys_ioctl+0x518/0xe40 sys_ioctl from ret_fast_syscall+0x0/0x1c Fix the problem by only overriding the extack if non-NULL. Fixes: `1c6e8088d9` ("net: dsa: allow port_bridge_join() to override extack message") Link: https://lore.kernel.org/netdev/CABikg9wx7vB5eRDAYtvAm7fprJ09Ta27a4ZazC=NX5K4wn6pWA@mail.gmail.com/ Reported-by: Sergei Antonov <saproj@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Sergei Antonov <saproj@gmail.com> Link: https://lore.kernel.org/r/20220819173925.3581871-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 07:54:16 -07:00
Maciej Żenczykowski	4b2e3a17e9	net: ipvtap - add __init/__exit annotations to module init/exit funcs Looks to have been left out in an oversight. Cc: Mahesh Bandewar <maheshb@google.com> Cc: Sainath Grandhi <sainath.grandhi@intel.com> Fixes: `235a9d89da` ('ipvtap: IP-VLAN based tap driver') Signed-off-by: Maciej Żenczykowski <maze@google.com> Link: https://lore.kernel.org/r/20220821130808.12143-1-zenczykowski@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 15:45:53 +02:00

1 2 3 4 5 ...

1121725 commits