linux-stable/net
Yan Zhai 3890e7008c net: report RCU QS on threaded NAPI repolling
[ Upstream commit d6dbbb1124 ]

NAPI threads can keep polling packets under load. Currently it is only
calling cond_resched() before repolling, but it is not sufficient to
clear out the holdout of RCU tasks, which prevent BPF tracing programs
from detaching for long period. This can be reproduced easily with
following set up:

ip netns add test1
ip netns add test2

ip -n test1 link add veth1 type veth peer name veth2 netns test2

ip -n test1 link set veth1 up
ip -n test1 link set lo up
ip -n test2 link set veth2 up
ip -n test2 link set lo up

ip -n test1 addr add 192.168.1.2/31 dev veth1
ip -n test1 addr add 1.1.1.1/32 dev lo
ip -n test2 addr add 192.168.1.3/31 dev veth2
ip -n test2 addr add 2.2.2.2/31 dev lo

ip -n test1 route add default via 192.168.1.3
ip -n test2 route add default via 192.168.1.2

for i in `seq 10 210`; do
 for j in `seq 10 210`; do
    ip netns exec test2 iptables -I INPUT -s 3.3.$i.$j -p udp --dport 5201
 done
done

ip netns exec test2 ethtool -K veth2 gro on
ip netns exec test2 bash -c 'echo 1 > /sys/class/net/veth2/threaded'
ip netns exec test1 ethtool -K veth1 tso off

Then run an iperf3 client/server and a bpftrace script can trigger it:

ip netns exec test2 iperf3 -s -B 2.2.2.2 >/dev/null&
ip netns exec test1 iperf3 -c 2.2.2.2 -B 1.1.1.1 -u -l 1500 -b 3g -t 100 >/dev/null&
bpftrace -e 'kfunc:__napi_poll{@=count();} interval:s:1{exit();}'

Report RCU quiescent states periodically will resolve the issue.

Fixes: 29863d41bb ("net: implement threaded-able napi poll loop support")
Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/4c3b0d3f32d3b18949d75b18e5e1d9f13a24f025.1710877680.git.yan@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26 18:21:02 -04:00
..
6lowpan
9p net: 9p: avoid freeing uninit memory in p9pdu_vreadf 2024-01-01 12:39:04 +00:00
802 mrp: introduce active flags to prevent UAF when applicant uninit 2022-12-31 13:33:02 +01:00
8021q vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING 2024-01-31 16:17:04 -08:00
appletalk appletalk: Fix Use-After-Free in atalk_ioctl 2023-12-20 17:00:19 +01:00
atm atm: Fix Use-After-Free in do_vcc_ioctl 2023-12-20 17:00:17 +01:00
ax25
batman-adv net: vlan: introduce skb_vlan_eth_hdr() 2023-12-20 17:00:16 +01:00
bluetooth Bluetooth: Fix eir name length 2024-03-26 18:20:42 -04:00
bpf Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" 2023-03-17 08:50:32 +01:00
bpfilter
bridge netfilter: bridge: confirm multicast packets before passing them up the stack 2024-03-06 14:45:08 +00:00
caif net: caif: Fix use-after-free in cfusbl_device_notify() 2023-03-17 08:50:24 +01:00
can can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER) 2024-02-23 09:12:47 +01:00
ceph libceph: use kernel_connect() 2023-10-19 23:08:56 +02:00
core net: report RCU QS on threaded NAPI repolling 2024-03-26 18:21:02 -04:00
dcb net: dcb: choose correct policy to parse DCB_ATTR_BCN 2023-08-11 12:08:17 +02:00
dccp dccp/tcp: Call security_inet_conn_request() after setting IPv6 addresses. 2023-11-20 11:52:16 +01:00
devlink devlink: remove reload failed checks in params get/set callbacks 2023-09-23 11:11:01 +02:00
dns_resolver keys, dns: Fix size check of V1 server-list header 2024-01-25 15:27:38 -08:00
dsa net: dsa: sja1105: always enable the send_meta options 2023-07-19 16:22:06 +02:00
ethernet
ethtool ethtool: netlink: Add missing ethnl_ops_begin/complete 2024-01-25 15:27:51 -08:00
hsr hsr: Handle failures in module init 2024-03-26 18:21:00 -04:00
ieee802154
ife net: sched: ife: fix potential use-after-free 2024-01-01 12:38:56 +00:00
ipv4 ipv4: raw: Fix sending packets from raw sockets via IPsec tunnels 2024-03-26 18:21:00 -04:00
ipv6 ipv6: fib6_rules: flush route cache when rule is changed 2024-03-26 18:20:41 -04:00
iucv net/iucv: fix the allocation size of iucv_path_table array 2024-03-26 18:20:25 -04:00
kcm net: kcm: fix incorrect parameter validation in the kcm_getsockopt) function 2024-03-26 18:20:42 -04:00
key net: af_key: fix sadb_x_filter validation 2023-08-23 17:52:32 +02:00
l2tp l2tp: fix incorrect parameter validation in the pppol2tp_getsockopt() function 2024-03-26 18:20:42 -04:00
l3mdev
lapb
llc llc: call sock_orphan() at release time 2024-02-05 20:13:01 +00:00
mac80211 wifi: mac80211: only call drv_sta_rc_update for uploaded stations 2024-03-26 18:20:26 -04:00
mac802154
mctp net: mctp: copy skb ext data when fragmenting 2024-03-26 18:20:37 -04:00
mpls net: mpls: fix stale pointer if allocation fails during device rename 2023-02-22 12:59:53 +01:00
mptcp mptcp: fix possible deadlock in subflow diag 2024-03-06 14:45:12 +00:00
ncsi net/ncsi: Fix netlink major/minor version numbers 2024-01-25 15:27:24 -08:00
netfilter netfilter: nf_tables: do not compare internal table flags on updates 2024-03-26 18:21:02 -04:00
netlabel calipso: fix memory leak in netlbl_calipso_add_pass() 2024-01-25 15:27:20 -08:00
netlink netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter 2024-03-06 14:45:06 +00:00
netrom netrom: Fix data-races around sysctl_net_busy_read 2024-03-15 10:48:18 -04:00
nfc nfc: nci: free rx_data_reassembly skb on NCI device cleanup 2024-02-23 09:12:37 +01:00
nsh net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment() 2023-05-24 17:32:45 +01:00
openvswitch net: openvswitch: limit the number of recursions from action sets 2024-02-23 09:12:30 +01:00
packet packet: annotate data-races around ignore_outgoing 2024-03-26 18:20:59 -04:00
phonet phonet/pep: fix racy skb_queue_empty() use 2024-03-01 13:26:38 +01:00
psample psample: Require 'CAP_NET_ADMIN' when joining "packets" group 2023-12-13 18:39:11 +01:00
qrtr net: qrtr: ns: Return 0 if server port is not present 2024-01-20 11:50:09 +01:00
rds rds: introduce acquire/release ordering in acquire/release_in_xmit() 2024-03-26 18:21:00 -04:00
rfkill net: rfkill: gpio: set GPIO direction 2024-01-01 12:39:04 +00:00
rose net/rose: fix races in rose_kill_by_device() 2024-01-01 12:38:57 +00:00
rxrpc rxrpc: Fix response to PING RESPONSE ACKs to a dead call 2024-02-16 19:06:27 +01:00
sched net/sched: taprio: proper TCA_TAPRIO_TC_ENTRY_INDEX check 2024-03-26 18:20:59 -04:00
sctp sctp: fix busy polling 2024-01-25 15:27:30 -08:00
smc net/smc: disable SEID on non-s390 archs where virtual ISM may be used 2024-02-05 20:12:54 +00:00
strparser
sunrpc net: sunrpc: Fix an off by one in rpc_sockaddr2uaddr() 2024-03-26 18:20:55 -04:00
switchdev net: bridge: switchdev: Skip MDB replays of deferred events on offload 2024-03-01 13:26:35 +01:00
tipc tipc: Check the bearer type before calling tipc_udp_nl_bearer_add() 2024-02-16 19:06:27 +01:00
tls tls: fix peeking with sync+async decryption 2024-03-06 14:45:09 +00:00
unix af_unix: Annotate data-race of gc_in_progress in wait_for_unix_gc(). 2024-03-26 18:20:31 -04:00
vmw_vsock virtio/vsock: fix logic which reduces credit update messages 2024-01-25 15:27:28 -08:00
wireless wifi: nl80211: reject iftype change with mesh ID change 2024-03-06 14:45:10 +00:00
x25 net/x25: fix incorrect parameter validation in the x25_getsockopt() function 2024-03-26 18:20:42 -04:00
xdp xsk: Skip polling event check for unbound socket 2023-12-13 18:39:08 +01:00
xfrm xfrm: Silence warnings triggerable by bad packets 2024-02-23 09:12:47 +01:00
Kconfig
Kconfig.debug
Makefile devlink: move code to a dedicated directory 2023-08-30 16:11:00 +02:00
compat.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
devres.c
socket.c splice, net: Add a splice_eof op to file-ops and socket-ops 2024-01-10 17:10:27 +01:00
sysctl_net.c