linux-stable/net/core
Yan Zhai 87e92e4c30 net: report RCU QS on threaded NAPI repolling
[ Upstream commit d6dbbb1124 ]

NAPI threads can keep polling packets under load. Currently it is only
calling cond_resched() before repolling, but it is not sufficient to
clear out the holdout of RCU tasks, which prevent BPF tracing programs
from detaching for long period. This can be reproduced easily with
following set up:

ip netns add test1
ip netns add test2

ip -n test1 link add veth1 type veth peer name veth2 netns test2

ip -n test1 link set veth1 up
ip -n test1 link set lo up
ip -n test2 link set veth2 up
ip -n test2 link set lo up

ip -n test1 addr add 192.168.1.2/31 dev veth1
ip -n test1 addr add 1.1.1.1/32 dev lo
ip -n test2 addr add 192.168.1.3/31 dev veth2
ip -n test2 addr add 2.2.2.2/31 dev lo

ip -n test1 route add default via 192.168.1.3
ip -n test2 route add default via 192.168.1.2

for i in `seq 10 210`; do
 for j in `seq 10 210`; do
    ip netns exec test2 iptables -I INPUT -s 3.3.$i.$j -p udp --dport 5201
 done
done

ip netns exec test2 ethtool -K veth2 gro on
ip netns exec test2 bash -c 'echo 1 > /sys/class/net/veth2/threaded'
ip netns exec test1 ethtool -K veth1 tso off

Then run an iperf3 client/server and a bpftrace script can trigger it:

ip netns exec test2 iperf3 -s -B 2.2.2.2 >/dev/null&
ip netns exec test1 iperf3 -c 2.2.2.2 -B 1.1.1.1 -u -l 1500 -b 3g -t 100 >/dev/null&
bpftrace -e 'kfunc:__napi_poll{@=count();} interval:s:1{exit();}'

Report RCU quiescent states periodically will resolve the issue.

Fixes: 29863d41bb ("net: implement threaded-able napi poll loop support")
Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/4c3b0d3f32d3b18949d75b18e5e1d9f13a24f025.1710877680.git.yan@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26 18:17:38 -04:00
..
bpf_sk_storage.c net: Namespace-ify sysctl_optmem_max 2023-12-15 11:01:27 +00:00
datagram.c net: Fix from address in memcpy_to_iter_csum() 2024-02-02 12:21:02 +00:00
dev.c net: report RCU QS on threaded NAPI repolling 2024-03-26 18:17:38 -04:00
dev.h net: fix removing a namespace with conflicting altnames 2024-01-22 01:09:30 +00:00
dev_addr_lists.c
dev_addr_lists_test.c net: fill in MODULE_DESCRIPTION()s under net/core 2023-10-28 11:29:27 +01:00
dev_ioctl.c net: partial revert of the "Make timestamping selectable: series 2023-11-18 18:42:37 -08:00
drop_monitor.c genetlink: Use internal flags for multicast groups 2023-12-29 08:43:59 +00:00
dst.c
dst_cache.c
failover.c
fib_notifier.c
fib_rules.c fib: rules: remove repeated assignment in fib_nl2rule 2024-01-07 15:16:19 +00:00
filter.c xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL 2024-01-24 16:24:07 -08:00
flow_dissector.c
flow_offload.c
gen_estimator.c
gen_stats.c
gro.c
gro_cells.c
gso.c
gso_test.c net: test: Fix printf format specifier in skb_segment kunit test 2024-03-26 18:16:28 -04:00
hwbm.c
link_watch.c Revert "net: rtnetlink: remove local list in __linkwatch_run_queue()" 2023-12-11 10:57:16 +00:00
lwt_bpf.c
lwtunnel.c
Makefile net: page_pool: id the page pools 2023-11-28 15:48:39 +01:00
neighbour.c neighbour: Don't let neigh_forced_gc() disable preemption for long 2023-12-08 10:37:43 +00:00
net-procfs.c
net-sysfs.c net: sysfs: fix locking in carrier read 2023-12-08 16:10:17 -08:00
net-sysfs.h
net-traces.c
net_namespace.c net: Namespace-ify sysctl_optmem_max 2023-12-15 11:01:27 +00:00
netclassid_cgroup.c cgroup, netclassid: on modifying netclassid in cgroup, only consider the main process. 2023-10-16 16:36:53 -07:00
netdev-genl-gen.c netdev-genl: spec: Extend netdev netlink spec in YAML for NAPI 2023-12-04 18:04:05 -08:00
netdev-genl-gen.h netdev-genl: spec: Extend netdev netlink spec in YAML for NAPI 2023-12-04 18:04:05 -08:00
netdev-genl.c netdev-genl: Add PID for the NAPI thread 2023-12-04 18:04:06 -08:00
netevent.c
netpoll.c
netprio_cgroup.c
of_net.c
page_pool.c page_pool: halve BIAS_MAX for multiple user references of a fragment 2023-12-17 10:56:33 +00:00
page_pool_priv.h net: page_pool: report when page pool was destroyed 2023-11-28 15:48:39 +01:00
page_pool_user.c page_pool: fix netlink dump stop/resume 2024-03-04 10:12:59 +00:00
pktgen.c net: pktgen: Use wait_event_freezable_timeout() for freezable kthread 2023-12-27 14:34:52 +00:00
ptp_classifier.c
request_sock.c tcp: make sure init the accept_queue's spinlocks once 2024-01-19 21:13:25 -08:00
rtnetlink.c dpll: move all dpll<>netdev helpers to dpll code 2024-03-05 18:36:42 -08:00
scm.c vfs-6.8.misc 2024-01-08 10:26:08 -08:00
secure_seq.c
selftests.c net: fill in MODULE_DESCRIPTION()s under net/core 2023-10-28 11:29:27 +01:00
skbuff.c net: mctp: copy skb ext data when fragmenting 2024-03-26 18:16:49 -04:00
skmsg.c bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready() 2024-02-21 17:15:23 +01:00
sock.c net: implement lockless setsockopt(SO_PEEK_OFF) 2024-02-21 11:24:20 +00:00
sock_destructor.h
sock_diag.c sock_diag: annotate data-races around sock_diag_handlers[family] 2024-03-26 18:16:32 -04:00
sock_map.c bpf: syzkaller found null ptr deref in unix_bpf proto add 2023-12-13 16:32:28 -08:00
sock_reuseport.c
stream.c net: Return error from sk_stream_wait_connect() if sk_wait_event() fails 2023-12-15 10:48:51 +00:00
sysctl_net_core.c net: Namespace-ify sysctl_optmem_max 2023-12-15 11:01:27 +00:00
timestamping.c net: partial revert of the "Make timestamping selectable: series 2023-11-18 18:42:37 -08:00
tso.c
utils.c
xdp.c xdp: Add VLAN tag hint 2023-12-13 16:16:40 -08:00