linux-stable/net/ipv4
Eric Dumazet 93f0133b9d tcp: properly terminate timers for kernel sockets
[ Upstream commit 151c9c724d ]

We had various syzbot reports about tcp timers firing after
the corresponding netns has been dismantled.

Fortunately Josef Bacik could trigger the issue more often,
and could test a patch I wrote two years ago.

When TCP sockets are closed, we call inet_csk_clear_xmit_timers()
to 'stop' the timers.

inet_csk_clear_xmit_timers() can be called from any context,
including when socket lock is held.
This is the reason it uses sk_stop_timer(), aka del_timer().
This means that ongoing timers might finish much later.

For user sockets, this is fine because each running timer
holds a reference on the socket, and the user socket holds
a reference on the netns.

For kernel sockets, we risk that the netns is freed before
timer can complete, because kernel sockets do not hold
reference on the netns.

This patch adds inet_csk_clear_xmit_timers_sync() function
that using sk_stop_timer_sync() to make sure all timers
are terminated before the kernel socket is released.
Modules using kernel sockets close them in their netns exit()
handler.

Also add sock_not_owned_by_me() helper to get LOCKDEP
support : inet_csk_clear_xmit_timers_sync() must not be called
while socket lock is held.

It is very possible we can revert in the future commit
3a58f13a88 ("net: rds: acquire refcount on TCP sockets")
which attempted to solve the issue in rds only.
(net/smc/af_smc.c and net/mptcp/subflow.c have similar code)

We probably can remove the check_net() tests from
tcp_out_of_resources() and __tcp_close() in the future.

Reported-by: Josef Bacik <josef@toxicpanda.com>
Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/
Fixes: 26abe14379 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Fixes: 8a68173691 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Josef Bacik <josef@toxicpanda.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20240322135732.1535772-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13 12:50:12 +02:00
..
bpfilter
netfilter treewide: Remove uninitialized_var() usage 2023-08-11 11:45:01 +02:00
af_inet.c inet: read sk->sk_family once in inet_recv_error() 2024-02-23 08:12:53 +01:00
ah4.c
arp.c ipv4: Invalidate neighbour for broadcast address upon address addition 2022-04-15 14:15:01 +02:00
cipso_ipv4.c cipso: Fix data-races around sysctl. 2022-07-21 21:09:28 +02:00
datagram.c
devinet.c net: return correct error code 2021-12-08 08:50:11 +01:00
esp4.c net: ipv4: fix return value check in esp_remove_trailer 2023-10-25 11:16:44 +02:00
esp4_offload.c xfrm: Linearize the skb after offloading if needed. 2023-06-28 10:15:29 +02:00
fib_frontend.c ipv4: Fix incorrect table ID in IOCTL path 2023-03-22 13:27:10 +01:00
fib_lookup.h
fib_notifier.c
fib_rules.c
fib_semantics.c
fib_trie.c
fou.c
gre_demux.c erspan: fix version 1 check in gre_parse_header() 2021-01-12 20:10:19 +01:00
gre_offload.c
icmp.c icmp: guard against too small mtu 2023-04-20 12:04:38 +02:00
igmp.c ipv4: igmp: fix refcnt uaf issue when receiving igmp query packet 2023-12-08 08:43:25 +01:00
inet_connection_sock.c tcp: properly terminate timers for kernel sockets 2024-04-13 12:50:12 +02:00
inet_diag.c inet_diag: Fix error path to cancel the meseage in inet_req_diag_fill() 2020-11-24 13:27:16 +01:00
inet_fragment.c
inet_hashtables.c Revert "tcp: avoid the lookup process failing to get sk in ehash table" 2023-08-11 11:45:26 +02:00
inet_timewait_sock.c Revert "tcp: avoid the lookup process failing to get sk in ehash table" 2023-08-11 11:45:26 +02:00
inetpeer.c inetpeer: Fix data-races around sysctl. 2022-07-21 21:09:27 +02:00
ip_forward.c
ip_fragment.c
ip_gre.c ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit() 2023-12-13 17:42:16 +01:00
ip_input.c tcp/udp: Make early_demux back namespacified. 2022-11-10 17:46:54 +01:00
ip_options.c
ip_output.c net: ipv4: fix a memleak in ip_setup_cork 2024-02-23 08:12:52 +01:00
ip_sockglue.c ipv{4,6}/raw: fix output xfrm lookup wrt protocol 2023-06-09 10:23:54 +02:00
ip_tunnel.c net: tunnels: annotate lockless accesses to dev->needed_headroom 2023-03-22 13:27:09 +01:00
ip_tunnel_core.c
ip_vti.c ip_vti: fix potential slab-use-after-free in decode_session6 2023-08-30 16:31:48 +02:00
ipcomp.c
ipconfig.c net: ipconfig: Don't override command-line hostnames or domains 2021-06-30 08:48:13 -04:00
ipip.c
ipmr.c ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path 2022-02-16 12:51:45 +01:00
ipmr_base.c
Kconfig tcp: configurable source port perturb table size 2022-12-08 11:18:31 +01:00
Makefile
metrics.c ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() 2023-02-06 07:49:43 +01:00
netfilter.c netfilter: use actual socket sk rather than skb sk when routing harder 2020-11-18 19:18:44 +01:00
netlink.c
ping.c ping: fix address binding wrt vrf 2022-05-18 09:42:50 +02:00
proc.c
protocol.c
raw.c ipv{4,6}/raw: fix output xfrm lookup wrt protocol 2023-06-09 10:23:54 +02:00
raw_diag.c
route.c ipv4: Correct/silence an endian warning in __ip_do_redirect 2023-12-08 08:43:23 +01:00
syncookies.c tcp: make sure treq->af_specific is initialized 2022-05-12 12:20:25 +02:00
sysctl_net_ipv4.c tcp/udp: Make early_demux back namespacified. 2022-11-10 17:46:54 +01:00
tcp.c tcp: properly terminate timers for kernel sockets 2024-04-13 12:50:12 +02:00
tcp_bbr.c tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets 2021-08-26 08:36:39 -04:00
tcp_bic.c
tcp_cdg.c tcp: cdg: allow tcp_cdg_release() to be called multiple times 2022-11-25 17:40:28 +01:00
tcp_cong.c net: Only allow init netns to set default tcp cong to a restricted algo 2021-05-22 10:59:39 +02:00
tcp_cubic.c tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows 2021-12-01 09:27:43 +01:00
tcp_dctcp.c
tcp_diag.c tcp: annotate tp->write_seq lockless reads 2021-03-17 16:43:43 +01:00
tcp_fastopen.c tcp: annotate data-races around fastopenq.max_qlen 2023-08-11 11:45:27 +02:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: do not accept ACK of bytes we never sent 2023-12-13 17:42:17 +01:00
tcp_ipv4.c dccp/tcp: Reset saddr on failure after inet6?_hash_connect(). 2022-12-08 11:18:30 +01:00
tcp_lp.c
tcp_metrics.c tcp_metrics: do not create an entry from tcp_init_metrics() 2023-11-20 10:29:16 +01:00
tcp_minisocks.c tcp: tcp_check_req() can be called from process context 2023-03-11 16:31:59 +01:00
tcp_nv.c
tcp_offload.c
tcp_output.c net: Remove acked SYN flag from packet in the transmit queue correctly 2023-12-20 15:38:01 +01:00
tcp_rate.c
tcp_recovery.c tcp: fix excessive TLP and RACK timeouts from HZ rounding 2023-10-25 11:16:46 +02:00
tcp_scalable.c
tcp_timer.c net: fix the RTO timer retransmitting skb every 1ms if linear option is enabled 2023-08-30 16:31:50 +02:00
tcp_ulp.c
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tunnel4.c
udp.c udp: fix incorrect parameter validation in the udp_lib_getsockopt() function 2024-03-26 18:22:36 -04:00
udp_diag.c
udp_impl.h
udp_offload.c net: Fix gro aggregation for udp encaps with zero csum 2021-03-17 16:43:42 +01:00
udp_tunnel.c net/tunnel: wait until all sk_user_data reader finish before releasing the sock 2023-01-18 11:30:18 +01:00
udplite.c udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). 2023-05-30 12:42:14 +01:00
xfrm4_input.c
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c
xfrm4_policy.c xfrm: Don't accidentally set RTO_ONLINK in decode_session4() 2022-02-23 11:58:39 +01:00
xfrm4_protocol.c net: xfrm: unexport __init-annotated xfrm4_protocol_init() 2022-06-14 16:59:35 +02:00
xfrm4_state.c
xfrm4_tunnel.c