linux-stable/net/ipv4
Nikolay Aleksandrov 1005f19b93 net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group
When replacing a nexthop group, we must release the IPv6 per-cpu dsts of
the removed nexthop entries after an RCU grace period because they
contain references to the nexthop's net device and to the fib6 info.
With specific series of events[1] we can reach net device refcount
imbalance which is unrecoverable. IPv4 is not affected because dsts
don't take a refcount on the route.

[1]
 $ ip nexthop list
  id 200 via 2002:db8::2 dev bridge.10 scope link onlink
  id 201 via 2002:db8::3 dev bridge scope link onlink
  id 203 group 201/200
 $ ip -6 route
  2001:db8::10 nhid 203 metric 1024 pref medium
     nexthop via 2002:db8::3 dev bridge weight 1 onlink
     nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink

Create rt6_info through one of the multipath legs, e.g.:
 $ taskset -a -c 1  ./pkt_inj 24 bridge.10 2001:db8::10
 (pkt_inj is just a custom packet generator, nothing special)

Then remove that leg from the group by replace (let's assume it is id
200 in this case):
 $ ip nexthop replace id 203 group 201

Now remove the IPv6 route:
 $ ip -6 route del 2001:db8::10/128

The route won't be really deleted due to the stale rt6_info holding 1
refcnt in nexthop id 200.
At this point we have the following reference count dependency:
 (deleted) IPv6 route holds 1 reference over nhid 203
 nh 203 holds 1 ref over id 201
 nh 200 holds 1 ref over the net device and the route due to the stale
 rt6_info

Now to create circular dependency between nh 200 and the IPv6 route, and
also to get a reference over nh 200, restore nhid 200 in the group:
 $ ip nexthop replace id 203 group 201/200

And now we have a permanent circular dependncy because nhid 203 holds a
reference over nh 200 and 201, but the route holds a ref over nh 203 and
is deleted.

To trigger the bug just delete the group (nhid 203):
 $ ip nexthop del id 203

It won't really be deleted due to the IPv6 route dependency, and now we
have 2 unlinked and deleted objects that reference each other: the group
and the IPv6 route. Since the group drops the reference it holds over its
entries at free time (i.e. its own refcount needs to drop to 0) that will
never happen and we get a permanent ref on them, since one of the entries
holds a reference over the IPv6 route it will also never be released.

At this point the dependencies are:
 (deleted, only unlinked) IPv6 route holds reference over group nh 203
 (deleted, only unlinked) group nh 203 holds reference over nh 201 and 200
 nh 200 holds 1 ref over the net device and the route due to the stale
 rt6_info

This is the last point where it can be fixed by running traffic through
nh 200, and specifically through the same CPU so the rt6_info (dst) will
get released due to the IPv6 genid, that in turn will free the IPv6
route, which in turn will free the ref count over the group nh 203.

If nh 200 is deleted at this point, it will never be released due to the
ref from the unlinked group 203, it will only be unlinked:
 $ ip nexthop del id 200
 $ ip nexthop
 $

Now we can never release that stale rt6_info, we have IPv6 route with ref
over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with
rt6_info (dst) with ref over the net device and the IPv6 route. All of
these objects are only unlinked, and cannot be released, thus they can't
release their ref counts.

 Message from syslogd@dev at Nov 19 14:04:10 ...
  kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
 Message from syslogd@dev at Nov 19 14:04:20 ...
  kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3

Fixes: 7bf4796dd0 ("nexthops: add support for replace")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22 15:44:49 +00:00
..
bpfilter
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2021-10-18 14:05:25 +01:00
af_inet.c net: introduce sk_forward_alloc_get() 2021-10-27 18:20:29 -07:00
ah4.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
arp.c net: arp: introduce arp_evict_nocarrier sysctl parameter 2021-11-01 19:57:14 -07:00
bpf_tcp_ca.c bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs 2021-11-15 20:35:58 -08:00
cipso_ipv4.c NET: IPV4: fix error "do not initialise globals to 0" 2021-09-19 12:43:56 +01:00
datagram.c net/ipv4/datagram.c: remove superfluous header files from datagram.c 2021-09-29 11:39:33 +01:00
devinet.c net: return correct error code 2021-11-15 14:22:12 +00:00
esp4.c ipsec: Remove unneeded extra variable in esp4 esp_ssg_unref() 2021-07-20 16:14:23 +02:00
esp4_offload.c xfrm: remove description from xfrm_type struct 2021-06-09 09:38:52 +02:00
fib_frontend.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
fib_lookup.h ipv4: Fix spelling mistakes 2021-06-07 14:08:30 -07:00
fib_notifier.c net: ipv4: remove superfluous header files from fib_notifier.c 2021-09-28 17:32:56 -07:00
fib_rules.c
fib_semantics.c net: ipv4: Fix rtnexthop len when RTA_FLOW is present 2021-09-24 14:07:10 +01:00
fib_trie.c memcg: enable accounting for IP address and routing-related objects 2021-07-20 06:00:38 -07:00
fou.c fou: remove sparse errors 2021-08-31 12:03:33 +01:00
gre_demux.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
gre_offload.c ip_gre: add csum offload support for gre header 2021-01-29 20:39:14 -08:00
icmp.c icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe 2021-10-14 07:54:47 -07:00
igmp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-13 06:41:22 -07:00
inet_connection_sock.c tcp: switch orphan_count to bare per-cpu counters 2021-10-15 11:28:34 +01:00
inet_diag.c net: introduce sk_forward_alloc_get() 2021-10-27 18:20:29 -07:00
inet_fragment.c
inet_hashtables.c tcp: switch orphan_count to bare per-cpu counters 2021-10-15 11:28:34 +01:00
inet_timewait_sock.c
inetpeer.c inetpeer: use div64_ul() and clamp_val() calculate inet_peer_threshold 2021-03-01 13:32:12 -08:00
ip_forward.c
ip_fragment.c
ip_gre.c ip: use dev_addr_set() in tunnels 2021-10-13 09:41:37 -07:00
ip_input.c net: use indirect call helpers for dst_input 2021-02-03 14:51:39 -08:00
ip_options.c
ip_output.c net: ipv4: Fix the warning for dereference 2021-08-30 12:47:09 +01:00
ip_sockglue.c ipv4: guard IP_MINTTL with a static key 2021-10-25 18:02:14 -07:00
ip_tunnel.c ip: use dev_addr_set() in tunnels 2021-10-13 09:41:37 -07:00
ip_tunnel_core.c
ip_vti.c ip: use dev_addr_set() in tunnels 2021-10-13 09:41:37 -07:00
ipcomp.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
ipconfig.c net: ipconfig: Release the rtnl_lock while waiting for carrier 2021-10-28 14:36:41 +01:00
ipip.c ip: use dev_addr_set() in tunnels 2021-10-13 09:41:37 -07:00
ipmr.c ipmr: Fix indentation issue 2021-07-07 20:52:25 -07:00
ipmr_base.c
Kconfig
Makefile bpf: Clean up sockmap related Kconfigs 2021-02-26 12:28:03 -08:00
metrics.c
netfilter.c netfilter: Dissect flow after packet mangling 2021-04-18 22:04:16 +02:00
netlink.c
nexthop.c net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group 2021-11-22 15:44:49 +00:00
ping.c Revert "Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"" 2021-09-14 14:24:31 +01:00
proc.c tcp: switch orphan_count to bare per-cpu counters 2021-10-15 11:28:34 +01:00
protocol.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
raw.c Revert "Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"" 2021-09-14 14:24:31 +01:00
raw_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
route.c net/ipv4/route.c: remove superfluous header files from route.c 2021-09-20 13:09:06 +01:00
syncookies.c net/ipv4/syncookies.c: remove superfluous header files from syncookies.c 2021-09-21 10:48:47 +01:00
sysctl_net_ipv4.c tcp: remove sk_{tr}x_skb_cache 2021-09-23 12:50:26 +01:00
tcp.c tcp: Fix uninitialized access in skb frags array for Rx 0cp. 2021-11-12 20:13:28 -08:00
tcp_bbr.c bpf: Enable TCP congestion control kfunc from modules 2021-10-05 17:07:41 -07:00
tcp_bic.c
tcp_bpf.c bpf, sockmap: Fix race in ingress receive verdict with redirect to self 2021-11-09 00:58:26 +01:00
tcp_cdg.c
tcp_cong.c net: Only allow init netns to set default tcp cong to a restricted algo 2021-05-04 11:58:28 -07:00
tcp_cubic.c bpf: Enable TCP congestion control kfunc from modules 2021-10-05 17:07:41 -07:00
tcp_dctcp.c bpf: Enable TCP congestion control kfunc from modules 2021-10-05 17:07:41 -07:00
tcp_dctcp.h
tcp_diag.c
tcp_fastopen.c net/ipv4/tcp_fastopen.c: remove superfluous header files from tcp_fastopen.c 2021-09-20 13:09:06 +01:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: adjust rcv_ssthresh according to sk_reserved_mem 2021-09-30 13:36:46 +01:00
tcp_ipv4.c ipv4: guard IP_MINTTL with a static key 2021-10-25 18:02:14 -07:00
tcp_lp.c ipv4: tcp_lp.c: Couple of typo fixes 2021-03-28 17:31:13 -07:00
tcp_metrics.c
tcp_minisocks.c net/ipv4/tcp_minisocks.c: remove superfluous header files from tcp_minisocks.c 2021-09-20 13:09:06 +01:00
tcp_nv.c net/ipv4/tcp_nv.c: remove superfluous header files from tcp_nv.c 2021-09-27 12:47:39 +01:00
tcp_offload.c net, gro: Set inner transport header offset in tcp/udp GRO hook 2021-08-02 10:20:56 +01:00
tcp_output.c tcp: Use BIT() for OPTION_* constants 2021-11-04 11:26:15 +00:00
tcp_rate.c tcp: tracking packets with CE marks in BW rate sample 2021-09-24 14:16:40 +01:00
tcp_recovery.c tcp: more accurately check DSACKs to grow RACK reordering window 2021-07-27 20:07:21 +01:00
tcp_scalable.c
tcp_timer.c net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
tcp_ulp.c
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c tcp_yeah: check struct yeah size at compile time 2021-06-29 11:54:36 -07:00
tunnel4.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
udp.c udp: Validate checksum in udp_read_sock() 2021-11-16 13:18:23 +01:00
udp_bpf.c net: Implement ->sock_is_readable() for UDP and AF_UNIX 2021-10-26 12:29:33 -07:00
udp_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
udp_impl.h
udp_offload.c fou: remove sparse errors 2021-08-31 12:03:33 +01:00
udp_tunnel_core.c net/ipv4/udp_tunnel_core.c: remove superfluous header files from udp_tunnel_core.c 2021-09-21 10:17:20 +01:00
udp_tunnel_nic.c udp_tunnel: Fix udp_tunnel_nic work-queue type 2021-09-13 12:38:45 +01:00
udp_tunnel_stub.c
udplite.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
xfrm4_input.c
xfrm4_output.c
xfrm4_policy.c
xfrm4_protocol.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
xfrm4_state.c
xfrm4_tunnel.c net/ipv4/xfrm4_tunnel.c: remove superfluous header files from xfrm4_tunnel.c 2021-09-23 10:10:00 +02:00