linux-stable/net
Ido Schimmel d25665f528 nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID
commit 8743aeff5b upstream.

A netlink dump callback can return a positive number to signal that more
information needs to be dumped or zero to signal that the dump is
complete. In the second case, the core netlink code will append the
NLMSG_DONE message to the skb in order to indicate to user space that
the dump is complete.

The nexthop bucket dump callback always returns a positive number if
nexthop buckets were filled in the provided skb, even if the dump is
complete. This means that a dump will span at least two recvmsg() calls
as long as nexthop buckets are present. In the last recvmsg() call the
dump callback will not fill in any nexthop buckets because the previous
call indicated that the dump should restart from the last dumped nexthop
ID plus one.

 # ip link add name dummy1 up type dummy
 # ip nexthop add id 1 dev dummy1
 # ip nexthop add id 10 group 1 type resilient buckets 2
 # strace -e sendto,recvmsg -s 5 ip nexthop bucket
 sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396980, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 128
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
 id 10 index 0 idle_time 6.66 nhid 1
 id 10 index 1 idle_time 6.66 nhid 1
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
 +++ exited with 0 +++

This behavior is both inefficient and buggy. If the last nexthop to be
dumped had the maximum ID of 0xffffffff, then the dump will restart from
0 (0xffffffff + 1) and never end:

 # ip link add name dummy1 up type dummy
 # ip nexthop add id 1 dev dummy1
 # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
 # ip nexthop bucket
 id 4294967295 index 0 idle_time 5.55 nhid 1
 id 4294967295 index 1 idle_time 5.55 nhid 1
 id 4294967295 index 0 idle_time 5.55 nhid 1
 id 4294967295 index 1 idle_time 5.55 nhid 1
 [...]

Fix by adjusting the dump callback to return zero when the dump is
complete. After the fix only one recvmsg() call is made and the
NLMSG_DONE message is appended to the RTM_NEWNEXTHOPBUCKET responses:

 # ip link add name dummy1 up type dummy
 # ip nexthop add id 1 dev dummy1
 # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
 # strace -e sendto,recvmsg -s 5 ip nexthop bucket
 sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396737, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 148
 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
 id 4294967295 index 0 idle_time 6.61 nhid 1
 id 4294967295 index 1 idle_time 6.61 nhid 1
 +++ exited with 0 +++

Note that if the NLMSG_DONE message cannot be appended because of size
limitations, then another recvmsg() will be needed, but the core netlink
code will not invoke the dump callback and simply reply with a
NLMSG_DONE message since it knows that the callback previously returned
zero.

Add a test that fails before the fix:

 # ./fib_nexthops.sh -t basic_res
 [...]
 TEST: Maximum nexthop ID dump                                       [FAIL]
 [...]

And passes after it:

 # ./fib_nexthops.sh -t basic_res
 [...]
 TEST: Maximum nexthop ID dump                                       [ OK ]
 [...]

Fixes: 8a1bbabb03 ("nexthop: Add netlink handlers for bucket dump")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230808075233.3337922-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16 18:32:28 +02:00
..
6lowpan 6lowpan: Remove redundant initialisation. 2023-03-29 08:22:52 +01:00
9p Including fixes from netfilter. 2023-05-05 19:12:01 -07:00
802
8021q vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit() 2023-05-17 12:55:39 +01:00
appletalk
atm atm: hide unused procfs functions 2023-05-17 21:27:30 -07:00
ax25
batman-adv batman-adv: Broken sync while rescheduling delayed work 2023-05-26 23:14:49 +02:00
bluetooth Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb 2023-08-11 12:14:24 +02:00
bpf bpf: add test_run support for netfilter program type 2023-04-21 11:34:50 -07:00
bpfilter
bridge bridge: Add extack warning when enabling STP in netns. 2023-07-27 08:56:55 +02:00
caif net: caif: Fix use-after-free in cfusbl_device_notify() 2023-03-02 22:22:07 -08:00
can net: annotate data-races around sk->sk_mark 2023-08-11 12:14:13 +02:00
ceph libceph: fix potential hang in ceph_osdc_notify() 2023-08-11 12:14:19 +02:00
core bpf, sockmap: Fix bug that strp_done cannot be called 2023-08-16 18:32:25 +02:00
dcb net: dcb: choose correct policy to parse DCB_ATTR_BCN 2023-08-11 12:14:16 +02:00
dccp dccp: fix data-race around dp->dccps_mss_cache 2023-08-16 18:32:26 +02:00
devlink devlink: report devlink_port_type_warn source device 2023-07-27 08:56:51 +02:00
dns_resolver
dsa net: dsa: sja1105: always enable the send_meta options 2023-07-19 16:36:48 +02:00
ethernet
ethtool ethtool: Fix uninitialized number of lanes 2023-05-03 09:13:20 +01:00
handshake net/handshake: remove fput() that causes use-after-free 2023-06-14 22:26:37 -07:00
hsr hsr: ratelimit only when errors are printed 2023-03-16 21:11:03 -07:00
ieee802154 ieee802154: Replace strlcpy with strscpy 2023-06-16 22:14:24 +02:00
ife
ipv4 nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID 2023-08-16 18:32:28 +02:00
ipv6 ipv6: adjust ndisc_is_useropt() to also return true for PIO 2023-08-16 18:32:17 +02:00
iucv net/iucv: Fix size of interrupt data 2023-03-16 17:34:40 -07:00
kcm
key af_key: Reject optional tunnel/BEET mode templates in outbound policies 2023-05-10 07:04:51 +02:00
l2tp net: annotate data-races around sk->sk_mark 2023-08-11 12:14:13 +02:00
l3mdev
lapb
llc llc: Don't drop packet from non-root netns. 2023-07-27 08:57:01 +02:00
mac80211 net: move gso declarations and functions to their own files 2023-08-11 12:14:12 +02:00
mac802154 Merge tag 'ieee802154-for-net-2023-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan 2023-06-20 09:32:33 +01:00
mctp mctp: remove MODULE_LICENSE in non-modules 2023-03-09 23:06:21 -08:00
mpls net: move gso declarations and functions to their own files 2023-08-11 12:14:12 +02:00
mptcp mptcp: fix the incorrect judgment for msk->cb_flags 2023-08-16 18:32:25 +02:00
ncsi net/ncsi: change from ndo_set_mac_address to dev_set_mac_address 2023-07-23 13:54:17 +02:00
netfilter netfilter: nft_set_hash: mark set element as dead when deleting from packet path 2023-08-16 18:32:22 +02:00
netlabel netlabel: fix shift wrapping bug in netlbl_catmap_setlong() 2023-06-10 19:54:06 +01:00
netlink netlink: Add __sock_i_ino() for __netlink_diag_dump(). 2023-07-19 16:35:38 +02:00
netrom netrom: fix info-leak in nr_write_internal() 2023-05-25 21:02:29 -07:00
nfc net: nfc: Fix use-after-free caused by nfc_llcp_find_local 2023-07-19 16:35:36 +02:00
nsh net: move gso declarations and functions to their own files 2023-08-11 12:14:12 +02:00
openvswitch net: move gso declarations and functions to their own files 2023-08-11 12:14:12 +02:00
packet net/packet: annotate data-races around tp->status 2023-08-16 18:32:25 +02:00
phonet
psample
qrtr net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume() 2023-04-13 09:35:30 +02:00
rds rds: rds_rm_zerocopy_callback() correct order for list_add_tail() 2023-02-13 09:33:39 +00:00
rfkill net: rfkill-gpio: Add explicit include for of.h 2023-04-06 20:36:27 +02:00
rose
rxrpc rxrpc: Truncate UTS_RELEASE for rxrpc version 2023-05-30 10:01:06 +02:00
sched net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free 2023-08-11 12:14:15 +02:00
sctp net: move gso declarations and functions to their own files 2023-08-11 12:14:12 +02:00
smc net/smc: Use correct buffer sizes when switching between TCP and SMC 2023-08-16 18:32:25 +02:00
strparser
sunrpc SUNRPC: Fix UAF in svc_tcp_listen_data_ready() 2023-07-19 16:36:22 +02:00
switchdev
tipc tipc: stop tipc crypto on failure in tipc_node_create 2023-08-03 10:25:54 +02:00
tls net: tls: avoid discarding data on record close 2023-08-16 18:32:27 +02:00
unix net: add missing data-race annotations around sk->sk_peek_off 2023-08-11 12:14:13 +02:00
vmw_vsock bpf, sockmap: Pass skb ownership through read_skb 2023-05-23 16:09:47 +02:00
wireless wifi: nl80211: fix integer overflow in nl80211_parse_mbssid_elems() 2023-08-16 18:32:16 +02:00
x25
xdp xsk: fix refcount underflow in error path 2023-08-16 18:32:26 +02:00
xfrm net: annotate data-races around sk->sk_mark 2023-08-11 12:14:13 +02:00
compat.c net/compat: Update msg_control_is_user when setting a kernel pointer 2023-04-14 11:09:27 +01:00
devres.c
Kconfig net/handshake: Add Kunit tests for the handshake consumer API 2023-04-19 18:48:48 -07:00
Kconfig.debug
Makefile net/handshake: Create a NETLINK service for handling handshake requests 2023-04-19 18:48:48 -07:00
socket.c net: annotate sk->sk_err write from do_recvmmsg() 2023-05-10 09:58:29 +01:00
sysctl_net.c