linux-stable/net
Pablo Neira Ayuso 5f68718b34 netfilter: nf_tables: GC transaction API to avoid race with control plane
The set types rhashtable and rbtree use a GC worker to reclaim memory.
From system work queue, in periodic intervals, a scan of the table is
done.

The major caveat here is that the nft transaction mutex is not held.
This causes a race between control plane and GC when they attempt to
delete the same element.

We cannot grab the netlink mutex from the work queue, because the
control plane has to wait for the GC work queue in case the set is to be
removed, so we get following deadlock:

   cpu 1                                cpu2
     GC work                            transaction comes in , lock nft mutex
       `acquire nft mutex // BLOCKS
                                        transaction asks to remove the set
                                        set destruction calls cancel_work_sync()

cancel_work_sync will now block forever, because it is waiting for the
mutex the caller already owns.

This patch adds a new API that deals with garbage collection in two
steps:

1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT
   so they are not visible via lookup. Annotate current GC sequence in
   the GC transaction. Enqueue GC transaction work as soon as it is
   full. If ruleset is updated, then GC transaction is aborted and
   retried later.

2) GC work grabs the mutex. If GC sequence has changed then this GC
   transaction lost race with control plane, abort it as it contains
   stale references to objects and let GC try again later. If the
   ruleset is intact, then this GC transaction deactivates and removes
   the elements and it uses call_rcu() to destroy elements.

Note that no elements are removed from GC lockless path, the _DEAD bit
is set and pointers are collected. GC catchall does not remove the
elements anymore too. There is a new set->dead flag that is set on to
abort the GC transaction to deal with set->ops->destroy() path which
removes the remaining elements in the set from commit_release, where no
mutex is held.

To deal with GC when mutex is held, which allows safe deactivate and
removal, add sync GC API which releases the set element object via
call_rcu(). This is used by rbtree and pipapo backends which also
perform garbage collection from control plane path.

Since element removal from sets can happen from control plane and
element garbage collection/timeout, it is necessary to keep the set
structure alive until all elements have been deactivated and destroyed.

We cannot do a cancel_work_sync or flush_work in nft_set_destroy because
its called with the transaction mutex held, but the aforementioned async
work queue might be blocked on the very mutex that nft_set_destroy()
callchain is sitting on.

This gives us the choice of ABBA deadlock or UaF.

To avoid both, add set->refs refcount_t member. The GC API can then
increment the set refcount and release it once the elements have been
free'd.

Set backends are adapted to use the GC transaction API in a follow up
patch entitled:

  ("netfilter: nf_tables: use gc transaction API in set backends")

This is joint work with Florian Westphal.

Fixes: cfed7e1b1f ("netfilter: nf_tables: add set garbage collection helpers")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-10 08:25:16 +02:00
..
6lowpan
9p
802
8021q
appletalk sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
atm sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
ax25 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
batman-adv batman-adv: Broken sync while rescheduling delayed work 2023-05-26 23:14:49 +02:00
bluetooth Bluetooth: MGMT: Use correct address for memcpy() 2023-07-20 11:27:22 -07:00
bpf
bpfilter net: Use umd_cleanup_helper() 2023-05-31 13:06:57 +02:00
bridge Revert "bridge: Add extack warning when enabling STP in netns." 2023-07-20 10:46:28 +02:00
caif sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
can net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
ceph libceph: harden msgr2.1 frame segment length checks 2023-07-13 13:18:57 +02:00
core net: annotate data-races around sk->sk_priority 2023-07-29 18:13:41 +01:00
dcb
dccp net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
devlink devlink: report devlink_port_type_warn source device 2023-06-17 00:31:14 -07:00
dns_resolver
dsa net: dsa: fix older DSA drivers using phylink 2023-07-27 17:19:46 -07:00
ethernet
ethtool net: create device lookup API with reference tracking 2023-06-15 08:21:11 +01:00
handshake Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-15 22:19:41 -07:00
hsr net: hsr: Disable promiscuous mode in offload mode 2023-06-21 16:47:05 -07:00
ieee802154 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
ife
ipv4 net: annotate data-races around sk->sk_priority 2023-07-29 18:13:41 +01:00
ipv6 net: annotate data-races around sk->sk_priority 2023-07-29 18:13:41 +01:00
iucv
kcm sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
key sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
l2tp net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
l3mdev
lapb
llc llc: Don't drop packet from non-root netns. 2023-07-20 10:46:28 +02:00
mac80211 - New Drivers 2023-07-03 11:26:05 -07:00
mac802154 Core WPAN changes: 2023-06-24 15:41:46 -07:00
mctp sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
mpls net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
mptcp net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
ncsi net/ncsi: change from ndo_set_mac_address to dev_set_mac_address 2023-06-09 10:32:51 +01:00
netfilter netfilter: nf_tables: GC transaction API to avoid race with control plane 2023-08-10 08:25:16 +02:00
netlabel netlabel: Reorder fields in 'struct netlbl_domaddr6_map' 2023-06-20 20:06:56 -07:00
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-27 09:45:22 -07:00
netrom sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-27 09:45:22 -07:00
nsh net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
openvswitch net: openvswitch: add support for l4 symmetric hashing 2023-06-12 09:46:30 +01:00
packet net: annotate data-races around sk->sk_priority 2023-07-29 18:13:41 +01:00
phonet sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
psample
qrtr Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
rds sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
rfkill
rose sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
rxrpc Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
sched net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free 2023-07-31 20:10:37 -07:00
sctp sctp: fix potential deadlock on &net->sctp.addr_wq_lock 2023-06-29 11:49:42 +02:00
smc net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
strparser
sunrpc NFS client updates for Linux 6.5 2023-07-01 14:38:25 -07:00
switchdev
tipc tipc: stop tipc crypto on failure in tipc_node_create 2023-07-27 11:45:05 +02:00
tls net: Kill MSG_SENDPAGE_NOTLAST 2023-06-24 15:50:13 -07:00
unix net: add missing data-race annotations around sk->sk_peek_off 2023-07-29 18:13:41 +01:00
vmw_vsock sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
wireless wifi: cfg80211: fix receiving mesh packets without RFC1042 header 2023-07-12 18:03:40 -07:00
x25 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
xdp net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
xfrm net: annotate data-races around sk->sk_mark 2023-07-29 18:13:41 +01:00
Kconfig net/core: Enable socket busy polling on -RT 2023-05-26 08:51:26 +01:00
Kconfig.debug
Makefile
compat.c
devres.c
socket.c Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
sysctl_net.c