linux-stable/net
Peilin Ye ea3f336f71 net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
[ Upstream commit 84ad0af0bc ]

mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized
in ingress_init() to point to net_device::miniq_ingress.  ingress Qdiscs
access this per-net_device pointer in mini_qdisc_pair_swap().  Similar
for clsact Qdiscs and miniq_egress.

Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
requests (thanks Hillf Danton for the hint), when replacing ingress or
clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
causing race conditions [1] including a use-after-free bug in
mini_qdisc_pair_swap() reported by syzbot:

 BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
 Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901
...
 Call Trace:
  <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
  print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
  print_report mm/kasan/report.c:430 [inline]
  kasan_report+0x11c/0x130 mm/kasan/report.c:536
  mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
  tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
  tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
  tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
  tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
  tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
...

@old and @new should not affect each other.  In other words, @old should
never modify miniq_{in,e}gress after @new, and @new should not update
@old's RCU state.

Fixing without changing sch_api.c turned out to be difficult (please
refer to Closes: for discussions).  Instead, make sure @new's first call
always happen after @old's last call (in {ingress,clsact}_destroy()) has
finished:

In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests,
and call qdisc_destroy() for @old before grafting @new.

Introduce qdisc_refcount_dec_if_one() as the counterpart of
qdisc_refcount_inc_nz() used for filter requests.  Introduce a
non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
just like qdisc_put() etc.

Depends on patch "net/sched: Refactor qdisc_graft() for ingress and
clsact Qdiscs".

[1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
TC_H_ROOT (no longer possible after commit c7cfbd1150 ("net/sched:
sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8
transmission queues:

  Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2),
  then adds a flower filter X to A.

  Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
  b2) to replace A, then adds a flower filter Y to B.

 Thread 1               A's refcnt   Thread 2
  RTM_NEWQDISC (A, RTNL-locked)
   qdisc_create(A)               1
   qdisc_graft(A)                9

  RTM_NEWTFILTER (X, RTNL-unlocked)
   __tcf_qdisc_find(A)          10
   tcf_chain0_head_change(A)
   mini_qdisc_pair_swap(A) (1st)
            |
            |                         RTM_NEWQDISC (B, RTNL-locked)
         RCU sync                2     qdisc_graft(B)
            |                    1     notify_and_destroy(A)
            |
   tcf_block_release(A)          0    RTM_NEWTFILTER (Y, RTNL-unlocked)
   qdisc_destroy(A)                    tcf_chain0_head_change(B)
   tcf_chain0_head_change_cb_del(A)    mini_qdisc_pair_swap(B) (2nd)
   mini_qdisc_pair_swap(A) (3rd)                |
           ...                                 ...

Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to
its mini Qdisc, b1.  Then, A calls mini_qdisc_pair_swap() again during
ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress
packets on eth0 will not find filter Y in sch_handle_ingress().

This is just one of the possible consequences of concurrently accessing
miniq_{in,e}gress pointers.

Fixes: 7a096d579e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops")
Fixes: 87f373921c ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops")
Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
Cc: Hillf Danton <hdanton@sina.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-21 16:01:01 +02:00
..
6lowpan
9p 9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition 2023-04-20 12:35:08 +02:00
802 mrp: introduce active flags to prevent UAF when applicant uninit 2022-12-31 13:33:02 +01:00
8021q vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit() 2023-05-24 17:32:47 +01:00
appletalk
atm atm: hide unused procfs functions 2023-06-09 10:34:16 +02:00
ax25 ax25: move from strlcpy with unused retval to strscpy 2022-08-22 17:55:50 -07:00
batman-adv batman-adv: Broken sync while rescheduling delayed work 2023-06-14 11:15:23 +02:00
bluetooth Bluetooth: fix debugfs registration 2023-06-14 11:15:28 +02:00
bpf Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" 2023-03-17 08:50:32 +01:00
bpfilter
bridge bridge: always declare tunnel functions 2023-05-24 17:32:48 +01:00
caif net: caif: Fix use-after-free in cfusbl_device_notify() 2023-03-17 08:50:24 +01:00
can can: j1939: avoid possible use-after-free when j1939_can_rx_register fails 2023-06-14 11:15:26 +02:00
ceph use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
core net: sched: add rcu annotations around qdisc->qdisc_sleeping 2023-06-14 11:15:21 +02:00
dcb
dccp netfilter: keep conntrack reference until IPsecv6 policy checks are done 2023-05-11 23:03:18 +09:00
dns_resolver
dsa net: dsa: tag_brcm: legacy: fix daisy-chained switches 2023-03-30 12:49:09 +02:00
ethernet net: gro: skb_gro_header helper function 2022-08-25 10:33:21 +02:00
ethtool ethtool: Fix uninitialized number of lanes 2023-05-17 11:53:37 +02:00
hsr hsr: ratelimit only when errors are printed 2023-04-06 12:10:58 +02:00
ieee802154 net: ieee802154: fix error return code in dgram_bind() 2022-10-07 09:29:17 +02:00
ife
ipv4 tcp: gso: really support BIG TCP 2023-06-14 11:15:20 +02:00
ipv6 ping6: Fix send to link-local addresses with VRF. 2023-06-21 16:00:58 +02:00
iucv net/iucv: Fix size of interrupt data 2023-03-22 13:33:50 +01:00
kcm kcm: close race conditions on sk_receive_queue 2022-11-15 12:42:26 +01:00
key af_key: Reject optional tunnel/BEET mode templates in outbound policies 2023-05-24 17:32:43 +01:00
l2tp inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy(). 2023-04-26 14:28:43 +02:00
l3mdev
lapb
llc net: deal with most data-races in sk_wait_event() 2023-05-24 17:32:32 +01:00
mac80211 wifi: mac80211: take lock before setting vif links 2023-06-21 16:00:59 +02:00
mac802154 mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add() 2022-12-05 09:53:08 +01:00
mctp net: mctp: purge receive queues on sk destruction 2023-02-06 08:06:34 +01:00
mpls net: mpls: fix stale pointer if allocation fails during device rename 2023-02-22 12:59:53 +01:00
mptcp mptcp: update userspace pm infos 2023-06-14 11:15:27 +02:00
ncsi net/ncsi: clear Tx enable mode when handling a Config required AEN 2023-05-17 11:53:32 +02:00
netfilter netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE 2023-06-21 16:00:58 +02:00
netlabel genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
netlink net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report 2023-06-09 10:34:03 +02:00
netrom netrom: fix info-leak in nr_write_internal() 2023-06-09 10:34:01 +02:00
nfc nfc: change order inside nfc_se_io error path 2023-03-17 08:50:17 +01:00
nsh net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment() 2023-05-24 17:32:45 +01:00
openvswitch net: openvswitch: fix race on port output 2023-04-20 12:35:09 +02:00
packet af_packet: do not use READ_ONCE() in packet_bind() 2023-06-09 10:34:02 +02:00
phonet
psample genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
qrtr net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume() 2023-04-20 12:35:09 +02:00
rds rds: rds_rm_zerocopy_callback() correct order for list_add_tail() 2023-03-10 09:33:02 +01:00
rfkill
rose net/rose: Fix to not accept on connected socket 2023-02-22 12:59:42 +01:00
rxrpc rxrpc: Fix hard call timeout units 2023-05-17 11:53:35 +02:00
sched net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting 2023-06-21 16:01:01 +02:00
sctp sctp: fix an error code in sctp_sf_eat_auth() 2023-06-21 16:01:00 +02:00
smc net/smc: Avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT 2023-06-14 11:15:17 +02:00
strparser
sunrpc nfsd: fix double fget() bug in __write_ports_addfd() 2023-06-09 10:34:04 +02:00
switchdev
tipc tipc: check the bearer min mtu properly when setting it by netlink 2023-05-24 17:32:46 +01:00
tls tls: rx: strp: don't use GFP_KERNEL in softirq context 2023-06-09 10:34:29 +02:00
unix bpf, sockmap: Pass skb ownership through read_skb 2023-06-05 09:26:18 +02:00
vmw_vsock vsock: avoid to close connected socket after the timeout 2023-05-24 17:32:44 +01:00
wireless wifi: cfg80211: fix link del callback to call correct handler 2023-06-21 16:00:59 +02:00
x25 net/x25: Fix to not accept on connected socket 2023-02-09 11:28:13 +01:00
xdp xsk: Fix unaligned descriptor validation 2023-05-11 23:03:21 +09:00
xfrm xfrm: Check if_id in inbound policy/secpath match 2023-06-09 10:34:10 +02:00
compat.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
devres.c
Kconfig Remove DECnet support from kernel 2022-08-22 14:26:30 +01:00
Kconfig.debug net: make NET_(DEV|NS)_REFCNT_TRACKER depend on NET 2022-09-20 14:23:56 -07:00
Makefile Remove DECnet support from kernel 2022-08-22 14:26:30 +01:00
socket.c net: annotate sk->sk_err write from do_recvmmsg() 2023-05-24 17:32:32 +01:00
sysctl_net.c