linux-stable/net
Vlad Buslov c2999f7fb0 net: sched: multiq: don't call qdisc_put() while holding tree lock
Recent changes that removed rtnl dependency from rules update path of tc
also made tcf_block_put() function sleeping. This function is called from
ops->destroy() of several Qdisc implementations, which in turn is called by
qdisc_put(). Some Qdiscs call qdisc_put() while holding sch tree spinlock,
which results sleeping-while-atomic BUG.

Steps to reproduce for multiq:

tc qdisc add dev ens1f0 root handle 1: multiq
tc qdisc add dev ens1f0 parent 1:10 handle 50: sfq perturb 10
ethtool -L ens1f0 combined 2
tc qdisc change dev ens1f0 root handle 1: multiq

Resulting dmesg:

[ 5539.419344] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
[ 5539.420945] in_atomic(): 1, irqs_disabled(): 0, pid: 27658, name: tc
[ 5539.422435] INFO: lockdep is turned off.
[ 5539.423904] CPU: 21 PID: 27658 Comm: tc Tainted: G        W         5.3.0-rc8+ #721
[ 5539.425400] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 5539.426911] Call Trace:
[ 5539.428380]  dump_stack+0x85/0xc0
[ 5539.429823]  ___might_sleep.cold+0xac/0xbc
[ 5539.431262]  __mutex_lock+0x5b/0x960
[ 5539.432682]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 5539.434103]  ? __nla_validate_parse+0x51/0x840
[ 5539.435493]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 5539.436903]  tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 5539.438327]  tcf_block_put_ext.part.0+0x21/0x50
[ 5539.439752]  tcf_block_put+0x50/0x70
[ 5539.441165]  sfq_destroy+0x15/0x50 [sch_sfq]
[ 5539.442570]  qdisc_destroy+0x5f/0x160
[ 5539.444000]  multiq_tune+0x14a/0x420 [sch_multiq]
[ 5539.445421]  tc_modify_qdisc+0x324/0x840
[ 5539.446841]  rtnetlink_rcv_msg+0x170/0x4b0
[ 5539.448269]  ? netlink_deliver_tap+0x95/0x400
[ 5539.449691]  ? rtnl_dellink+0x2d0/0x2d0
[ 5539.451116]  netlink_rcv_skb+0x49/0x110
[ 5539.452522]  netlink_unicast+0x171/0x200
[ 5539.453914]  netlink_sendmsg+0x224/0x3f0
[ 5539.455304]  sock_sendmsg+0x5e/0x60
[ 5539.456686]  ___sys_sendmsg+0x2ae/0x330
[ 5539.458071]  ? ___sys_recvmsg+0x159/0x1f0
[ 5539.459461]  ? do_wp_page+0x9c/0x790
[ 5539.460846]  ? __handle_mm_fault+0xcd3/0x19e0
[ 5539.462263]  __sys_sendmsg+0x59/0xa0
[ 5539.463661]  do_syscall_64+0x5c/0xb0
[ 5539.465044]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 5539.466454] RIP: 0033:0x7f1fe08177b8
[ 5539.467863] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5
4
[ 5539.470906] RSP: 002b:00007ffe812de5d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 5539.472483] RAX: ffffffffffffffda RBX: 000000005d8135e3 RCX: 00007f1fe08177b8
[ 5539.474069] RDX: 0000000000000000 RSI: 00007ffe812de640 RDI: 0000000000000003
[ 5539.475655] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000182e9b0
[ 5539.477203] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001
[ 5539.478699] R13: 000000000047f640 R14: 0000000000000000 R15: 0000000000000000

Rearrange locking in multiq_tune() in following ways:

- In loop that removes Qdiscs from disabled queues, call
  qdisc_purge_queue() instead of qdisc_tree_flush_backlog() on Qdisc that
  is being destroyed. Save the Qdisc in temporary allocated array and call
  qdisc_put() on each element of the array after sch tree lock is released.
  This is safe to do because Qdiscs have already been reset by
  qdisc_purge_queue() inside sch tree lock critical section.

- Do the same change for second loop that initializes Qdiscs for newly
  enabled queues in multiq_tune() function. Since sch tree lock is obtained
  and released on each iteration of this loop, just call qdisc_put()
  directly outside of critical section. Don't verify that old Qdisc is not
  noop_qdisc before releasing reference to it because such check is already
  performed by qdisc_put*() functions.

Fixes: c266f64dbf ("net: sched: protect block state with mutex")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-27 12:13:55 +02:00
..
6lowpan 6lowpan: no need to check return value of debugfs_create functions 2019-07-06 12:50:01 +02:00
9p 9p pull request for inclusion in 5.13 2019-07-12 17:31:19 -07:00
802 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
appletalk appletalk: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
atm pppoatm: use %*ph to print small buffer 2019-09-05 12:33:28 +02:00
ax25 ax25: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
batman-adv net: Fix Kconfig indentation 2019-09-26 08:56:17 +02:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2019-09-18 12:34:53 -07:00
bpf bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN 2019-07-25 18:00:41 -07:00
bpfilter Kbuild updates for v5.3 2019-07-12 16:03:16 -07:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
caif treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 194 2019-05-30 11:29:22 -07:00
can can: add support of SAE J1939 protocol 2019-09-04 14:22:33 +02:00
ceph libceph: don't call crypto_free_sync_skcipher() on a NULL tfm 2019-08-28 12:33:46 +02:00
core net: print proper warning on dst underflow 2019-09-26 09:05:56 +02:00
dcb treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201 2019-05-30 11:29:52 -07:00
dccp ipv6: add priority parameter to ip6_xmit() 2019-09-27 12:05:02 +02:00
decnet
dns_resolver Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs" 2019-07-10 18:43:43 -07:00
dsa Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2019-09-17 23:51:10 +02:00
ethernet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
hsr hsr: switch ->dellink() to ->ndo_uninit() 2019-07-11 14:37:45 -07:00
ieee802154 ieee802154: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
ife net: Fix Kconfig indentation 2019-09-26 08:56:17 +02:00
ipv4 tcp: honor SO_PRIORITY in TIME_WAIT state 2019-09-27 12:05:02 +02:00
ipv6 tcp: honor SO_PRIORITY in TIME_WAIT state 2019-09-27 12:05:02 +02:00
iucv net/af_iucv: mark expected switch fall-throughs 2019-07-29 10:26:14 -07:00
kcm kcm: disable preemption in kcm_parse_func_strparser() 2019-09-27 10:27:14 +02:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-07-08 19:48:57 -07:00
l2tp compat_ioctl: pppoe: fix PPPOEIOCSFWD handling 2019-07-30 14:42:13 -07:00
l3mdev ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF 2019-06-23 13:24:17 -07:00
lapb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-17 20:20:36 -07:00
llc treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 281 2019-06-05 17:36:36 +02:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
mac802154 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 2019-05-30 11:26:41 -07:00
mpls ipv4: mpls: fix mpls_xmit for iptunnel 2019-08-25 14:34:08 -07:00
ncsi net/ncsi: Disable global multicast filter 2019-09-19 18:04:40 -07:00
netfilter net: Fix Kconfig indentation 2019-09-26 08:56:17 +02:00
netlabel netlabel: remove redundant assignment to pointer iter 2019-09-01 11:45:02 -07:00
netlink net: remove empty netlink_tap_exit_net 2019-06-14 19:50:33 -07:00
netrom netrom: hold sock when setting skb->destructor 2019-07-24 15:49:05 -07:00
nfc nfc: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
nsh treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
openvswitch openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC 2019-09-26 09:32:33 +02:00
packet net/packet: fix race in tpacket_snd() 2019-08-15 13:59:48 -07:00
phonet treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 336 2019-06-05 17:37:07 +02:00
psample net: sched: take reference to psample group in flow_action infra 2019-09-16 09:18:03 +02:00
qrtr net: qrtr: Stop rx_worker before freeing node 2019-09-21 18:45:46 -07:00
rds net/rds: Check laddr_check before calling it 2019-09-27 12:10:55 +02:00
rfkill treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
rose treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
sched net: sched: multiq: don't call qdisc_put() while holding tree lock 2019-09-27 12:13:55 +02:00
sctp ipv6: add priority parameter to ip6_xmit() 2019-09-27 12:05:02 +02:00
smc net/smc: make sure EPOLLOUT is raised 2019-08-20 12:25:14 -07:00
strparser Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
sunrpc Merge branch 'work.mount-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-09-18 13:15:58 -07:00
switchdev treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
tls net/tls: align non temporal copy to cache lines 2019-09-07 18:10:34 +02:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
vmw_vsock vsock/virtio: a better comment on credit update 2019-09-05 09:53:01 +02:00
wimax wimax: no need to check return value of debugfs_create functions 2019-08-10 15:25:47 -07:00
wireless We have a number of changes, but things are settling down: 2019-09-11 14:57:17 +01:00
x25
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2019-09-06 16:49:17 +02:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
Kconfig devlink: Add packet trap infrastructure 2019-08-17 12:40:08 -07:00
Makefile
compat.c uio: make import_iovec()/compat_import_iovec() return bytes on success 2019-05-31 15:30:03 -06:00
socket.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
sysctl_net.c