linux-stable/net/wireless
Johannes Berg ea6b2098dd cfg80211: fix locking in netlink owner interface destruction
Harald Arnesen reported [1] a deadlock at reboot time, and after
he captured a stack trace a picture developed of what's going on:

The distribution he's using is using iwd (not wpa_supplicant) to
manage wireless. iwd will usually use the "socket owner" option
when it creates new interfaces, so that they're automatically
destroyed when it quits (unexpectedly or otherwise). This is also
done by wpa_supplicant, but it doesn't do it for the normal one,
only for additional ones, which is different with iwd.

Anyway, during shutdown, iwd quits while the netdev is still UP,
i.e. IFF_UP is set. This causes the stack trace that Linus so
nicely transcribed from the pictures:

cfg80211_destroy_iface_wk() takes wiphy_lock
 -> cfg80211_destroy_ifaces()
  ->ieee80211_del_iface
    ->ieeee80211_if_remove
      ->cfg80211_unregister_wdev
        ->unregister_netdevice_queue
          ->dev_close_many
            ->__dev_close_many
              ->raw_notifier_call_chain
                ->cfg80211_netdev_notifier_call
and that last call tries to take wiphy_lock again.

In commit a05829a722 ("cfg80211: avoid holding the RTNL when
calling the driver") I had taken into account the possibility of
recursing from cfg80211 into cfg80211_netdev_notifier_call() via
the network stack, but only for NETDEV_UNREGISTER, not for what
happens here, NETDEV_GOING_DOWN and NETDEV_DOWN notifications.

Additionally, while this worked still back in commit 78f22b6a3a
("cfg80211: allow userspace to take ownership of interfaces"), it
missed another corner case: unregistering a netdev will cause
dev_close() to be called, and thus stop wireless operations (e.g.
disconnecting), but there are some types of virtual interfaces in
wifi that don't have a netdev - for that we need an additional
call to cfg80211_leave().

So, to fix this mess, change cfg80211_destroy_ifaces() to not
require the wiphy_lock(), but instead make it acquire it, but
only after it has actually closed all the netdevs on the list,
and then call cfg80211_leave() as well before removing them
from the driver, to fix the second issue. The locking change in
this requires modifying the nl80211 call to not get the wiphy
lock passed in, but acquire it by itself after flushing any
potentially pending destruction requests.

[1] https://lore.kernel.org/r/09464e67-f3de-ac09-28a3-e27b7914ee7d@skogtun.org

Cc: stable@vger.kernel.org # 5.12
Reported-by: Harald Arnesen <harald@skogtun.org>
Fixes: 776a39b819 ("cfg80211: call cfg80211_destroy_ifaces() with wiphy lock held")
Fixes: 78f22b6a3a ("cfg80211: allow userspace to take ownership of interfaces")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Harald Arnesen <harald@skogtun.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-27 08:30:49 -07:00
..
certs
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
ap.c
chan.c cfg80211: avoid holding the RTNL when calling the driver 2021-01-26 11:55:50 +01:00
core.c cfg80211: fix locking in netlink owner interface destruction 2021-04-27 08:30:49 -07:00
core.h cfg80211: avoid holding the RTNL when calling the driver 2021-01-26 11:55:50 +01:00
debugfs.c cfg80211: avoid holding the RTNL when calling the driver 2021-01-26 11:55:50 +01:00
debugfs.h
ethtool.c cfg80211: check wiphy driver existence for drvinfo report 2020-02-07 12:53:26 +01:00
ibss.c cfg80211: avoid holding the RTNL when calling the driver 2021-01-26 11:55:50 +01:00
Kconfig cfg80211: select CONFIG_CRC32 2021-01-05 15:50:36 -08:00
lib80211.c lib80211: Remove unused macro DRV_NAME 2020-09-18 11:53:00 +02:00
lib80211_crypt_ccmp.c
lib80211_crypt_tkip.c mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
lib80211_crypt_wep.c mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
Makefile
mesh.c cfg80211/mac80211: add mesh_param "mesh_nolearn" to skip path discovery 2020-07-31 09:24:23 +02:00
mlme.c cfg80211: avoid holding the RTNL when calling the driver 2021-01-26 11:55:50 +01:00
nl80211.c cfg80211: fix locking in netlink owner interface destruction 2021-04-27 08:30:49 -07:00
nl80211.h cfg80211: support immediate reconnect request hint 2020-12-11 13:20:05 +01:00
ocb.c
of.c
pmsr.c nl80211: link recursive netlink nested policy 2020-04-30 17:51:41 -07:00
radiotap.c wireless: radiotap: fix some kernel-doc 2020-09-28 13:53:05 +02:00
rdev-ops.h nl80211: add common API to configure SAR power limitations 2020-12-11 13:38:54 +01:00
reg.c cfg80211: initialize reg_rule in __freq_reg_info() 2021-02-12 08:56:19 +01:00
reg.h cfg80211: avoid holding the RTNL when calling the driver 2021-01-26 11:55:50 +01:00
scan.c cfg80211: check S1G beacon compat element length 2021-04-08 14:44:54 +02:00
sme.c cfg80211: remove WARN_ON() in cfg80211_sme_connect 2021-04-08 10:14:55 +02:00
sysfs.c cfg80211: remove unused callback 2021-02-12 08:52:25 +01:00
sysfs.h
trace.c
trace.h nl80211: add common API to configure SAR power limitations 2020-12-11 13:38:54 +01:00
util.c cfg80211: avoid holding the RTNL when calling the driver 2021-01-26 11:55:50 +01:00
wext-compat.c wext: call cfg80211_set_encryption() with wiphy lock held 2021-01-28 19:10:57 +01:00
wext-compat.h
wext-core.c wext: fix NULL-ptr-dereference with cfg80211's lack of commit() 2021-01-26 11:59:42 +01:00
wext-priv.c
wext-proc.c
wext-sme.c cfg80211: avoid holding the RTNL when calling the driver 2021-01-26 11:55:50 +01:00
wext-spy.c