linux-stable/drivers/pci/hotplug/pciehp_pci.c
Lukas Wunner f5eff5591b PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock
In 2013, commits

  2e35afaefe ("PCI: pciehp: Add reset_slot() method")
  608c388122 ("PCI: Add slot reset option to pci_dev_reset()")

amended PCIe hotplug to mask Presence Detect Changed events during a
Secondary Bus Reset.  The reset thus no longer causes gratuitous slot
bringdown and bringup.

However the commits neglected to serialize reset with code paths reading
slot registers.  For instance, a slot bringup due to an earlier hotplug
event may see the Presence Detect State bit cleared during a concurrent
Secondary Bus Reset.

In 2018, commit

  5b3f7b7d06 ("PCI: pciehp: Avoid slot access during reset")

retrofitted the missing locking.  It introduced a reset_lock which
serializes a Secondary Bus Reset with other parts of pciehp.

Unfortunately the locking turns out to be overzealous:  reset_lock is
held for the entire enumeration and de-enumeration of hotplugged devices,
including driver binding and unbinding.

Driver binding and unbinding acquires device_lock while the reset_lock
of the ancestral hotplug port is held.  A concurrent Secondary Bus Reset
acquires the ancestral reset_lock while already holding the device_lock.
The asymmetric locking order in the two code paths can lead to AB-BA
deadlocks.

Michael Haeuptle reports such deadlocks on simultaneous hot-removal and
vfio release (the latter implies a Secondary Bus Reset):

  pciehp_ist()                                    # down_read(reset_lock)
    pciehp_handle_presence_or_link_change()
      pciehp_disable_slot()
        __pciehp_disable_slot()
          remove_board()
            pciehp_unconfigure_device()
              pci_stop_and_remove_bus_device()
                pci_stop_bus_device()
                  pci_stop_dev()
                    device_release_driver()
                      device_release_driver_internal()
                        __device_driver_lock()    # device_lock()

  SYS_munmap()
    vfio_device_fops_release()
      vfio_device_group_close()
        vfio_device_close()
          vfio_device_last_close()
            vfio_pci_core_close_device()
              vfio_pci_core_disable()             # device_lock()
                __pci_reset_function_locked()
                  pci_reset_bus_function()
                    pci_dev_reset_slot_function()
                      pci_reset_hotplug_slot()
                        pciehp_reset_slot()       # down_write(reset_lock)

Ian May reports the same deadlock on simultaneous hot-removal and an
AER-induced Secondary Bus Reset:

  aer_recover_work_func()
    pcie_do_recovery()
      aer_root_reset()
        pci_bus_error_reset()
          pci_slot_reset()
            pci_slot_lock()                       # device_lock()
            pci_reset_hotplug_slot()
              pciehp_reset_slot()                 # down_write(reset_lock)

Fix by releasing the reset_lock during driver binding and unbinding,
thereby splitting and shrinking the critical section.

Driver binding and unbinding is protected by the device_lock() and thus
serialized with a Secondary Bus Reset.  There's no need to additionally
protect it with the reset_lock.  However, pciehp does not bind and
unbind devices directly, but rather invokes PCI core functions which
also perform certain enumeration and de-enumeration steps.

The reset_lock's purpose is to protect slot registers, not enumeration
and de-enumeration of hotplugged devices.  That would arguably be the
job of the PCI core, not the PCIe hotplug driver.  After all, an
AER-induced Secondary Bus Reset may as well happen during boot-time
enumeration of the PCI hierarchy and there's no locking to prevent that
either.

Exempting *de-enumeration* from the reset_lock is relatively harmless:
A concurrent Secondary Bus Reset may foil config space accesses such as
PME interrupt disablement.  But if the device is physically gone, those
accesses are pointless anyway.  If the device is physically present and
only logically removed through an Attention Button press or the sysfs
"power" attribute, PME interrupts as well as DMA cannot come through
because pciehp_unconfigure_device() disables INTx and Bus Master bits.
That's still protected by the reset_lock in the present commit.

Exempting *enumeration* from the reset_lock also has limited impact:
The exempted call to pci_bus_add_device() may perform device accesses
through pcibios_bus_add_device() and pci_fixup_device() which are now
no longer protected from a concurrent Secondary Bus Reset.  Otherwise
there should be no impact.

In essence, the present commit seeks to fix the AB-BA deadlocks while
still retaining a best-effort reset protection for enumeration and
de-enumeration of hotplugged devices -- until a general solution is
implemented in the PCI core.

Link: https://lore.kernel.org/linux-pci/CS1PR8401MB0728FC6FDAB8A35C22BD90EC95F10@CS1PR8401MB0728.NAMPRD84.PROD.OUTLOOK.COM
Link: https://lore.kernel.org/linux-pci/20200615143250.438252-1-ian.may@canonical.com
Link: https://lore.kernel.org/linux-pci/ce878dab-c0c4-5bd0-a725-9805a075682d@amd.com
Link: https://lore.kernel.org/linux-pci/ed831249-384a-6d35-0831-70af191e9bce@huawei.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215590
Fixes: 5b3f7b7d06 ("PCI: pciehp: Avoid slot access during reset")
Link: https://lore.kernel.org/r/fef2b2e9edf245c049a8c5b94743c0f74ff5008a.1681191902.git.lukas@wunner.de
Reported-by: Michael Haeuptle <michael.haeuptle@hpe.com>
Reported-by: Ian May <ian.may@canonical.com>
Reported-by: Andrey Grodzovsky <andrey2805@gmail.com>
Reported-by: Rahul Kumar <rahul.kumar1@amd.com>
Reported-by: Jialin Zhang <zhangjialin11@huawei.com>
Tested-by: Anatoli Antonovitch <Anatoli.Antonovitch@amd.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v4.19+
Cc: Dan Stein <dstein@hpe.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Alex Michon <amichon@kalrayinc.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
2023-04-11 13:18:30 -05:00

137 lines
3.7 KiB
C

// SPDX-License-Identifier: GPL-2.0+
/*
* PCI Express Hot Plug Controller Driver
*
* Copyright (C) 1995,2001 Compaq Computer Corporation
* Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com)
* Copyright (C) 2001 IBM Corp.
* Copyright (C) 2003-2004 Intel Corporation
*
* All rights reserved.
*
* Send feedback to <greg@kroah.com>, <kristen.c.accardi@intel.com>
*
*/
#define dev_fmt(fmt) "pciehp: " fmt
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/pci.h>
#include "../pci.h"
#include "pciehp.h"
/**
* pciehp_configure_device() - enumerate PCI devices below a hotplug bridge
* @ctrl: PCIe hotplug controller
*
* Enumerate PCI devices below a hotplug bridge and add them to the system.
* Return 0 on success, %-EEXIST if the devices are already enumerated or
* %-ENODEV if enumeration failed.
*/
int pciehp_configure_device(struct controller *ctrl)
{
struct pci_dev *dev;
struct pci_dev *bridge = ctrl->pcie->port;
struct pci_bus *parent = bridge->subordinate;
int num, ret = 0;
pci_lock_rescan_remove();
dev = pci_get_slot(parent, PCI_DEVFN(0, 0));
if (dev) {
/*
* The device is already there. Either configured by the
* boot firmware or a previous hotplug event.
*/
ctrl_dbg(ctrl, "Device %s already exists at %04x:%02x:00, skipping hot-add\n",
pci_name(dev), pci_domain_nr(parent), parent->number);
pci_dev_put(dev);
ret = -EEXIST;
goto out;
}
num = pci_scan_slot(parent, PCI_DEVFN(0, 0));
if (num == 0) {
ctrl_err(ctrl, "No new device found\n");
ret = -ENODEV;
goto out;
}
for_each_pci_bridge(dev, parent)
pci_hp_add_bridge(dev);
pci_assign_unassigned_bridge_resources(bridge);
pcie_bus_configure_settings(parent);
/*
* Release reset_lock during driver binding
* to avoid AB-BA deadlock with device_lock.
*/
up_read(&ctrl->reset_lock);
pci_bus_add_devices(parent);
down_read_nested(&ctrl->reset_lock, ctrl->depth);
out:
pci_unlock_rescan_remove();
return ret;
}
/**
* pciehp_unconfigure_device() - remove PCI devices below a hotplug bridge
* @ctrl: PCIe hotplug controller
* @presence: whether the card is still present in the slot;
* true for safe removal via sysfs or an Attention Button press,
* false for surprise removal
*
* Unbind PCI devices below a hotplug bridge from their drivers and remove
* them from the system. Safely removed devices are quiesced. Surprise
* removed devices are marked as such to prevent further accesses.
*/
void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
{
struct pci_dev *dev, *temp;
struct pci_bus *parent = ctrl->pcie->port->subordinate;
u16 command;
ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n",
__func__, pci_domain_nr(parent), parent->number);
if (!presence)
pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
pci_lock_rescan_remove();
/*
* Stopping an SR-IOV PF device removes all the associated VFs,
* which will update the bus->devices list and confuse the
* iterator. Therefore, iterate in reverse so we remove the VFs
* first, then the PF. We do the same in pci_stop_bus_device().
*/
list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
bus_list) {
pci_dev_get(dev);
/*
* Release reset_lock during driver unbinding
* to avoid AB-BA deadlock with device_lock.
*/
up_read(&ctrl->reset_lock);
pci_stop_and_remove_bus_device(dev);
down_read_nested(&ctrl->reset_lock, ctrl->depth);
/*
* Ensure that no new Requests will be generated from
* the device.
*/
if (presence) {
pci_read_config_word(dev, PCI_COMMAND, &command);
command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR);
command |= PCI_COMMAND_INTX_DISABLE;
pci_write_config_word(dev, PCI_COMMAND, command);
}
pci_dev_put(dev);
}
pci_unlock_rescan_remove();
}