linux-stable/drivers/pci/pcie
Kuppuswamy Sathyanarayanan eda8eba0de PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits
[ Upstream commit 203926da2b ]

When a Root Port or Root Complex Event Collector receives an error Message
e.g., ERR_COR, it sets PCI_ERR_ROOT_COR_RCV in the Root Error Status
register and logs the Requester ID in the Error Source Identification
register.  If it receives a second ERR_COR Message before software clears
PCI_ERR_ROOT_COR_RCV, hardware sets PCI_ERR_ROOT_MULTI_COR_RCV and the
Requester ID is lost.

In the following scenario, PCI_ERR_ROOT_MULTI_COR_RCV was never cleared:

  - hardware receives ERR_COR message
  - hardware sets PCI_ERR_ROOT_COR_RCV
  - aer_irq() entered
  - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
  - aer_irq(): now status == PCI_ERR_ROOT_COR_RCV
  - hardware receives second ERR_COR message
  - hardware sets PCI_ERR_ROOT_MULTI_COR_RCV
  - aer_irq(): pci_write_config_dword(PCI_ERR_ROOT_STATUS, status)
  - PCI_ERR_ROOT_COR_RCV is cleared; PCI_ERR_ROOT_MULTI_COR_RCV is set
  - aer_irq() entered again
  - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
  - aer_irq(): now status == PCI_ERR_ROOT_MULTI_COR_RCV
  - aer_irq() exits because PCI_ERR_ROOT_COR_RCV not set
  - PCI_ERR_ROOT_MULTI_COR_RCV is still set

The same problem occurred with ERR_NONFATAL/ERR_FATAL Messages and
PCI_ERR_ROOT_UNCOR_RCV and PCI_ERR_ROOT_MULTI_UNCOR_RCV.

Fix the problem by queueing an AER event and clearing the Root Error Status
bits when any of these bits are set:

  PCI_ERR_ROOT_COR_RCV
  PCI_ERR_ROOT_UNCOR_RCV
  PCI_ERR_ROOT_MULTI_COR_RCV
  PCI_ERR_ROOT_MULTI_UNCOR_RCV

See the bugzilla link for details from Eric about how to reproduce this
problem.

[bhelgaas: commit log, move repro details to bugzilla]
Fixes: e167bfcaa4 ("PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215992
Link: https://lore.kernel.org/r/20220418150237.1021519-1-sathyanarayanan.kuppuswamy@linux.intel.com
Reported-by: Eric Badger <ebadger@purestorage.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09 10:30:31 +02:00
..
aer.c PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits 2022-06-09 10:30:31 +02:00
aer_inject.c PCI/AER: Update aer-inject URL 2022-03-02 11:26:17 -06:00
aspm.c Merge branch 'pci/enumeration' 2022-01-13 09:57:43 -06:00
dpc.c PCI/DPC: Use PCI_POSSIBLE_ERROR() to check config reads 2021-11-18 14:13:18 -06:00
edr.c PCI/EDR: Log only ACPI_NOTIFY_DISCONNECT_RECOVER events 2020-04-24 18:33:29 -05:00
err.c Revert "PCI: Use to_pci_driver() instead of pci_dev->driver" 2021-11-11 13:36:22 -06:00
Kconfig PCI/AER: Update aer-inject URL 2022-03-02 11:26:17 -06:00
Makefile PCI/ERR: Reduce compile time for CONFIG_PCIEAER=n 2021-10-16 09:16:59 -05:00
pme.c PCI/PME: Use PCI_POSSIBLE_ERROR() to check config reads 2021-11-18 14:13:18 -06:00
portdrv.h PCI/portdrv: Remove unused pcie_port_bus_{,un}register() declarations 2021-10-15 14:25:18 -05:00
portdrv_core.c Revert "PCI/portdrv: Do not setup up IRQs if there are no users" 2022-02-11 14:16:11 -06:00
portdrv_pci.c PCI: Add defines for normal and subtractive PCI bridges 2022-02-17 15:29:35 -06:00
ptm.c pci-v5.15-changes 2021-09-07 19:13:42 -07:00
rcec.c PCI/RCEC: Fix RCiEP device to RCEC association 2021-03-10 15:10:46 -06:00