Commit graph

8 commits

Author SHA1 Message Date
Keith Busch
bdb5ac8577 PCI/ERR: Handle fatal error recovery
We don't need to be paranoid about the topology changing while handling an
error.  If the device has changed in a hotplug capable slot, we can rely on
the presence detection handling to react to a changing topology.

Restore the fatal error handling behavior that existed before merging DPC
with AER with 7e9084b367 ("PCI/AER: Handle ERR_FATAL with removal and
re-enumeration of devices").

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
2018-09-26 14:23:14 -05:00
Oza Pawandeep
7e9084b367 PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices
PCIe ERR_FATAL errors mean the Link is unreliable.  Components on the Link
may need to be reset to return to reliable operation (PCIe r4.0, sec
6.2.2).  We previously handled these errors much differently depending on
whether the platform supports Downstream Port Containment (DPC) (PCIe r4.0,
sec 6.2.10) or not.

The AER driver has historically logged the error details, called
driver-supplied pci_error_handlers callbacks, and reset the Link.  This
reset downstream devices, but did not remove them from the PCI subsystem,
re-enumerate them, or call their driver .remove() or .probe() methods.

DPC is different because the hardware automatically disables the Link when
it detects ERR_FATAL, which resets downstream devices.  There's no
opportunity for pci_error_handlers callbacks before resetting the Link.
The DPC driver removes affected devices (which calls their driver .remove()
methods), brings the Link back up, and re-enumerates (which calls driver
.probe() methods).

Align AER ERR_FATAL handling with DPC by resetting the Link in software,
skipping the driver pci_error_handlers callbacks, removing the devices from
the PCI subsystem, and re-enumerating.  The idea is that drivers and
devices should see the same behavior for ERR_FATAL events, regardless of
whether they're handled by AER or DPC.

Here are the basic ERR_FATAL recovery steps, showing the previous AER
behavior, the AER behavior after this patch, and the DPC behavior:

                          AER        AER      DPC
                          previous   new      behavior
                          --------   ---      --------
  Log error               yes        yes      yes (minimal)
  drv.error_detected()    yes        no       no
  Reset Link              yes        yes      yes
  drv.mmio_enabled()      yes        no       no
  drv.slot_reset()        yes        no       no
  drv.resume()            yes        no       no
  Remove PCI devices      no         yes      yes
    (calls drv.remove())
  Re-enumerate            no         yes      yes
    (calls drv.probe())

N.B. With DPC, the Link reset happens before the driver .remove() calls,
while with AER, the reset happens *after* the .remove() calls.  The goal is
to eventually do the reset before .remove() for AER as well.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, squash doc patch into this, remove unused
"result_data"]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
2018-05-17 16:44:13 -05:00
Cao jin
97e4e959c9 pci-error-recovery: doc cleanup
Include whitespace shooting; correction; typo fix; superfluous word
dropping.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-03-29 15:51:32 -06:00
Michael S. Tsirkin
2fd260f03b PCI/AER: Remove unused .link_reset() callback
No hardware seems to actually call .link_reset(), and no driver implements
it as more than a nop stub.

Drop mentions of the callback from everywhere.  It's dropped from the
documentation as well, but the doc really needs to be updated to reflect
reality better (e.g., on PCIe, slot reset is the link reset).  This will be
done in a later patch.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-02-09 16:41:58 -06:00
Masanari Iida
654d2e7cd1 doc:pci: Fix typo in Documentation/PCI
This patch fix spelling typo in Documentation/PCI.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-03-20 07:41:56 -06:00
Francis Galiegue
a33f32244d Documentation/: it's -> its where appropriate
Fix obvious cases of "it's" being used when "its" was meant.

Signed-off-by: Francis Galiegue <fgaliegue@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-04-23 02:09:52 +02:00
Mike Mason
fe14acd4e7 PCI: document PCIe fundamental reset interfaces
The attached patch updates the Documentation/PCI/pci-error-recovery.txt
file with changes related to this new bit field, as well a few unrelated
updates.

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Richard Lary <rlary@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:38 -07:00
Randy Dunlap
4b5ff46923 PCI: doc/pci: create Documentation/PCI/ and move files into it
Create Documentation/PCI/ and move PCI-related files to it.
Fix a few instances of trailing whitespace.
Update references to the new file locations.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20 21:46:51 -07:00
Renamed from Documentation/pci-error-recovery.txt (Browse further)