linux-stable/arch
Oliver O'Halloran 271b18405e powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE number
[ Upstream commit 3b5b9997b3 ]

On pseries there is a bug with adding hotplugged devices to an IOMMU
group. For a number of dumb reasons fixing that bug first requires
re-working how VFs are configured on PowerNV. For background, on
PowerNV we use the pcibios_sriov_enable() hook to do two things:

  1. Create a pci_dn structure for each of the VFs, and
  2. Configure the PHB's internal BARs so the MMIO range for each VF
     maps to a unique PE.

Roughly speaking a PE is the hardware counterpart to a Linux IOMMU
group since all the devices in a PE share the same IOMMU table. A PE
also defines the set of devices that should be isolated in response to
a PCI error (i.e. bad DMA, UR/CA, AER events, etc). When isolated all
MMIO and DMA traffic to and from devicein the PE is blocked by the
root complex until the PE is recovered by the OS.

The requirement to block MMIO causes a giant headache because the P8
PHB generally uses a fixed mapping between MMIO addresses and PEs. As
a result we need to delay configuring the IOMMU groups for device
until after MMIO resources are assigned. For physical devices (i.e.
non-VFs) the PE assignment is done in pcibios_setup_bridge() which is
called immediately after the MMIO resources for downstream
devices (and the bridge's windows) are assigned. For VFs the setup is
more complicated because:

  a) pcibios_setup_bridge() is not called again when VFs are activated, and
  b) The pci_dev for VFs are created by generic code which runs after
     pcibios_sriov_enable() is called.

The work around for this is a two step process:

  1. A fixup in pcibios_add_device() is used to initialised the cached
     pe_number in pci_dn, then
  2. A bus notifier then adds the device to the IOMMU group for the PE
     specified in pci_dn->pe_number.

A side effect fixing the pseries bug mentioned in the first paragraph
is moving the fixup out of pcibios_add_device() and into
pcibios_bus_add_device(), which is called much later. This results in
step 2. failing because pci_dn->pe_number won't be initialised when
the bus notifier is run.

We can fix this by removing the need for the fixup. The PE for a VF is
known before the VF is even scanned so we can initialise
pci_dn->pe_number pcibios_sriov_enable() instead. Unfortunately,
moving the initialisation causes two problems:

  1. We trip the WARN_ON() in the current fixup code, and
  2. The EEH core clears pdn->pe_number when recovering a VF and
     relies on the fixup to correctly re-set it.

The only justification for either of these is a comment in
eeh_rmv_device() suggesting that pdn->pe_number *must* be set to
IODA_INVALID_PE in order for the VF to be scanned. However, this
comment appears to have no basis in reality. Both bugs can be fixed by
just deleting the code.

Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191028085424.12006-1-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:36:26 +01:00
..
alpha
arc ARC: [plat-axs10x]: Add missing multicast filter number to GMAC node 2020-02-14 16:34:12 -05:00
arm ARM: npcm: Bring back GPIOLIB support 2020-02-19 19:53:01 +01:00
arm64 arm64: dts: fast models: Fix FVP PCI interrupt-map property 2020-02-19 19:53:08 +01:00
c6x
csky
h8300
hexagon hexagon: work around compiler crash 2020-01-17 19:49:07 +01:00
ia64 mm/memory_hotplug: shrink zones when offlining memory 2020-01-09 10:19:56 +01:00
m68k
microblaze
mips MIPS: boot: fix typo in 'vmlinux.lzma.its' target 2020-02-11 04:35:17 -08:00
nds32 asm-generic/nds32: don't redefine cacheflush primitives 2020-01-17 19:48:43 +01:00
nios2
openrisc
parisc parisc: Use proper printk format for resource_size_t 2020-02-05 21:22:46 +00:00
powerpc powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE number 2020-02-24 08:36:26 +01:00
riscv riscv, bpf: Fix broken BPF tail calls 2020-02-11 04:35:28 -08:00
s390 s390/time: Fix clk type in get_tod_clock 2020-02-19 19:53:07 +01:00
sh mm/memory_hotplug: shrink zones when offlining memory 2020-01-09 10:19:56 +01:00
sparc mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush 2020-02-11 04:35:42 -08:00
um Revert "um: Enable CONFIG_CONSTRUCTORS" 2020-02-01 09:34:53 +00:00
unicore32
x86 KVM: x86/mmu: Fix struct guest_walker arrays for 5-level paging 2020-02-19 19:53:09 +01:00
xtensa xtensa: Implement copy_thread_tls 2020-01-14 20:08:35 +01:00
.gitignore
Kconfig mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush 2020-02-11 04:35:42 -08:00