linux-stable/drivers/vfio/pci
Anthony DeRossi e806e22362 vfio/pci: Check the device set open count on reset
vfio_pci_dev_set_needs_reset() inspects the open_count of every device
in the set to determine whether a reset is allowed. The current device
always has open_count == 1 within vfio_pci_core_disable(), effectively
disabling the reset logic. This field is also documented as private in
vfio_device, so it should not be used to determine whether other devices
in the set are open.

Checking for vfio_device_set_open_count() > 1 on the device set fixes
both issues.

After commit 2cd8b14aaa ("vfio/pci: Move to the device set
infrastructure"), failure to create a new file for a device would cause
the reset to be skipped due to open_count being decremented after
calling close_device() in the error path.

After commit eadd86f835 ("vfio: Remove calls to
vfio_group_add_container_user()"), releasing a device would always skip
the reset due to an ordering change in vfio_device_fops_release().

Failing to reset the device leaves it in an unknown state, potentially
causing errors when it is accessed later or bound to a different driver.

This issue was observed with a Radeon RX Vega 56 [1002:687f] (rev c3)
assigned to a Windows guest. After shutting down the guest, unbinding
the device from vfio-pci, and binding the device to amdgpu:

[  548.007102] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
[  548.027174] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[  548.027242] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[  548.027306] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_init failed
[  548.027308] amdgpu 0000:0a:00.0: amdgpu: Fatal error during GPU init

Fixes: 2cd8b14aaa ("vfio/pci: Move to the device set infrastructure")
Fixes: eadd86f835 ("vfio: Remove calls to vfio_group_add_container_user()")
Signed-off-by: Anthony DeRossi <ajderossi@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221110014027.28780-4-ajderossi@gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-10 12:03:36 -07:00
..
hisilicon hisi_acc_vfio_pci: Update some log and comment formats 2022-09-27 09:30:31 -06:00
mlx5 vfio/mlx5: Use the new device life cycle helpers 2022-09-21 14:15:10 -06:00
Kconfig vfio/pci: introduce CONFIG_VFIO_PCI_ZDEV_KVM 2022-07-11 09:54:25 +02:00
Makefile vfio/pci: introduce CONFIG_VFIO_PCI_ZDEV_KVM 2022-07-11 09:54:25 +02:00
trace.h vfio/pci: Cleanup license mess 2019-01-22 11:06:05 -07:00
vfio_pci.c vfio/pci: Use the new device life cycle helpers 2022-09-21 14:15:10 -06:00
vfio_pci_config.c vfio/pci: Simplify the is_intx/msi/msix/etc defines 2022-09-01 15:29:11 -06:00
vfio_pci_core.c vfio/pci: Check the device set open count on reset 2022-11-10 12:03:36 -07:00
vfio_pci_igd.c vfio/pci: Rename vfio_pci_register_dev_region() 2022-09-01 15:29:11 -06:00
vfio_pci_intrs.c vfio/pci: Mask INTx during runtime suspend 2022-09-01 15:29:11 -06:00
vfio_pci_priv.h vfio/pci: Mask INTx during runtime suspend 2022-09-01 15:29:11 -06:00
vfio_pci_rdwr.c vfio-pci: Fix vfio_pci_ioeventfd() to return int 2022-09-01 15:29:11 -06:00
vfio_pci_zdev.c Merge remote-tracking branch 'mlx5/mlx5-vfio' into v6.1/vfio/next 2022-09-08 10:44:34 -06:00