Commit graph

913 commits

Author SHA1 Message Date
Christoph Hellwig
c04ac34078 vfio: remove the iommudata hack for noiommu groups
Just pass a noiommu argument to vfio_create_group and set up the
->noiommu flag directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-7-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30 12:46:44 -06:00
Christoph Hellwig
3af9177132 vfio: refactor noiommu group creation
Split the actual noiommu group creation from vfio_iommu_group_get into a
new helper, and open code the rest of vfio_iommu_group_get in its only
caller.  This creates an entirely separate and clear code path for the
noiommu group creation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-6-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30 12:46:44 -06:00
Christoph Hellwig
1362591f15 vfio: factor out a vfio_group_find_or_alloc helper
Factor out a helper to find or allocate the vfio_group to reduce the
spagetthi code in vfio_register_group_dev a little.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-5-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30 12:46:43 -06:00
Christoph Hellwig
c5b4ba9730 vfio: remove the iommudata check in vfio_noiommu_attach_group
vfio_noiommu_attach_group has two callers:

 1) __vfio_container_attach_groups is called by vfio_ioctl_set_iommu,
    which just called vfio_iommu_driver_allowed
 2) vfio_group_set_container requires already checks ->noiommu on the
    vfio_group, which is propagated from the iommudata in
    vfio_create_group

so this check is entirely superflous and can be removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-4-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30 12:46:43 -06:00
Christoph Hellwig
b00621603d vfio: factor out a vfio_iommu_driver_allowed helper
Factor out a little helper to make the checks for the noiommu driver less
ugly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-3-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30 12:46:43 -06:00
Jason Gunthorpe
38a68934aa vfio: Move vfio_iommu_group_get() to vfio_register_group_dev()
We don't need to hold a reference to the group in the driver as well as
obtain a reference to the same group as the first thing
vfio_register_group_dev() does.

Since the drivers never use the group move this all into the core code.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-2-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30 12:46:43 -06:00
Diana Craciun
8798a803dd vfio/fsl-mc: Add per device reset support
Currently when a fsl-mc device is reset, the entire DPRC container
is reset which is very inefficient because the devices within a
container will be reset multiple times.
Add support for individually resetting a device.

Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Link: https://lore.kernel.org/r/20210922110530.24736-2-diana.craciun@oss.nxp.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-28 16:56:05 -06:00
Colin Ian King
8bd8d1dff9 vfio/pci: add missing identifier name in argument of function prototype
The function prototype is missing an identifier name. Add one.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20210902212631.54260-1-colin.king@canonical.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-23 14:12:36 -06:00
Linus Torvalds
89b6b8cd92 VFIO update for v5.15-rc1
- Fix dma-valid return WAITED implementation (Anthony Yznaga)
 
  - SPDX license cleanups (Cai Huoqing)
 
  - Split vfio-pci-core from vfio-pci and enhance PCI driver matching
    to support future vendor provided vfio-pci variants (Yishai Hadas,
    Max Gurtovoy, Jason Gunthorpe)
 
  - Replace duplicated reflck with core support for managing first
    open, last close, and device sets (Jason Gunthorpe, Max Gurtovoy,
    Yishai Hadas)
 
  - Fix non-modular mdev support and don't nag about request callback
    support (Christoph Hellwig)
 
  - Add semaphore to protect instruction intercept handler and replace
    open-coded locks in vfio-ap driver (Tony Krowiak)
 
  - Convert vfio-ap to vfio_register_group_dev() API (Jason Gunthorpe)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmEvwWkbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsi+1UP/3CRizghroINVYR+cJ99
 Tjz7lB/wlzxmRfX+SL4NAVe1SSB2VeCgU4B0PF6kywELLS8OhCO3HXYXVsz244fW
 Gk5UIns86+TFTrfCOMpwYBV0P86zuaa1ZnvCnkhMK1i2pTZ+oX8hUH1Yj5clHuU+
 YgC7JfEuTIAX73q2bC/llLvNE9ke1QCoDX3+HAH87ttqutnRWcnnq56PTEqwe+EW
 eMA+glB1UG6JAqXxoJET4155arNOny1/ZMprfBr3YXZTiXDF/lSzuMyUtbp526Sf
 hsvlnqtE6TCdfKbog0Lxckl+8E9NCq8jzFBKiZhbccrQv3vVaoP6dOsPWcT35Kp1
 IjzMLiHIbl4wXOL+Xap/biz3LCM5BMdT/OhW5LUC007zggK71ndRvb9F8ptW83Bv
 0Uh9DNv7YIQ0su3JHZEsJ3qPFXQXceP199UiADOGSeV8U1Qig3YKsHUDMuALfFvN
 t+NleeJ4qCWao+W4VCfyDfKurVnMj/cThXiDEWEeq5gMOO+6YKBIFWJVKFxUYDbf
 MgGdg0nQTUECuXKXxLD4c1HAWH9xi207OnLvhW1Icywp20MsYqOWt0vhg+PRdMBT
 DK6STxP18aQxCaOuQN9Vf81LjhXNTeg+xt3mMyViOZPcKfX6/wAC9qLt4MucJDdw
 FBfOz2UL2F56dhAYT+1vHoUM
 =nzK7
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.15-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Fix dma-valid return WAITED implementation (Anthony Yznaga)

 - SPDX license cleanups (Cai Huoqing)

 - Split vfio-pci-core from vfio-pci and enhance PCI driver matching to
   support future vendor provided vfio-pci variants (Yishai Hadas, Max
   Gurtovoy, Jason Gunthorpe)

 - Replace duplicated reflck with core support for managing first open,
   last close, and device sets (Jason Gunthorpe, Max Gurtovoy, Yishai
   Hadas)

 - Fix non-modular mdev support and don't nag about request callback
   support (Christoph Hellwig)

 - Add semaphore to protect instruction intercept handler and replace
   open-coded locks in vfio-ap driver (Tony Krowiak)

 - Convert vfio-ap to vfio_register_group_dev() API (Jason Gunthorpe)

* tag 'vfio-v5.15-rc1' of git://github.com/awilliam/linux-vfio: (37 commits)
  vfio/pci: Introduce vfio_pci_core.ko
  vfio: Use kconfig if XX/endif blocks instead of repeating 'depends on'
  vfio: Use select for eventfd
  PCI / VFIO: Add 'override_only' support for VFIO PCI sub system
  PCI: Add 'override_only' field to struct pci_device_id
  vfio/pci: Move module parameters to vfio_pci.c
  vfio/pci: Move igd initialization to vfio_pci.c
  vfio/pci: Split the pci_driver code out of vfio_pci_core.c
  vfio/pci: Include vfio header in vfio_pci_core.h
  vfio/pci: Rename ops functions to fit core namings
  vfio/pci: Rename vfio_pci_device to vfio_pci_core_device
  vfio/pci: Rename vfio_pci_private.h to vfio_pci_core.h
  vfio/pci: Rename vfio_pci.c to vfio_pci_core.c
  vfio/ap_ops: Convert to use vfio_register_group_dev()
  s390/vfio-ap: replace open coded locks for VFIO_GROUP_NOTIFY_SET_KVM notification
  s390/vfio-ap: r/w lock for PQAP interception handler function pointer
  vfio/type1: Fix vfio_find_dma_valid return
  vfio-pci/zdev: Remove repeated verbose license text
  vfio: platform: reset: Convert to SPDX identifier
  vfio: Remove struct vfio_device_ops open/release
  ...
2021-09-02 13:41:33 -07:00
Alex Williamson
ea870730d8 Merge branches 'v5.15/vfio/spdx-license-cleanups', 'v5.15/vfio/dma-valid-waited-v3', 'v5.15/vfio/vfio-pci-core-v5' and 'v5.15/vfio/vfio-ap' into v5.15/vfio/next 2021-08-26 11:08:50 -06:00
Max Gurtovoy
7fa005caa3 vfio/pci: Introduce vfio_pci_core.ko
Now that vfio_pci has been split into two source modules, one focusing on
the "struct pci_driver" (vfio_pci.c) and a toolbox library of code
(vfio_pci_core.c), complete the split and move them into two different
kernel modules.

As before vfio_pci.ko continues to present the same interface under sysfs
and this change will have no functional impact.

Splitting into another module and adding exports allows creating new HW
specific VFIO PCI drivers that can implement device specific
functionality, such as VFIO migration interfaces or specialized device
requirements.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-14-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Jason Gunthorpe
85c94dcffc vfio: Use kconfig if XX/endif blocks instead of repeating 'depends on'
This results in less kconfig wordage and a simpler understanding of the
required "depends on" to create the menu structure.

The next patch increases the nesting level a lot so this is a nice
preparatory simplification.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-13-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Jason Gunthorpe
ca4ddaac7f vfio: Use select for eventfd
If VFIO_VIRQFD is required then turn on eventfd automatically.
The majority of kconfig users of the EVENTFD use select not depends on.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-12-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Max Gurtovoy
cc6711b0bf PCI / VFIO: Add 'override_only' support for VFIO PCI sub system
Expose an 'override_only' helper macro (i.e.
PCI_DRIVER_OVERRIDE_DEVICE_VFIO) for VFIO PCI sub system and add the
required code to prefix its matching entries with "vfio_" in
modules.alias file.

It allows VFIO device drivers to include match entries in the
modules.alias file produced by kbuild that are not used for normal
driver autoprobing and module autoloading. Drivers using these match
entries can be connected to the PCI device manually, by userspace, using
the existing driver_override sysfs.

For example the resulting modules.alias may have:

  alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core
  alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci
  alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci

In this example mlx5_core and mlx5_vfio_pci match to the same PCI
device. The kernel will autoload and autobind to mlx5_core but the
kernel and udev mechanisms will ignore mlx5_vfio_pci.

When userspace wants to change a device to the VFIO subsystem it can
implement a generic algorithm:

   1) Identify the sysfs path to the device:
    /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0

   2) Get the modalias string from the kernel:
    $ cat /sys/bus/pci/devices/0000:01:00.0/modalias
    pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00

   3) Prefix it with vfio_:
    vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00

   4) Search modules.alias for the above string and select the entry that
      has the fewest *'s:
    alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci

   5) modprobe the matched module name:
    $ modprobe mlx5_vfio_pci

   6) cat the matched module name to driver_override:
    echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override

   7) unbind device from original module
     echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind

   8) probe PCI drivers (or explicitly bind to mlx5_vfio_pci)
    echo 0000:01:00.0 > /sys/bus/pci/drivers_probe

The algorithm is independent of bus type. In future the other buses with
VFIO device drivers, like platform and ACPI, can use this algorithm as
well.

This patch is the infrastructure to provide the information in the
modules.alias to userspace. Convert the only VFIO pci_driver which results
in one new line in the modules.alias:

  alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci

Later series introduce additional HW specific VFIO PCI drivers, such as
mlx5_vfio_pci.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>  # for pci.h
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-11-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Yishai Hadas
c61302aa48 vfio/pci: Move module parameters to vfio_pci.c
This is a preparation before splitting vfio_pci.ko to 2 modules.

As module parameters are a kind of uAPI they need to stay on vfio_pci.ko
to avoid a user visible impact.

For now continue to keep the implementation of these options in
vfio_pci_core.c. Arguably they are vfio_pci functionality, but further
splitting of vfio_pci_core.c will be better done in another series

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-9-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Max Gurtovoy
2fb89f56a6 vfio/pci: Move igd initialization to vfio_pci.c
igd is related to the vfio_pci pci_driver implementation, move it out of
vfio_pci_core.c.

This is preparation for splitting vfio_pci.ko into 2 drivers.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-8-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Max Gurtovoy
ff53edf6d6 vfio/pci: Split the pci_driver code out of vfio_pci_core.c
Split the vfio_pci driver into two logical parts, the 'struct
pci_driver' (vfio_pci.c) which implements "Generic VFIO support for any
PCI device" and a library of code (vfio_pci_core.c) that helps
implementing a struct vfio_device on top of a PCI device.

vfio_pci.ko continues to present the same interface under sysfs and this
change should have no functional impact.

Following patches will turn vfio_pci and vfio_pci_core into a separate
module.

This is a preparation for allowing another module to provide the
pci_driver and allow that module to customize how VFIO is setup, inject
its own operations, and easily extend vendor specific functionality.

At this point the vfio_pci_core still contains a lot of vfio_pci
functionality mixed into it. Following patches will move more of the
large scale items out, but another cleanup series will be needed to get
everything.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-7-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Max Gurtovoy
c39f8fa76c vfio/pci: Include vfio header in vfio_pci_core.h
The vfio_device structure is embedded into the vfio_pci_core_device
structure, so there is no reason for not including the header file in
the vfio_pci_core header as well.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-6-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Max Gurtovoy
bf9fdc9a74 vfio/pci: Rename ops functions to fit core namings
This is another preparation patch for separating the vfio_pci driver to
a subsystem driver and a generic pci driver. This patch doesn't change
any logic.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-5-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Max Gurtovoy
536475109c vfio/pci: Rename vfio_pci_device to vfio_pci_core_device
This is a preparation patch for separating the vfio_pci driver to a
subsystem driver and a generic pci driver. This patch doesn't change any
logic.

The new vfio_pci_core_device structure will be the main structure of the
core driver and later on vfio_pci_device structure will be the main
structure of the generic vfio_pci driver.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-4-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Max Gurtovoy
9a38993869 vfio/pci: Rename vfio_pci_private.h to vfio_pci_core.h
This is a preparation patch for separating the vfio_pci driver to a
subsystem driver and a generic pci driver. This patch doesn't change any
logic.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-3-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Max Gurtovoy
1cbd70fe37 vfio/pci: Rename vfio_pci.c to vfio_pci_core.c
This is a preparation patch for separating the vfio_pci driver to a
subsystem driver and a generic pci driver. This patch doesn't change any
logic.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20210826103912.128972-2-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 10:36:51 -06:00
Anthony Yznaga
ffc95d1b8e vfio/type1: Fix vfio_find_dma_valid return
vfio_find_dma_valid is defined to return WAITED on success if it was
necessary to wait.  However, the loop forgets the WAITED value returned
by vfio_wait() and returns 0 in a later iteration.  Fix it.

Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1629736550-2388-1-git-send-email-anthony.yznaga@oracle.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-24 12:09:50 -06:00
Cai Huoqing
29848a034a vfio-pci/zdev: Remove repeated verbose license text
The SPDX and verbose license text are redundant, however in this case
the verbose license indicates a GPL v2 only while SPDX specifies v2+.
Remove the verbose license and correct SPDX to the more restricted
version.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20210824003749.1039-1-caihuoqing@baidu.com
[aw: commit log]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-24 12:04:52 -06:00
Cai Huoqing
ab78130e6e vfio: platform: reset: Convert to SPDX identifier
use SPDX-License-Identifier instead of a verbose license text

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Link: https://lore.kernel.org/r/20210822043643.2040-1-caihuoqing@baidu.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-24 11:59:10 -06:00
Jason Gunthorpe
eb24c1007e vfio: Remove struct vfio_device_ops open/release
Nothing uses this anymore, delete it.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/14-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11 09:50:11 -06:00
Jason Gunthorpe
db44c17458 vfio/pci: Reorganize VFIO_DEVICE_PCI_HOT_RESET to use the device set
Like vfio_pci_dev_set_try_reset() this code wants to reset all of the
devices in the "reset group" which is the same membership as the device
set.

Instead of trying to reconstruct the device set from the PCI list go
directly from the device set's device list to execute the reset.

The same basic structure as vfio_pci_dev_set_try_reset() is used. The
'vfio_devices' struct is replaced with the device set linked list and we
simply sweep it multiple times under the lock.

This eliminates a memory allocation and get/put traffic and another
improperly locked test of pci_dev_driver().

Reviewed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/10-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11 09:50:11 -06:00
Jason Gunthorpe
a882c16a2b vfio/pci: Change vfio_pci_try_bus_reset() to use the dev_set
vfio_pci_try_bus_reset() is triggering a reset of the entire_dev set if
any device within it has accumulated a needs_reset. This reset can only be
done once all of the drivers operating the PCI devices to be reset are in
a known safe state.

Make this clearer by directly operating on the dev_set instead of the
vfio_pci_device. Rename the function to vfio_pci_dev_set_try_reset().

Use the device list inside the dev_set to check that all drivers are in a
safe state instead of working backwards from the pci_device.

The dev_set->lock directly prevents devices from joining/leaving the set,
or changing their state, which further implies the pci_device cannot
change drivers or that the vfio_device be freed, eliminating the need for
get/put's.

If a pci_device to be reset is not in the dev_set then the reset cannot be
used as we can't know what the state of that driver is. Directly measure
this by checking that every pci_device is in the dev_set - which
effectively proves that VFIO drivers are attached to everything.

Remove the odd interaction around vfio_pci_set_power_state() - have the
only caller avoid its redundant vfio_pci_set_power_state() instead of
avoiding it inside vfio_pci_dev_set_try_reset().

This restructuring corrects a call to pci_dev_driver() without holding the
device_lock() and removes a hard wiring to &vfio_pci_driver.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/9-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11 09:50:11 -06:00
Yishai Hadas
2cd8b14aaa vfio/pci: Move to the device set infrastructure
PCI wants to have the usual open/close_device() logic with the slight
twist that the open/close_device() must be done under a singelton lock
shared by all of the vfio_devices that are in the PCI "reset group".

The reset group, and thus the device set, is determined by what devices
pci_reset_bus() touches, which is either the entire bus or only the slot.

Rely on the core code to do everything reflck was doing and delete reflck
entirely.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/8-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11 09:50:11 -06:00
Jason Gunthorpe
ab7e5e34a9 vfio/platform: Use open_device() instead of open coding a refcnt scheme
Platform simply wants to run some code when the device is first
opened/last closed. Use the core framework and locking for this.  Aside
from removing a bit of code this narrows the locking scope from a global
lock.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/7-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11 09:50:11 -06:00
Jason Gunthorpe
da119f387e vfio/fsl: Move to the device set infrastructure
FSL uses the internal reflck to implement the open_device() functionality,
conversion to the core code is straightforward.

The decision on which set to be part of is trivially based on the
is_fsl_mc_bus_dprc() and we use a 'struct device *' pointer as the set_id.

The dev_set lock is protecting the interrupts setup. The FSL MC devices
are using MSIs and only the DPRC device is allocating the MSIs from the
MSI domain. The other devices just take interrupts from a pool. The lock
is protecting the access to this pool.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Tested-by: Diana Craciun OSS <diana.craciun@oss.nxp.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11 09:50:11 -06:00
Jason Gunthorpe
2fd585f4ed vfio: Provide better generic support for open/release vfio_device_ops
Currently the driver ops have an open/release pair that is called once
each time a device FD is opened or closed. Add an additional set of
open/close_device() ops which are called when the device FD is opened for
the first time and closed for the last time.

An analysis shows that all of the drivers require this semantic. Some are
open coding it as part of their reflck implementation, and some are just
buggy and miss it completely.

To retain the current semantics PCI and FSL depend on, introduce the idea
of a "device set" which is a grouping of vfio_device's that share the same
lock around opening.

The device set is established by providing a 'set_id' pointer. All
vfio_device's that provide the same pointer will be joined to the same
singleton memory and lock across the whole set. This effectively replaces
the oddly named reflck.

After conversion the set_id will be sourced from:
 - A struct device from a fsl_mc_device (fsl)
 - A struct pci_slot (pci)
 - A struct pci_bus (pci)
 - The struct vfio_device (everything)

The design ensures that the above pointers are live as long as the
vfio_device is registered, so they form reliable unique keys to group
vfio_devices into sets.

This implementation uses xarray instead of searching through the driver
core structures, which simplifies the somewhat tricky locking in this
area.

Following patches convert all the drivers.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11 09:50:10 -06:00
Max Gurtovoy
ae03c3771b vfio: Introduce a vfio_uninit_group_dev() API call
This pairs with vfio_init_group_dev() and allows undoing any state that is
stored in the vfio_device unrelated to registration. Add appropriately
placed calls to all the drivers.

The following patch will use this to add pre-registration state for the
device set.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11 09:50:10 -06:00
Dave Airlie
9efba20291 Bus: Make remove callback return void tag
Tag for other trees/branches to pull from in order to have a stable
 place to build off of if they want to add new busses for 5.15.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYPkvsg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yl8pwCff+mL/zOaU8M1KwIHrEvNH6LgF58AoKgBAeCb
 58Lenmw44ofP5itzU5x3
 =J5mO
 -----END PGP SIGNATURE-----

Merge tag 'bus_remove_return_void-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into drm-next

Bus: Make remove callback return void tag

Tag for other trees/branches to pull from in order to have a stable
place to build off of if they want to add new busses for 5.15.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

[airlied: fixed up merge conflict in drm]
From:   Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patchwork.freedesktop.org/patch/msgid/YPkwQwf0dUKnGA7L@kroah.com
2021-08-11 08:47:08 +10:00
Christoph Hellwig
3fb1712d85 vfio/mdev: don't warn if ->request is not set
Only a single driver actually sets the ->request method, so don't print
a scary warning if it isn't.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20210726143524.155779-3-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-02 15:23:43 -06:00
Christoph Hellwig
15a5896e61 vfio/mdev: turn mdev_init into a subsys_initcall
Without this setups with buіlt-in mdev and mdev-drivers fail to
register like this:

[1.903149] Driver 'intel_vgpu_mdev' was unable to register with bus_type 'mdev' because the bus was not initialized.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20210726143524.155779-2-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-02 15:23:43 -06:00
Yishai Hadas
e7500b3ede vfio/pci: Make vfio_pci_regops->rw() return ssize_t
The only implementation of this in IGD returns a -ERRNO which is
implicitly cast through a size_t and then casted again and returned as a
ssize_t in vfio_pci_rw().

Fix the vfio_pci_regops->rw() return type to be ssize_t so all is
consistent.

Fixes: 28541d41c9 ("vfio/pci: Add infrastructure for additional device specific regions")
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/0-v3-5db12d1bf576+c910-vfio_rw_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-02 15:23:42 -06:00
Jason Gunthorpe
26c22cfde5 vfio: Use config not menuconfig for VFIO_NOIOMMU
VFIO_NOIOMMU is supposed to be an element in the VFIO menu, not start
a new menu. Correct this copy-paste mistake.

Fixes: 03a76b60f8 ("vfio: Include No-IOMMU mode")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/0-v1-3f0b685c3679+478-vfio_menuconfig_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-02 15:23:42 -06:00
Dave Airlie
8da49a33dd drm-misc-next for v5.15-rc1:
UAPI Changes:
 - Remove sysfs stats for dma-buf attachments, as it causes a performance regression.
   Previous merge is not in a rc kernel yet, so no userspace regression possible.
 
 Cross-subsystem Changes:
 - Sanitize user input in kyro's viewport ioctl.
 - Use refcount_t in fb_info->count
 - Assorted fixes to dma-buf.
 - Extend x86 efifb handling to all archs.
 - Fix neofb divide by 0.
 - Document corpro,gm7123 bridge dt bindings.
 
 Core Changes:
 - Slightly rework drm master handling.
 - Cleanup vgaarb handling.
 - Assorted fixes.
 
 Driver Changes:
 - Add support for ws2401 panel.
 - Assorted fixes to stm, ast, bochs.
 - Demidlayer ingenic irq.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmD5TGAACgkQ/lWMcqZw
 E8PNgxAApjTYQSfjIBbOZnNraxW6w7/bPea35E9A47EdBQsNGnYftNsFjbrn/mCJ
 D+0eRLjCMlg4FF1SHdh9cPJ35py+ygbDeupogboLITfU99eGBth3fM2Xdg9LPcBh
 dbni/JLG9R7gIvSlqdJuweN21trfVrV/9FQEilG5DvQcl27Wx5g8VMRZke1EqGKX
 7Id09Uq50ky18vhDjQRCveYhRqJAxV+XozBatzHyxpDVzjLQvRhlAAYdvrSMHZ5R
 jreGzOfR8awc6Om+w7wx3Jn1oEGmXVZB/VqxEqGtMOr3lpARPucxrqfHsqpam3rv
 yIoEKPrkG+k6fsU7Tbg59jNqe/PbCUW3AlpyuBxf55EbnVGgjLDbq4sRRMkehPfA
 fhC31ujOXQQnAgaxyeQAaAJFKNFJzA8Cq5ZPfG+zztzuomHCiUVQBRowP65hJMzR
 +ZlEDnhUD3STLz39zuO1reZR1ZoPIvKbsokHAA+ZrIwUd6U3D3ia8V51pq+lL5aS
 TGDkyMN9jyZ+SO8Z7+2FnJAv9FAOPU/WCLU/fWW46jAvuezwMIwVcjfSqDU2XbZD
 e7KgHpHhx3BGxI8TThHKlY7mf6IL2Bm7X1Cv1pdZs/eEn3Udh2ax942uTQZu/YOO
 0AT1XchpvYCBNRw05bVI3OlJ+w3I8uV+h+11jHOKeY6cbwdHeKE=
 =BUya
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2021-07-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for v5.15-rc1:

UAPI Changes:
- Remove sysfs stats for dma-buf attachments, as it causes a performance regression.
  Previous merge is not in a rc kernel yet, so no userspace regression possible.

Cross-subsystem Changes:
- Sanitize user input in kyro's viewport ioctl.
- Use refcount_t in fb_info->count
- Assorted fixes to dma-buf.
- Extend x86 efifb handling to all archs.
- Fix neofb divide by 0.
- Document corpro,gm7123 bridge dt bindings.

Core Changes:
- Slightly rework drm master handling.
- Cleanup vgaarb handling.
- Assorted fixes.

Driver Changes:
- Add support for ws2401 panel.
- Assorted fixes to stm, ast, bochs.
- Demidlayer ingenic irq.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2d0d2fe8-01fc-e216-c3fd-38db9e69944e@linux.intel.com
2021-07-23 11:32:43 +10:00
Uwe Kleine-König
fc7a6209d5 bus: Make remove callback return void
The driver core ignores the return value of this callback because there
is only little it can do when a device disappears.

This is the final bit of a long lasting cleanup quest where several
buses were converted to also return void from their remove callback.
Additionally some resource leaks were fixed that were caused by drivers
returning an error code in the expectation that the driver won't go
away.

With struct bus_type::remove returning void it's prevented that newly
implemented buses return an ignored error code and so don't anticipate
wrong expectations for driver authors.

Reviewed-by: Tom Rix <trix@redhat.com> (For fpga)
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com> (For drivers/s390 and drivers/vfio)
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> (For ARM, Amba and related parts)
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Chen-Yu Tsai <wens@csie.org> (for sunxi-rsb)
Acked-by: Pali Rohár <pali@kernel.org>
Acked-by: Mauro Carvalho Chehab <mchehab@kernel.org> (for media)
Acked-by: Hans de Goede <hdegoede@redhat.com> (For drivers/platform)
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-By: Vinod Koul <vkoul@kernel.org>
Acked-by: Juergen Gross <jgross@suse.com> (For xen)
Acked-by: Lee Jones <lee.jones@linaro.org> (For mfd)
Acked-by: Johannes Thumshirn <jth@kernel.org> (For mcb)
Acked-by: Johan Hovold <johan@kernel.org>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> (For slimbus)
Acked-by: Kirti Wankhede <kwankhede@nvidia.com> (For vfio)
Acked-by: Maximilian Luz <luzmaximilian@gmail.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> (For ulpi and typec)
Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> (For ipack)
Acked-by: Geoff Levand <geoff@infradead.org> (For ps3)
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com> (For thunderbolt)
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> (For intel_th)
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net> (For pcmcia)
Acked-by: Rafael J. Wysocki <rafael@kernel.org> (For ACPI)
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> (rpmsg and apr)
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> (For intel-ish-hid)
Acked-by: Dan Williams <dan.j.williams@intel.com> (For CXL, DAX, and NVDIMM)
Acked-by: William Breathitt Gray <vilhelm.gray@gmail.com> (For isa)
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (For firewire)
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> (For hid)
Acked-by: Thorsten Scherer <t.scherer@eckelmann.de> (For siox)
Acked-by: Sven Van Asbroeck <TheSven73@gmail.com> (For anybuss)
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> (For MMC)
Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20210713193522.1770306-6-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-21 11:53:42 +02:00
Christoph Hellwig
bf44e8cecc vgaarb: don't pass a cookie to vga_client_register
The VGA arbitration is entirely based on pci_dev structures, so just pass
that back to the set_vga_decode callback.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210716061634.2446357-8-hch@lst.de
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2021-07-21 10:29:10 +02:00
Christoph Hellwig
f6b1772b25 vgaarb: remove the unused irq_set_state argument to vga_client_register
All callers pass NULL as the irq_set_state argument, so remove it and
the ->irq_set_state member in struct vga_device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210716061634.2446357-7-hch@lst.de
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2021-07-21 10:29:05 +02:00
Christoph Hellwig
b877947586 vgaarb: provide a vga_client_unregister wrapper
Add a trivial wrapper for the unregister case that sets all fields to
NULL.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210716061634.2446357-6-hch@lst.de
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2021-07-21 10:29:00 +02:00
Linus Torvalds
8e8d9442d1 VFIO update for v5.14-rc1
- Module reference fixes, structure renaming (Max Gurtovoy)
 
  - Export and use common pci_dev_trylock() (Luis Chamberlain)
 
  - Enable direct mdev device creation and probing by parent
    (Christoph Hellwig & Jason Gunthorpe)
 
  - Fix mdpy error path leak (Colin Ian King)
 
  - Fix mtty list entry leak (Jason Gunthorpe)
 
  - Enforce mtty device limit (Alex Williamson)
 
  - Resolve concurrent vfio-pci mmap faults (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmDfaAYbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsi80YQAI4g3K53zXufWuNEHkDf
 ufduNa4dRtd9kp/v9Wz62iuLm0V1bGfuEjIMXHRl20nl4FOeZPuCR+yzlFLBXKaF
 5zZWW/yNyH4Ebs1aBa2zoNbsVhRoIlQzTQOMEtWZ29D4prCldnzshgVPB7QrEjWU
 0t7kosly9kl6snUrTREiSsnVbCe7IqtGv2wz+HS6LlmiezKQH1RA7yvH5glpPktG
 KWk5owONhEl0szM0UUvR9F79pUvMIRIk5Ym6gHMjZjEVgZTVx5Jybs6rSF6s04PE
 541tOBl/WNzanJIMNo8rlarWoCMiKchLetSVk9L+WHFRFd4DlcyasdapjyjFvJH2
 qh1d3gOV+Cn8tsnhXmTCHu89mFZ/M+qScSOw5ymWOavl/CDe+WG3v0xvtyP1ZWeq
 LFFtAeWwqJVp31degk9dugOg8C8W5hzzIqwWo4fH20A3o2IwGPZ6Ftn88jqMRfmN
 5eRAushF/VyjJYp/DrhjbRbVL95c8GIYxEEqWjjEuwxmNny+saCaqiDggVp2DF2S
 U4RBK92wgFqBIjQWHxQLxB0LwtC4RdREutAu/hT5FWTBgtfmv/yy6uPEtjaUTtNP
 FsbgKDbD5RloYToM8u/URBrILyvPueA2Y4ds6uAGtxomytLwXJbXa4yCVA505rzy
 odCiThZ/pWvblNuhJocJ9CoE
 =f81k
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.14-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Module reference fixes, structure renaming (Max Gurtovoy)

 - Export and use common pci_dev_trylock() (Luis Chamberlain)

 - Enable direct mdev device creation and probing by parent (Christoph
   Hellwig & Jason Gunthorpe)

 - Fix mdpy error path leak (Colin Ian King)

 - Fix mtty list entry leak (Jason Gunthorpe)

 - Enforce mtty device limit (Alex Williamson)

 - Resolve concurrent vfio-pci mmap faults (Alex Williamson)

* tag 'vfio-v5.14-rc1' of git://github.com/awilliam/linux-vfio:
  vfio/pci: Handle concurrent vma faults
  vfio/mtty: Enforce available_instances
  vfio/mtty: Delete mdev_devices_list
  vfio: use the new pci_dev_trylock() helper to simplify try lock
  PCI: Export pci_dev_trylock() and pci_dev_unlock()
  vfio/mdpy: Fix memory leak of object mdev_state->vconfig
  vfio/iommu_type1: rename vfio_group struck to vfio_iommu_group
  vfio/mbochs: Convert to use vfio_register_group_dev()
  vfio/mdpy: Convert to use vfio_register_group_dev()
  vfio/mtty: Convert to use vfio_register_group_dev()
  vfio/mdev: Allow the mdev_parent_ops to specify the device driver to bind
  vfio/mdev: Remove CONFIG_VFIO_MDEV_DEVICE
  driver core: Export device_driver_attach()
  driver core: Don't return EPROBE_DEFER to userspace during sysfs bind
  driver core: Flow the return code from ->probe() through to sysfs bind
  driver core: Better distinguish probe errors in really_probe
  driver core: Pull required checks into driver_probe_device()
  vfio/platform: remove unneeded parent_module attribute
  vfio: centralize module refcount in subsystem layer
2021-07-03 11:49:33 -07:00
Alex Williamson
6a45ece4c9 vfio/pci: Handle concurrent vma faults
io_remap_pfn_range() will trigger a BUG_ON if it encounters a
populated pte within the mapping range.  This can occur because we map
the entire vma on fault and multiple faults can be blocked behind the
vma_lock.  This leads to traces like the one reported below.

We can use our vma_list to test whether a given vma is mapped to avoid
this issue.

[ 1591.733256] kernel BUG at mm/memory.c:2177!
[ 1591.739515] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 1591.747381] Modules linked in: vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O)
[ 1591.760536] CPU: 2 PID: 227 Comm: lcore-worker-2 Tainted: G O 5.11.0-rc3+ #1
[ 1591.770735] Hardware name:  , BIOS HixxxxFPGA 1P B600 V121-1
[ 1591.778872] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--)
[ 1591.786134] pc : remap_pfn_range+0x214/0x340
[ 1591.793564] lr : remap_pfn_range+0x1b8/0x340
[ 1591.799117] sp : ffff80001068bbd0
[ 1591.803476] x29: ffff80001068bbd0 x28: 0000042eff6f0000
[ 1591.810404] x27: 0000001100910000 x26: 0000001300910000
[ 1591.817457] x25: 0068000000000fd3 x24: ffffa92f1338e358
[ 1591.825144] x23: 0000001140000000 x22: 0000000000000041
[ 1591.832506] x21: 0000001300910000 x20: ffffa92f141a4000
[ 1591.839520] x19: 0000001100a00000 x18: 0000000000000000
[ 1591.846108] x17: 0000000000000000 x16: ffffa92f11844540
[ 1591.853570] x15: 0000000000000000 x14: 0000000000000000
[ 1591.860768] x13: fffffc0000000000 x12: 0000000000000880
[ 1591.868053] x11: ffff0821bf3d01d0 x10: ffff5ef2abd89000
[ 1591.875932] x9 : ffffa92f12ab0064 x8 : ffffa92f136471c0
[ 1591.883208] x7 : 0000001140910000 x6 : 0000000200000000
[ 1591.890177] x5 : 0000000000000001 x4 : 0000000000000001
[ 1591.896656] x3 : 0000000000000000 x2 : 0168044000000fd3
[ 1591.903215] x1 : ffff082126261880 x0 : fffffc2084989868
[ 1591.910234] Call trace:
[ 1591.914837]  remap_pfn_range+0x214/0x340
[ 1591.921765]  vfio_pci_mmap_fault+0xac/0x130 [vfio_pci]
[ 1591.931200]  __do_fault+0x44/0x12c
[ 1591.937031]  handle_mm_fault+0xcc8/0x1230
[ 1591.942475]  do_page_fault+0x16c/0x484
[ 1591.948635]  do_translation_fault+0xbc/0xd8
[ 1591.954171]  do_mem_abort+0x4c/0xc0
[ 1591.960316]  el0_da+0x40/0x80
[ 1591.965585]  el0_sync_handler+0x168/0x1b0
[ 1591.971608]  el0_sync+0x174/0x180
[ 1591.978312] Code: eb1b027f 540000c0 f9400022 b4fffe02 (d4210000)

Fixes: 11c4cd07ba ("vfio-pci: Fault mmaps to enable vma tracking")
Reported-by: Zeng Tao <prime.zeng@hisilicon.com>
Suggested-by: Zeng Tao <prime.zeng@hisilicon.com>
Link: https://lore.kernel.org/r/162497742783.3883260.3282953006487785034.stgit@omen
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-30 13:55:53 -06:00
Liam Howlett
85715d6809 vfio: use vma_lookup() instead of find_vma_intersection()
vma_lookup() finds the vma of a specific address with a cleaner interface
and is more readable.

Link: https://lkml.kernel.org/r/20210521174745.2219620-12-Liam.Howlett@Oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:51 -07:00
Luis Chamberlain
742b4c0d1e vfio: use the new pci_dev_trylock() helper to simplify try lock
Use the new pci_dev_trylock() helper to simplify our locking.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20210623022824.308041-3-mcgrof@kernel.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-24 13:32:31 -06:00
Max Gurtovoy
c7396f2eac vfio/iommu_type1: rename vfio_group struck to vfio_iommu_group
The vfio_group structure is already defined in vfio module so in order
to improve code readability and for simplicity, rename the vfio_group
structure in vfio_iommu_type1 module to vfio_iommu_group.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20210608112841.51897-1-mgurtovoy@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-21 15:49:10 -06:00
Alex Williamson
bc01b7617d Merge branch 'hch-mdev-direct-v4' into v5.14/vfio/next 2021-06-21 15:29:51 -06:00
Jason Gunthorpe
88a21f265c vfio/mdev: Allow the mdev_parent_ops to specify the device driver to bind
This allows a mdev driver to opt out of using vfio_mdev.c, instead the
driver will provide a 'struct mdev_driver' and register directly with the
driver core.

Much of mdev_parent_ops becomes unused in this mode:
- create()/remove() are done via the mdev_driver probe()/remove()
- mdev_attr_groups becomes mdev_driver driver.dev_groups
- Wrapper function callbacks are replaced with the same ones from
  struct vfio_device_ops

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20210617142218.1877096-8-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-21 15:29:25 -06:00
Jason Gunthorpe
af3ab3f9b9 vfio/mdev: Remove CONFIG_VFIO_MDEV_DEVICE
For some reason the vfio_mdev shim mdev_driver has its own module and
kconfig. As the next patch requires access to it from mdev.ko merge the
two modules together and remove VFIO_MDEV_DEVICE.

A later patch deletes this driver entirely.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20210617142218.1877096-7-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-21 15:29:25 -06:00
Max Gurtovoy
3b62a62429 vfio/platform: remove unneeded parent_module attribute
The vfio core driver is now taking refcount on the provider drivers,
remove redundant parent_module attribute from vfio_platform_device
structure.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20210518192133.59195-3-mgurtovoy@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-15 14:12:15 -06:00
Max Gurtovoy
9dcf01d957 vfio: centralize module refcount in subsystem layer
Remove code duplication and move module refcounting to the subsystem
module.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20210518192133.59195-2-mgurtovoy@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-15 14:12:15 -06:00
Max Gurtovoy
dc51ff91cf vfio/platform: fix module_put call in error flow
The ->parent_module is the one that use in try_module_get. It should
also be the one the we use in module_put during vfio_platform_open().

Fixes: 32a2d71c4e ("vfio: platform: introduce vfio-platform-base module")
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20210518192133.59195-1-mgurtovoy@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-05-24 13:40:13 -06:00
Gustavo A. R. Silva
78b238147e vfio/iommu_type1: Use struct_size() for kzalloc()
Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows
that, in the worst scenario, could lead to heap overflows.

This code was detected with the help of Coccinelle and, audited and
fixed manually.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Message-Id: <20210513230155.GA217517@embeddedor>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-05-24 13:40:13 -06:00
Randy Dunlap
2a55ca3735 vfio/pci: zap_vma_ptes() needs MMU
zap_vma_ptes() is only available when CONFIG_MMU is set/enabled.
Without CONFIG_MMU, vfio_pci.o has build errors, so make
VFIO_PCI depend on MMU.

riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `vfio_pci_mmap_open':
vfio_pci.c:(.text+0x1ec): undefined reference to `zap_vma_ptes'
riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `.L0 ':
vfio_pci.c:(.text+0x165c): undefined reference to `zap_vma_ptes'

Fixes: 11c4cd07ba ("vfio-pci: Fault mmaps to enable vma tracking")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210515190856.2130-1-rdunlap@infradead.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-05-24 13:40:13 -06:00
Zhen Lei
d1ce2c7915 vfio/pci: Fix error return code in vfio_ecap_init()
The error code returned from vfio_ext_cap_len() is stored in 'len', not
in 'ret'.

Fixes: 89e1f7d4c6 ("vfio: Add PCI device driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20210515020458.6771-1-thunder.leizhen@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-05-24 12:14:18 -06:00
Linus Torvalds
4f9701057a IOMMU Updates for Linux v5.13
Including:
 
 	- Big cleanup of almost unsused parts of the IOMMU API by
 	  Christoph Hellwig. This mostly affects the Freescale PAMU
 	  driver.
 
 	- New IOMMU driver for Unisoc SOCs
 
 	- ARM SMMU Updates from Will:
 
 	  - SMMUv3: Drop vestigial PREFETCH_ADDR support
 	  - SMMUv3: Elide TLB sync logic for empty gather
 	  - SMMUv3: Fix "Service Failure Mode" handling
     	  - SMMUv2: New Qualcomm compatible string
 
 	- Removal of the AMD IOMMU performance counter writeable check
 	  on AMD. It caused long boot delays on some machines and is
 	  only needed to work around an errata on some older (possibly
 	  pre-production) chips. If someone is still hit by this
 	  hardware issue anyway the performance counters will just
 	  return 0.
 
 	- Support for targeted invalidations in the AMD IOMMU driver.
 	  Before that the driver only invalidated a single 4k page or the
 	  whole IO/TLB for an address space. This has been extended now
 	  and is mostly useful for emulated AMD IOMMUs.
 
 	- Several fixes for the Shared Virtual Memory support in the
 	  Intel VT-d driver
 
 	- Mediatek drivers can now be built as modules
 
 	- Re-introduction of the forcedac boot option which got lost
 	  when converting the Intel VT-d driver to the common dma-iommu
 	  implementation.
 
 	- Extension of the IOMMU device registration interface and
 	  support iommu_ops to be const again when drivers are built as
 	  modules.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmCMEIoACgkQK/BELZcB
 GuOu9xAAvg6aR0uHlxvRq6cgNnHN9Ltp5+t3qFYtRRrauY0iOPMO62k0QQli5shX
 CGeczD0e59KAZqI0zNJnQn8hMY5dg7XVkFCC5BrSzuCDCtwJZ0N5Tq3pfUlaV1rw
 BJf41t79Fd+jp7kn53tu+vRAfYZ3+sLOx/6U3c15pqKRZSkyFWbQllOtD3J5LnLu
 1PyPlfiNpMwCajiS7aQbN+fuJ/lKIFeA2MDPOsCBzhbfxiJUqJxZOKAZO3rOjFfK
 feTibqQ+3Zz6MPXt9st1cvPpy8jCosv81OY6Knqvxf/oB5q+fEdi2uNrKISonb/t
 Fw331oOIwg2A+HOpwC9MN1AumOIqiHSWWENAMk9SlP+TMIWKQ8kZreyI6IEB23dV
 +QvP3DVA+CfLwtNY/Zh0IqKh28D+IHlKbpWNU1m+9AUe468mV/MTjfwxr9Yfffhm
 LZ6C0DgFdmtqv8jPuDGUOgo3RNeN8bLnUSEHG9gHibA+RKujl5BWDjKkwILqMQTt
 Ysdsu8TiNtFIULomizqCpgqEbQfW8TLFvASXCM1VMQ/PDURxvchZPxFDJonYXy+K
 z2HGaG3eUE07YrAdRKH69aMVIbmS+sjEhvmi4xZ1Lh7wWcIE2AZVvO8qNb+Ckcp3
 4tLPPDksm/iQngnFf6gdgH3qv4rgbzE4+74GXqeANiQCjY9dSJI=
 =qF2C
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Big cleanup of almost unsused parts of the IOMMU API by Christoph
   Hellwig. This mostly affects the Freescale PAMU driver.

 - New IOMMU driver for Unisoc SOCs

 - ARM SMMU Updates from Will:
     - Drop vestigial PREFETCH_ADDR support (SMMUv3)
     - Elide TLB sync logic for empty gather (SMMUv3)
     - Fix "Service Failure Mode" handling (SMMUv3)
     - New Qualcomm compatible string (SMMUv2)

 - Removal of the AMD IOMMU performance counter writeable check on AMD.
   It caused long boot delays on some machines and is only needed to
   work around an errata on some older (possibly pre-production) chips.
   If someone is still hit by this hardware issue anyway the performance
   counters will just return 0.

 - Support for targeted invalidations in the AMD IOMMU driver. Before
   that the driver only invalidated a single 4k page or the whole IO/TLB
   for an address space. This has been extended now and is mostly useful
   for emulated AMD IOMMUs.

 - Several fixes for the Shared Virtual Memory support in the Intel VT-d
   driver

 - Mediatek drivers can now be built as modules

 - Re-introduction of the forcedac boot option which got lost when
   converting the Intel VT-d driver to the common dma-iommu
   implementation.

 - Extension of the IOMMU device registration interface and support
   iommu_ops to be const again when drivers are built as modules.

* tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (84 commits)
  iommu: Streamline registration interface
  iommu: Statically set module owner
  iommu/mediatek-v1: Add error handle for mtk_iommu_probe
  iommu/mediatek-v1: Avoid build fail when build as module
  iommu/mediatek: Always enable the clk on resume
  iommu/fsl-pamu: Fix uninitialized variable warning
  iommu/vt-d: Force to flush iotlb before creating superpage
  iommu/amd: Put newline after closing bracket in warning
  iommu/vt-d: Fix an error handling path in 'intel_prepare_irq_remapping()'
  iommu/vt-d: Fix build error of pasid_enable_wpe() with !X86
  iommu/amd: Remove performance counter pre-initialization test
  Revert "iommu/amd: Fix performance counter initialization"
  iommu/amd: Remove duplicate check of devid
  iommu/exynos: Remove unneeded local variable initialization
  iommu/amd: Page-specific invalidations for more than one page
  iommu/arm-smmu-v3: Remove the unused fields for PREFETCH_CONFIG command
  iommu/vt-d: Avoid unnecessary cache flush in pasid entry teardown
  iommu/vt-d: Invalidate PASID cache when root/context entry changed
  iommu/vt-d: Remove WO permissions on second-level paging entries
  iommu/vt-d: Report the right page fault address
  ...
2021-05-01 09:33:00 -07:00
Linus Torvalds
238da4d004 VFIO updates for v5.13-rc1
- Embed struct vfio_device into vfio driver structures (Jason Gunthorpe)
 
  - Make vfio_mdev type safe (Jason Gunthorpe)
 
  - Remove vfio-pci NVLink2 extensions for POWER9 (Christoph Hellwig)
 
  - Update vfio-pci IGD extensions for OpRegion 2.1+ (Fred Gao)
 
  - Various spelling/blank line fixes (Zhen Lei, Zhou Wang, Bhaskar Chowdhury)
 
  - Simplify unpin_pages error handling (Shenming Lu)
 
  - Fix i915 mdev Kconfig dependency (Arnd Bergmann)
 
  - Remove unused structure member (Keqian Zhu)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmCJsEsbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsi4HEP/icVw+sKYvfYkN3D5+7+
 J+IUtgdXSFaSmc9S8lj9gED68t6t5BhPMtdfSe9Tfvn/btyaYZ6uNwZGNqkaRCz8
 97jLkDmTPzL2F8uiMT4OI/VY6mzcK5yKA0mYNcO+nXQyUNtpDDCCbTD9OfIu62i/
 nVAu6u/Sj89zj66opZLTz1sxlYu/IQQ6olDlmgRAZ0JfUmpZLuNrSJbdv1nJZN0i
 uSAfNthSlgfMK/6hf9oW22GFjiJmnMVPnPd2wMceEq1N2F3Co9mNEhsMwqSZH3ra
 UQVJKRLA1NMhlee33Pkbsilmwk3lyTXI4GYw61y+TDi/CuOvNkym1BIW6WCQlQRi
 iOuMnaas3UOwi45pcqDNl9DRR+mfJc9I11auyAbznuw6omeZVxYHkWi94HRbJNzf
 zDJN94fcnafomHXe0NYJ33eSUrEeN7heNd5wvMkqQJEfHRpqUQf/xOGE9xBZXVAO
 Vl02C+qQt/fM0tds09lUOvMRjAYBRSaF2xcq6SEQYMXmfx9hcOrEUJUEWegy0NGN
 lq+Oa9mMxwHllDm9Wtn7vMx4KPqDm6dq3BwB45+S7bdQiDj0RiCohRLK9GC7YMzT
 K6dlKdYmed9dOnIzCw8R9PI/DDHRyG/mXgzFERkB5DTt1YVssYLTZnGL2lePwM9J
 BSqG2mMHzyYaHdaaVAWHH/vQ
 =y3PY
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.13-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Embed struct vfio_device into vfio driver structures (Jason
   Gunthorpe)

 - Make vfio_mdev type safe (Jason Gunthorpe)

 - Remove vfio-pci NVLink2 extensions for POWER9 (Christoph Hellwig)

 - Update vfio-pci IGD extensions for OpRegion 2.1+ (Fred Gao)

 - Various spelling/blank line fixes (Zhen Lei, Zhou Wang, Bhaskar
   Chowdhury)

 - Simplify unpin_pages error handling (Shenming Lu)

 - Fix i915 mdev Kconfig dependency (Arnd Bergmann)

 - Remove unused structure member (Keqian Zhu)

* tag 'vfio-v5.13-rc1' of git://github.com/awilliam/linux-vfio: (43 commits)
  vfio/gvt: fix DRM_I915_GVT dependency on VFIO_MDEV
  vfio/iommu_type1: Remove unused pinned_page_dirty_scope in vfio_iommu
  vfio/mdev: Correct the function signatures for the mdev_type_attributes
  vfio/mdev: Remove kobj from mdev_parent_ops->create()
  vfio/gvt: Use mdev_get_type_group_id()
  vfio/gvt: Make DRM_I915_GVT depend on VFIO_MDEV
  vfio/mbochs: Use mdev_get_type_group_id()
  vfio/mdpy: Use mdev_get_type_group_id()
  vfio/mtty: Use mdev_get_type_group_id()
  vfio/mdev: Add mdev/mtype_get_type_group_id()
  vfio/mdev: Remove duplicate storage of parent in mdev_device
  vfio/mdev: Add missing error handling to dev_set_name()
  vfio/mdev: Reorganize mdev_device_create()
  vfio/mdev: Add missing reference counting to mdev_type
  vfio/mdev: Expose mdev_get/put_parent to mdev_private.h
  vfio/mdev: Use struct mdev_type in struct mdev_device
  vfio/mdev: Simplify driver registration
  vfio/mdev: Add missing typesafety around mdev_device
  vfio/mdev: Do not allow a mdev_type to have a NULL parent pointer
  vfio/mdev: Fix missing static's on MDEV_TYPE_ATTR's
  ...
2021-04-28 17:19:47 -07:00
Joerg Roedel
49d11527e5 Merge branches 'iommu/fixes', 'arm/mediatek', 'arm/smmu', 'arm/exynos', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next 2021-04-16 17:16:03 +02:00
Keqian Zhu
43dcf6ccf8 vfio/iommu_type1: Remove unused pinned_page_dirty_scope in vfio_iommu
pinned_page_dirty_scope is optimized out by commit 010321565a
("vfio/iommu_type1: Mantain a counter for non_pinned_groups"),
but appears again due to some issues during merging branches.
We can safely remove it here.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Message-Id: <20210412024415.30676-1-zhukeqian1@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-14 12:01:11 -06:00
Christian A. Ehrhardt
909290786e vfio/pci: Add missing range check in vfio_pci_mmap
When mmaping an extra device region verify that the region index
derived from the mmap offset is valid.

Fixes: a15b1883fe ("vfio_pci: Allow mapping extra regions")
Cc: stable@vger.kernel.org
Signed-off-by: Christian A. Ehrhardt <lk@c--e.de>
Message-Id: <20210412214124.GA241759@lisa.in-ulm.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-13 08:29:16 -06:00
Jason Gunthorpe
9169cff168 vfio/mdev: Correct the function signatures for the mdev_type_attributes
The driver core standard is to pass in the properly typed object, the
properly typed attribute and the buffer data. It stems from the root
kobject method:

  ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr,..)

Each subclass of kobject should provide their own function with the same
signature but more specific types, eg struct device uses:

  ssize_t (*show)(struct device *dev, struct device_attribute *attr,..)

In this case the existing signature is:

  ssize_t (*show)(struct kobject *kobj, struct device *dev,..)

Where kobj is a 'struct mdev_type *' and dev is 'mdev_type->parent->dev'.

Change the mdev_type related sysfs attribute functions to:

  ssize_t (*show)(struct mdev_type *mtype, struct mdev_type_attribute *attr,..)

In order to restore type safety and match the driver core standard

There are no current users of 'attr', but if it is ever needed it would be
hard to add in retroactively, so do it now.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <18-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-12 10:36:00 -06:00
Jason Gunthorpe
c2ef2f50ad vfio/mdev: Remove kobj from mdev_parent_ops->create()
The kobj here is a type-erased version of mdev_type, which is already
stored in the struct mdev_device being passed in. It was only ever used to
compute the type_group_id, which is now extracted directly from the mdev.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <17-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-12 10:36:00 -06:00
Jason Gunthorpe
15fcc44be0 vfio/mdev: Add mdev/mtype_get_type_group_id()
This returns the index in the supported_type_groups array that is
associated with the mdev_type attached to the struct mdev_device or its
containing struct kobject.

Each mdev_device can be spawned from exactly one mdev_type, which in turn
originates from exactly one supported_type_group.

Drivers are using weird string calculations to try and get back to this
index, providing a direct access to the index removes a bunch of wonky
driver code.

mdev_type->group can be deleted as the group is obtained using the
type_group_id.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <11-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07 15:39:18 -06:00
Jason Gunthorpe
fbea432390 vfio/mdev: Remove duplicate storage of parent in mdev_device
mdev_device->type->parent is the same thing.

The struct mdev_device was relying on the kref on the mdev_parent to also
indirectly hold a kref on the mdev_type pointer. Now that the type holds a
kref on the parent we can directly kref the mdev_type and remove this
implicit relationship.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <10-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07 15:39:18 -06:00
Jason Gunthorpe
18d731242d vfio/mdev: Add missing error handling to dev_set_name()
This can fail, and seems to be a popular target for syzkaller error
injection. Check the error return and unwind with put_device().

Fixes: 7b96953bc6 ("vfio: Mediated device Core driver")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <9-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07 15:39:18 -06:00
Jason Gunthorpe
fbd0e2b0c3 vfio/mdev: Reorganize mdev_device_create()
Once the memory for the struct mdev_device is allocated it should
immediately be device_initialize()'d and filled in so that put_device()
can always be used to undo the allocation.

Place the mdev_get/put_parent() so that they are clearly protecting the
mdev->parent pointer. Move the final put to the release function so that
the lifetime rules are trivial to understand. Update the goto labels to
follow the normal convention.

Remove mdev_device_free() as the release function via device_put() is now
usable in all cases.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <8-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07 15:39:18 -06:00
Jason Gunthorpe
9a302449a5 vfio/mdev: Add missing reference counting to mdev_type
struct mdev_type holds a pointer to the kref'd object struct mdev_parent,
but doesn't hold the kref. The lifetime of the parent becomes implicit
because parent_remove_sysfs_files() is supposed to remove all the access
before the parent can be freed, but this is very hard to reason about.

Make it obviously correct by adding the missing get.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <7-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07 15:39:17 -06:00
Jason Gunthorpe
a9f8111d0b vfio/mdev: Expose mdev_get/put_parent to mdev_private.h
The next patch will use these in mdev_sysfs.c

While here remove the now dead code checks for NULL, a mdev_type can never
have a NULL parent.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <6-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07 15:39:17 -06:00
Jason Gunthorpe
417fd5bf24 vfio/mdev: Use struct mdev_type in struct mdev_device
The kobj pointer in mdev_device is actually pointing at a struct
mdev_type. Use the proper type so things are understandable.

There are a number of places that are confused and passing both the mdev
and the mtype as function arguments, fix these to derive the mtype
directly from the mdev to remove the redundancy.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <5-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07 15:39:17 -06:00
Jason Gunthorpe
91b9969d9c vfio/mdev: Simplify driver registration
This is only done once, we don't need to generate code to initialize a
structure stored in the ELF .data segment. Fill in the three required
.driver members directly instead of copying data into them during
mdev_register_driver().

Further the to_mdev_driver() function doesn't belong in a public header,
just inline it into the two places that need it. Finally, we can now
clearly see that 'drv' derived from dev->driver cannot be NULL, firstly
because the driver core forbids it, and secondly because NULL won't pass
through the container_of(). Remove the dead code.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <4-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07 15:39:16 -06:00
Jason Gunthorpe
2a3d15f270 vfio/mdev: Add missing typesafety around mdev_device
The mdev API should accept and pass a 'struct mdev_device *' in all
places, not pass a 'struct device *' and cast it internally with
to_mdev_device(). Particularly in its struct mdev_driver functions, the
whole point of a bus's struct device_driver wrapper is to provide type
safety compared to the default struct device_driver.

Further, the driver core standard is for bus drivers to expose their
device structure in their public headers that can be used with
container_of() inlines and '&foo->dev' to go between the class levels, and
'&foo->dev' to be used with dev_err/etc driver core helper functions. Move
'struct mdev_device' to mdev.h

Once done this allows moving some one instruction exported functions to
static inlines, which in turns allows removing one of the two grotesque
symbol_get()'s related to mdev in the core code.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <3-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07 15:39:16 -06:00
Jason Gunthorpe
b5a1f8921d vfio/mdev: Do not allow a mdev_type to have a NULL parent pointer
There is a small race where the parent is NULL even though the kobj has
already been made visible in sysfs.

For instance the attribute_group is made visible in sysfs_create_files()
and the mdev_type_attr_show() does:

    ret = attr->show(kobj, type->parent->dev, buf);

Which will crash on NULL parent. Move the parent setup to before the type
pointer leaves the stack frame.

Fixes: 7b96953bc6 ("vfio: Mediated device Core driver")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <2-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07 15:39:16 -06:00
Christoph Hellwig
7e14754778 iommu: remove DOMAIN_ATTR_NESTING
Use an explicit enable_nesting method instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-17-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:56:53 +02:00
Christoph Hellwig
bc9a05eef1 iommu: remove DOMAIN_ATTR_GEOMETRY
The geometry information can be trivially queried from the iommu_domain
struture.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-16-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:56:53 +02:00
Alex Williamson
6a2a235aa6 Merge branches 'v5.13/vfio/embed-vfio_device', 'v5.13/vfio/misc' and 'v5.13/vfio/nvlink' into v5.13/vfio/next
Spelling fixes merged with file deletion.

Conflicts:
	drivers/vfio/pci/vfio_pci_nvlink2.c

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 12:01:51 -06:00
Jason Gunthorpe
1e04ec1420 vfio: Remove device_data from the vfio bus driver API
There are no longer any users, so it can go away. Everything is using
container_of now.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <14-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:11 -06:00
Jason Gunthorpe
07d47b4222 vfio/pci: Replace uses of vfio_device_data() with container_of
This tidies a few confused places that think they can have a refcount on
the vfio_device but the device_data could be NULL, that isn't possible by
design.

Most of the change falls out when struct vfio_devices is updated to just
store the struct vfio_pci_device itself. This wasn't possible before
because there was no easy way to get from the 'struct vfio_pci_device' to
the 'struct vfio_device' to put back the refcount.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <13-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:11 -06:00
Jason Gunthorpe
6df62c5b05 vfio: Make vfio_device_ops pass a 'struct vfio_device *' instead of 'void *'
This is the standard kernel pattern, the ops associated with a struct get
the struct pointer in for typesafety. The expected design is to use
container_of to cleanly go from the subsystem level type to the driver
level type without having any type erasure in a void *.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <12-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:11 -06:00
Jason Gunthorpe
66873b5fa7 vfio/mdev: Make to_mdev_device() into a static inline
The macro wrongly uses 'dev' as both the macro argument and the member
name, which means it fails compilation if any caller uses a word other
than 'dev' as the single argument. Fix this defect by making it into
proper static inline, which is more clear and typesafe anyhow.

Fixes: 99e3123e3d ("vfio-mdev: Make mdev_device private and abstract interfaces")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <11-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:11 -06:00
Jason Gunthorpe
1ae1b20f6f vfio/mdev: Use vfio_init/register/unregister_group_dev
mdev gets little benefit because it doesn't actually do anything, however
it is the last user, so move the vfio_init/register/unregister_group_dev()
code here for now.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <10-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:11 -06:00
Jason Gunthorpe
6b018e203d vfio/pci: Use vfio_init/register/unregister_group_dev
pci already allocates a struct vfio_pci_device with exactly the same
lifetime as vfio_device, switch to the new API and embed vfio_device in
vfio_pci_device.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <9-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:10 -06:00
Jason Gunthorpe
4aeec3984d vfio/pci: Re-order vfio_pci_probe()
vfio_add_group_dev() must be called only after all of the private data in
vdev is fully setup and ready, otherwise there could be races with user
space instantiating a device file descriptor and starting to call ops.

For instance vfio_pci_reflck_attach() sets vdev->reflck and
vfio_pci_open(), called by fops open, unconditionally derefs it, which
will crash if things get out of order.

Fixes: cc20d79990 ("vfio/pci: Introduce VF token")
Fixes: e309df5b0c ("vfio/pci: Parallelize device open and release")
Fixes: 6eb7018705 ("vfio-pci: Move idle devices to D3hot power state")
Fixes: ecaa1f6a01 ("vfio-pci: Add VGA arbiter client")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <8-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:10 -06:00
Jason Gunthorpe
61e9081748 vfio/pci: Move VGA and VF initialization to functions
vfio_pci_probe() is quite complicated, with optional VF and VGA sub
components. Move these into clear init/uninit functions and have a linear
flow in probe/remove.

This fixes a few little buglets:
 - vfio_pci_remove() is in the wrong order, vga_client_register() removes
   a notifier and is after kfree(vdev), but the notifier refers to vdev,
   so it can use after free in a race.
 - vga_client_register() can fail but was ignored

Organize things so destruction order is the reverse of creation order.

Fixes: ecaa1f6a01 ("vfio-pci: Add VGA arbiter client")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <7-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:10 -06:00
Jason Gunthorpe
0ca78666fa vfio/fsl-mc: Use vfio_init/register/unregister_group_dev
fsl-mc already allocates a struct vfio_fsl_mc_device with exactly the same
lifetime as vfio_device, switch to the new API and embed vfio_device in
vfio_fsl_mc_device. While here remove the devm usage for the vdev, this
code is clean and doesn't need devm.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <6-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:10 -06:00
Jason Gunthorpe
2b1fe162e5 vfio/fsl-mc: Re-order vfio_fsl_mc_probe()
vfio_add_group_dev() must be called only after all of the private data in
vdev is fully setup and ready, otherwise there could be races with user
space instantiating a device file descriptor and starting to call ops.

For instance vfio_fsl_mc_reflck_attach() sets vdev->reflck and
vfio_fsl_mc_open(), called by fops open, unconditionally derefs it, which
will crash if things get out of order.

This driver started life with the right sequence, but two commits added
stuff after vfio_add_group_dev().

Fixes: 2e0d29561f ("vfio/fsl-mc: Add irq infrastructure for fsl-mc devices")
Fixes: f2ba7e8c94 ("vfio/fsl-mc: Added lock support in preparation for interrupt handling")
Co-developed-by: Diana Craciun OSS <diana.craciun@oss.nxp.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <5-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:10 -06:00
Jason Gunthorpe
cb61645868 vfio/platform: Use vfio_init/register/unregister_group_dev
platform already allocates a struct vfio_platform_device with exactly
the same lifetime as vfio_device, switch to the new API and embed
vfio_device in vfio_platform_device.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <4-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:10 -06:00
Jason Gunthorpe
0bfc6a4ea6 vfio: Split creation of a vfio_device into init and register ops
This makes the struct vfio_device part of the public interface so it
can be used with container_of and so forth, as is typical for a Linux
subystem.

This is the first step to bring some type-safety to the vfio interface by
allowing the replacement of 'void *' and 'struct device *' inputs with a
simple and clear 'struct vfio_device *'

For now the self-allocating vfio_add_group_dev() interface is kept so each
user can be updated as a separate patch.

The expected usage pattern is

  driver core probe() function:
     my_device = kzalloc(sizeof(*mydevice));
     vfio_init_group_dev(&my_device->vdev, dev, ops, mydevice);
     /* other driver specific prep */
     vfio_register_group_dev(&my_device->vdev);
     dev_set_drvdata(dev, my_device);

  driver core remove() function:
     my_device = dev_get_drvdata(dev);
     vfio_unregister_group_dev(&my_device->vdev);
     /* other driver specific tear down */
     kfree(my_device);

Allowing the driver to be able to use the drvdata and vfio_device to go
to/from its own data.

The pattern also makes it clear that vfio_register_group_dev() must be
last in the sequence, as once it is called the core code can immediately
start calling ops. The init/register gap is provided to allow for the
driver to do setup before ops can be called and thus avoid races.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <3-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:10 -06:00
Jason Gunthorpe
5e42c99944 vfio: Simplify the lifetime logic for vfio_device
The vfio_device is using a 'sleep until all refs go to zero' pattern for
its lifetime, but it is indirectly coded by repeatedly scanning the group
list waiting for the device to be removed on its own.

Switch this around to be a direct representation, use a refcount to count
the number of places that are blocking destruction and sleep directly on a
completion until that counter goes to zero. kfree the device after other
accesses have been excluded in vfio_del_group_dev(). This is a fairly
common Linux idiom.

Due to this we can now remove kref_put_mutex(), which is very rarely used
in the kernel. Here it is being used to prevent a zero ref device from
being seen in the group list. Instead allow the zero ref device to
continue to exist in the device_list and use refcount_inc_not_zero() to
exclude it once refs go to zero.

This patch is organized so the next patch will be able to alter the API to
allow drivers to provide the kfree.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <2-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:10 -06:00
Jason Gunthorpe
e572bfb2b6 vfio: Remove extra put/gets around vfio_device->group
The vfio_device->group value has a get obtained during
vfio_add_group_dev() which gets moved from the stack to vfio_device->group
in vfio_group_create_device().

The reference remains until we reach the end of vfio_del_group_dev() when
it is put back.

Thus anything that already has a kref on the vfio_device is guaranteed a
valid group pointer. Remove all the extra reference traffic.

It is tricky to see, but the get at the start of vfio_del_group_dev() is
actually pairing with the put hidden inside vfio_device_put() a few lines
below.

A later patch merges vfio_group_create_device() into vfio_add_group_dev()
which makes the ownership and error flow on the create side easier to
follow.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <1-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:55:09 -06:00
Christoph Hellwig
b392a19891 vfio/pci: remove vfio_pci_nvlink2
This driver never had any open userspace (which for VFIO would include
VM kernel drivers) that use it, and thus should never have been added
by our normal userspace ABI rules.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <20210326061311.1497642-2-hch@lst.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:54:13 -06:00
Shenming Lu
a536019d3e vfio/type1: Remove the almost unused check in vfio_iommu_type1_unpin_pages
The check i > npage at the end of vfio_iommu_type1_unpin_pages is unused
unless npage < 0, but if npage < 0, this function will return npage, which
should return -EINVAL instead. So let's just check the parameter npage at
the start of the function. By the way, replace unpin_exit with break.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Message-Id: <20210406135009.1707-1-lushenming@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:53:50 -06:00
Zhen Lei
f5c858ec2b vfio/platform: Fix spelling mistake "registe" -> "register"
There is a spelling mistake in a comment, fix it.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210326083528.1329-5-thunder.leizhen@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:53:50 -06:00
Zhen Lei
d0915b3291 vfio/pci: fix a couple of spelling mistakes
There are several spelling mistakes, as follows:
thru ==> through
presense ==> presence

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210326083528.1329-4-thunder.leizhen@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:53:50 -06:00
Zhen Lei
d0a7541dd9 vfio/mdev: Fix spelling mistake "interal" -> "internal"
There is a spelling mistake in a comment, fix it.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210326083528.1329-3-thunder.leizhen@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:53:50 -06:00
Zhen Lei
06d738c8ab vfio/type1: fix a couple of spelling mistakes
There are several spelling mistakes, as follows:
userpsace ==> userspace
Accouting ==> Accounting
exlude ==> exclude

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210326083528.1329-2-thunder.leizhen@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:53:50 -06:00
Fred Gao
bab2c1990b vfio/pci: Add support for opregion v2.1+
Before opregion version 2.0 VBT data is stored in opregion mailbox #4,
but when VBT data exceeds 6KB size and cannot be within mailbox #4
then from opregion v2.0+, Extended VBT region, next to opregion is
used to hold the VBT data, so the total size will be opregion size plus
extended VBT region size.

Since opregion v2.0 with physical host VBT address would not be
practically available for end user and guest can not directly access
host physical address, so it is not supported.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Swee Yee Fonn <swee.yee.fonn@intel.com>
Signed-off-by: Fred Gao <fred.gao@intel.com>
Message-Id: <20210325170953.24549-1-fred.gao@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:53:50 -06:00
Zhou Wang
36f0be5a30 vfio/pci: Remove an unnecessary blank line in vfio_pci_enable
This blank line is unnecessary, so remove it.

Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Message-Id: <1615808073-178604-1-git-send-email-wangzhou1@hisilicon.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:53:50 -06:00
Bhaskar Chowdhury
fbc9d37161 vfio: pci: Spello fix in the file vfio_pci.c
s/permision/permission/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Message-Id: <20210314052925.3560-1-unixbhaskar@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:53:49 -06:00
Jason Gunthorpe
e0146a108c vfio/nvlink: Add missing SPAPR_TCE_IOMMU depends
Compiling the nvlink stuff relies on the SPAPR_TCE_IOMMU otherwise there
are compile errors:

 drivers/vfio/pci/vfio_pci_nvlink2.c:101:10: error: implicit declaration of function 'mm_iommu_put' [-Werror,-Wimplicit-function-declaration]
                            ret = mm_iommu_put(data->mm, data->mem);

As PPC only defines these functions when the config is set.

Previously this wasn't a problem by chance as SPAPR_TCE_IOMMU was the only
IOMMU that could have satisfied IOMMU_API on POWERNV.

Fixes: 179209fa12 ("vfio: IOMMU_API should be selected")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <0-v1-83dba9768fc3+419-vfio_nvlink2_kconfig_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-29 14:48:00 -06:00
Daniel Jordan
60c988bc15 vfio/type1: Empty batch for pfnmap pages
When vfio_pin_pages_remote() returns with a partial batch consisting of
a single VM_PFNMAP pfn, a subsequent call will unfortunately try
restoring it from batch->pages, resulting in vfio mapping the wrong page
and unbalancing the page refcount.

Prevent the function from returning with this kind of partial batch to
avoid the issue.  There's no explicit check for a VM_PFNMAP pfn because
it's awkward to do so, so infer it from characteristics of the batch
instead.  This may result in occasional false positives but keeps the
code simpler.

Fixes: 4d83de6da2 ("vfio/type1: Batch page pinning")
Link: https://lkml.kernel.org/r/20210323133254.33ed9161@omen.home.shazbot.org/
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Message-Id: <20210325010552.185481-1-daniel.m.jordan@oracle.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-25 12:48:38 -06:00
Daniel Jordan
4ab4fcfce5 vfio/type1: fix vaddr_get_pfns() return in vfio_pin_page_external()
vaddr_get_pfns() now returns the positive number of pfns successfully
gotten instead of zero.  vfio_pin_page_external() might return 1 to
vfio_iommu_type1_pin_pages(), which will treat it as an error, if
vaddr_get_pfns() is successful but vfio_pin_page_external() doesn't
reach vfio_lock_acct().

Fix it up in vfio_pin_page_external().  Found by inspection.

Fixes: be16c1fd99 ("vfio/type1: Change success value of vaddr_get_pfn()")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Message-Id: <20210308172452.38864-1-daniel.m.jordan@oracle.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:39:29 -06:00
Jason Gunthorpe
b2b12db535 vfio: Depend on MMU
VFIO_IOMMU_TYPE1 does not compile with !MMU:

../drivers/vfio/vfio_iommu_type1.c: In function 'follow_fault_pfn':
../drivers/vfio/vfio_iommu_type1.c:536:22: error: implicit declaration of function 'pte_write'; did you mean 'vfs_write'? [-Werror=implicit-function-declaration]

So require it.

Suggested-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <0-v1-02cb5500df6e+78-vfio_no_mmu_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:39:28 -06:00
Jason Gunthorpe
3b49dfb08c ARM: amba: Allow some ARM_AMBA users to compile with COMPILE_TEST
CONFIG_VFIO_AMBA has a light use of AMBA, adding some inline fallbacks
when AMBA is disabled will allow it to be compiled under COMPILE_TEST and
make VFIO easier to maintain.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <3-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:39:28 -06:00
Jason Gunthorpe
d3d72a6dff vfio-platform: Add COMPILE_TEST to VFIO_PLATFORM
x86 can build platform bus code too, so vfio-platform and all the platform
reset implementations compile successfully on x86.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <2-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:39:28 -06:00
Jason Gunthorpe
179209fa12 vfio: IOMMU_API should be selected
As IOMMU_API is a kconfig without a description (eg does not show in the
menu) the correct operator is select not 'depends on'. Using 'depends on'
for this kind of symbol means VFIO is not selectable unless some other
random kconfig has already enabled IOMMU_API for it.

Fixes: cba3345cc4 ("vfio: VFIO core")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <1-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:39:27 -06:00
Steve Sistare
7dc4b2fdb2 vfio/type1: fix unmap all on ILP32
Some ILP32 architectures support mapping a 32-bit vaddr within a 64-bit
iova space.  The unmap-all code uses 32-bit SIZE_MAX as an upper bound on
the extent of the mappings within iova space, so mappings above 4G cannot
be found and unmapped.  Use U64_MAX instead, and use u64 for size variables.
This also fixes a static analysis bug found by the kernel test robot running
smatch for ILP32.

Fixes: 0f53afa12b ("vfio/type1: unmap cleanup")
Fixes: c196509953 ("vfio/type1: implement unmap all")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Message-Id: <1614281102-230747-1-git-send-email-steven.sistare@oracle.com>
Link: https://lore.kernel.org/linux-mm/20210222141043.GW2222@kadam
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:39:27 -06:00
Linus Torvalds
719bbd4a50 VFIO updates for v5.12-rc1
- Virtual address update handling (Steve Sistare)
 
  - s390/zpci fixes and cleanups (Max Gurtovoy)
 
  - Fixes for dirty bitmap handling, non-mdev page pinning,
    and improved pinned dirty scope tracking (Keqian Zhu)
 
  - Batched page pinning enhancement (Daniel Jordan)
 
  - Page access permission fix (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJgNpAEAAoJECObm247sIsiDEsP/1G0QJIum3KqG0+ABHgSS7ks
 j3oKeLxDl2BGeDBw2yIfinif1fjtafmUWg3Q0RlVRv0S71ccu7Ee4MfAHqy8k7Gp
 BM/G+2Amnrz1qWsgEV2JGw8T2wwZDG8ZJluh0sxj2KFqI99jWKftlPH4D8TTJeDj
 VrsFHzQlpcILFBh9Mj5zWFkIuqm2/70O7FJF3jhyN2b0MjYG/f390k0TLQZS+Mkr
 l+6pfIZ3pHYngzro8pX56B1z3c1mJEeRChMPt7IdTVruBcGkUCMXrZKZVN2WqoOf
 Otj6Mxvq5Wur8Rk9VfKs2fO/oz9FJjr5/sL4Vv7xUigWe9nDXBnoy+OR4XJUwxEf
 BaB4tK8f9xTJcf8MrK+eOpBvMSx7eE0qnP/7VMtykC7Cw57qdhCuzEq7ueUGKuVw
 ubj+pjHcAx6T2urjL7KdzuJUMNPkafATi8hN/Bj6oshESZuhM2lSCHiqI4ZQnh5H
 TPMWpb2dX/ohRkcnQdO9N2T2+Lcg6tmD4Kqigv+75zzDj+U15Ph2owtnmH5OFJIG
 BCtibsX2yk6UuxBPvl8eN0X7n41G6gwJcsD6spuaoateK6UTJugjTCZtKB96YMFQ
 c4eULO+hvUIiQJkWbbpFA+mXUcLwcpEoT2pWfuj3MET0FuHVtEhbGEO609gGAAWI
 GMheKjGI+GRW07JFwgCV
 =ei4J
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.12-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updatesfrom Alex Williamson:

 - Virtual address update handling (Steve Sistare)

 - s390/zpci fixes and cleanups (Max Gurtovoy)

 - Fixes for dirty bitmap handling, non-mdev page pinning, and improved
   pinned dirty scope tracking (Keqian Zhu)

 - Batched page pinning enhancement (Daniel Jordan)

 - Page access permission fix (Alex Williamson)

* tag 'vfio-v5.12-rc1' of git://github.com/awilliam/linux-vfio: (21 commits)
  vfio/type1: Batch page pinning
  vfio/type1: Prepare for batched pinning with struct vfio_batch
  vfio/type1: Change success value of vaddr_get_pfn()
  vfio/type1: Use follow_pte()
  vfio/pci: remove CONFIG_VFIO_PCI_ZDEV from Kconfig
  vfio/iommu_type1: Fix duplicate included kthread.h
  vfio-pci/zdev: fix possible segmentation fault issue
  vfio-pci/zdev: remove unused vdev argument
  vfio/pci: Fix handling of pci use accessor return codes
  vfio/iommu_type1: Mantain a counter for non_pinned_groups
  vfio/iommu_type1: Fix some sanity checks in detach group
  vfio/iommu_type1: Populate full dirty when detach non-pinned group
  vfio/type1: block on invalid vaddr
  vfio/type1: implement notify callback
  vfio: iommu driver notify callback
  vfio/type1: implement interfaces to update vaddr
  vfio/type1: massage unmap iteration
  vfio: interfaces to update vaddr
  vfio/type1: implement unmap all
  vfio/type1: unmap cleanup
  ...
2021-02-24 10:43:40 -08:00
Daniel Jordan
4d83de6da2 vfio/type1: Batch page pinning
Pinning one 4K page at a time is inefficient, so do it in batches of 512
instead.  This is just an optimization with no functional change
intended, and in particular the driver still calls iommu_map() with the
largest physically contiguous range possible.

Add two fields in vfio_batch to remember where to start between calls to
vfio_pin_pages_remote(), and use vfio_batch_unpin() to handle remaining
pages in the batch in case of error.

qemu pins pages for guests around 8% faster on my test system, a
two-node Broadwell server with 128G memory per node.  The qemu process
was bound to one node with its allocations constrained there as well.

                             base               test
          guest              ----------------   ----------------
       mem (GB)   speedup    avg sec    (std)   avg sec    (std)
              1      7.4%       0.61   (0.00)      0.56   (0.00)
              2      8.3%       0.93   (0.00)      0.85   (0.00)
              4      8.4%       1.46   (0.00)      1.34   (0.00)
              8      8.6%       2.54   (0.01)      2.32   (0.00)
             16      8.3%       4.66   (0.00)      4.27   (0.01)
             32      8.3%       8.94   (0.01)      8.20   (0.01)
             64      8.2%      17.47   (0.01)     16.04   (0.03)
            120      8.5%      32.45   (0.13)     29.69   (0.01)

perf diff confirms less time spent in pup.  Here are the top ten
functions:

             Baseline  Delta Abs  Symbol

               78.63%     +6.64%  clear_page_erms
                1.50%     -1.50%  __gup_longterm_locked
                1.27%     -0.78%  __get_user_pages
                          +0.76%  kvm_zap_rmapp.constprop.0
                0.54%     -0.53%  vmacache_find
                0.55%     -0.51%  get_pfnblock_flags_mask
                0.48%     -0.48%  __get_user_pages_remote
                          +0.39%  slot_rmap_walk_next
                          +0.32%  vfio_pin_map_dma
                          +0.26%  kvm_handle_hva_range
                ...

Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-22 16:30:47 -07:00
Daniel Jordan
4b6c33b322 vfio/type1: Prepare for batched pinning with struct vfio_batch
Get ready to pin more pages at once with struct vfio_batch, which
represents a batch of pinned pages.

The struct has a fallback page pointer to avoid two unlikely scenarios:
pointlessly allocating a page if disable_hugepages is enabled or failing
the whole pinning operation if the kernel can't allocate memory.

vaddr_get_pfn() becomes vaddr_get_pfns() to prepare for handling
multiple pages, though for now only one page is stored in the pages
array.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-22 16:30:45 -07:00
Daniel Jordan
be16c1fd99 vfio/type1: Change success value of vaddr_get_pfn()
vaddr_get_pfn() simply returns 0 on success.  Have it report the number
of pfns successfully gotten instead, whether from page pinning or
follow_fault_pfn(), which will be used later when batching pinning.

Change the last check in vfio_pin_pages_remote() for consistency with
the other two.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-22 16:30:44 -07:00
Alex Williamson
07956b6269 vfio/type1: Use follow_pte()
follow_pfn() doesn't make sure that we're using the correct page
protections, get the pte with follow_pte() so that we can test
protections and get the pfn from the pte.

Fixes: 5cbf3264bc ("vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()")
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-22 10:17:13 -07:00
Max Gurtovoy
b9abef43a0 vfio/pci: remove CONFIG_VFIO_PCI_ZDEV from Kconfig
In case we're running on s390 system always expose the capabilities for
configuration of zPCI devices. In case we're running on different
platform, continue as usual.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-19 10:29:56 -07:00
Tian Tao
35ac5991cd vfio/iommu_type1: Fix duplicate included kthread.h
linux/kthread.h is included more than once, remove the one that isn't
necessary.

Fixes: 898b9eaeb3 ("vfio/type1: block on invalid vaddr")
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-18 12:25:37 -07:00
Alex Williamson
76adb20f92 Merge branch 'v5.12/vfio/next-vaddr' into v5.12/vfio/next 2021-02-02 09:17:48 -07:00
Max Gurtovoy
7e31d6dc2c vfio-pci/zdev: fix possible segmentation fault issue
In case allocation fails, we must behave correctly and exit with error.

Fixes: e6b817d4b8 ("vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO")
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-02 09:06:02 -07:00
Uwe Kleine-König
3fd269e74f amba: Make the remove callback return void
All amba drivers return 0 in their remove callback. Together with the
driver core ignoring the return value anyhow, it doesn't make sense to
return a value here.

Change the remove prototype to return void, which makes it explicit that
returning an error value doesn't work as expected. This simplifies changing
the core remove callback to return void, too.

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org> # for drivers/memory
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> # for hwtracing/coresight
Acked-By: Vinod Koul <vkoul@kernel.org> # for dmaengine
Acked-by: Guenter Roeck <linux@roeck-us.net> # for watchdog
Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C
Acked-by: Takashi Iwai <tiwai@suse.de> # for sound
Acked-by: Vladimir Zapolskiy <vz@mleia.com> # for memory/pl172
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20210126165835.687514-5-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
2021-02-02 14:25:50 +01:00
Uwe Kleine-König
5b495ac8fe vfio: platform: simplify device removal
vfio_platform_remove_common() cannot return non-NULL in
vfio_amba_remove() as the latter is only called if vfio_amba_probe()
returned success.

Diagnosed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20210126165835.687514-4-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
2021-02-02 14:24:23 +01:00
Max Gurtovoy
46c4746660 vfio-pci/zdev: remove unused vdev argument
Zdev static functions do not use vdev argument. Remove it.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:43:06 -07:00
Heiner Kallweit
37a682ffbe vfio/pci: Fix handling of pci use accessor return codes
The pci user accessors return negative errno's on error.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[aw: drop Fixes tag, pcibios_err_to_errno() behaves correctly for -errno]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:40:52 -07:00
Keqian Zhu
010321565a vfio/iommu_type1: Mantain a counter for non_pinned_groups
With this counter, we never need to traverse all groups to update
pinned_scope of vfio_iommu.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:40:52 -07:00
Keqian Zhu
4a19f37a3d vfio/iommu_type1: Fix some sanity checks in detach group
vfio_sanity_check_pfn_list() is used to check whether pfn_list and
notifier are empty when remove the external domain, so it makes a
wrong assumption that only external domain will use the pinning
interface.

Now we apply the pfn_list check when a vfio_dma is removed and apply
the notifier check when all domains are removed.

Fixes: a54eb55045 ("vfio iommu type1: Add support for mediated devices")
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:40:52 -07:00
Keqian Zhu
d0a78f9176 vfio/iommu_type1: Populate full dirty when detach non-pinned group
If a group with non-pinned-page dirty scope is detached with dirty
logging enabled, we should fully populate the dirty bitmaps at the
time it's removed since we don't know the extent of its previous DMA,
nor will the group be present to trigger the full bitmap when the user
retrieves the dirty bitmap.

Fixes: d6a4c18566 ("vfio iommu: Implementation of ioctl for dirty pages tracking")
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:40:52 -07:00
Steve Sistare
898b9eaeb3 vfio/type1: block on invalid vaddr
Block translation of host virtual address while an iova range has an
invalid vaddr.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:20:07 -07:00
Steve Sistare
487ace1340 vfio/type1: implement notify callback
Implement a notify callback that remembers if the container's file
descriptor has been closed.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:20:07 -07:00
Steve Sistare
ec5e32940c vfio: iommu driver notify callback
Define a vfio_iommu_driver_ops notify callback, for sending events to
the driver.  Drivers are not required to provide the callback, and
may ignore any events.  The handling of events is driver specific.

Define the CONTAINER_CLOSE event, called when the container's file
descriptor is closed.  This event signifies that no further state changes
will occur via container ioctl's.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:20:07 -07:00
Steve Sistare
c3cbab24db vfio/type1: implement interfaces to update vaddr
Implement VFIO_DMA_UNMAP_FLAG_VADDR, VFIO_DMA_MAP_FLAG_VADDR, and
VFIO_UPDATE_VADDR.  This is a partial implementation.  Blocking is
added in a subsequent patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:20:07 -07:00
Steve Sistare
40ae9b807b vfio/type1: massage unmap iteration
Modify the iteration in vfio_dma_do_unmap so it does not depend on deletion
of each dma entry.  Add a variant of vfio_find_dma that returns the entry
with the lowest iova in the search range to initialize the iteration.  No
externally visible change, but this behavior is needed in the subsequent
update-vaddr patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:20:06 -07:00
Steve Sistare
c196509953 vfio/type1: implement unmap all
Implement VFIO_DMA_UNMAP_FLAG_ALL.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:20:06 -07:00
Steve Sistare
0f53afa12b vfio/type1: unmap cleanup
Minor changes in vfio_dma_do_unmap to improve readability, which also
simplify the subsequent unmap-all patch.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01 13:20:05 -07:00
Linus Torvalds
6a447b0e31 ARM:
* PSCI relay at EL2 when "protected KVM" is enabled
 * New exception injection code
 * Simplification of AArch32 system register handling
 * Fix PMU accesses when no PMU is enabled
 * Expose CSV3 on non-Meltdown hosts
 * Cache hierarchy discovery fixes
 * PV steal-time cleanups
 * Allow function pointers at EL2
 * Various host EL2 entry cleanups
 * Simplification of the EL2 vector allocation
 
 s390:
 * memcg accouting for s390 specific parts of kvm and gmap
 * selftest for diag318
 * new kvm_stat for when async_pf falls back to sync
 
 x86:
 * Tracepoints for the new pagetable code from 5.10
 * Catch VFIO and KVM irqfd events before userspace
 * Reporting dirty pages to userspace with a ring buffer
 * SEV-ES host support
 * Nested VMX support for wait-for-SIPI activity state
 * New feature flag (AVX512 FP16)
 * New system ioctl to report Hyper-V-compatible paravirtualization features
 
 Generic:
 * Selftest improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/bdL4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNgQQgAnTH6rhXa++Zd5F0EM2NwXwz3iEGb
 lOq1DZSGjs6Eekjn8AnrWbmVQr+CBCuGU9MrxpSSzNDK/awryo3NwepOWAZw9eqk
 BBCVwGBbJQx5YrdgkGC0pDq2sNzcpW/VVB3vFsmOxd9eHblnuKSIxEsCCXTtyqIt
 XrLpQ1UhvI4yu102fDNhuFw2EfpzXm+K0Lc0x6idSkdM/p7SyeOxiv8hD4aMr6+G
 bGUQuMl4edKZFOWFigzr8NovQAvDHZGrwfihu2cLRYKLhV97QuWVmafv/yYfXcz2
 drr+wQCDNzDOXyANnssmviazrhOX0QmTAhbIXGGX/kTxYKcfPi83ZLoI3A==
 =ISud
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Much x86 work was pushed out to 5.12, but ARM more than made up for it.

  ARM:
   - PSCI relay at EL2 when "protected KVM" is enabled
   - New exception injection code
   - Simplification of AArch32 system register handling
   - Fix PMU accesses when no PMU is enabled
   - Expose CSV3 on non-Meltdown hosts
   - Cache hierarchy discovery fixes
   - PV steal-time cleanups
   - Allow function pointers at EL2
   - Various host EL2 entry cleanups
   - Simplification of the EL2 vector allocation

  s390:
   - memcg accouting for s390 specific parts of kvm and gmap
   - selftest for diag318
   - new kvm_stat for when async_pf falls back to sync

  x86:
   - Tracepoints for the new pagetable code from 5.10
   - Catch VFIO and KVM irqfd events before userspace
   - Reporting dirty pages to userspace with a ring buffer
   - SEV-ES host support
   - Nested VMX support for wait-for-SIPI activity state
   - New feature flag (AVX512 FP16)
   - New system ioctl to report Hyper-V-compatible paravirtualization features

  Generic:
   - Selftest improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
  KVM: SVM: fix 32-bit compilation
  KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting
  KVM: SVM: Provide support to launch and run an SEV-ES guest
  KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
  KVM: SVM: Provide support for SEV-ES vCPU loading
  KVM: SVM: Provide support for SEV-ES vCPU creation/loading
  KVM: SVM: Update ASID allocation to support SEV-ES guests
  KVM: SVM: Set the encryption mask for the SVM host save area
  KVM: SVM: Add NMI support for an SEV-ES guest
  KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
  KVM: SVM: Do not report support for SMM for an SEV-ES guest
  KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
  KVM: SVM: Add support for EFER write traps for an SEV-ES guest
  KVM: SVM: Support string IO operations for an SEV-ES guest
  KVM: SVM: Support MMIO for an SEV-ES guest
  KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
  KVM: SVM: Create trace events for VMGEXIT processing
  ...
2020-12-20 10:44:05 -08:00
Linus Torvalds
0c71cc04eb VFIO updates for v5.11-rc1
- Fix uninitialized list walk in error path (Eric Auger)
 
  - Use io_remap_pfn_range() (Jason Gunthorpe)
 
  - Allow fallback support for NVLink on POWER8 (Alexey Kardashevskiy)
 
  - Enable mdev request interrupt with CCW support (Eric Farman)
 
  - Enable interface to iommu_domain from vfio_group (Lu Baolu)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJf2mBlAAoJECObm247sIsi4Y0P/2+beHwcZfSn5HSSSeqFR+uj
 HfiZYN05bFcVirHPIPfY56kbe8+XNqwHAUPqch+c2iuGhLHuUIXkHwFq8VqvBDY4
 xVqu9ZB23e881dF7NPzMAcK2UMDDwtlmeaf+7oRADVmTn58hErU9qt813i7a+x1S
 5p8UpCTf+X+sh434cV882/TIV4hWgaFgn1/Gy3l8GDaJUW1lQb3MHZ5XY+o9KOxt
 4R7VLIYZCDwEcByUsz8p4RLD036YpSS2Ir+CHXqeArtKrwRjjM62cSwDCOuM6ewZ
 GJ7O3YzPp9FQ75F3oorL3D+ojPY6AU1QjZKg0+gAQS3kucewwkv+vI+RDhSc8Xx7
 PFTW2bLk4cu9LHQxUT64uH5Qoa0NtfPBUGvgsR4kXPCRClk71ZcGZgaQD3CWJBhE
 CTPI1OLHeJgp7MGXAArGRzFOtf0nux5oxOmcmT5fg4icG+x7BgcvDd7dVhyGAjn7
 Gp87OOtJ3itDhWIlO1aTJVHEt42b1eezLkkyIKHfPlDLJmyHEfCOTjDLftr7Rmma
 2IlyDJs83MCahQFcjqbPJWqh2Ttda8+ItutiklwBJRHe6EC+4WML6JaHgXe71CDi
 9Y9HwKLxYFr1pwUpQP6bxzEDlTCPasWyhBmOAUnRtZdU/daX+KGp4WqSFBYkPIgy
 ERmTCiJGJ3p41V08qOki
 =W/GT
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.11-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Fix uninitialized list walk in error path (Eric Auger)

 - Use io_remap_pfn_range() (Jason Gunthorpe)

 - Allow fallback support for NVLink on POWER8 (Alexey Kardashevskiy)

 - Enable mdev request interrupt with CCW support (Eric Farman)

 - Enable interface to iommu_domain from vfio_group (Lu Baolu)

* tag 'vfio-v5.11-rc1' of git://github.com/awilliam/linux-vfio:
  vfio/type1: Add vfio_group_iommu_domain()
  vfio-ccw: Wire in the request callback
  vfio-mdev: Wire in a request handler for mdev parent
  vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU
  vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  vfio/pci: Move dummy_resources_list init in vfio_pci_probe()
2020-12-16 15:51:15 -08:00
Lu Baolu
bdfae1c9a9 vfio/type1: Add vfio_group_iommu_domain()
Add the API for getting the domain from a vfio group. This could be used
by the physical device drivers which rely on the vfio/mdev framework for
mediated device user level access. The typical use case like below:

	unsigned int pasid;
	struct vfio_group *vfio_group;
	struct iommu_domain *iommu_domain;
	struct device *dev = mdev_dev(mdev);
	struct device *iommu_device = mdev_get_iommu_device(dev);

	if (!iommu_device ||
	    !iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
		return -EINVAL;

	vfio_group = vfio_group_get_external_user_from_dev(dev);
	if (IS_ERR_OR_NULL(vfio_group))
		return -EFAULT;

	iommu_domain = vfio_group_iommu_domain(vfio_group);
	if (IS_ERR_OR_NULL(iommu_domain)) {
		vfio_group_put_external_user(vfio_group);
		return -EFAULT;
	}

	pasid = iommu_aux_get_pasid(iommu_domain, iommu_device);
	if (pasid < 0) {
		vfio_group_put_external_user(vfio_group);
		return -EFAULT;
	}

	/* Program device context with pasid value. */
	...

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-12-10 14:47:56 -07:00
Andy Shevchenko
feaba5932b vfio: platform: Switch to use platform_get_mem_or_io()
Switch to use new platform_get_mem_or_io() instead of home grown analogue.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
Acked-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20201209203642.27648-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-10 16:31:46 +01:00
Eric Farman
a15ac665b9 vfio-mdev: Wire in a request handler for mdev parent
While performing some destructive tests with vfio-ccw, where the
paths to a device are forcible removed and thus the device itself
is unreachable, it is rather easy to end up in an endless loop in
vfio_del_group_dev() due to the lack of a request callback for the
associated device.

In this example, one MDEV (77c) is used by a guest, while another
(77b) is not. The symptom is that the iommu is detached from the
mdev for 77b, but not 77c, until that guest is shutdown:

    [  238.794867] vfio_ccw 0.0.077b: MDEV: Unregistering
    [  238.794996] vfio_mdev 11f2d2bc-4083-431d-a023-eff72715c4f0: Removing from iommu group 2
    [  238.795001] vfio_mdev 11f2d2bc-4083-431d-a023-eff72715c4f0: MDEV: detaching iommu
    [  238.795036] vfio_ccw 0.0.077c: MDEV: Unregistering
    ...silence...

Let's wire in the request call back to the mdev device, so that a
device being physically removed from the host can be (gracefully?)
handled by the parent device at the time the device is removed.

Add a message when registering the device if a driver doesn't
provide this callback, so a clue is given that this same loop
may be encountered in a similar situation, and a message when
this occurs instead of the awkward silence noted above.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-12-03 16:21:07 -07:00
Alexey Kardashevskiy
d22f9a6c92 vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU
We execute certain NPU2 setup code (such as mapping an LPID to a device
in NPU2) unconditionally if an Nvlink bridge is detected. However this
cannot succeed on POWER8NVL machines as the init helpers return an error
other than ENODEV which means the device is there is and setup failed so
vfio_pci_enable() fails and pass through is not possible.

This changes the two NPU2 related init helpers to return -ENODEV if
there is no "memory-region" device tree property as this is
the distinction between NPU and NPU2.

Tested on
- POWER9 pvr=004e1201, Ubuntu 19.04 host, Ubuntu 18.04 vm,
  NVIDIA GV100 10de:1db1 driver 418.39
- POWER8 pvr=004c0100, RHEL 7.6 host, Ubuntu 16.10 vm,
  NVIDIA P100 10de:15f9 driver 396.47

Fixes: 7f92891778 ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver")
Cc: stable@vger.kernel.org # 5.0
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-12-02 13:04:22 -07:00
Jason Gunthorpe
7b06a56d46 vfio-pci: Use io_remap_pfn_range() for PCI IO memory
commit f8f6ae5d07 ("mm: always have io_remap_pfn_range() set
pgprot_decrypted()") allows drivers using mmap to put PCI memory mapped
BAR space into userspace to work correctly on AMD SME systems that default
to all memory encrypted.

Since vfio_pci_mmap_fault() is working with PCI memory mapped BAR space it
should be calling io_remap_pfn_range() otherwise it will not work on SME
systems.

Fixes: 11c4cd07ba ("vfio-pci: Fault mmaps to enable vma tracking")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Peter Xu <peterx@redhat.com>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-12-02 13:03:12 -07:00
Eric Auger
16b8fe4caf vfio/pci: Move dummy_resources_list init in vfio_pci_probe()
In case an error occurs in vfio_pci_enable() before the call to
vfio_pci_probe_mmaps(), vfio_pci_disable() will  try to iterate
on an uninitialized list and cause a kernel panic.

Lets move to the initialization to vfio_pci_probe() to fix the
issue.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Fixes: 05f0c03fba ("vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive")
CC: Stable <stable@vger.kernel.org> # v4.7+
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-12-02 13:03:12 -07:00
David Woodhouse
b1b397aeef vfio/virqfd: Drain events from eventfd in virqfd_wakeup()
Don't allow the events to accumulate in the eventfd counter, drain them
as they are handled.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20201027135523.646811-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-15 09:49:10 -05:00
Fred Gao
e4eccb8536 vfio/pci: Bypass IGD init in case of -ENODEV
Bypass the IGD initialization when -ENODEV returns,
that should be the case if opregion is not available for IGD
or within discrete graphics device's option ROM,
or host/lpc bridge is not found.

Then use of -ENODEV here means no special device resources found
which needs special care for VFIO, but we still allow other normal
device resource access.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Xiong Zhang <xiong.y.zhang@intel.com>
Cc: Hang Yuan <hang.yuan@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Fred Gao <fred.gao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-03 11:07:40 -07:00
Zhang Qilong
bb742ad019 vfio: platform: fix reference leak in vfio_platform_open
pm_runtime_get_sync() will increment pm usage counter even it
failed. Forgetting to call pm_runtime_put will result in
reference leak in vfio_platform_open, so we should fix it.

Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-03 11:07:40 -07:00
Alex Williamson
38565c93c8 vfio/pci: Implement ioeventfd thread handler for contended memory lock
The ioeventfd is called under spinlock with interrupts disabled,
therefore if the memory lock is contended defer code that might
sleep to a thread context.

Fixes: bc93b9ae01 ("vfio-pci: Avoid recursive read-lock usage")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209253#c1
Reported-by: Ian Pilcher <arequipeno@gmail.com>
Tested-by: Ian Pilcher <arequipeno@gmail.com>
Tested-by: Justin Gatzen <justin.gatzen@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-03 11:07:40 -07:00
Diana Craciun
8e91cb3812 vfio/fsl-mc: Make vfio_fsl_mc_irqs_allocate static
Fixed compiler warning:
drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c:16:5: warning: no previous
prototype for function 'vfio_fsl_mc_irqs_allocate' [-Wmissing-prototypes]
       ^
drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c:16:1: note: declare 'static'
if the function is not intended to be used outside of this translation unit
int vfio_fsl_mc_irqs_allocate(struct vfio_fsl_mc_device *vdev)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-03 11:07:40 -07:00
Dan Carpenter
69848cd6f0 vfio/fsl-mc: prevent underflow in vfio_fsl_mc_mmap()
My static analsysis tool complains that the "index" can be negative.
There are some checks in do_mmap() which try to prevent underflows but
I don't know if they are sufficient for this situation.  Either way,
making "index" unsigned is harmless so let's do it just to be safe.

Fixes: 6724728968 ("vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-03 11:07:19 -07:00
Dan Carpenter
09699e56de vfio/fsl-mc: return -EFAULT if copy_to_user() fails
The copy_to_user() function returns the number of bytes remaining to be
copied, but this code should return -EFAULT.

Fixes: df747bcd5b ("vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-02 15:00:06 -07:00
Zenghui Yu
572f64c71e vfio/type1: Use the new helper to find vfio_group
When attaching a new group to the container, let's use the new helper
vfio_iommu_find_iommu_group() to check if it's already attached. There
is no functional change.

Also take this chance to add a missing blank line.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-02 14:58:29 -07:00
Linus Torvalds
fc996db970 VFIO updates for v5.10-rc1
- New fsl-mc vfio bus driver supporting userspace drivers of objects
    within NXP's DPAA2 architecture (Diana Craciun)
 
  - Support for exposing zPCI information on s390 (Matthew Rosato)
 
  - Fixes for "detached" VFs on s390 (Matthew Rosato)
 
  - Fixes for pin-pages and dma-rw accesses (Yan Zhao)
 
  - Cleanups and optimize vconfig regen (Zenghui Yu)
 
  - Fix duplicate irq-bypass token registration (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJfkcCjAAoJECObm247sIsi2XIP/j7NL4glPrWU37mesz9dd5nx
 SmZhcmxnOqZSQkOCnu+hNFZ9e+tdQjuX+jATOZaYz5l55bLAFmBlBj1Dv8HWaCVI
 mTbJ6xXUwdOvNSxbFH6BIUkJg8otR0iEkefVyJLNlF84FsaDknH4yZxx0vdeczjF
 wTkkk3+4VmH+4klvPIa9v0eL7yeKeFmgls9nQViVE5kDWUF4us/z/oHlVm9wR+mL
 2r3DEjHyz4L2hwVEkhZk7ytR6szdhuhF2l7NoMmaSEXRXjBzJoO6I3P9Y2W4i+su
 MFgTfiQ+OpIfVuiR8GzGev+/SrjWGX0Hvb2sYriKOELjhyedkE2kmxacbqMZ/UE+
 SRAhFf64C1rzJ4g1IW//Gg+9ObIPqlkqU52VDbOZdCED0AquwSyVmdwIUAK6qF+I
 HLOyZXhMI8EZ+w063cS+aKLJIvQTBbfIdMmPZkopVZhwWB3N3BjdvBKA+rPpPoTx
 0DpeUo891+zyeEE4aunUmCB8HFnBPgUa+XZqg2juq9MxjScsqgTzA0WEZg7jV4oj
 tORQrqoAKJgSk9oVL3EvAnr+IJix3ScRTqYymESORkz/lRCk2hFX48qdeW+qiSP8
 W1DHOnivFb1+JzhuZyaRKFWy1mK0EQQWTsE2b2ymPMKJbFhi+pVxaksmeG5x+4Q9
 SAp+Qma8Aj3UtBKcj/S+
 =LDPo
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - New fsl-mc vfio bus driver supporting userspace drivers of objects
   within NXP's DPAA2 architecture (Diana Craciun)

 - Support for exposing zPCI information on s390 (Matthew Rosato)

 - Fixes for "detached" VFs on s390 (Matthew Rosato)

 - Fixes for pin-pages and dma-rw accesses (Yan Zhao)

 - Cleanups and optimize vconfig regen (Zenghui Yu)

 - Fix duplicate irq-bypass token registration (Alex Williamson)

* tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits)
  vfio iommu type1: Fix memory leak in vfio_iommu_type1_pin_pages
  vfio/pci: Clear token on bypass registration failure
  vfio/fsl-mc: fix the return of the uninitialized variable ret
  vfio/fsl-mc: Fix the dead code in vfio_fsl_mc_set_irq_trigger
  vfio/fsl-mc: Fixed vfio-fsl-mc driver compilation on 32 bit
  MAINTAINERS: Add entry for s390 vfio-pci
  vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO
  vfio/fsl-mc: Add support for device reset
  vfio/fsl-mc: Add read/write support for fsl-mc devices
  vfio/fsl-mc: trigger an interrupt via eventfd
  vfio/fsl-mc: Add irq infrastructure for fsl-mc devices
  vfio/fsl-mc: Added lock support in preparation for interrupt handling
  vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions
  vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call
  vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctl
  vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind
  vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFO
  s390/pci: track whether util_str is valid in the zpci_dev
  s390/pci: stash version in the zpci_dev
  vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices
  ...
2020-10-22 13:00:44 -07:00
Xiaoyang Xu
2e6cfd496f vfio iommu type1: Fix memory leak in vfio_iommu_type1_pin_pages
pfn is not added to pfn_list when vfio_add_to_pfn_list fails.
vfio_unpin_page_external will exit directly without calling
vfio_iova_put_vfio_pfn.  This will lead to a memory leak.

Fixes: a54eb55045 ("vfio iommu type1: Add support for mediated devices")
Signed-off-by: Xiaoyang Xu <xuxiaoyang2@huawei.com>
[aw: simplified logic, add Fixes]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-20 10:12:17 -06:00
Alex Williamson
852b1beecb vfio/pci: Clear token on bypass registration failure
The eventfd context is used as our irqbypass token, therefore if an
eventfd is re-used, our token is the same.  The irqbypass code will
return an -EBUSY in this case, but we'll still attempt to unregister
the producer, where if that duplicate token still exists, results in
removing the wrong object.  Clear the token of failed producers so
that they harmlessly fall out when unregistered.

Fixes: 6d7425f109 ("vfio: Register/unregister irq_bypass_producer")
Reported-by: guomin chen <guomin_chen@sina.com>
Tested-by: guomin chen <guomin_chen@sina.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-19 07:13:55 -06:00
Diana Craciun
822e1a90af vfio/fsl-mc: fix the return of the uninitialized variable ret
The vfio_fsl_mc_reflck_attach function may return, on success path,
an uninitialized variable. Fix the problem by initializing the return
variable to 0.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: f2ba7e8c94 ("vfio/fsl-mc: Added lock support in preparation for interrupt handling")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-19 07:09:41 -06:00
Jann Horn
4d45e75a99 mm: remove the now-unnecessary mmget_still_valid() hack
The preceding patches have ensured that core dumping properly takes the
mmap_lock.  Thanks to that, we can now remove mmget_still_valid() and all
its users.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:22 -07:00
Diana Craciun
159246378d vfio/fsl-mc: Fix the dead code in vfio_fsl_mc_set_irq_trigger
Static analysis discovered that some code in vfio_fsl_mc_set_irq_trigger
is dead code. Fixed the code by changing the conditions order.

Fixes: cc0ee20bd9 ("vfio/fsl-mc: trigger an interrupt via eventfd")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-15 12:46:08 -06:00
Diana Craciun
83e491799e vfio/fsl-mc: Fixed vfio-fsl-mc driver compilation on 32 bit
The FSL_MC_BUS on which the VFIO-FSL-MC driver is dependent on
can be compiled on other architectures as well (not only ARM64)
including 32 bit architectures.
Include linux/io-64-nonatomic-hi-lo.h to make writeq/readq used
in the driver available on 32bit platforms.

Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-13 11:12:29 -06:00
Alex Williamson
2099363255 Merge branches 'v5.10/vfio/fsl-mc-v6' and 'v5.10/vfio/zpci-info-v3' into v5.10/vfio/next 2020-10-12 11:41:02 -06:00
Matthew Rosato
e6b817d4b8 vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO
Define a new configuration entry VFIO_PCI_ZDEV for VFIO/PCI.

When this s390-only feature is configured we add capabilities to the
VFIO_DEVICE_GET_INFO ioctl that describe features of the associated
zPCI device and its underlying hardware.

This patch is based on work previously done by Pierre Morel.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:37:59 -06:00
Diana Craciun
ac93ab2bf6 vfio/fsl-mc: Add support for device reset
Currently only resetting the DPRC container is supported which
will reset all the objects inside it. Resetting individual
objects is possible from the userspace by issueing commands
towards MC firmware.

Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:33:48 -06:00
Diana Craciun
1bb141ed5e vfio/fsl-mc: Add read/write support for fsl-mc devices
The software uses a memory-mapped I/O command interface (MC portals) to
communicate with the MC hardware. This command interface is used to
discover, enumerate, configure and remove DPAA2 objects. The DPAA2
objects use MSIs, so the command interface needs to be emulated
such that the correct MSI is configured in the hardware (the guest
has the virtual MSIs).

This patch is adding read/write support for fsl-mc devices. The mc
commands are emulated by the userspace. The host is just passing
the correct command to the hardware.

Also the current patch limits userspace to write complete
64byte command once and read 64byte response by one ioctl.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:33:27 -06:00
Diana Craciun
cc0ee20bd9 vfio/fsl-mc: trigger an interrupt via eventfd
This patch allows to set an eventfd for fsl-mc device interrupts
and also to trigger the interrupt eventfd from userspace for testing.

All fsl-mc device interrupts are MSIs. The MSIs are allocated from
the MSI domain only once per DPRC and used by all the DPAA2 objects.
The interrupts are managed by the DPRC in a pool of interrupts. Each
device requests interrupts from this pool. The pool is allocated
when the first virtual device is setting the interrupts.
The pool of interrupts is protected by a lock.

The DPRC has an interrupt of its own which indicates if the DPRC
contents have changed. However, currently, the contents of a DPRC
assigned to the guest cannot be changed at runtime, so this interrupt
is not configured.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:33:15 -06:00
Diana Craciun
2e0d29561f vfio/fsl-mc: Add irq infrastructure for fsl-mc devices
This patch adds the skeleton for interrupt support
for fsl-mc devices. The interrupts are not yet functional,
the functionality will be added by subsequent patches.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:33:06 -06:00
Diana Craciun
f2ba7e8c94 vfio/fsl-mc: Added lock support in preparation for interrupt handling
Only the DPRC object allocates interrupts from the MSI
interrupt domain. The interrupts are managed by the DPRC in
a pool of interrupts. The access to this pool of interrupts
has to be protected with a lock.
This patch extends the current lock implementation to have a
lock per DPRC.

Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:32:49 -06:00
Diana Craciun
6724728968 vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions
Allow userspace to mmap device regions for direct access of
fsl-mc devices.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:32:37 -06:00
Diana Craciun
df747bcd5b vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call
Expose to userspace information about the memory regions.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:32:24 -06:00
Diana Craciun
f97f4c04e5 vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctl
Allow userspace to get fsl-mc device info (number of regions
and irqs).

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:32:19 -06:00
Diana Craciun
704f5082d8 vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind
The DPRC (Data Path Resource Container) device is a bus device and has
child devices attached to it. When the vfio-fsl-mc driver is probed
the DPRC is scanned and the child devices discovered and initialized.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:32:03 -06:00
Bharat Bhushan
fb1ff4c194 vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices
DPAA2 (Data Path Acceleration Architecture) consists in
mechanisms for processing Ethernet packets, queue management,
accelerators, etc.

The Management Complex (mc) is a hardware entity that manages the DPAA2
hardware resources. It provides an object-based abstraction for software
drivers to use the DPAA2 hardware. The MC mediates operations such as
create, discover, destroy of DPAA2 objects.
The MC provides memory-mapped I/O command interfaces (MC portals) which
DPAA2 software drivers use to operate on DPAA2 objects.

A DPRC is a container object that holds other types of DPAA2 objects.
Each object in the DPRC is a Linux device and bound to a driver.
The MC-bus driver is a platform driver (different from PCI or platform
bus). The DPRC driver does runtime management of a bus instance. It
performs the initial scan of the DPRC and handles changes in the DPRC
configuration (adding/removing objects).

All objects inside a container share the same hardware isolation
context, meaning that only an entire DPRC can be assigned to
a virtual machine.
When a container is assigned to a virtual machine, all the objects
within that container are assigned to that virtual machine.
The DPRC container assigned to the virtual machine is not allowed
to change contents (add/remove objects) by the guest. The restriction
is set by the host and enforced by the mc hardware.

The DPAA2 objects can be directly assigned to the guest. However
the MC portals (the memory mapped command interface to the MC) need
to be emulated because there are commands that configure the
interrupts and the isolation IDs which are virtual in the guest.

Example:
echo vfio-fsl-mc > /sys/bus/fsl-mc/devices/dprc.2/driver_override
echo dprc.2 > /sys/bus/fsl-mc/drivers/vfio-fsl-mc/bind

The dprc.2 is bound to the VFIO driver and all the objects within
dprc.2 are going to be bound to the VFIO driver.

This patch adds the infrastructure for VFIO support for fsl-mc
devices. Subsequent patches will add support for binding and secure
assigning these devices using VFIO.

More details about the DPAA2 objects can be found here:
Documentation/networking/device_drivers/freescale/dpaa2/overview.rst

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-07 14:17:33 -06:00
Alex Williamson
3de066f8f8 Merge branches 'v5.10/vfio/bardirty', 'v5.10/vfio/dma_avail', 'v5.10/vfio/misc', 'v5.10/vfio/no-cmd-mem' and 'v5.10/vfio/yan_zhao_fixes' into v5.10/vfio/next 2020-09-22 10:56:51 -06:00
Yan Zhao
2c5af98592 vfio/type1: fix dirty bitmap calculation in vfio_dma_rw
The count of dirtied pages is not only determined by count of copied
pages, but also by the start offset.

e.g. if offset = PAGE_SIZE - 1, and *copied=2, the dirty pages count
is 2, instead of 1 or 0.

Fixes: d6a4c18566 ("vfio iommu: Implementation of ioctl for dirty pages tracking")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22 10:56:41 -06:00
Yan Zhao
28b1302440 vfio: fix a missed vfio group put in vfio_pin_pages
When error occurs, need to put vfio group after a successful get.

Fixes: 95fc87b441 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22 10:56:40 -06:00
Matthew Rosato
515ecd5368 vfio/pci: Decouple PCI_COMMAND_MEMORY bit checks from is_virtfn
While it is true that devices with is_virtfn=1 will have a Memory Space
Enable bit that is hard-wired to 0, this is not the only case where we
see this behavior -- For example some bare-metal hypervisors lack
Memory Space Enable bit emulation for devices not setting is_virtfn
(s390). Fix this by instead checking for the newly-added
no_command_memory bit which directly denotes the need for
PCI_COMMAND_MEMORY emulation in vfio.

Fixes: abafbc551f ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22 10:52:24 -06:00
Matthew Rosato
7d6e132965 vfio iommu: Add dma available capability
Commit 492855939b ("vfio/type1: Limit DMA mappings per container")
added the ability to limit the number of memory backed DMA mappings.
However on s390x, when lazy mapping is in use, we use a very large
number of concurrent mappings.  Let's provide the current allowable
number of DMA mappings to userspace via the IOMMU info chain so that
userspace can take appropriate mitigation.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21 14:58:34 -06:00
Yan Zhao
7ef32e5236 vfio: add a singleton check for vfio_group_pin_pages
Page pinning is used both to translate and pin device mappings for DMA
purpose, as well as to indicate to the IOMMU backend to limit the dirty
page scope to those pages that have been pinned, in the case of an IOMMU
backed device.
To support this, the vfio_pin_pages() interface limits itself to only
singleton groups such that the IOMMU backend can consider dirty page
scope only at the group level.  Implement the same requirement for the
vfio_group_pin_pages() interface.

Fixes: 95fc87b441 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21 14:50:51 -06:00
Zenghui Yu
1c0f68252a vfio/pci: Don't regenerate vconfig for all BARs if !bardirty
Now we regenerate vconfig for all the BARs via vfio_bar_fixup(), every
time any offset of any of them are read.  Though BARs aren't re-read
regularly, the regeneration can be avoided if no BARs had been written
since they were last read, in which case vdev->bardirty is false.

Let's return immediately in vfio_bar_fixup() if bardirty is false.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21 14:08:12 -06:00
Zenghui Yu
eac7cc21c4 vfio/pci: Remove redundant declaration of vfio_pci_driver
It was added by commit 137e553135 ("vfio/pci: Add sriov_configure
support") but duplicates a forward declaration earlier in the file.

Remove it.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21 14:07:05 -06:00
Tom Murphy
aae4c8e27b iommu: Rename iommu_tlb_* functions to iommu_iotlb_*
To keep naming consistent we should stick with *iotlb*. This patch
renames a few remaining functions.

Signed-off-by: Tom Murphy <murphyt7@tcd.ie>
Link: https://lore.kernel.org/r/20200817210051.13546-1-murphyt7@tcd.ie
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-09-04 11:16:09 +02:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Alex Williamson
aae7a75a82 vfio/type1: Add proper error unwind for vfio_iommu_replay()
The vfio_iommu_replay() function does not currently unwind on error,
yet it does pin pages, perform IOMMU mapping, and modify the vfio_dma
structure to indicate IOMMU mapping.  The IOMMU mappings are torn down
when the domain is destroyed, but the other actions go on to cause
trouble later.  For example, the iommu->domain_list can be empty if we
only have a non-IOMMU backed mdev attached.  We don't currently check
if the list is empty before getting the first entry in the list, which
leads to a bogus domain pointer.  If a vfio_dma entry is erroneously
marked as iommu_mapped, we'll attempt to use that bogus pointer to
retrieve the existing physical page addresses.

This is the scenario that uncovered this issue, attempting to hot-add
a vfio-pci device to a container with an existing mdev device and DMA
mappings, one of which could not be pinned, causing a failure adding
the new group to the existing container and setting the conditions
for a subsequent attempt to explode.

To resolve this, we can first check if the domain_list is empty so
that we can reject replay of a bogus domain, should we ever encounter
this inconsistent state again in the future.  The real fix though is
to add the necessary unwind support, which means cleaning up the
current pinning if an IOMMU mapping fails, then walking back through
the r-b tree of DMA entries, reading from the IOMMU which ranges are
mapped, and unmapping and unpinning those ranges.  To be able to do
this, we also defer marking the DMA entry as IOMMU mapped until all
entries are processed, in order to allow the unwind to know the
disposition of each entry.

Fixes: a54eb55045 ("vfio iommu type1: Add support for mediated devices")
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Tested-by: Zhiyi Guo <zhguo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-08-17 11:09:13 -06:00
Alex Williamson
bc93b9ae01 vfio-pci: Avoid recursive read-lock usage
A down_read on memory_lock is held when performing read/write accesses
to MMIO BAR space, including across the copy_to/from_user() callouts
which may fault.  If the user buffer for these copies resides in an
mmap of device MMIO space, the mmap fault handler will acquire a
recursive read-lock on memory_lock.  Avoid this by reducing the lock
granularity.  Sequential accesses requiring multiple ioread/iowrite
cycles are expected to be rare, therefore typical accesses should not
see additional overhead.

VGA MMIO accesses are expected to be non-fatal regardless of the PCI
memory enable bit to allow legacy probing, this behavior remains with
a comment added.  ioeventfds are now included in memory access testing,
with writes dropped while memory space is disabled.

Fixes: abafbc551f ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Tested-by: Zhiyi Guo <zhguo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-08-17 11:08:18 -06:00
Linus Torvalds
407bc8d818 VFIO updates for v5.9-rc1
- Inclusive naming updates (Alex Williamson)
 
  - Intel X550 INTx quirk (Alex Williamson)
 
  - Error path resched between unmaps (Xiang Zheng)
 
  - SPAPR IOMMU pin_user_pages() conversion (John Hubbard)
 
  - Trivial mutex simplification (Alex Williamson)
 
  - QAT device denylist (Giovanni Cabiddu)
 
  - type1 IOMMU ioctl refactor (Liu Yi L)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJfM1LBAAoJECObm247sIsiF1oP/RJOlmwKf0mmq8iP0j5yjCis
 gjvAAvmD4h3y15yx0kOoY8BNJjyHT26iuqnCHbEhaO+u+lNt3plWNJG9FRpuZJJY
 To6fsj6SApC3BnI3lxhduvQa29dcflbyohzjF2/+QJLbt4PJoRA5GlAEcI4rIUE0
 JfxquYE7n++h4of/532392TmJpMZeW+9IScF2TzPCIYMCw4DwUYZRcfm0UePkv0D
 nQOIyyeyzFhBhoG45TqQr4yp8KBXQvrBViwkuIX+4P0JnRBI/h8LLcDu9TCGAo87
 mkF/AZfuq7q5X6hJwbNavk9YeksNWeIM5JaYyJeBd42a3j3uz6iUe9ENUMQpm7qe
 6SGU3V7xVAPokE5q9VmjlSJz0yUgopqsTrD1EBP5DrQGIBVBAkFPHf/bOPE8DrAN
 htUNX9Zl/TuNPBmEBursgot9qMvgdwA1f2uOPtiB2zvf44qPgoGbT16k48xNK+x7
 kWWVlgUYX3yIi9NI0EnlEaTMA2DGL3xY5/ZLjySojWrtu5AspTuBlJZaT1jPKAAF
 OmKqJFCIIRMqrWPdg8r9P7k6nLcW0nI6hIMhxqRVgqp18PYLojPsaeDk/ndQv8Lx
 D0IafrruF932XGRFjuo2d5vUar4pJcHXqwbeQbf6mWRIeprrBYWm7nVsU/ZIWIMU
 lCtj3NQtShAIGARZUZK9
 =qsYH
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.9-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Inclusive naming updates (Alex Williamson)

 - Intel X550 INTx quirk (Alex Williamson)

 - Error path resched between unmaps (Xiang Zheng)

 - SPAPR IOMMU pin_user_pages() conversion (John Hubbard)

 - Trivial mutex simplification (Alex Williamson)

 - QAT device denylist (Giovanni Cabiddu)

 - type1 IOMMU ioctl refactor (Liu Yi L)

* tag 'vfio-v5.9-rc1' of git://github.com/awilliam/linux-vfio:
  vfio/type1: Refactor vfio_iommu_type1_ioctl()
  vfio/pci: Add QAT devices to denylist
  vfio/pci: Add device denylist
  PCI: Add Intel QuickAssist device IDs
  vfio/pci: Hold igate across releasing eventfd contexts
  vfio/spapr_tce: convert get_user_pages() --> pin_user_pages()
  vfio/type1: Add conditional rescheduling after iommu map failed
  vfio/pci: Add Intel X550 to hidden INTx devices
  vfio: Cleanup allowed driver naming
2020-08-12 12:09:36 -07:00
Peter Xu
64019a2e46 mm/gup: remove task_struct pointer for all gup code
After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more.  Remove that parameter in the whole gup
stack.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:58:04 -07:00
Liu Yi L
ccd59dce1a vfio/type1: Refactor vfio_iommu_type1_ioctl()
This patch refactors the vfio_iommu_type1_ioctl() to use switch instead of
if-else, and each command got a helper function.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:46:13 -06:00
Giovanni Cabiddu
50173329c8 vfio/pci: Add QAT devices to denylist
The current generation of Intel® QuickAssist Technology devices
are not designed to run in an untrusted environment because of the
following issues reported in the document "Intel® QuickAssist Technology
(Intel® QAT) Software for Linux" (document number 336211-014):

QATE-39220 - GEN - Intel® QAT API submissions with bad addresses that
             trigger DMA to invalid or unmapped addresses can cause a
             platform hang
QATE-7495  - GEN - An incorrectly formatted request to Intel® QAT can
             hang the entire Intel® QAT Endpoint

The document is downloadable from https://01.org/intel-quickassist-technology
at the following link:
https://01.org/sites/default/files/downloads/336211-014-qatforlinux-releasenotes-hwv1.7_0.pdf

This patch adds the following QAT devices to the denylist: DH895XCC,
C3XXX and C62X.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Fiona Trahe <fiona.trahe@intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:40 -06:00
Giovanni Cabiddu
1f97970e6c vfio/pci: Add device denylist
Add denylist of devices that by default are not probed by vfio-pci.
Devices in this list may be susceptible to untrusted application, even
if the IOMMU is enabled. To be accessed via vfio-pci, the user has to
explicitly disable the denylist.

The denylist can be disabled via the module parameter disable_denylist.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Fiona Trahe <fiona.trahe@intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:40 -06:00
Alex Williamson
924b51abf9 vfio/pci: Hold igate across releasing eventfd contexts
No need to release and immediately re-acquire igate while clearing
out the eventfd ctxs.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:38 -06:00
John Hubbard
9d532f2869 vfio/spapr_tce: convert get_user_pages() --> pin_user_pages()
This code was using get_user_pages*(), in a "Case 2" scenario
(DMA/RDMA), using the categorization from [1]. That means that it's
time to convert the get_user_pages*() + put_page() calls to
pin_user_pages*() + unpin_user_pages() calls.

There is some helpful background in [2]: basically, this is a small
part of fixing a long-standing disconnect between pinning pages, and
file systems' use of those pages.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
    https://lwn.net/Articles/807108/

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:38 -06:00
Xiang Zheng
e1907d6752 vfio/type1: Add conditional rescheduling after iommu map failed
Commit c5e6688752 ("vfio/type1: Add conditional rescheduling")
missed a "cond_resched()" in vfio_iommu_map if iommu map failed.

This is a very tiny optimization and the case can hardly happen.

Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:37 -06:00
Alex Williamson
bf3551e150 vfio/pci: Add Intel X550 to hidden INTx devices
Intel document 333717-008, "Intel® Ethernet Controller X550
Specification Update", version 2.7, dated June 2020, includes errata
#22, added in version 2.1, May 2016, indicating X550 NICs suffer from
the same implementation deficiency as the 700-series NICs:

"The Interrupt Status bit in the Status register of the PCIe
 configuration space is not implemented and is not set as described
 in the PCIe specification."

Without the interrupt status bit, vfio-pci cannot determine when
these devices signal INTx.  They are therefore added to the nointx
quirk.

Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:37 -06:00
Alex Williamson
26afdd9882 vfio: Cleanup allowed driver naming
No functional change, avoid non-inclusive naming schemes.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:36 -06:00
Zeng Tao
b872d06408 vfio/pci: fix racy on error and request eventfd ctx
The vfio_pci_release call will free and clear the error and request
eventfd ctx while these ctx could be in use at the same time in the
function like vfio_pci_request, and it's expected to protect them under
the vdev->igate mutex, which is missing in vfio_pci_release.

This issue is introduced since commit 1518ac272e ("vfio/pci: fix memory
leaks of eventfd ctx"),and since commit 5c5866c593 ("vfio/pci: Clear
error and request eventfd ctx after releasing"), it's very easily to
trigger the kernel panic like this:

[ 9513.904346] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 9513.913091] Mem abort info:
[ 9513.915871]   ESR = 0x96000006
[ 9513.918912]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 9513.924198]   SET = 0, FnV = 0
[ 9513.927238]   EA = 0, S1PTW = 0
[ 9513.930364] Data abort info:
[ 9513.933231]   ISV = 0, ISS = 0x00000006
[ 9513.937048]   CM = 0, WnR = 0
[ 9513.940003] user pgtable: 4k pages, 48-bit VAs, pgdp=0000007ec7d12000
[ 9513.946414] [0000000000000008] pgd=0000007ec7d13003, p4d=0000007ec7d13003, pud=0000007ec728c003, pmd=0000000000000000
[ 9513.956975] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 9513.962521] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio hclge hns3 hnae3 [last unloaded: vfio_pci]
[ 9513.972998] CPU: 4 PID: 1327 Comm: bash Tainted: G        W         5.8.0-rc4+ #3
[ 9513.980443] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020
[ 9513.989274] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--)
[ 9513.994827] pc : _raw_spin_lock_irqsave+0x48/0x88
[ 9513.999515] lr : eventfd_signal+0x6c/0x1b0
[ 9514.003591] sp : ffff800038a0b960
[ 9514.006889] x29: ffff800038a0b960 x28: ffff007ef7f4da10
[ 9514.012175] x27: ffff207eefbbfc80 x26: ffffbb7903457000
[ 9514.017462] x25: ffffbb7912191000 x24: ffff007ef7f4d400
[ 9514.022747] x23: ffff20be6e0e4c00 x22: 0000000000000008
[ 9514.028033] x21: 0000000000000000 x20: 0000000000000000
[ 9514.033321] x19: 0000000000000008 x18: 0000000000000000
[ 9514.038606] x17: 0000000000000000 x16: ffffbb7910029328
[ 9514.043893] x15: 0000000000000000 x14: 0000000000000001
[ 9514.049179] x13: 0000000000000000 x12: 0000000000000002
[ 9514.054466] x11: 0000000000000000 x10: 0000000000000a00
[ 9514.059752] x9 : ffff800038a0b840 x8 : ffff007ef7f4de60
[ 9514.065038] x7 : ffff007fffc96690 x6 : fffffe01faffb748
[ 9514.070324] x5 : 0000000000000000 x4 : 0000000000000000
[ 9514.075609] x3 : 0000000000000000 x2 : 0000000000000001
[ 9514.080895] x1 : ffff007ef7f4d400 x0 : 0000000000000000
[ 9514.086181] Call trace:
[ 9514.088618]  _raw_spin_lock_irqsave+0x48/0x88
[ 9514.092954]  eventfd_signal+0x6c/0x1b0
[ 9514.096691]  vfio_pci_request+0x84/0xd0 [vfio_pci]
[ 9514.101464]  vfio_del_group_dev+0x150/0x290 [vfio]
[ 9514.106234]  vfio_pci_remove+0x30/0x128 [vfio_pci]
[ 9514.111007]  pci_device_remove+0x48/0x108
[ 9514.115001]  device_release_driver_internal+0x100/0x1b8
[ 9514.120200]  device_release_driver+0x28/0x38
[ 9514.124452]  pci_stop_bus_device+0x68/0xa8
[ 9514.128528]  pci_stop_and_remove_bus_device+0x20/0x38
[ 9514.133557]  pci_iov_remove_virtfn+0xb4/0x128
[ 9514.137893]  sriov_disable+0x3c/0x108
[ 9514.141538]  pci_disable_sriov+0x28/0x38
[ 9514.145445]  hns3_pci_sriov_configure+0x48/0xb8 [hns3]
[ 9514.150558]  sriov_numvfs_store+0x110/0x198
[ 9514.154724]  dev_attr_store+0x44/0x60
[ 9514.158373]  sysfs_kf_write+0x5c/0x78
[ 9514.162018]  kernfs_fop_write+0x104/0x210
[ 9514.166010]  __vfs_write+0x48/0x90
[ 9514.169395]  vfs_write+0xbc/0x1c0
[ 9514.172694]  ksys_write+0x74/0x100
[ 9514.176079]  __arm64_sys_write+0x24/0x30
[ 9514.179987]  el0_svc_common.constprop.4+0x110/0x200
[ 9514.184842]  do_el0_svc+0x34/0x98
[ 9514.188144]  el0_svc+0x14/0x40
[ 9514.191185]  el0_sync_handler+0xb0/0x2d0
[ 9514.195088]  el0_sync+0x140/0x180
[ 9514.198389] Code: b9001020 d2800000 52800022 f9800271 (885ffe61)
[ 9514.204455] ---[ end trace 648de00c8406465f ]---
[ 9514.212308] note: bash[1327] exited with preempt_count 1

Cc: Qian Cai <cai@lca.pw>
Cc: Alex Williamson <alex.williamson@redhat.com>
Fixes: 1518ac272e ("vfio/pci: fix memory leaks of eventfd ctx")
Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-17 08:28:40 -06:00
Alex Williamson
ebfa440ce3 vfio/pci: Fix SR-IOV VF handling with MMIO blocking
SR-IOV VFs do not implement the memory enable bit of the command
register, therefore this bit is not set in config space after
pci_enable_device().  This leads to an unintended difference
between PF and VF in hand-off state to the user.  We can correct
this by setting the initial value of the memory enable bit in our
virtualized config space.  There's really no need however to
ever fault a user on a VF though as this would only indicate an
error in the user's management of the enable bit, versus a PF
where the same access could trigger hardware faults.

Fixes: abafbc551f ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-25 11:04:23 -06:00
Alex Williamson
5c5866c593 vfio/pci: Clear error and request eventfd ctx after releasing
The next use of the device will generate an underflow from the
stale reference.

Cc: Qian Cai <cai@lca.pw>
Fixes: 1518ac272e ("vfio/pci: fix memory leaks of eventfd ctx")
Reported-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-17 15:18:42 -06:00
Christoph Hellwig
f5678e7f2a kernel: better document the use_mm/unuse_mm API contract
Switch the function documentation to kerneldoc comments, and add
WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
not have ->mm set (or has ->mm set in the case of unuse_mm).

Also give the functions a kthread_ prefix to better document the use case.

[hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
  Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de
[sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename]
  Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb]
Acked-by: Haren Myneni <haren@linux.ibm.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10 19:14:18 -07:00
Christoph Hellwig
4dbe59a6ae kernel: move use_mm/unuse_mm to kthread.c
cover the newly merged use_mm/unuse_mm caller in vfio

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Link: http://lkml.kernel.org/r/20200416053158.586887-2-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10 19:14:18 -07:00
Michel Lespinasse
c1e8d7c6a7 mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse
89154dd531 mmap locking API: convert mmap_sem call sites missed by coccinelle
Convert the last few remaining mmap_sem rwsem calls to use the new mmap
locking API.  These were missed by coccinelle for some reason (I think
coccinelle does not support some of the preprocessor constructs in these
files ?)

[akpm@linux-foundation.org: convert linux-next leftovers]
[akpm@linux-foundation.org: more linux-next leftovers]
[akpm@linux-foundation.org: more linux-next leftovers]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse
d8ed45c5dc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Linus Torvalds
5a36f0f3f5 VFIO updates for v5.8-rc1
- Block accesses to disabled MMIO space (Alex Williamson)
 
  - VFIO device migration API (Kirti Wankhede)
 
  - type1 IOMMU dirty bitmap API and implementation (Kirti Wankhede)
 
  - PCI NULL capability masking (Alex Williamson)
 
  - Memory leak fixes (Qian Cai)
 
  - Reference leak fix (Qiushi Wu)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJe19RZAAoJECObm247sIsiBTEP/1DsA/mL/eDR94Hkmz3y1qoX
 tkrIJuM5xo95uk6DsbXDikbjM2jpUmvDUiRilara25IceuVX/tKgTjgxNOWJd+h8
 KHBoPM0bveILp+Hp6r6ZAEfhZLQ3P6A5XBTB6ql0ebveadyt13TP9Cr0jCf4Jxr2
 2A7wZAexWN8P4YBXBqrW8X3ZF3V9rfWYOJ4eILRvEplJ/TkqsGJyKW8LKP45waUF
 wutN6yh7irnWQbbUO6hb/eTIO7usrNB2L4A5CJMSZIm8JZYYmFQOrn73Ry3bA3Pg
 U+v3xIRU5U1ed/1DeLs8lRDDxImmkSXbVga0N+e3qnKtDIBh7/i9yNT9hQjhG9Ba
 C3IQoOM0Fj03hK/UahNerN7FKYoj+JvtdSKeTSNFXwdp/JUSXjj1tLouxUBSFsgD
 P15CMPXF/U6rT3nQ7/3z3JFKft/iSf6ShjXeNJFqhwH+lKsj13kF2+py3+vgayl8
 zLOqeLdRkQwXz73UStX+hrc7MmNYZVRxJ0uf3XXq3/CZ1gwiinIQqxDF3ry+pQ6w
 vXB8j5XJqsOguoAU7trTdvkKpfoJd4yY0D6aNN9F3JN8xkEBbfm6xZ7Hzur2JI6r
 iCxUHcS4XrXA/PRlTFupg8451CefuPJa2UkAWSdk182hO2R41l458Q6b7lIMqhOQ
 oP04S8VYvvDIw5JVnPWU
 =KSAE
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Block accesses to disabled MMIO space (Alex Williamson)

 - VFIO device migration API (Kirti Wankhede)

 - type1 IOMMU dirty bitmap API and implementation (Kirti Wankhede)

 - PCI NULL capability masking (Alex Williamson)

 - Memory leak fixes (Qian Cai)

 - Reference leak fix (Qiushi Wu)

* tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio:
  vfio iommu: typecast corrections
  vfio iommu: Use shift operation for 64-bit integer division
  vfio/mdev: Fix reference count leak in add_mdev_supported_type
  vfio: Selective dirty page tracking if IOMMU backed device pins pages
  vfio iommu: Add migration capability to report supported features
  vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
  vfio iommu: Implementation of ioctl for dirty pages tracking
  vfio iommu: Add ioctl definition for dirty pages tracking
  vfio iommu: Cache pgsize_bitmap in struct vfio_iommu
  vfio iommu: Remove atomicity of ref_count of pinned pages
  vfio: UAPI for migration interface for device state
  vfio/pci: fix memory leaks of eventfd ctx
  vfio/pci: fix memory leaks in alloc_perm_bits()
  vfio-pci: Mask cap zero
  vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  vfio-pci: Fault mmaps to enable vma tracking
  vfio/type1: Support faulting PFNMAP vmas
2020-06-05 13:51:49 -07:00
Linus Torvalds
7ae77150d9 powerpc updates for 5.8
- Support for userspace to send requests directly to the on-chip GZIP
    accelerator on Power9.
 
  - Rework of our lockless page table walking (__find_linux_pte()) to make it
    safe against parallel page table manipulations without relying on an IPI for
    serialisation.
 
  - A series of fixes & enhancements to make our machine check handling more
    robust.
 
  - Lots of plumbing to add support for "prefixed" (64-bit) instructions on
    Power10.
 
  - Support for using huge pages for the linear mapping on 8xx (32-bit).
 
  - Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound driver.
 
  - Removal of some obsolete 40x platforms and associated cruft.
 
  - Initial support for booting on Power10.
 
  - Lots of other small features, cleanups & fixes.
 
 Thanks to:
   Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Andrey Abramov,
   Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent Abali, Cédric Le
   Goater, Chen Zhou, Christian Zigotzky, Christophe JAILLET, Christophe Leroy,
   Dmitry Torokhov, Emmanuel Nicolet, Erhard F., Gautham R. Shenoy, Geoff Levand,
   George Spelvin, Greg Kurz, Gustavo A. R. Silva, Gustavo Walbon, Haren Myneni,
   Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Kees Cook, Leonardo
   Bras, Madhavan Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael
   Neuling, Michal Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao,
   Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram
   Pai, Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
   Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler, Wolfram
   Sang, Xiongfeng Wang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7aYZ8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPiKD/9zNCuZLFMAFrIdbm0HlYA2RGYZFT75
 GUHsqYyei1pxA7PgM3KwJiXELVODsBv0eQbgNh1tbecKrxPRegN/cywd1KLjPZ7I
 v5/qweQP8MvR0RhzjbhvUcO0jq/f8u2LbJr5mUfVzjU6tAvrvcWo3oZqDElsekCS
 kgyOH3r1vZ2PLTMiGFhb0gWi2iqc+6BHU1AFCGPCMjB1Vu5d5+54VvZ/6lllGsOF
 yg9CBXmmVvQ+Bn6tH4zdEB78FYxnAIwBqlbmL79i5ca+HQJ0Sw6HuPRy9XYq35p6
 2EiXS4Wrgp7i7+1TN3HO362u5Onb8TSyQU7NS6yCFPoJ6JQxcJMBIw6mHhnXOPuZ
 CrjgcdwUMjx8uDoKmX1Epbfuex2w+AysW+4yBHPFiSgl3klKC3D0wi95mR485w2F
 rN8uzJtrDeFKcYZJG7IoB/cgFCCPKGf9HaXr8q0S/jBKMffx91ul3cfzlfdIXOCw
 FDNw/+ZX7UD6ddFEG12ZTO+vdL8yf1uCRT/DIZwUiDMIA0+M6F4nc7j3lfyZfoO1
 65f9UlhoLxScq7VH2fKH4UtZatO9cPID2z1CmiY4UbUIPtFDepSuYClgLF+Duf4b
 rkfxhKU0+Ja1zNH5XNc+L+Bc5/W4lFiJXz02dYIjtHoUpWkc1aToOETVwzggYFNM
 G3PXIBOI0jRgRw==
 =o0WU
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Support for userspace to send requests directly to the on-chip GZIP
   accelerator on Power9.

 - Rework of our lockless page table walking (__find_linux_pte()) to
   make it safe against parallel page table manipulations without
   relying on an IPI for serialisation.

 - A series of fixes & enhancements to make our machine check handling
   more robust.

 - Lots of plumbing to add support for "prefixed" (64-bit) instructions
   on Power10.

 - Support for using huge pages for the linear mapping on 8xx (32-bit).

 - Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound
   driver.

 - Removal of some obsolete 40x platforms and associated cruft.

 - Initial support for booting on Power10.

 - Lots of other small features, cleanups & fixes.

Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent
Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe
JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F.,
Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A.
R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley,
Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan
Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal
Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai,
Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler,
Wolfram Sang, Xiongfeng Wang.

* tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits)
  powerpc/pseries: Make vio and ibmebus initcalls pseries specific
  cxl: Remove dead Kconfig options
  powerpc: Add POWER10 architected mode
  powerpc/dt_cpu_ftrs: Add MMA feature
  powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
  powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
  powerpc: Add support for ISA v3.1
  powerpc: Add new HWCAP bits
  powerpc/64s: Don't set FSCR bits in INIT_THREAD
  powerpc/64s: Save FSCR to init_task.thread.fscr after feature init
  powerpc/64s: Don't let DT CPU features set FSCR_DSCR
  powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
  powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
  powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel
  powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations
  powerpc/module_64: Consolidate ftrace code
  powerpc/32: Disable KASAN with pages bigger than 16k
  powerpc/uaccess: Don't set KUEP by default on book3s/32
  powerpc/uaccess: Don't set KUAP by default on book3s/32
  powerpc/8xx: Reduce time spent in allow_user_access() and friends
  ...
2020-06-05 12:39:30 -07:00
Alex Williamson
4f085ca2f5 Merge branch 'v5.8/vfio/kirti-migration-fixes' into v5.8/vfio/next 2020-06-02 13:53:00 -06:00
Kirti Wankhede
c8e9df4744 vfio iommu: typecast corrections
Fixes sparse warnings by adding '__user' in typecast for
copy_[from,to]_user()

Fixes: d6a4c18566 ("vfio iommu: Implementation of ioctl for dirty pages tracking")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-02 13:44:28 -06:00
Kirti Wankhede
cd0bb41ea8 vfio iommu: Use shift operation for 64-bit integer division
Fixes compilation error with ARCH=i386.

Error fixed by this commit:
ld: drivers/vfio/vfio_iommu_type1.o: in function `vfio_dma_populate_bitmap':
>> vfio_iommu_type1.c:(.text+0x666): undefined reference to `__udivdi3'

Fixes: d6a4c18566 ("vfio iommu: Implementation of ioctl for dirty pages tracking")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-02 13:38:21 -06:00
Alex Williamson
ea20868c71 Merge branch 'qiushi-wu-mdev-ref-v1' into v5.8/vfio/next 2020-05-29 16:17:33 -06:00
Qiushi Wu
aa8ba13cae vfio/mdev: Fix reference count leak in add_mdev_supported_type
kobject_init_and_add() takes reference even when it fails.
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object. Thus,
replace kfree() by kobject_put() to fix this issue. Previous
commit "b8eb718348b8" fixed a similar problem.

Fixes: 7b96953bc6 ("vfio: Mediated device Core driver")
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-29 16:07:18 -06:00
Kirti Wankhede
95fc87b441 vfio: Selective dirty page tracking if IOMMU backed device pins pages
Added a check such that only singleton IOMMU groups can pin pages.
>From the point when vendor driver pins any pages, consider IOMMU group
dirty page scope to be limited to pinned pages.

To optimize to avoid walking list often, added flag
pinned_page_dirty_scope to indicate if all of the vfio_groups for each
vfio_domain in the domain_list dirty page scope is limited to pinned
pages. This flag is updated on first pinned pages request for that IOMMU
group and on attaching/detaching group.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:29 -06:00
Kirti Wankhede
ad721705d0 vfio iommu: Add migration capability to report supported features
Added migration capability in IOMMU info chain.
User application should check IOMMU info chain for migration capability
to use dirty page tracking feature provided by kernel module.
User application must check page sizes supported and maximum dirty
bitmap size returned by this capability structure for ioctls used to get
dirty bitmap.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:27 -06:00
Kirti Wankhede
331e33d296 vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
DMA mapped pages, including those pinned by mdev vendor drivers, might
get unpinned and unmapped while migration is active and device is still
running. For example, in pre-copy phase while guest driver could access
those pages, host device or vendor driver can dirty these mapped pages.
Such pages should be marked dirty so as to maintain memory consistency
for a user making use of dirty page tracking.

To get bitmap during unmap, user should allocate memory for bitmap, set
it all zeros, set size of allocated memory, set page size to be
considered for bitmap and set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:26 -06:00
Kirti Wankhede
d6a4c18566 vfio iommu: Implementation of ioctl for dirty pages tracking
VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
  copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

Fixed error reported by build bot by changing pgsize type from uint64_t
to size_t.
Reported-by: kbuild test robot <lkp@intel.com>

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:25 -06:00
Kirti Wankhede
cade075f26 vfio iommu: Cache pgsize_bitmap in struct vfio_iommu
Calculate and cache pgsize_bitmap when iommu->domain_list is updated
and iommu->external_domain is set for mdev device.
Add iommu->lock protection when cached pgsize_bitmap is accessed.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:22 -06:00
Kirti Wankhede
6581708586 vfio iommu: Remove atomicity of ref_count of pinned pages
vfio_pfn.ref_count is always updated while holding iommu->lock, using
atomic variable is overkill.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:21 -06:00
Alex Williamson
cd34b82e6e Merge branches 'v5.8/vfio/alex-block-mmio-v3', 'v5.8/vfio/alex-zero-cap-v2' and 'v5.8/vfio/qian-leak-fixes' into v5.8/vfio/next 2020-05-26 10:27:15 -06:00
Qian Cai
1518ac272e vfio/pci: fix memory leaks of eventfd ctx
Finished a qemu-kvm (-device vfio-pci,host=0001:01:00.0) triggers a few
memory leaks after a while because vfio_pci_set_ctx_trigger_single()
calls eventfd_ctx_fdget() without the matching eventfd_ctx_put() later.
Fix it by calling eventfd_ctx_put() for those memory in
vfio_pci_release() before vfio_device_release().

unreferenced object 0xebff008981cc2b00 (size 128):
  comm "qemu-kvm", pid 4043, jiffies 4294994816 (age 9796.310s)
  hex dump (first 32 bytes):
    01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de  ....kkkk.....N..
    ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
  backtrace:
    [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c
    [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4
    [<000000005fcec025>] do_eventfd+0x54/0x1ac
    [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44
    [<00000000b819758c>] do_el0_svc+0x128/0x1dc
    [<00000000b244e810>] el0_sync_handler+0xd0/0x268
    [<00000000d495ef94>] el0_sync+0x164/0x180
unreferenced object 0x29ff008981cc4180 (size 128):
  comm "qemu-kvm", pid 4043, jiffies 4294994818 (age 9796.290s)
  hex dump (first 32 bytes):
    01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de  ....kkkk.....N..
    ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
  backtrace:
    [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c
    [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4
    [<000000005fcec025>] do_eventfd+0x54/0x1ac
    [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44
    [<00000000b819758c>] do_el0_svc+0x128/0x1dc
    [<00000000b244e810>] el0_sync_handler+0xd0/0x268
    [<00000000d495ef94>] el0_sync+0x164/0x180

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-26 10:23:22 -06:00
Qian Cai
3e63b94b62 vfio/pci: fix memory leaks in alloc_perm_bits()
vfio_pci_disable() calls vfio_config_free() but forgets to call
free_perm_bits() resulting in memory leaks,

unreferenced object 0xc000000c4db2dee0 (size 16):
  comm "qemu-kvm", pid 4305, jiffies 4295020272 (age 3463.780s)
  hex dump (first 16 bytes):
    00 00 ff 00 ff ff ff ff ff ff ff ff ff ff 00 00  ................
  backtrace:
    [<00000000a6a4552d>] alloc_perm_bits+0x58/0xe0 [vfio_pci]
    [<00000000ac990549>] vfio_config_init+0xdf0/0x11b0 [vfio_pci]
    init_pci_cap_msi_perm at drivers/vfio/pci/vfio_pci_config.c:1125
    (inlined by) vfio_msi_cap_len at drivers/vfio/pci/vfio_pci_config.c:1180
    (inlined by) vfio_cap_len at drivers/vfio/pci/vfio_pci_config.c:1241
    (inlined by) vfio_cap_init at drivers/vfio/pci/vfio_pci_config.c:1468
    (inlined by) vfio_config_init at drivers/vfio/pci/vfio_pci_config.c:1707
    [<000000006db873a1>] vfio_pci_open+0x234/0x700 [vfio_pci]
    [<00000000630e1906>] vfio_group_fops_unl_ioctl+0x8e0/0xb84 [vfio]
    [<000000009e34c54f>] ksys_ioctl+0xd8/0x130
    [<000000006577923d>] sys_ioctl+0x28/0x40
    [<000000006d7b1cf2>] system_call_exception+0x114/0x1e0
    [<0000000008ea7dd5>] system_call_common+0xf0/0x278
unreferenced object 0xc000000c4db2e330 (size 16):
  comm "qemu-kvm", pid 4305, jiffies 4295020272 (age 3463.780s)
  hex dump (first 16 bytes):
    00 ff ff 00 ff ff ff ff ff ff ff ff ff ff 00 00  ................
  backtrace:
    [<000000004c71914f>] alloc_perm_bits+0x44/0xe0 [vfio_pci]
    [<00000000ac990549>] vfio_config_init+0xdf0/0x11b0 [vfio_pci]
    [<000000006db873a1>] vfio_pci_open+0x234/0x700 [vfio_pci]
    [<00000000630e1906>] vfio_group_fops_unl_ioctl+0x8e0/0xb84 [vfio]
    [<000000009e34c54f>] ksys_ioctl+0xd8/0x130
    [<000000006577923d>] sys_ioctl+0x28/0x40
    [<000000006d7b1cf2>] system_call_exception+0x114/0x1e0
    [<0000000008ea7dd5>] system_call_common+0xf0/0x278

Fixes: 89e1f7d4c6 ("vfio: Add PCI device driver")
Signed-off-by: Qian Cai <cai@lca.pw>
[aw: rolled in follow-up patch]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-18 09:58:28 -06:00
Alex Williamson
bc138db1b9 vfio-pci: Mask cap zero
The PCI Code and ID Assignment Specification changed capability ID 0
from reserved to a NULL capability in the v1.1 revision.  The NULL
capability is defined to include only the 16-bit capability header,
ie. only the ID and next pointer.  Unfortunately vfio-pci creates a
map of config space, where ID 0 is used to reserve the standard type
0 header.  Finding an actual capability with this ID therefore results
in a bogus range marked in that map and conflicts with subsequent
capabilities.  As this seems to be a dummy capability anyway and we
already support dropping capabilities, let's hide this one rather than
delving into the potentially subtle dependencies within our map.

Seen on an NVIDIA Tesla T4.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-18 09:55:35 -06:00
Alex Williamson
abafbc551f vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
Accessing the disabled memory space of a PCI device would typically
result in a master abort response on conventional PCI, or an
unsupported request on PCI express.  The user would generally see
these as a -1 response for the read return data and the write would be
silently discarded, possibly with an uncorrected, non-fatal AER error
triggered on the host.  Some systems however take it upon themselves
to bring down the entire system when they see something that might
indicate a loss of data, such as this discarded write to a disabled
memory space.

To avoid this, we want to try to block the user from accessing memory
spaces while they're disabled.  We start with a semaphore around the
memory enable bit, where writers modify the memory enable state and
must be serialized, while readers make use of the memory region and
can access in parallel.  Writers include both direct manipulation via
the command register, as well as any reset path where the internal
mechanics of the reset may both explicitly and implicitly disable
memory access, and manipulation of the MSI-X configuration, where the
MSI-X vector table resides in MMIO space of the device.  Readers
include the read and write file ops to access the vfio device fd
offsets as well as memory mapped access.  In the latter case, we make
use of our new vma list support to zap, or invalidate, those memory
mappings in order to force them to be faulted back in on access.

Our semaphore usage will stall user access to MMIO spaces across
internal operations like reset, but the user might experience new
behavior when trying to access the MMIO space while disabled via the
PCI command register.  Access via read or write while disabled will
return -EIO and access via memory maps will result in a SIGBUS.  This
is expected to be compatible with known use cases and potentially
provides better error handling capabilities than present in the
hardware, while avoiding the more readily accessible and severe
platform error responses that might otherwise occur.

Fixes: CVE-2020-12888
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-18 09:53:38 -06:00
Alex Williamson
11c4cd07ba vfio-pci: Fault mmaps to enable vma tracking
Rather than calling remap_pfn_range() when a region is mmap'd, setup
a vm_ops handler to support dynamic faulting of the range on access.
This allows us to manage a list of vmas actively mapping the area that
we can later use to invalidate those mappings.  The open callback
invalidates the vma range so that all tracking is inserted in the
fault handler and removed in the close handler.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-18 09:53:29 -06:00
Alex Williamson
4131124222 vfio/type1: Support faulting PFNMAP vmas
With conversion to follow_pfn(), DMA mapping a PFNMAP range depends on
the range being faulted into the vma.  Add support to manually provide
that, in the same way as done on KVM with hva_to_pfn_remapped().

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-18 09:53:13 -06:00
Christophe Leroy
7bfc3c84cb drivers/powerpc: Replace _ALIGN_UP() by ALIGN()
_ALIGN_UP() is specific to powerpc
ALIGN() is generic and does the same

Replace _ALIGN_UP() by ALIGN()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/a5945463f86c984151962a475a3ee56a2893e85d.1587407777.git.christophe.leroy@c-s.fr
2020-05-11 23:15:15 +10:00
Sean Christopherson
5cbf3264bc vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()
Use follow_pfn() to get the PFN of a PFNMAP VMA instead of assuming that
vma->vm_pgoff holds the base PFN of the VMA.  This fixes a bug where
attempting to do VFIO_IOMMU_MAP_DMA on an arbitrary PFNMAP'd region of
memory calculates garbage for the PFN.

Hilariously, this only got detected because the first "PFN" calculated
by vaddr_get_pfn() is PFN 0 (vma->vm_pgoff==0), and iommu_iova_to_phys()
uses PA==0 as an error, which triggers a WARN in vfio_unmap_unpin()
because the translation "failed".  PFN 0 is now unconditionally reserved
on x86 in order to mitigate L1TF, which causes is_invalid_reserved_pfn()
to return true and in turns results in vaddr_get_pfn() returning success
for PFN 0.  Eventually the bogus calculation runs into PFNs that aren't
reserved and leads to failure in vfio_pin_map_dma().  The subsequent
call to vfio_remove_dma() attempts to unmap PFN 0 and WARNs.

  WARNING: CPU: 8 PID: 5130 at drivers/vfio/vfio_iommu_type1.c:750 vfio_unmap_unpin+0x2e1/0x310 [vfio_iommu_type1]
  Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio ...
  CPU: 8 PID: 5130 Comm: sgx Tainted: G        W         5.6.0-rc5-705d787c7fee-vfio+ #3
  Hardware name: Intel Corporation Mehlow UP Server Platform/Moss Beach Server, BIOS CNLSE2R1.D00.X119.B49.1803010910 03/01/2018
  RIP: 0010:vfio_unmap_unpin+0x2e1/0x310 [vfio_iommu_type1]
  Code: <0f> 0b 49 81 c5 00 10 00 00 e9 c5 fe ff ff bb 00 10 00 00 e9 3d fe
  RSP: 0018:ffffbeb5039ebda8 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff9a55cbf8d480 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9a52b771c200
  RBP: 0000000000000000 R08: 0000000000000040 R09: 00000000fffffff2
  R10: 0000000000000001 R11: ffff9a51fa896000 R12: 0000000184010000
  R13: 0000000184000000 R14: 0000000000010000 R15: ffff9a55cb66ea08
  FS:  00007f15d3830b40(0000) GS:ffff9a55d5600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000561cf39429e0 CR3: 000000084f75f005 CR4: 00000000003626e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   vfio_remove_dma+0x17/0x70 [vfio_iommu_type1]
   vfio_iommu_type1_ioctl+0x9e3/0xa7b [vfio_iommu_type1]
   ksys_ioctl+0x92/0xb0
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x4c/0x180
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f15d04c75d7
  Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48

Fixes: 73fa0d10d0 ("vfio: Type1 IOMMU implementation")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-04-23 12:10:01 -06:00
Yan Zhao
0ea971f8dc vfio: avoid possible overflow in vfio_iommu_type1_pin_pages
add parentheses to avoid possible vaddr overflow.

Fixes: a54eb55045 ("vfio iommu type1: Add support for mediated devices")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-04-20 12:40:16 -06:00
Yan Zhao
205323b8ce vfio: checking of validity of user vaddr in vfio_dma_rw
instead of calling __copy_to/from_user(), use copy_to_from_user() to
ensure vaddr range is a valid user address range before accessing them.

Fixes: 8d46c0cca5 ("vfio: introduce vfio_dma_rw to read/write a range of IOVAs")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-04-20 12:38:22 -06:00
Andre Przywara
f44efca049 vfio: Ignore -ENODEV when getting MSI cookie
When we try to get an MSI cookie for a VFIO device, that can fail if
CONFIG_IOMMU_DMA is not set. In this case iommu_get_msi_cookie() returns
-ENODEV, and that should not be fatal.

Ignore that case and proceed with the initialisation.

This fixes VFIO with a platform device on the Calxeda Midway (no MSIs).

Fixes: f6810c15cf ("iommu/arm-smmu: Clean up early-probing workarounds")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-04-01 13:51:51 -06:00
Sam Bobroff
00bc509554 vfio-pci/nvlink2: Allow fallback to ibm,mmio-atsd[0]
Older versions of skiboot only provide a single value in the device
tree property "ibm,mmio-atsd", even when multiple Address Translation
Shoot Down (ATSD) registers are present. This prevents NVLink2 devices
(other than the first) from being used with vfio-pci because vfio-pci
expects to be able to assign a dedicated ATSD register to each NVLink2
device.

However, ATSD registers can be shared among devices. This change
allows vfio-pci to fall back to sharing the register at index 0 if
necessary.

Fixes: 7f92891778 ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver")
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-04-01 13:50:46 -06:00
Alex Williamson
48219795e7 Merge branches 'v5.7/vfio/alex-sriov-v3' and 'v5.7/vfio/yan-dma-rw-v4' into v5.7/vfio/next 2020-03-24 09:32:41 -06:00
Alex Williamson
b66574a3fb vfio/pci: Cleanup .probe() exit paths
The cleanup is getting a tad long.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:28:29 -06:00
Alex Williamson
959e1b75cc vfio/pci: Remove dev_fmt definition
It currently results in messages like:

 "vfio-pci 0000:03:00.0: vfio_pci: ..."

Which is quite a bit redundant.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:28:29 -06:00
Alex Williamson
137e553135 vfio/pci: Add sriov_configure support
With the VF Token interface we can now expect that a vfio userspace
driver must be in collaboration with the PF driver, an unwitting
userspace driver will not be able to get past the GET_DEVICE_FD step
in accessing the device.  We can now move on to actually allowing
SR-IOV to be enabled by vfio-pci on the PF.  Support for this is not
enabled by default in this commit, but it does provide a module option
for this to be enabled (enable_sriov=1).  Enabling VFs is rather
straightforward, except we don't want to risk that a VF might get
autoprobed and bound to other drivers, so a bus notifier is used to
"capture" VFs to vfio-pci using the driver_override support.  We
assume any later action to bind the device to other drivers is
condoned by the system admin and allow it with a log warning.

vfio-pci will disable SR-IOV on a PF before releasing the device,
allowing a VF driver to be assured other drivers cannot take over the
PF and that any other userspace driver must know the shared VF token.
This support also does not provide a mechanism for the PF userspace
driver itself to manipulate SR-IOV through the vfio API.  With this
patch SR-IOV can only be enabled via the host sysfs interface and the
PF driver user cannot create or remove VFs.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:28:28 -06:00
Alex Williamson
43eeeecc8e vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device
agnostic ioctl for setting, retrieving, and probing device features.
This implementation provides a 16-bit field for specifying a feature
index, where the data porition of the ioctl is determined by the
semantics for the given feature.  Additional flag bits indicate the
direction and nature of the operation; SET indicates user data is
provided into the device feature, GET indicates the device feature is
written out into user data.  The PROBE flag augments determining
whether the given feature is supported, and if provided, whether the
given operation on the feature is supported.

The first user of this ioctl is for setting the vfio-pci VF token,
where the user provides a shared secret key (UUID) on a SR-IOV PF
device, which users must provide when opening associated VF devices.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:28:27 -06:00
Alex Williamson
cc20d79990 vfio/pci: Introduce VF token
If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
fully isolated from the PF.  The PF can always cause a denial of service
to the VF, even if by simply resetting itself.  The degree to which a PF
can access the data passed through a VF or interfere with its operation
is dependent on a given SR-IOV implementation.  Therefore we want to
avoid a scenario where an existing vfio-pci based userspace driver might
assume the PF driver is trusted, for example assigning a PF to one VM
and VF to another with some expectation of isolation.  IOMMU grouping
could be a solution to this, but imposes an unnecessarily strong
relationship between PF and VF drivers if they need to operate with the
same IOMMU context.  Instead we introduce a "VF token", which is
essentially just a shared secret between PF and VF drivers, implemented
as a UUID.

The VF token can be set by a vfio-pci based PF driver and must be known
by the vfio-pci based VF driver in order to gain access to the device.
This allows the degree to which this VF token is considered secret to be
determined by the applications and environment.  For example a VM might
generate a random UUID known only internally to the hypervisor while a
userspace networking appliance might use a shared, or even well know,
UUID among the application drivers.

To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD interface is
extended to accept key=value pairs in addition to the device name.  This
allows us to most easily deny user access to the device without risk
that existing userspace drivers assume region offsets, IRQs, and other
device features, leading to more elaborate error paths.  The format of
these options are expected to take the form:

"$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"

Where the device name is always provided first for compatibility and
additional options are specified in a space separated list.  The
relation between and requirements for the additional options will be
vfio bus driver dependent, however unknown or unused option within this
schema should return error.  This allow for future use of unknown
options as well as a positive indication to the user that an option is
used.

An example VF token option would take this form:

"0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"

When accessing a VF where the PF is making use of vfio-pci, the user
MUST provide the current vf_token.  When accessing a PF, the user MUST
provide the current vf_token IF there are active VF users or MAY provide
a vf_token in order to set the current VF token when no VF users are
active.  The former requirement assures VF users that an unassociated
driver cannot usurp the PF device.  These semantics also imply that a
VF token MUST be set by a PF driver before VF drivers can access their
device, the default token is random and mechanisms to read the token are
not provided in order to protect the VF token of previous users.  Use of
the vf_token option outside of these cases will return an error, as
discussed above.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:28:27 -06:00
Alex Williamson
467c084f9a vfio/pci: Implement match ops
This currently serves the same purpose as the default implementation
but will be expanded for additional functionality.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:28:26 -06:00
Alex Williamson
5f3874c2a2 vfio: Include optional device match in vfio_device_ops callbacks
Allow bus drivers to provide their own callback to match a device to
the user provided string.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:28:25 -06:00
Yan Zhao
40280cf7e8 vfio: avoid inefficient operations on VFIO group in vfio_pin/unpin_pages
vfio_group_pin_pages() and vfio_group_unpin_pages() are introduced to
avoid inefficient search/check/ref/deref opertions associated with VFIO
group as those in each calling into vfio_pin_pages() and
vfio_unpin_pages().

VFIO group is taken as arg directly. The callers combine
search/check/ref/deref operations associated with VFIO group by calling
vfio_group_get_external_user()/vfio_group_get_external_user_from_dev()
beforehand, and vfio_group_put_external_user() afterwards.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:27:57 -06:00
Yan Zhao
8d46c0cca5 vfio: introduce vfio_dma_rw to read/write a range of IOVAs
vfio_dma_rw will read/write a range of user space memory pointed to by
IOVA into/from a kernel buffer without enforcing pinning the user space
memory.

TODO: mark the IOVAs to user space memory dirty if they are written in
vfio_dma_rw().

Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:27:57 -06:00
Yan Zhao
c0560f51cf vfio: allow external user to get vfio group from device
external user calls vfio_group_get_external_user_from_dev() with a device
pointer to get the VFIO group associated with this device.
The VFIO group is checked to be vialbe and have IOMMU set. Then
container user counter is increased and VFIO group reference is hold
to prevent the VFIO group from disposal before external user exits.

when the external user finishes using of the VFIO group, it calls
vfio_group_put_external_user() to dereference the VFIO group and the
container user counter.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:27:56 -06:00
Eric Auger
723fe298ad vfio: platform: Switch to platform_get_irq_optional()
Since commit 7723f4c5ec ("driver core: platform: Add an error
message to platform_get_irq*()"), platform_get_irq() calls dev_err()
on an error. As we enumerate all interrupts until platform_get_irq()
fails, we now systematically get a message such as:
"vfio-platform fff51000.ethernet: IRQ index 3 not found" which is
a false positive.

Let's use platform_get_irq_optional() instead.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Cc: stable@vger.kernel.org # v5.3+
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24 09:26:30 -06:00
Linus Torvalds
a6d5f9dca4 VFIO updates for v5.6-rc1
- Fix nvlink error path (Alexey Kardashevskiy)
 
  - Update nvlink and spapr to use mmgrab() (Julia Lawall)
 
  - Update static declaration (Ben Dooks)
 
  - Annotate __iomem to fix sparse warnings (Ben Dooks)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJeOFZPAAoJECObm247sIsidxIQAIxDITCEjsquBuT9AfBHMcC6
 HIqJ9/JYORGxc+nbtc0858rXsvAGJTJR7yu/WE9N9UqDFLHX0yoRHrWEQdK5OFtA
 pj3PpweMlzdXqMZl5kgQOaLoM4I+gYNrPIdcUyxWpfVJxCCQsgz0iFBrvNqoFgEM
 YK1UPZSCpHnD5EsG3Lz5BfSM7lyrSfH6R+eWempZST9eBHXC7kvU9pRYVmVDP/l7
 fyBsWpoN+sg3rAf+8R7fQyBbPyggvpyhpaWTKVVdARZStISxiprWXEfQl3p2vOK/
 aDbyMLhdMraQkkipcu+HAG0AJXX3gdxGBcX9BXLmB6Z8EohBj9hEu8RNUrNnFCm4
 rOOT32IuwxSH3oZr+vmOUrG6AQhu10vPLHf5scOYdn3h/6R0gW13aTmsDPTzdOCN
 oZ2iU5CDJ6w9BMLELRL4WrKTC1lG9xYXS8RmS//GAq24Re7X8kUdHszyoSW8bUFu
 5RSqgAYz8Wk/3qrQsMU03yYiEgMS1AZwKHJWRgTCJLdT/p3a8/NXgLWO9Y6EdMua
 LETW63Bg/YITFTqCr4jTOHYg8XUqrGHFQOWmT+boAkRAHDn56FGcrQ1Bl0BAC8cP
 4C1iGwaxsa3DwQkVqsLMsHlBqpyLbz4a2SUQVmaIlwEs58LUv7EBwTymFir4w6aJ
 Esxh+sm0NYZE0nJ7aAqY
 =bamw
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.6-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Fix nvlink error path (Alexey Kardashevskiy)

 - Update nvlink and spapr to use mmgrab() (Julia Lawall)

 - Update static declaration (Ben Dooks)

 - Annotate __iomem to fix sparse warnings (Ben Dooks)

* tag 'vfio-v5.6-rc1' of git://github.com/awilliam/linux-vfio:
  vfio: platform: fix __iomem in vfio_platform_amdxgbe.c
  vfio/mdev: make create attribute static
  vfio/spapr_tce: use mmgrab
  vfio: vfio_pci_nvlink2: use mmgrab
  vfio/spapr/nvlink2: Skip unpinning pages on error exit
2020-02-03 22:22:05 +00:00
John Hubbard
f1f6a7dd9b mm, tree-wide: rename put_user_page*() to unpin_user_page*()
In order to provide a clearer, more symmetric API for pinning and
unpinning DMA pages.  This way, pin_user_pages*() calls match up with
unpin_user_pages*() calls, and the API is a lot closer to being
self-explanatory.

Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:38 -08:00
John Hubbard
19fed0dae9 vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
1. Change vfio from get_user_pages_remote(), to
   pin_user_pages_remote().

2. Because all FOLL_PIN-acquired pages must be released via
   put_user_page(), also convert the put_page() call over to
   put_user_pages_dirty_lock().

Note that this effectively changes the code's behavior in
vfio_iommu_type1.c: put_pfn(): it now ultimately calls
set_page_dirty_lock(), instead of set_page_dirty().  This is probably
more accurate.

As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [1]

[1] https://lore.kernel.org/r/20190723153640.GB720@lst.de

Link: http://lkml.kernel.org/r/20200107224558.2362728-20-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:38 -08:00
John Hubbard
3567813eae vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
Update VFIO to take advantage of the recently loosened restriction on
FOLL_LONGTERM with get_user_pages_remote().  Also, now it is possible to
fix a bug: the VFIO caller is logically a FOLL_LONGTERM user, but it
wasn't setting FOLL_LONGTERM.

Also, remove an unnessary pair of calls that were releasing and
reacquiring the mmap_sem.  There is no need to avoid holding mmap_sem
just in order to call page_to_pfn().

Also, now that the the DAX check ("if a VMA is DAX, don't allow long
term pinning") is in the internals of get_user_pages_remote() and
__gup_longterm_locked(), there's no need for it at the VFIO call site.  So
remove it.

Link: http://lkml.kernel.org/r/20200107224558.2362728-8-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:37 -08:00
Ben Dooks (Codethink)
7b5372ba04 vfio: platform: fix __iomem in vfio_platform_amdxgbe.c
The ioaddr should have __iomem marker on it, so add that to fix
the following sparse warnings:

drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:33:44: warning: incorrect type in argument 2 (different address spaces)
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:33:44:    expected void volatile [noderef] <asn:2> *addr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:33:44:    got void *
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:34:33: warning: incorrect type in argument 1 (different address spaces)
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:34:33:    expected void const volatile [noderef] <asn:2> *addr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:34:33:    got void *
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:44:44: warning: incorrect type in argument 2 (different address spaces)
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:44:44:    expected void volatile [noderef] <asn:2> *addr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:44:44:    got void *
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:45:33: warning: incorrect type in argument 2 (different address spaces)
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:45:33:    expected void volatile [noderef] <asn:2> *addr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:45:33:    got void *
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:69:41: warning: incorrect type in argument 1 (different address spaces)
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:69:41:    expected void *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:69:41:    got void [noderef] <asn:2> *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:71:30: warning: incorrect type in argument 1 (different address spaces)
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:71:30:    expected void *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:71:30:    got void [noderef] <asn:2> *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:76:49: warning: incorrect type in argument 1 (different address spaces)
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:76:49:    expected void *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:76:49:    got void [noderef] <asn:2> *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:85:37: warning: incorrect type in argument 1 (different address spaces)
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:85:37:    expected void *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:85:37:    got void [noderef] <asn:2> *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:87:30: warning: incorrect type in argument 1 (different address spaces)
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:87:30:    expected void *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:87:30:    got void [noderef] <asn:2> *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:90:30: warning: incorrect type in argument 1 (different address spaces)
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:90:30:    expected void *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:90:30:    got void [noderef] <asn:2> *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:93:30: warning: incorrect type in argument 1 (different address spaces)
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:93:30:    expected void *ioaddr
drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:93:30:    got void [noderef] <asn:2> *ioaddr

Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Acked-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-01-09 11:32:14 -07:00
Ben Dooks (Codethink)
e10b4f6cd8 vfio/mdev: make create attribute static
The create attribute is not exported, so make it
static to avoid the following sparse warning:

drivers/vfio/mdev/mdev_sysfs.c:77:1: warning: symbol 'mdev_type_attr_create' was not declared. Should it be static?

Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-01-09 11:30:57 -07:00
Julia Lawall
7a49de995e vfio/spapr_tce: use mmgrab
Mmgrab was introduced in commit f1f1007644 ("mm: add new mmgrab()
helper") and most of the kernel was updated to use it. Update a
remaining file.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

<smpl>
@@ expression e; @@
- atomic_inc(&e->mm_count);
+ mmgrab(e);
</smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-01-07 13:03:12 -07:00
Julia Lawall
bb3d3cf928 vfio: vfio_pci_nvlink2: use mmgrab
Mmgrab was introduced in commit f1f1007644 ("mm: add new mmgrab()
helper") and most of the kernel was updated to use it. Update a
remaining file.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

<smpl>
@@ expression e; @@
- atomic_inc(&e->mm_count);
+ mmgrab(e);
</smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-01-07 13:02:43 -07:00
Alexey Kardashevskiy
338b4e10f9 vfio/spapr/nvlink2: Skip unpinning pages on error exit
The nvlink2 subdriver for IBM Witherspoon machines preregisters
GPU memory in the IOMMI API so KVM TCE code can map this memory
for DMA as well. This is done by mm_iommu_newdev() called from
vfio_pci_nvgpu_regops::mmap.

In an unlikely event of failure the data->mem remains NULL and
since mm_iommu_put() (which unregisters the region and unpins memory
if that was regular memory) does not expect mem=NULL, it should not be
called.

This adds a check to only call mm_iommu_put() for a valid data->mem.

Fixes: 7f92891778 ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-01-06 10:01:05 -07:00
Christoph Hellwig
4bdc0d676a remove ioremap_nocache and devm_ioremap_nocache
ioremap has provided non-cached semantics by default since the Linux 2.6
days, so remove the additional ioremap_nocache interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2020-01-06 09:45:59 +01:00
Linus Torvalds
94e89b4023 VFIO updates for v5.5-rc1
- Remove hugepage checks for reserved pfns (Ben Luo)
 
  - Fix irq-bypass unregister ordering (Jiang Yi)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJd6oWBAAoJECObm247sIsi7Q8P/37Gs4jKx+ZUfd0K8MwyAJjV
 1oiq0vsXVKQPRlBF0G9YdZQubX3X7l87PbeV0P4e6GKH5/A30FUA4OUgurbQaybK
 siajJVmdIuEeS/xNZxkuEyyjsOXXPBo+oBIhu3yDoLp0wOCYuAMOQea25H34dhG9
 A141c/mZg/29jfz8whQMthZiACzIFXMcD/pO6VzspEwVX4aGaX4js1wgOp2firRf
 QqbQon7BfozwM5WAwwlyEyBYmewILQgciijsifGpKhTTfONl7j34GH8+/JM7WrD7
 BNayX+DZdcr+NFrNyuvdU+/HnlCt/6lZADAtnyMRv53QTiRFFmdQkYU4bTpKLqHH
 XoN4OoPAdxYHOPRq/MOon7QsAvXodCTZ9GTIiXtC+hjIangm5f1jW7tHPjpRrbrT
 luXvRv3dC94lVOxJ8wfp36H/GPJbJkAASmOasTT54AP1PBmOihAYwYCVBMF9Qihp
 Fbk9DCg7mTb8vSbOQPxUi732sovOVhwQS4lkPq1BgoNibElzV6UbwYxQhn/0Ixpm
 ZcDE8lipk/sjns5HhFWYJYFJQAVtFVM3Ets4oJK/tB8ELzCch/URoc+zwNKnCVHQ
 jjCBcrHU/jdnNm85BrN12vFn1gjHqHZLvTGA7HCAABgqzeJ9Ln4IDuUELg9JsO/T
 xq/8RRxCREs5XL6aWr4n
 =veGz
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.5-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Remove hugepage checks for reserved pfns (Ben Luo)

 - Fix irq-bypass unregister ordering (Jiang Yi)

* tag 'vfio-v5.5-rc1' of git://github.com/awilliam/linux-vfio:
  vfio/pci: call irq_bypass_unregister_producer() before freeing irq
  vfio/type1: remove hugepage checks in is_invalid_reserved_pfn()
2019-12-07 14:51:04 -08:00
Alex Williamson
9917b54ade Merge branch 'v5.5/vfio/jiang-yi-irq-bypass-unregister-v1' into v5.5/vfio/next 2019-12-04 10:15:56 -07:00
Linus Torvalds
c3bed3b20e pci-v5.5-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl3leXUUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyY3g/9FAVVdPEaadNtAhQ/zIxcjozDovKq
 0q7yOA3aTBTUoNEinm88an6p0dcC4gNKtGukXmzVH2Hhxm9kLRdtpZGYY00tpLUB
 9rI7XsgwwHa+hLwsHbIs507sKGFGy5FLr0ChTTGLDEMppnEvjA2hZooYmcB/OgrC
 LlFcwbNKGOk/Si9u2bF2nLO0JDoVHnwzpF99saew/nqc7Lfj9e9IPZFom+VjPBUh
 AOvRp2H7uBN+WQlpLeFeMDDoeXh34lX0kYqIV/cVkXVnknDGYKV2CBTg2aeX7jd0
 QiPHZh6zlW8zNQgaCZRiBAbatVEOnRMRJ++yiqB8hBYp1LMXm6kJ01YSQpXkugoY
 Vp9dtzzTARWV/XkKwD4brw9ZEmIDnO+Ed2x2VbUkPJVcXAvzSQWAx82IU0Iuqmcb
 9qr6U2Zf/Xk5aFlGPYVH8QOG+QqzIbZNRQ7NlhDlITyW4P6QPu0mw374yYP2wDGL
 sP5YSS3YGa0sQcEgDtVnd4z+WTZI4AwXLPaeaLkDhdfHp2FsERUY4TrPs33J99xw
 og4EyokVFzjYzlnBPU6WWn7LL+jj5ccXkL3MA4DR4FJOnNGHh7NXfQUH56rrgsq7
 F9/8shL5DuTbQkde1uSyUG9Iq/RigVLlV5DQavFm3dSXvZi0E16t5alC5URNTzk7
 at8Bogn53QhlmYc=
 =uUXw
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Warn if a host bridge has no NUMA info (Yunsheng Lin)

   - Add PCI_STD_NUM_BARS for the number of standard BARs (Denis
     Efremov)

  Resource management:

   - Fix boot-time Embedded Controller GPE storm caused by incorrect
     resource assignment after ACPI Bus Check Notification (Mika
     Westerberg)

   - Protect pci_reassign_bridge_resources() against concurrent
     addition/removal (Benjamin Herrenschmidt)

   - Fix bridge dma_ranges resource list cleanup (Rob Herring)

   - Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters to control
     the MMIO and prefetchable MMIO window sizes of hotplug bridges
     independently (Nicholas Johnson)

   - Fix MMIO/MMIO_PREF window assignment that assigned more space than
     desired (Nicholas Johnson)

   - Only enforce bus numbers from bridge EA if the bridge has EA
     devices downstream (Subbaraya Sundeep)

   - Consolidate DT "dma-ranges" parsing and convert all host drivers to
     use shared parsing (Rob Herring)

  Error reporting:

   - Restore AER capability after resume (Mayurkumar Patel)

   - Add PoisonTLPBlocked AER counter (Rajat Jain)

   - Use for_each_set_bit() to simplify AER code (Andy Shevchenko)

   - Fix AER kernel-doc (Andy Shevchenko)

   - Add "pcie_ports=dpc-native" parameter to allow native use of DPC
     even if platform didn't grant control over AER (Olof Johansson)

  Hotplug:

   - Avoid returning prematurely from sysfs requests to enable or
     disable a PCIe hotplug slot (Lukas Wunner)

   - Don't disable interrupts twice when suspending hotplug ports (Mika
     Westerberg)

   - Fix deadlocks when PCIe ports are hot-removed while suspended (Mika
     Westerberg)

  Power management:

   - Remove unnecessary ASPM locking (Bjorn Helgaas)

   - Add support for disabling L1 PM Substates (Heiner Kallweit)

   - Allow re-enabling Clock PM after it has been disabled (Heiner
     Kallweit)

   - Add sysfs attributes for controlling ASPM link states (Heiner
     Kallweit)

   - Remove CONFIG_PCIEASPM_DEBUG, including "link_state" and "clk_ctl"
     sysfs files (Heiner Kallweit)

   - Avoid AMD FCH XHCI USB PME# from D0 defect that prevents wakeup on
     USB 2.0 or 1.1 connect events (Kai-Heng Feng)

   - Move power state check out of pci_msi_supported() (Bjorn Helgaas)

   - Fix incorrect MSI-X masking on resume and revert related nvme quirk
     for Kingston NVME SSD running FW E8FK11.T (Jian-Hong Pan)

   - Always return devices to D0 when thawing to fix hibernation with
     drivers like mlx4 that used legacy power management (previously we
     only did it for drivers with new power management ops) (Dexuan Cui)

   - Clear PCIe PME Status even for legacy power management (Bjorn
     Helgaas)

   - Fix PCI PM documentation errors (Bjorn Helgaas)

   - Use dev_printk() for more power management messages (Bjorn Helgaas)

   - Apply D2 delay as milliseconds, not microseconds (Bjorn Helgaas)

   - Convert xen-platform from legacy to generic power management (Bjorn
     Helgaas)

   - Removed unused .resume_early() and .suspend_late() legacy power
     management hooks (Bjorn Helgaas)

   - Rearrange power management code for clarity (Rafael J. Wysocki)

   - Decode power states more clearly ("4" or "D4" really refers to
     "D3cold") (Bjorn Helgaas)

   - Notice when reading PM Control register returns an error (~0)
     instead of interpreting it as being in D3hot (Bjorn Helgaas)

   - Add missing link delays required by the PCIe spec (Mika Westerberg)

  Virtualization:

   - Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI (Bjorn
     Helgaas)

   - Allow VFs to use PRI (the PF PRI is shared by the VFs, but the code
     previously didn't recognize that) (Kuppuswamy Sathyanarayanan)

   - Allow VFs to use PASID (the PF PASID capability is shared by the
     VFs, but the code previously didn't recognize that) (Kuppuswamy
     Sathyanarayanan)

   - Disconnect PF and VF ATS enablement, since ATS in PFs and
     associated VFs can be enabled independently (Kuppuswamy
     Sathyanarayanan)

   - Cache PRI and PASID capability offsets (Kuppuswamy Sathyanarayanan)

   - Cache the PRI PRG Response PASID Required bit (Bjorn Helgaas)

   - Consolidate ATS declarations in linux/pci-ats.h (Krzysztof
     Wilczynski)

   - Remove unused PRI and PASID stubs (Bjorn Helgaas)

   - Removed unnecessary EXPORT_SYMBOL_GPL() from ATS, PRI, and PASID
     interfaces that are only used by built-in IOMMU drivers (Bjorn
     Helgaas)

   - Hide PRI and PASID state restoration functions used only inside the
     PCI core (Bjorn Helgaas)

   - Add a DMA alias quirk for the Intel VCA NTB (Slawomir Pawlowski)

   - Serialize sysfs sriov_numvfs reads vs writes (Pierre Crégut)

   - Update Cavium ACS quirk for ThunderX2 and ThunderX3 (George
     Cherian)

   - Fix the UPDCR register address in the Intel ACS quirk (Steffen
     Liebergeld)

   - Unify ACS quirk implementations (Bjorn Helgaas)

  Amlogic Meson host bridge driver:

   - Fix meson PERST# GPIO polarity problem (Remi Pommarel)

   - Add DT bindings for Amlogic Meson G12A (Neil Armstrong)

   - Fix meson clock names to match DT bindings (Neil Armstrong)

   - Add meson support for Amlogic G12A SoC with separate shared PHY
     (Neil Armstrong)

   - Add meson extended PCIe PHY functions for Amlogic G12A USB3+PCIe
     combo PHY (Neil Armstrong)

   - Add arm64 DT for Amlogic G12A PCIe controller node (Neil Armstrong)

   - Add commented-out description of VIM3 USB3/PCIe mux in arm64 DT
     (Neil Armstrong)

  Broadcom iProc host bridge driver:

   - Invalidate iProc PAXB address mapping before programming it
     (Abhishek Shah)

   - Fix iproc-msi and mvebu __iomem annotations (Ben Dooks)

  Cadence host bridge driver:

   - Refactor Cadence PCIe host controller to use as a library for both
     host and endpoint (Tom Joseph)

  Freescale Layerscape host bridge driver:

   - Add layerscape LS1028a support (Xiaowei Bao)

  Intel VMD host bridge driver:

   - Add VMD bus 224-255 restriction decode (Jon Derrick)

   - Add VMD 8086:9A0B device ID (Jon Derrick)

   - Remove Keith from VMD maintainer list (Keith Busch)

  Marvell ARMADA 3700 / Aardvark host bridge driver:

   - Use LTSSM state to build link training flag since Aardvark doesn't
     implement the Link Training bit (Remi Pommarel)

   - Delay before training Aardvark link in case PERST# was asserted
     before the driver probe (Remi Pommarel)

   - Fix Aardvark issues with Root Control reads and writes (Remi
     Pommarel)

   - Don't rely on jiffies in Aardvark config access path since
     interrupts may be disabled (Remi Pommarel)

   - Fix Aardvark big-endian support (Grzegorz Jaszczyk)

  Marvell ARMADA 370 / XP host bridge driver:

   - Make mvebu_pci_bridge_emul_ops static (Ben Dooks)

  Microsoft Hyper-V host bridge driver:

   - Add hibernation support for Hyper-V virtual PCI devices (Dexuan
     Cui)

   - Track Hyper-V pci_protocol_version per-hbus, not globally (Dexuan
     Cui)

   - Avoid kmemleak false positive on hv hbus buffer (Dexuan Cui)

  Mobiveil host bridge driver:

   - Change mobiveil csr_read()/write() function names that conflict
     with riscv arch functions (Kefeng Wang)

  NVIDIA Tegra host bridge driver:

   - Fix Tegra CLKREQ dependency programming (Vidya Sagar)

  Renesas R-Car host bridge driver:

   - Remove unnecessary header include from rcar (Andrew Murray)

   - Tighten register index checking for rcar inbound range programming
     (Marek Vasut)

   - Fix rcar inbound range alignment calculation to improve packing of
     multiple entries (Marek Vasut)

   - Update rcar MACCTLR setting to match documentation (Yoshihiro
     Shimoda)

   - Clear bit 0 of MACCTLR before PCIETCTLR.CFINIT per manual
     (Yoshihiro Shimoda)

   - Add Marek Vasut and Yoshihiro Shimoda as R-Car maintainers (Simon
     Horman)

  Rockchip host bridge driver:

   - Make rockchip 0V9 and 1V8 power regulators non-optional (Robin
     Murphy)

  Socionext UniPhier host bridge driver:

   - Set uniphier to host (RC) mode always (Kunihiko Hayashi)

  Endpoint drivers:

   - Fix endpoint driver sign extension problem when shifting page
     number to phys_addr_t (Alan Mikhak)

  Misc:

   - Add NumaChip SPDX header (Krzysztof Wilczynski)

   - Replace EXTRA_CFLAGS with ccflags-y (Krzysztof Wilczynski)

   - Remove unused includes (Krzysztof Wilczynski)

   - Removed unused sysfs attribute groups (Ben Dooks)

   - Remove PTM and ASPM dependencies on PCIEPORTBUS (Bjorn Helgaas)

   - Add PCIe Link Control 2 register field definitions to replace magic
     numbers in AMDGPU and Radeon CIK/SI (Bjorn Helgaas)

   - Fix incorrect Link Control 2 Transmit Margin usage in AMDGPU and
     Radeon CIK/SI PCIe Gen3 link training (Bjorn Helgaas)

   - Use pcie_capability_read_word() instead of pci_read_config_word()
     in AMDGPU and Radeon CIK/SI (Frederick Lawler)

   - Remove unused pci_irq_get_node() Greg Kroah-Hartman)

   - Make asm/msi.h mandatory and simplify PCI_MSI_IRQ_DOMAIN Kconfig
     (Palmer Dabbelt, Michal Simek)

   - Read all 64 bits of Switchtec part_event_bitmap (Logan Gunthorpe)

   - Fix erroneous intel-iommu dependency on CONFIG_AMD_IOMMU (Bjorn
     Helgaas)

   - Fix bridge emulation big-endian support (Grzegorz Jaszczyk)

   - Fix dwc find_next_bit() usage (Niklas Cassel)

   - Fix pcitest.c fd leak (Hewenliang)

   - Fix typos and comments (Bjorn Helgaas)

   - Fix Kconfig whitespace errors (Krzysztof Kozlowski)"

* tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (160 commits)
  PCI: Remove PCI_MSI_IRQ_DOMAIN architecture whitelist
  asm-generic: Make msi.h a mandatory include/asm header
  Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"
  PCI/MSI: Fix incorrect MSI-X masking on resume
  PCI/MSI: Move power state check out of pci_msi_supported()
  PCI/MSI: Remove unused pci_irq_get_node()
  PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer
  PCI: hv: Change pci_protocol_version to per-hbus
  PCI: hv: Add hibernation support
  PCI: hv: Reorganize the code in preparation of hibernation
  MAINTAINERS: Remove Keith from VMD maintainer
  PCI/ASPM: Remove PCIEASPM_DEBUG Kconfig option and related code
  PCI/ASPM: Add sysfs attributes for controlling ASPM link states
  PCI: Fix indentation
  drm/radeon: Prefer pcie_capability_read_word()
  drm/radeon: Replace numbers with PCI_EXP_LNKCTL2 definitions
  drm/radeon: Correct Transmit Margin masks
  drm/amdgpu: Prefer pcie_capability_read_word()
  PCI: uniphier: Set mode register to host mode
  drm/amdgpu: Replace numbers with PCI_EXP_LNKCTL2 definitions
  ...
2019-12-03 13:58:22 -08:00
Jiang Yi
d567fb8819 vfio/pci: call irq_bypass_unregister_producer() before freeing irq
Since irq_bypass_register_producer() is called after request_irq(), we
should do tear-down in reverse order: irq_bypass_unregister_producer()
then free_irq().

Specifically free_irq() may release resources required by the
irqbypass del_producer() callback.  Notably an example provided by
Marc Zyngier on arm64 with GICv4 that he indicates has the potential
to wedge the hardware:

 free_irq(irq)
   __free_irq(irq)
     irq_domain_deactivate_irq(irq)
       its_irq_domain_deactivate()
         [unmap the VLPI from the ITS]

 kvm_arch_irq_bypass_del_producer(cons, prod)
   kvm_vgic_v4_unset_forwarding(kvm, irq, ...)
     its_unmap_vlpi(irq)
       [Unmap the VLPI from the ITS (again), remap the original LPI]

Signed-off-by: Jiang Yi <giangyi@amazon.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: 6d7425f109 ("vfio: Register/unregister irq_bypass_producer")
Link: https://lore.kernel.org/kvm/20191127164910.15888-1-giangyi@amazon.com
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[aw: commit log]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-12-02 14:34:41 -07:00
Linus Torvalds
0da522107e compat_ioctl: remove most of fs/compat_ioctl.c
As part of the cleanup of some remaining y2038 issues, I came to
 fs/compat_ioctl.c, which still has a couple of commands that need support
 for time64_t.
 
 In completely unrelated work, I spent time on cleaning up parts of this
 file in the past, moving things out into drivers instead.
 
 After Al Viro reviewed an earlier version of this series and did a lot
 more of that cleanup, I decided to try to completely eliminate the rest
 of it and move it all into drivers.
 
 This series incorporates some of Al's work and many patches of my own,
 but in the end stops short of actually removing the last part, which is
 the scsi ioctl handlers. I have patches for those as well, but they need
 more testing or possibly a rewrite.
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJdsHCdAAoJEJpsee/mABjZtYkP/1JGl3jFv3Iq/5BCdPkaePP1
 RtMJRNfURgK3GeuHUui330PvVjI/pLWXU/VXMK2MPTASpJLzYz3uCaZrpVWEMpDZ
 +ImzGmgJkITlW1uWU3zOcQhOxTyb1hCZ0Ci+2xn9QAmyOL7prXoXCXDWv3h6iyiF
 lwG+nW+HNtyx41YG+9bRfKNoG0ZJ+nkJ70BV6u0acQHXWn7Xuupa9YUmBL87hxAL
 6dlJfLTJg6q8QSv/Q6LxslfWk2Ti8OOJZOwtFM5R8Bgl0iUcvshiRCKfv/3t9jXD
 dJNvF1uq8z+gracWK49Qsfq5dnZ2ZxHFUo9u0NjbCrxNvWH/sdvhbaUBuJI75seH
 VIznCkdxFhrqitJJ8KmxANxG08u+9zSKjSlxG2SmlA4qFx/AoStoHwQXcogJscNb
 YIXYKmWBvwPzYu09QFAXdHFPmZvp/3HhMWU6o92lvDhsDwzkSGt3XKhCJea4DCaT
 m+oCcoACqSWhMwdbJOEFofSub4bY43s5iaYuKes+c8O261/Dwg6v/pgIVez9mxXm
 TBnvCsotq5m8wbwzv99eFqGeJH8zpDHrXxEtRR5KQqMqjLq/OQVaEzmpHZTEuK7n
 e/V/PAKo2/V63g4k6GApQXDxnjwT+m0aWToWoeEzPYXS6KmtWC91r4bWtslu3rdl
 bN65armTm7bFFR32Avnu
 =lgCl
 -----END PGP SIGNATURE-----

Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
 "As part of the cleanup of some remaining y2038 issues, I came to
  fs/compat_ioctl.c, which still has a couple of commands that need
  support for time64_t.

  In completely unrelated work, I spent time on cleaning up parts of
  this file in the past, moving things out into drivers instead.

  After Al Viro reviewed an earlier version of this series and did a lot
  more of that cleanup, I decided to try to completely eliminate the
  rest of it and move it all into drivers.

  This series incorporates some of Al's work and many patches of my own,
  but in the end stops short of actually removing the last part, which
  is the scsi ioctl handlers. I have patches for those as well, but they
  need more testing or possibly a rewrite"

* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
  scsi: sd: enable compat ioctls for sed-opal
  pktcdvd: add compat_ioctl handler
  compat_ioctl: move SG_GET_REQUEST_TABLE handling
  compat_ioctl: ppp: move simple commands into ppp_generic.c
  compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
  compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
  compat_ioctl: unify copy-in of ppp filters
  tty: handle compat PPP ioctls
  compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
  compat_ioctl: handle SIOCOUTQNSD
  af_unix: add compat_ioctl support
  compat_ioctl: reimplement SG_IO handling
  compat_ioctl: move WDIOC handling into wdt drivers
  fs: compat_ioctl: move FITRIM emulation into file systems
  gfs2: add compat_ioctl support
  compat_ioctl: remove unused convert_in_user macro
  compat_ioctl: remove last RAID handling code
  compat_ioctl: remove /dev/raw ioctl translation
  compat_ioctl: remove PCI ioctl translation
  compat_ioctl: remove joystick ioctl translation
  ...
2019-12-01 13:46:15 -08:00
Arnd Bergmann
407e9ef724 compat_ioctl: move drivers to compat_ptr_ioctl
Each of these drivers has a copy of the same trivial helper function to
convert the pointer argument and then call the native ioctl handler.

We now have a generic implementation of that, so use it.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-23 17:23:43 +02:00