linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-03 23:58:05 +00:00

Author	SHA1	Message	Date
Linus Torvalds	176882156a	VFIO updates for v5.19-rc1 - Improvements to mlx5 vfio-pci variant driver, including support for parallel migration per PF (Yishai Hadas) - Remove redundant iommu_present() check (Robin Murphy) - Ongoing refactoring to consolidate the VFIO driver facing API to use vfio_device (Jason Gunthorpe) - Use drvdata to store vfio_device among all vfio-pci and variant drivers (Jason Gunthorpe) - Remove redundant code now that IOMMU core manages group DMA ownership (Jason Gunthorpe) - Remove vfio_group from external API handling struct file ownership (Jason Gunthorpe) - Correct typo in uapi comments (Thomas Huth) - Fix coccicheck detected deadlock (Wan Jiabing) - Use rwsem to remove races and simplify code around container and kvm association to groups (Jason Gunthorpe) - Harden access to devices in low power states and use runtime PM to enable d3cold support for unused devices (Abhishek Sahu) - Fix dma_owner handling of fake IOMMU groups (Jason Gunthorpe) - Set driver_managed_dma on vfio-pci variant drivers (Jason Gunthorpe) - Pass KVM pointer directly rather than via notifier (Matthew Rosato) -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmKPvyMbHGFsZXgud2ls bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsihegP/3XamiYsS0GuA7awAq/X h9Jahb6kJ+sh0RXL1Gqzc9nxH5X9H/hBcL88VOV3GLwyOhNVNpVjQXGguL3aLaCE zUrs0+AFEJb990y9H+VgwIDom5BIpgdZ2naG42bz9wUeVGg4daJnkMwOgXwIBzfx IOddktN6UwuE+DyA57yqL93f+0cTrhYZx9R14sDoLR5lE4uGnbQwIknawEKVtoeR rEPaCFptxPxCUbqoOSR0Y3bu6rUYSH4iiMZpMviqm2ak3aNn76gru3q4QAnI4gTd l/w+2OJNFC0U7H5Cz7cdIn2StdJvfSkX0e753+qsFccFsViRCGdnW0Lht/xrYrFC i8AJxkrq2/bs00LXs7kzcruaD8pJ2UPe2x2+nupHSEsj99K4NraeHRB2CC1uwj0d gYliOSW5T3//wOpztK48s475VppgXeKWkXGoNY3JJlGjAPyd0vFrH8hRLhVZJ9uI /eLh6hQnOJuCDz1rQrVNRk6cZi9R1Wpl5dvCBRLqjK519nm569aTlVBra+iNyUCQ lU5/kN0ym8+X8CweE5ILPGiX2iEXBYMqv+Dm5yOimRUHRJZHYv900FX0GVEnCUCq 23sMDaeHS1hyDCQk//bd2Ig7xjh7mbh7CrKcdJ7pL5Gc/A1zkCXd54hvxViiGwQq U5KIPTyJy+erpcpxjUApaoP2 =etEI -----END PGP SIGNATURE----- Merge tag 'vfio-v5.19-rc1' of https://github.com/awilliam/linux-vfio Pull vfio updates from Alex Williamson: - Improvements to mlx5 vfio-pci variant driver, including support for parallel migration per PF (Yishai Hadas) - Remove redundant iommu_present() check (Robin Murphy) - Ongoing refactoring to consolidate the VFIO driver facing API to use vfio_device (Jason Gunthorpe) - Use drvdata to store vfio_device among all vfio-pci and variant drivers (Jason Gunthorpe) - Remove redundant code now that IOMMU core manages group DMA ownership (Jason Gunthorpe) - Remove vfio_group from external API handling struct file ownership (Jason Gunthorpe) - Correct typo in uapi comments (Thomas Huth) - Fix coccicheck detected deadlock (Wan Jiabing) - Use rwsem to remove races and simplify code around container and kvm association to groups (Jason Gunthorpe) - Harden access to devices in low power states and use runtime PM to enable d3cold support for unused devices (Abhishek Sahu) - Fix dma_owner handling of fake IOMMU groups (Jason Gunthorpe) - Set driver_managed_dma on vfio-pci variant drivers (Jason Gunthorpe) - Pass KVM pointer directly rather than via notifier (Matthew Rosato) * tag 'vfio-v5.19-rc1' of https://github.com/awilliam/linux-vfio: (38 commits) vfio: remove VFIO_GROUP_NOTIFY_SET_KVM vfio/pci: Add driver_managed_dma to the new vfio_pci drivers vfio: Do not manipulate iommu dma_owner for fake iommu groups vfio/pci: Move the unused device into low power state with runtime PM vfio/pci: Virtualize PME related registers bits and initialize to zero vfio/pci: Change the PF power state to D0 before enabling VFs vfio/pci: Invalidate mmaps and block the access in D3hot power state vfio: Change struct vfio_group::container_users to a non-atomic int vfio: Simplify the life cycle of the group FD vfio: Fully lock struct vfio_group::container vfio: Split up vfio_group_get_device_fd() vfio: Change struct vfio_group::opened from an atomic to bool vfio: Add missing locking for struct vfio_group::kvm kvm/vfio: Fix potential deadlock problem in vfio include/uapi/linux/vfio.h: Fix trivial typo - _IORW should be _IOWR instead vfio/pci: Use the struct file as the handle not the vfio_group kvm/vfio: Remove vfio_group from kvm vfio: Change vfio_group_set_kvm() to vfio_file_set_kvm() vfio: Change vfio_external_check_extension() to vfio_file_enforced_coherent() vfio: Remove vfio_external_group_match_file() ...	2022-06-01 13:49:15 -07:00
Matthew Rosato	421cfe6596	vfio: remove VFIO_GROUP_NOTIFY_SET_KVM Rather than relying on a notifier for associating the KVM with the group, let's assume that the association has already been made prior to device_open. The first time a device is opened associate the group KVM with the device. This fixes a user-triggerable oops in GVT. Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Zhi Wang <zhi.a.wang@intel.com> Link: https://lore.kernel.org/r/20220519183311.582380-2-mjrosato@linux.ibm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-24 08:41:18 -06:00
Jason Gunthorpe	a3da1ab6fb	vfio: Do not manipulate iommu dma_owner for fake iommu groups Since asserting dma ownership now causes the group to have its DMA blocked the iommu layer requires a working iommu. This means the dma_owner APIs cannot be used on the fake groups that VFIO creates. Test for this and avoid calling them. Otherwise asserting dma ownership will fail for VFIO mdev devices as a BLOCKING iommu_domain cannot be allocated due to the NULL iommu ops. Fixes: `0286300e60` ("iommu: iommu_group_claim_dma_owner() must always assign a domain") Reported-by: Eric Farman <farman@linux.ibm.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/0-v1-9cfc47edbcd4+13546-vfio_dma_owner_fix_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-23 10:27:43 -06:00
Joerg Roedel	b0dacee202	Merge branches 'apple/dart', 'arm/mediatek', 'arm/msm', 'arm/smmu', 'ppc/pamu', 'x86/vt-d', 'x86/amd' and 'vfio-notifier-fix' into next	2022-05-20 12:27:17 +02:00
Jason Gunthorpe	3ca5470878	vfio: Change struct vfio_group::container_users to a non-atomic int Now that everything is fully locked there is no need for container_users to remain as an atomic, change it to an unsigned int. Use 'if (group->container)' as the test to determine if the container is present or not instead of using container_users. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/6-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	b76c0eed74	vfio: Simplify the life cycle of the group FD Once userspace opens a group FD it is prevented from opening another instance of that same group FD until all the prior group FDs and users of the container are done. The first is done trivially by checking the group->opened during group FD open. However, things get a little weird if userspace creates a device FD and then closes the group FD. The group FD still cannot be re-opened, but this time it is because the group->container is still set and container_users is elevated by the device FD. Due to this mismatched lifecycle we have the vfio_group_try_dissolve_container() which tries to auto-free a container after the group FD is closed but the device FD remains open. Instead have the device FD hold onto a reference to the single group FD. This directly prevents vfio_group_fops_release() from being called when any device FD exists and makes the lifecycle model more understandable. vfio_group_try_dissolve_container() is removed as the only place a container is auto-deleted is during vfio_group_fops_release(). At this point the container_users is either 1 or 0 since all device FDs must be closed. Change group->opened to group->opened_file which points to the single struct file * that is open for the group. If the group->open_file is NULL then group->container == NULL. If all device FDs have closed then the group's notifier list must be empty. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/5-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	e0e29bdb59	vfio: Fully lock struct vfio_group::container This is necessary to avoid various user triggerable races, for instance racing SET_CONTAINER/UNSET_CONTAINER: ioctl(VFIO_GROUP_SET_CONTAINER) ioctl(VFIO_GROUP_UNSET_CONTAINER) vfio_group_unset_container int users = atomic_cmpxchg(&group->container_users, 1, 0); // users == 1 container_users == 0 __vfio_group_unset_container(group); container = group->container; vfio_group_set_container() if (!atomic_read(&group->container_users)) down_write(&container->group_lock); group->container = container; up_write(&container->group_lock); down_write(&container->group_lock); group->container = NULL; up_write(&container->group_lock); vfio_container_put(container); /* woops we lost/leaked the new container */ This can then go on to NULL pointer deref since container == 0 and container_users == 1. Wrap all touches of container, except those on a performance path with a known open device, with the group_rwsem. The only user of vfio_group_add_container_user() holds the user count for a simple operation, change it to just hold the group_lock over the operation and delete vfio_group_add_container_user(). Containers now only gain a user when a device FD is opened. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/4-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	805bb6c1bd	vfio: Split up vfio_group_get_device_fd() The split follows the pairing with the destroy functions: - vfio_group_get_device_fd() destroyed by close() - vfio_device_open() destroyed by vfio_device_fops_release() - vfio_device_assign_container() destroyed by vfio_group_try_dissolve_container() The next patch will put a lock around vfio_device_assign_container(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/3-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	c6f4860ef9	vfio: Change struct vfio_group::opened from an atomic to bool This is not a performance path, just use the group_rwsem to protect the value. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/2-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	be8d3adae6	vfio: Add missing locking for struct vfio_group::kvm Without locking userspace can trigger a UAF by racing KVM_DEV_VFIO_GROUP_DEL with VFIO_GROUP_GET_DEVICE_FD: CPU1 CPU2 ioctl(KVM_DEV_VFIO_GROUP_DEL) ioctl(VFIO_GROUP_GET_DEVICE_FD) vfio_group_get_device_fd open_device() intel_vgpu_open_device() vfio_register_notifier() vfio_register_group_notifier() blocking_notifier_call_chain(&group->notifier, VFIO_GROUP_NOTIFY_SET_KVM, group->kvm); set_kvm() group->kvm = NULL close() kfree(kvm) intel_vgpu_group_notifier() vdev->kvm = data [..] kvm_get_kvm(vgpu->kvm); // UAF! Add a simple rwsem in the group to protect the kvm while the notifier is using it. Note this doesn't fix the race internal to i915 where userspace can trigger two VFIO_GROUP_NOTIFY_SET_KVM's before we reach a consumer of vgpu->kvm and trigger this same UAF, it just makes the notifier self-consistent. Fixes: `ccd46dbae7` ("vfio: support notifier chain in vfio_group") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/1-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	6a985ae80b	vfio/pci: Use the struct file as the handle not the vfio_group VFIO PCI does a security check as part of hot reset to prove that the user has permission to manipulate all the devices that will be impacted by the reset. Use a new API vfio_file_has_dev() to perform this security check against the struct file directly and remove the vfio_group from VFIO PCI. Since VFIO PCI was the last user of vfio_group_get_external_user() and vfio_group_put_external_user() remove it as well. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:14:20 -06:00
Jason Gunthorpe	ba70a89f3c	vfio: Change vfio_group_set_kvm() to vfio_file_set_kvm() Just change the argument from struct vfio_group to struct file *. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:14:20 -06:00
Jason Gunthorpe	a905ad043f	vfio: Change vfio_external_check_extension() to vfio_file_enforced_coherent() Instead of a general extension check change the function into a limited test if the iommu_domain has enforced coherency, which is the only thing kvm needs to query. Make the new op self contained by properly refcounting the container before touching it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:14:20 -06:00
Jason Gunthorpe	c38ff5b0c3	vfio: Remove vfio_external_group_match_file() vfio_group_fops_open() ensures there is only ever one struct file open for any struct vfio_group at any time: /* Do we need multiple instances of the group open? Seems not. / opened = atomic_cmpxchg(&group->opened, 0, 1); if (opened) { vfio_group_put(group); return -EBUSY; Therefor the struct file can be used directly to search the list of VFIO groups that KVM keeps instead of using the vfio_external_group_match_file() callback to try to figure out if the passed in FD matches the list or not. Delete vfio_external_group_match_file(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:14:19 -06:00
Jason Gunthorpe	50d63b5bbf	vfio: Change vfio_external_user_iommu_id() to vfio_file_iommu_group() The only caller wants to get a pointer to the struct iommu_group associated with the VFIO group file. Instead of returning the group ID then searching sysfs for that string to get the struct iommu_group just directly return the iommu_group pointer already held by the vfio_group struct. It already has a safe lifetime due to the struct file kref, the vfio_group and thus the iommu_group cannot be destroyed while the group file is open. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:14:19 -06:00
Jason Gunthorpe	dc15f82f53	vfio: Delete container_q Now that the iommu core takes care of isolation there is no race between driver attach and container unset. Once iommu_group_release_dma_owner() returns the device can immediately be re-used. Remove this mechanism. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-a1e8791d795b+6b-vfio_container_q_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:08:02 -06:00
Alex Williamson	c5e8c39282	Merge remote-tracking branch 'iommu/vfio-notifier-fix' into v5.19/vfio/next Merge IOMMU dependencies for vfio. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:04:22 -06:00
Jason Gunthorpe	ff806cbd90	vfio/pci: Remove vfio_device_get_from_dev() The last user of this function is in PCI callbacks that want to convert their struct pci_dev to a vfio_device. Instead of searching use the vfio_device available trivially through the drvdata. When a callback in the device_driver is called, the caller must hold the device_lock() on dev. The purpose of the device_lock is to prevent remove() from being called (see __device_release_driver), and allow the driver to safely interact with its drvdata without races. The PCI core correctly follows this and holds the device_lock() when calling error_detected (see report_error_detected) and sriov_configure (see sriov_numvfs_store). Further, since the drvdata holds a positive refcount on the vfio_device any access of the drvdata, under the device_lock(), from a driver callback needs no further protection or refcounting. Thus the remark in the vfio_device_get_from_dev() comment does not apply here, VFIO PCI drivers all call vfio_unregister_group_dev() from their remove callbacks under the device_lock() and cannot race with the remaining callers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:32:56 -06:00
Jason Gunthorpe	eadd86f835	vfio: Remove calls to vfio_group_add_container_user() When the open_device() op is called the container_users is incremented and held incremented until close_device(). Thus, so long as drivers call functions within their open_device()/close_device() region they do not need to worry about the container_users. These functions can all only be called between open_device() and close_device(): vfio_pin_pages() vfio_unpin_pages() vfio_dma_rw() vfio_register_notifier() vfio_unregister_notifier() Eliminate the calls to vfio_group_add_container_user() and add vfio_assert_device_open() to detect driver mis-use. This causes the close_device() op to check device->open_count so always leave it elevated while calling the op. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:13:00 -06:00
Jason Gunthorpe	231657b345	vfio: Remove dead code Now that callers have been updated to use the vfio_device APIs the driver facing group interface is no longer used, delete it: - vfio_group_get_external_user_from_dev() - vfio_group_pin_pages() - vfio_group_unpin_pages() - vfio_group_iommu_domain() -- Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:13:00 -06:00
Jason Gunthorpe	c6250ffbac	vfio/mdev: Pass in a struct vfio_device * to vfio_dma_rw() Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. Change vfio_dma_rw() to take in the struct vfio_device and move the container users that would have been held by vfio_group_get_external_user_from_dev() to vfio_dma_rw() directly, like vfio_pin/unpin_pages(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:59 -06:00
Jason Gunthorpe	8e432bb015	vfio/mdev: Pass in a struct vfio_device * to vfio_pin/unpin_pages() Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. The struct vfio_device already contains the group we need so this avoids complexity, extra refcountings, and a confusing lifecycle model. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:59 -06:00
Jason Gunthorpe	09ea48efff	vfio: Make vfio_(un)register_notifier accept a vfio_device All callers have a struct vfio_device trivially available, pass it in directly and avoid calling the expensive vfio_group_get_from_dev(). Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:58 -06:00
Robin Murphy	a77109ffca	vfio: Stop using iommu_present() IOMMU groups have been mandatory for some time now, so a device without one is necessarily a device without any usable IOMMU, therefore the iommu_present() check is redundant (or at best unhelpful). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/537103bbd7246574f37f2c88704d7824a3a889f2.1649160714.git.robin.murphy@arm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:58 -06:00
Jason Gunthorpe	e8ae0e140c	vfio: Require that devices support DMA cache coherence IOMMU_CACHE means that normal DMAs do not require any additional coherency mechanism and is the basic uAPI that VFIO exposes to userspace. For instance VFIO applications like DPDK will not work if additional coherency operations are required. Therefore check IOMMU_CAP_CACHE_COHERENCY like vdpa & usnic do before allowing an IOMMU backed VFIO device to be created. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 17:24:57 +02:00
Lu Baolu	3b86f317c9	vfio: Remove iommu group notifier The iommu core and driver core have been enhanced to avoid unsafe driver binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER) has been called. There's no need to register iommu group notifier. This removes the iommu group notifer which contains BUG_ON() and WARN(). Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-11-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 15:32:20 +02:00
Jason Gunthorpe	93219ea943	vfio: Delete the unbound_list commit `60720a0fc6` ("vfio: Add device tracking during unbind") added the unbound list to plug a problem with KVM where KVM_DEV_VFIO_GROUP_DEL relied on vfio_group_get_external_user() succeeding to return the vfio_group from a group file descriptor. The unbound list allowed vfio_group_get_external_user() to continue to succeed in edge cases. However commit `5d6dee80a1` ("vfio: New external user group/file match") deleted the call to vfio_group_get_external_user() during KVM_DEV_VFIO_GROUP_DEL. Instead vfio_external_group_match_file() is used to directly match the file descriptor to the group pointer. This in turn avoids the call down to vfio_dev_viable() during KVM_DEV_VFIO_GROUP_DEL and also avoids the trouble the first commit was trying to fix. There are no other users of vfio_dev_viable() that care about the time after vfio_unregister_group_dev() returns, so simply delete the unbound_list entirely. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-10-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 15:32:20 +02:00
Lu Baolu	31076af0cb	vfio: Remove use of vfio_group_viable() As DMA ownership is claimed for the iommu group when a VFIO group is added to a VFIO container, the VFIO group viability is guaranteed as long as group->container_users > 0. Remove those unnecessary group viability checks which are only hit when group->container_users is not zero. The only remaining reference is in GROUP_GET_STATUS, which could be called at any time when group fd is valid. Here we just replace the vfio_group_viable() by directly calling IOMMU core to get viability status. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-9-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 15:32:20 +02:00
Lu Baolu	70693f4708	vfio: Set DMA ownership for VFIO devices Claim group dma ownership when an IOMMU group is set to a container, and release the dma ownership once the iommu group is unset from the container. This change disallows some unsafe bridge drivers to bind to non-ACS bridges while devices under them are assigned to user space. This is an intentional enhancement and possibly breaks some existing configurations. The recommendation to such an affected user would be that the previously allowed host bridge driver was unsafe for this use case and to continue to enable assignment of devices within that group, the driver should be unbound from the bridge device or replaced with the pci-stub driver. For any bridge driver, we consider it unsafe if it satisfies any of the following conditions: 1) The bridge driver uses DMA. Calling pci_set_master() or calling any kernel DMA API (dma_map_*() and etc.) is an indicate that the driver is doing DMA. 2) If the bridge driver uses MMIO, it should be tolerant to hostile userspace also touching the same MMIO registers via P2P DMA attacks. If the bridge driver turns out to be a safe one, it could be used as before by setting the driver's .driver_managed_dma field, just like what we have done in the pcieport driver. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 15:32:20 +02:00
Jason Gunthorpe	8cb3d83b95	vfio: Extend the device migration protocol with RUNNING_P2P The RUNNING_P2P state is designed to support multiple devices in the same VM that are doing P2P transactions between themselves. When in RUNNING_P2P the device must be able to accept incoming P2P transactions but should not generate outgoing P2P transactions. As an optional extension to the mandatory states it is defined as in between STOP and RUNNING: STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP For drivers that are unable to support RUNNING_P2P the core code silently merges RUNNING_P2P and RUNNING together. Unless driver support is present, the new state cannot be used in SET_STATE. Drivers that support this will be required to implement 4 FSM arcs beyond the basic FSM. 2 of the basic FSM arcs become combination transitions. Compared to the v1 clarification, NDMA is redefined into FSM states and is described in terms of the desired P2P quiescent behavior, noting that halting all DMA is an acceptable implementation. Link: https://lore.kernel.org/all/20220224142024.147653-11-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-03-03 13:00:16 +02:00
Jason Gunthorpe	115dcec65f	vfio: Define device migration protocol v2 Replace the existing region based migration protocol with an ioctl based protocol. The two protocols have the same general semantic behaviors, but the way the data is transported is changed. This is the STOP_COPY portion of the new protocol, it defines the 5 states for basic stop and copy migration and the protocol to move the migration data in/out of the kernel. Compared to the clarification of the v1 protocol Alex proposed: https://lore.kernel.org/r/163909282574.728533.7460416142511440919.stgit@omen This has a few deliberate functional differences: - ERROR arcs allow the device function to remain unchanged. - The protocol is not required to return to the original state on transition failure. Instead userspace can execute an unwind back to the original state, reset, or do something else without needing kernel support. This simplifies the kernel design and should userspace choose a policy like always reset, avoids doing useless work in the kernel on error handling paths. - PRE_COPY is made optional, userspace must discover it before using it. This reflects the fact that the majority of drivers we are aware of right now will not implement PRE_COPY. - segmentation is not part of the data stream protocol, the receiver does not have to reproduce the framing boundaries. The hybrid FSM for the device_state is described as a Mealy machine by documenting each of the arcs the driver is required to implement. Defining the remaining set of old/new device_state transitions as 'combination transitions' which are naturally defined as taking multiple FSM arcs along the shortest path within the FSM's digraph allows a complete matrix of transitions. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is defined to replace writing to the device_state field in the region. This allows returning a brand new FD whenever the requested transition opens a data transfer session. The VFIO core code implements the new feature and provides a helper function to the driver. Using the helper the driver only has to implement 6 of the FSM arcs and the other combination transitions are elaborated consistently from those arcs. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIGRATION is defined to report the capability for migration and indicate which set of states and arcs are supported by the device. The FSM provides a lot of flexibility to make backwards compatible extensions but the VFIO_DEVICE_FEATURE also allows for future breaking extensions for scenarios that cannot support even the basic STOP_COPY requirements. The VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE with the GET option (i.e. VFIO_DEVICE_FEATURE_GET) can be used to read the current migration state of the VFIO device. Data transfer sessions are now carried over a file descriptor, instead of the region. The FD functions for the lifetime of the data transfer session. read() and write() transfer the data with normal Linux stream FD semantics. This design allows future expansion to support poll(), io_uring, and other performance optimizations. The complicated mmap mode for data transfer is discarded as current qemu doesn't take meaningful advantage of it, and the new qemu implementation avoids substantially all the performance penalty of using a read() on the region. Link: https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-03-03 12:57:39 +02:00
Jason Gunthorpe	445ad495f0	vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl Invoke a new device op 'device_feature' to handle just the data array portion of the command. This lifts the ioctl validation to the core code and makes it simpler for either the core code, or layered drivers, to implement their own feature values. Provide vfio_check_feature() to consolidate checking the flags/etc against what the driver supports. Link: https://lore.kernel.org/all/20220224142024.147653-9-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-03-03 12:54:43 +02:00
Randy Dunlap	3b9a2d5793	vfio: remove all kernel-doc notation vfio.c abuses (misuses) "/", which indicates the beginning of kernel-doc notation in the kernel tree. This causes a bunch of kernel-doc complaints about this source file, so quieten all of them by changing all "/" to "/". vfio.c:236: warning: This comment starts with '/', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst IOMMU driver registration vfio.c:236: warning: missing initial short description on line: * IOMMU driver registration vfio.c:295: warning: expecting prototype for Container objects(). Prototype was for vfio_container_get() instead vfio.c:317: warning: expecting prototype for Group objects(). Prototype was for __vfio_group_get_from_iommu() instead vfio.c:496: warning: Function parameter or member 'device' not described in 'vfio_device_put' vfio.c:496: warning: expecting prototype for Device objects(). Prototype was for vfio_device_put() instead vfio.c:599: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Async device support vfio.c:599: warning: missing initial short description on line: * Async device support vfio.c:693: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst VFIO driver API vfio.c:693: warning: missing initial short description on line: * VFIO driver API vfio.c:835: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Get a reference to the vfio_device for a device. Even if the vfio.c:835: warning: missing initial short description on line: * Get a reference to the vfio_device for a device. Even if the vfio.c:969: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst VFIO base fd, /dev/vfio/vfio vfio.c:969: warning: missing initial short description on line: * VFIO base fd, /dev/vfio/vfio vfio.c:1187: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst VFIO Group fd, /dev/vfio/$GROUP vfio.c:1187: warning: missing initial short description on line: * VFIO Group fd, /dev/vfio/$GROUP vfio.c:1540: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst VFIO Device fd vfio.c:1540: warning: missing initial short description on line: * VFIO Device fd vfio.c:1615: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst External user API, exported by symbols to be linked dynamically. vfio.c:1615: warning: missing initial short description on line: * External user API, exported by symbols to be linked dynamically. vfio.c:1663: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst External user API, exported by symbols to be linked dynamically. vfio.c:1663: warning: missing initial short description on line: * External user API, exported by symbols to be linked dynamically. vfio.c:1742: warning: Function parameter or member 'caps' not described in 'vfio_info_cap_add' vfio.c:1742: warning: Function parameter or member 'size' not described in 'vfio_info_cap_add' vfio.c:1742: warning: Function parameter or member 'id' not described in 'vfio_info_cap_add' vfio.c:1742: warning: Function parameter or member 'version' not described in 'vfio_info_cap_add' vfio.c:1742: warning: expecting prototype for Sub(). Prototype was for vfio_info_cap_add() instead vfio.c:2276: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Module/class support vfio.c:2276: warning: missing initial short description on line: * Module/class support Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: kvm@vger.kernel.org Link: https://lore.kernel.org/r/38a9cb92-a473-40bf-b8f9-85cc5cfc2da4@infradead.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-11-30 11:41:43 -07:00
Jason Gunthorpe	9cef73918e	vfio: Use cdev_device_add() instead of device_create() Modernize how vfio is creating the group char dev and sysfs presence. These days drivers with state should use cdev_device_add() and cdev_device_del() to manage the cdev and sysfs lifetime. This API requires the driver to put the struct device and struct cdev inside its state struct (vfio_group), and then use the usual device_initialize()/cdev_device_add()/cdev_device_del() sequence. Split the code to make this possible: - vfio_group_alloc()/vfio_group_release() are pair'd functions to alloc/free the vfio_group. release is done under the struct device kref. - vfio_create_group()/vfio_group_put() are pairs that manage the sysfs/cdev lifetime. Once the uses count is zero the vfio group's userspace presence is destroyed. - The IDR is replaced with an IDA. container_of(inode->i_cdev) is used to get back to the vfio_group during fops open. The IDA assigns unique minor numbers. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-10-15 13:58:20 -06:00
Jason Gunthorpe	2b678aa2f0	vfio: Use a refcount_t instead of a kref in the vfio_group The next patch adds a struct device to the struct vfio_group, and it is confusing/bad practice to have two krefs in the same struct. This kref is controlling the period when the vfio_group is registered in sysfs, and visible in the internal lookup. Switch it to a refcount_t instead. The refcount_dec_and_mutex_lock() is still required because we need atomicity of the list searches and sysfs presence. Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-10-15 13:58:20 -06:00
Jason Gunthorpe	325a31c920	vfio: Don't leak a group reference if the group already exists If vfio_create_group() searches the group list and returns an already existing group it does not put back the iommu_group reference that the caller passed in. Change the semantic of vfio_create_group() to not move the reference in from the caller, but instead obtain a new reference inside and leave the caller's reference alone. The two callers must now call iommu_group_put(). This is an unlikely race as the only caller that could hit it has already searched the group list before attempting to create the group. Fixes: `cba3345cc4` ("vfio: VFIO core") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-10-15 13:58:20 -06:00
Jason Gunthorpe	1ceabade1d	vfio: Do not open code the group list search in vfio_create_group() Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu() so that vfio_create_group() can call it to consolidate this duplicated code. Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-10-15 13:58:19 -06:00
Jason Gunthorpe	63b150fde7	vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() iommu_group_register_notifier()/iommu_group_unregister_notifier() are built using a blocking_notifier_chain which integrates a rwsem. The notifier function cannot be running outside its registration. When considering how the notifier function interacts with create/destroy of the group there are two fringe cases, the notifier starts before list_add(&vfio.group_list) and the notifier runs after the kref becomes 0. Prior to vfio_create_group() unlocking and returning we have container_users == 0 device_list == empty And this cannot change until the mutex is unlocked. After the kref goes to zero we must also have container_users == 0 device_list == empty Both are required because they are balanced operations and a 0 kref means some caller became unbalanced. Add the missing assertion that container_users must be zero as well. These two facts are important because when checking each operation we see: - IOMMU_GROUP_NOTIFY_ADD_DEVICE Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev() 0 container_users ends the call - IOMMU_GROUP_NOTIFY_BOUND_DRIVER 0 container_users ends the call Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes items from the unbound list. During creation this list is empty, during kref == 0 nothing can read this list, and it will be freed soon. Since the vfio_group_release() doesn't hold the appropriate lock to manipulate the unbound_list and could race with the notifier, move the cleanup to directly before the kfree. This allows deleting all of the deferred group put code. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-10-15 13:58:19 -06:00
Christoph Hellwig	c3c0fa9d94	vfio: clean up the check for mediated device in vfio_iommu_type1 Pass the group flags to ->attach_group and remove the messy check for the bus type. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-12-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:44 -06:00
Christoph Hellwig	8cc02d22d7	vfio: move the vfio_iommu_driver_ops interface out of <linux/vfio.h> Create a new private drivers/vfio/vfio.h header for the interface between the VFIO core and the iommu drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-10-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:44 -06:00
Christoph Hellwig	6746203787	vfio: remove unused method from vfio_iommu_driver_ops The read, write and mmap methods are never implemented, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-9-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:44 -06:00
Christoph Hellwig	c68ea0d00a	vfio: simplify iommu group allocation for mediated devices Reuse the logic in vfio_noiommu_group_alloc to allocate a fake single-device iommu group for mediated devices by factoring out a common function, and replacing the noiommu boolean field in struct vfio_group with an enum to distinguish the three different kinds of groups. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-8-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:44 -06:00
Christoph Hellwig	c04ac34078	vfio: remove the iommudata hack for noiommu groups Just pass a noiommu argument to vfio_create_group and set up the ->noiommu flag directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-7-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:44 -06:00
Christoph Hellwig	3af9177132	vfio: refactor noiommu group creation Split the actual noiommu group creation from vfio_iommu_group_get into a new helper, and open code the rest of vfio_iommu_group_get in its only caller. This creates an entirely separate and clear code path for the noiommu group creation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-6-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:44 -06:00
Christoph Hellwig	1362591f15	vfio: factor out a vfio_group_find_or_alloc helper Factor out a helper to find or allocate the vfio_group to reduce the spagetthi code in vfio_register_group_dev a little. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-5-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:43 -06:00
Christoph Hellwig	c5b4ba9730	vfio: remove the iommudata check in vfio_noiommu_attach_group vfio_noiommu_attach_group has two callers: 1) __vfio_container_attach_groups is called by vfio_ioctl_set_iommu, which just called vfio_iommu_driver_allowed 2) vfio_group_set_container requires already checks ->noiommu on the vfio_group, which is propagated from the iommudata in vfio_create_group so this check is entirely superflous and can be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-4-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:43 -06:00
Christoph Hellwig	b00621603d	vfio: factor out a vfio_iommu_driver_allowed helper Factor out a little helper to make the checks for the noiommu driver less ugly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-3-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:43 -06:00
Jason Gunthorpe	38a68934aa	vfio: Move vfio_iommu_group_get() to vfio_register_group_dev() We don't need to hold a reference to the group in the driver as well as obtain a reference to the same group as the first thing vfio_register_group_dev() does. Since the drivers never use the group move this all into the core code. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-2-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:43 -06:00
Jason Gunthorpe	eb24c1007e	vfio: Remove struct vfio_device_ops open/release Nothing uses this anymore, delete it. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/14-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-08-11 09:50:11 -06:00
Jason Gunthorpe	2fd585f4ed	vfio: Provide better generic support for open/release vfio_device_ops Currently the driver ops have an open/release pair that is called once each time a device FD is opened or closed. Add an additional set of open/close_device() ops which are called when the device FD is opened for the first time and closed for the last time. An analysis shows that all of the drivers require this semantic. Some are open coding it as part of their reflck implementation, and some are just buggy and miss it completely. To retain the current semantics PCI and FSL depend on, introduce the idea of a "device set" which is a grouping of vfio_device's that share the same lock around opening. The device set is established by providing a 'set_id' pointer. All vfio_device's that provide the same pointer will be joined to the same singleton memory and lock across the whole set. This effectively replaces the oddly named reflck. After conversion the set_id will be sourced from: - A struct device from a fsl_mc_device (fsl) - A struct pci_slot (pci) - A struct pci_bus (pci) - The struct vfio_device (everything) The design ensures that the above pointers are live as long as the vfio_device is registered, so they form reliable unique keys to group vfio_devices into sets. This implementation uses xarray instead of searching through the driver core structures, which simplifies the somewhat tricky locking in this area. Following patches convert all the drivers. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-08-11 09:50:10 -06:00

1 2 3

144 commits