linux-stable/drivers/iommu
Jason Gunthorpe 8d160cd4d5 iommufd: Algorithms for PFN storage
The iopt_pages which represents a logical linear list of full PFNs held in
different storage tiers. Each area points to a slice of exactly one
iopt_pages, and each iopt_pages can have multiple areas and accesses.

The three storage tiers are managed to meet these objectives:

 - If no iommu_domain or in-kerenel access exists then minimal memory
   should be consumed by iomufd
 - If a page has been pinned then an iopt_pages will not pin it again
 - If an in-kernel access exists then the xarray must provide the backing
   storage to avoid allocations on domain removals
 - Otherwise any iommu_domain will be used for storage

In a common configuration with only an iommu_domain the iopt_pages does
not allocate significant memory itself.

The external interface for pages has several logical operations:

  iopt_area_fill_domain() will load the PFNs from storage into a single
  domain. This is used when attaching a new domain to an existing IOAS.

  iopt_area_fill_domains() will load the PFNs from storage into multiple
  domains. This is used when creating a new IOVA map in an existing IOAS

  iopt_pages_add_access() creates an iopt_pages_access that tracks an
  in-kernel access of PFNs. This is some external driver that might be
  accessing the IOVA using the CPU, or programming PFNs with the DMA
  API. ie a VFIO mdev.

  iopt_pages_rw_access() directly perform a memcpy on the PFNs, without
  the overhead of iopt_pages_add_access()

  iopt_pages_fill_xarray() will load PFNs into the xarray and return a
  'struct page *' array. It is used by iopt_pages_access's to extract PFNs
  for in-kernel use. iopt_pages_fill_from_xarray() is a fast path when it
  is known the xarray is already filled.

As an iopt_pages can be referred to in slices by many areas and accesses
it uses interval trees to keep track of which storage tiers currently hold
the PFNs. On a page-by-page basis any request for a PFN will be satisfied
from one of the storage tiers and the PFN copied to target domain/array.

Unfill actions are similar, on a page by page basis domains are unmapped,
xarray entries freed or struct pages fully put back.

Significant complexity is required to fully optimize all of these data
motions. The implementation calculates the largest consecutive range of
same-storage indexes and operates in blocks. The accumulation of PFNs
always generates the largest contiguous PFN range possible to optimize and
this gathering can cross storage tier boundaries. For cases like 'fill
domains' care is taken to avoid duplicated work and PFNs are read once and
pushed into all domains.

The map/unmap interaction with the iommu_domain always works in contiguous
PFN blocks. The implementation does not require or benefit from any
split/merge optimization in the iommu_domain driver.

This design suggests several possible improvements in the IOMMU API that
would greatly help performance, particularly a way for the driver to map
and read the pfns lists instead of working with one driver call per page
to read, and one driver call per contiguous range to store.

Link: https://lore.kernel.org/r/9-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30 20:16:49 -04:00
..
amd iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY 2022-11-29 16:34:15 -04:00
arm iommu: Define EINVAL as device/domain incompatibility 2022-11-03 15:51:48 +01:00
intel iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY 2022-11-29 16:34:15 -04:00
iommufd iommufd: Algorithms for PFN storage 2022-11-30 20:16:49 -04:00
apple-dart.c iommu: Add gfp parameter to iommu_alloc_resv_region 2022-10-21 10:49:32 +02:00
dma-iommu.c iommu/dma: Make header private 2022-09-09 09:26:22 +02:00
dma-iommu.h iommu/dma: Make header private 2022-09-09 09:26:22 +02:00
exynos-iommu.c iommu/exynos: Clean up bus_set_iommu() 2022-09-07 14:26:14 +02:00
fsl_pamu.c iommu: Regulate EINVAL in ->attach_dev callback functions 2022-11-01 14:39:59 -03:00
fsl_pamu.h iommu/fsl_pamu: hardcode the window address and size in pamu_config_ppaace 2021-04-07 10:56:52 +02:00
fsl_pamu_domain.c iommu: Regulate EINVAL in ->attach_dev callback functions 2022-11-01 14:39:59 -03:00
fsl_pamu_domain.h iommu/fsl_pamu: remove the snoop_id field 2021-04-07 10:56:52 +02:00
hyperv-iommu.c iommu/hyper-v: Use helper instead of directly accessing affinity 2022-08-04 10:02:09 +01:00
io-pgfault.c iommu: Rename iommu-sva-lib.{c,h} 2022-11-03 15:47:54 +01:00
io-pgtable-arm-v7s.c iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit 2022-07-07 09:42:59 +02:00
io-pgtable-arm.c Merge branches 'apple/dart', 'arm/mediatek', 'arm/omap', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2022-09-26 15:52:31 +02:00
io-pgtable-arm.h iommu/io-pgtable-arm: Move some definitions to a header 2020-09-28 23:48:06 +01:00
io-pgtable-dart.c iommu/io-pgtable-dart: Add DART PTE support for t6000 2022-09-26 13:49:40 +02:00
io-pgtable.c Merge branches 'apple/dart', 'arm/mediatek', 'arm/omap', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2022-09-26 15:52:31 +02:00
ioasid.c iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit 2022-02-15 11:31:35 +01:00
iommu-debugfs.c
iommu-sva.c iommu: Rename iommu-sva-lib.{c,h} 2022-11-03 15:47:54 +01:00
iommu-sva.h iommu: Rename iommu-sva-lib.{c,h} 2022-11-03 15:47:54 +01:00
iommu-sysfs.c drivers/iommu: Export core IOMMU API symbols to permit modular drivers 2019-12-23 14:06:05 +01:00
iommu-traces.c
iommu.c iommu: Add device-centric DMA ownership interfaces 2022-11-29 16:34:15 -04:00
iova.c iova: Remove iovad->rcaches check in iova_rcache_get() 2022-09-09 09:27:03 +02:00
ipmmu-vmsa.c iommu: Use EINVAL for incompatible device/domain in ->attach_dev 2022-11-01 14:39:59 -03:00
irq_remapping.c x86: Kill all traces of irq_remapping_get_irq_domain() 2020-10-28 20:26:28 +01:00
irq_remapping.h x86: Kill all traces of irq_remapping_get_irq_domain() 2020-10-28 20:26:28 +01:00
Kconfig iommufd: File descriptor, context, kconfig and makefiles 2022-11-30 20:16:49 -04:00
Makefile iommufd: File descriptor, context, kconfig and makefiles 2022-11-30 20:16:49 -04:00
msm_iommu.c iommu: Clean up bus_set_iommu() 2022-09-07 14:26:17 +02:00
msm_iommu.h
msm_iommu_hw-8xxx.h
mtk_iommu.c iommu: Propagate return value in ->attach_dev callback functions 2022-11-01 14:39:59 -03:00
mtk_iommu_v1.c iommu/mtk: Clean up bus_set_iommu() 2022-09-07 14:26:15 +02:00
of_iommu.c Revert "iommu/of: Delete usage of driver_deferred_probe_check_state()" 2022-08-23 13:14:02 +02:00
omap-iommu-debug.c iommu/omap: Fix buffer overflow in debugfs 2022-09-07 10:42:28 +02:00
omap-iommu.c iommu: Use EINVAL for incompatible device/domain in ->attach_dev 2022-11-01 14:39:59 -03:00
omap-iommu.h
omap-iopgtable.h iommu/omap: Fix -Woverflow warnings when compiling on 64-bit architectures 2020-03-04 16:24:46 +01:00
rockchip-iommu.c iommu: Clean up bus_set_iommu() 2022-09-07 14:26:17 +02:00
s390-iommu.c iommu: Clean up bus_set_iommu() 2022-09-07 14:26:17 +02:00
sprd-iommu.c iommu: Use EINVAL for incompatible device/domain in ->attach_dev 2022-11-01 14:39:59 -03:00
sun50i-iommu.c iommu: Clean up bus_set_iommu() 2022-09-07 14:26:17 +02:00
tegra-gart.c iommu: Use EINVAL for incompatible device/domain in ->attach_dev 2022-11-01 14:39:59 -03:00
tegra-smmu.c iommu/tegra-smmu: Clean up bus_set_iommu() 2022-09-07 14:26:16 +02:00
virtio-iommu.c iommu: Propagate return value in ->attach_dev callback functions 2022-11-01 14:39:59 -03:00