linux-stable/drivers
Jason Gunthorpe 8d160cd4d5 iommufd: Algorithms for PFN storage
The iopt_pages which represents a logical linear list of full PFNs held in
different storage tiers. Each area points to a slice of exactly one
iopt_pages, and each iopt_pages can have multiple areas and accesses.

The three storage tiers are managed to meet these objectives:

 - If no iommu_domain or in-kerenel access exists then minimal memory
   should be consumed by iomufd
 - If a page has been pinned then an iopt_pages will not pin it again
 - If an in-kernel access exists then the xarray must provide the backing
   storage to avoid allocations on domain removals
 - Otherwise any iommu_domain will be used for storage

In a common configuration with only an iommu_domain the iopt_pages does
not allocate significant memory itself.

The external interface for pages has several logical operations:

  iopt_area_fill_domain() will load the PFNs from storage into a single
  domain. This is used when attaching a new domain to an existing IOAS.

  iopt_area_fill_domains() will load the PFNs from storage into multiple
  domains. This is used when creating a new IOVA map in an existing IOAS

  iopt_pages_add_access() creates an iopt_pages_access that tracks an
  in-kernel access of PFNs. This is some external driver that might be
  accessing the IOVA using the CPU, or programming PFNs with the DMA
  API. ie a VFIO mdev.

  iopt_pages_rw_access() directly perform a memcpy on the PFNs, without
  the overhead of iopt_pages_add_access()

  iopt_pages_fill_xarray() will load PFNs into the xarray and return a
  'struct page *' array. It is used by iopt_pages_access's to extract PFNs
  for in-kernel use. iopt_pages_fill_from_xarray() is a fast path when it
  is known the xarray is already filled.

As an iopt_pages can be referred to in slices by many areas and accesses
it uses interval trees to keep track of which storage tiers currently hold
the PFNs. On a page-by-page basis any request for a PFN will be satisfied
from one of the storage tiers and the PFN copied to target domain/array.

Unfill actions are similar, on a page by page basis domains are unmapped,
xarray entries freed or struct pages fully put back.

Significant complexity is required to fully optimize all of these data
motions. The implementation calculates the largest consecutive range of
same-storage indexes and operates in blocks. The accumulation of PFNs
always generates the largest contiguous PFN range possible to optimize and
this gathering can cross storage tier boundaries. For cases like 'fill
domains' care is taken to avoid duplicated work and PFNs are read once and
pushed into all domains.

The map/unmap interaction with the iommu_domain always works in contiguous
PFN blocks. The implementation does not require or benefit from any
split/merge optimization in the iommu_domain driver.

This design suggests several possible improvements in the IOMMU API that
would greatly help performance, particularly a way for the driver to map
and read the pfns lists instead of working with one driver call per page
to read, and one driver call per contiguous range to store.

Link: https://lore.kernel.org/r/9-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30 20:16:49 -04:00
..
accessibility
acpi ACPI and device properties fixes for 6.1-rc3 2022-10-28 16:48:29 -07:00
amba
android Scheduler changes for v6.1: 2022-10-10 09:10:28 -07:00
ata ata: ahci_qoriq: Fix compilation warning 2022-10-18 08:02:14 +09:00
atm
auxdisplay
base ACPI and device properties fixes for 6.1-rc3 2022-10-28 16:48:29 -07:00
bcma Interrupt subsystem updates: 2022-10-12 10:23:24 -07:00
block block-6.1-2022-10-28 2022-10-29 18:06:52 -07:00
bluetooth
bus Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
cdrom
char random: use arch_get_random*_early() in random_init() 2022-10-29 00:24:03 +02:00
clk This is the final part of the clk patches for this merge window. 2022-10-16 11:08:19 -07:00
clocksource A boring time, timekeeping, timers update: 2022-10-10 10:16:00 -07:00
comedi
connector
counter counter: 104-quad-8: Fix race getting function mode and direction 2022-10-23 20:39:26 -04:00
cpufreq cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores 2022-10-25 15:09:23 +02:00
cpuidle RISC-V Patches for the 6.1 Merge Window, Part 1 2022-10-09 13:24:01 -07:00
crypto This update includes the following changes: 2022-10-10 13:04:25 -07:00
cxl
dax libnvdimm for 6.1 2022-10-14 18:41:41 -07:00
dca
devfreq PM / devfreq: rockchip-dfi: Fix an error message 2022-09-26 03:59:43 +09:00
dio
dma iommu: Remove SVM_FLAG_SUPERVISOR_MODE support 2022-11-03 15:47:45 +01:00
dma-buf whack-a-mole: cropped up open-coded file_inode() uses... 2022-10-06 17:22:11 -07:00
edac Merge patch series "Use composable cache instead of L2 cache" 2022-10-13 11:07:13 -07:00
eisa
extcon Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
firewire
firmware efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0 2022-10-21 11:09:41 +02:00
fpga Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
fsi fsi: core: Check error number after calling ida_simple_get 2022-09-28 21:10:57 +09:30
gnss
gpio gpio: tegra: Convert to immutable irq chip 2022-10-20 13:47:54 +02:00
gpu drm-misc-fixes for v6.1-rc3: 2022-10-28 13:00:15 +10:00
greybus
hid for-linus-2022102101 2022-10-21 17:41:57 -07:00
hsi
hte
hv hyperv-next for 6.1 2022-10-10 13:59:01 -07:00
hwmon - Use the correct CPU capability clearing function on the error path in 2022-10-23 10:01:34 -07:00
hwspinlock
hwtracing coresight: cti: Fix hang in cti_disable_hw() 2022-10-25 19:08:07 +02:00
i2c i2c: mlxbf: depend on ACPI; clean away ifdeffage 2022-10-21 07:59:35 +02:00
i3c i3c: master: Remove the wrong place of reattach. 2022-10-12 23:45:29 +02:00
idle
iio iio: bmc150-accel-core: Fix unsafe buffer attributes 2022-10-17 08:51:26 +01:00
infiniband treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
input Input updates for 6.1 merge window: 2022-10-11 10:53:25 -07:00
interconnect
iommu iommufd: Algorithms for PFN storage 2022-11-30 20:16:49 -04:00
ipack Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
irqchip Interrupt subsystem updates: 2022-10-12 10:23:24 -07:00
isdn mISDN: hfcpci: Fix use-after-free bug in hfcpci_softirq 2022-10-09 19:11:54 +01:00
leds leds: simatic-ipc-leds-gpio: fix incorrect LED to GPIO mapping 2022-10-24 11:32:10 +02:00
macintosh powerpc updates for 6.1 2022-10-09 14:05:15 -07:00
mailbox mailbox: qcom-ipcc: flag IRQ NO_THREAD 2022-10-05 21:51:58 -05:00
mcb
md dm clone: Fix typo in block_device format specifier 2022-10-18 17:17:48 -04:00
media media: vivid: set num_in/outputs to 0 if not supported 2022-10-25 16:43:34 +01:00
memory
memstick
message
mfd Revert "mfd: syscon: Remove repetition of the regmap_get_val_endian()" 2022-10-23 12:04:56 -07:00
misc iommu: Remove SVM_FLAG_SUPERVISOR_MODE support 2022-11-03 15:47:45 +01:00
mmc mmc: sdhci_am654: 'select', not 'depends' REGMAP_MMIO 2022-10-26 11:48:03 +02:00
most
mtd mtd: parsers: bcm47xxpart: Fix halfblock reads 2022-10-18 11:20:12 +02:00
mux
net net: enetc: survive memory pressure without crashing 2022-10-27 11:32:25 -07:00
nfc nfc: virtual_ncidev: Fix memory leak in virtual_nci_send() 2022-10-20 21:13:04 -07:00
ntb
nubus
nvdimm libnvdimm for 6.1 2022-10-14 18:41:41 -07:00
nvme block-6.1-2022-10-28 2022-10-29 18:06:52 -07:00
nvmem nvmem: u-boot-env: fix crc32 casting type 2022-09-24 14:56:37 +02:00
of Devicetree updates for v6.1: 2022-10-10 13:13:51 -07:00
opp
parisc parisc architecture fixes and updates for kernel v6.1-rc1: 2022-10-14 12:10:01 -07:00
parport
pci PCI: Enable PASID only when ACS RR & UF enabled on upstream path 2022-11-03 15:47:47 +01:00
pcmcia pcmcia: remove AT91RM9200 Compact Flash driver 2022-09-27 08:12:16 +02:00
peci
perf arm64 fixes: 2022-10-14 12:38:03 -07:00
phy pci-v6.1-changes 2022-10-11 11:08:18 -07:00
pinctrl pinctrl: ocelot: Fix incorrect trigger of the interrupt. 2022-10-18 10:42:10 +02:00
platform LoongArch fixes for v6.1-rc3 2022-10-30 09:44:06 -07:00
pnp Merge branches 'acpi-apei', 'acpi-wakeup', 'acpi-reboot' and 'acpi-thermal' 2022-10-10 18:11:11 +02:00
power power supply and reset changes for the v6.1 series 2022-10-07 11:48:30 -07:00
powercap Scheduler changes for v6.1: 2022-10-10 09:10:28 -07:00
pps
ps3
ptp ] ptp: ocp: remove symlink for second GNSS 2022-10-10 08:37:24 +01:00
pwm pwm: Changes for v6.1-rc1 2022-10-07 11:32:10 -07:00
rapidio
ras
regulator - Core Frameworks 2022-10-07 11:24:20 -07:00
remoteproc remoteproc: virtio: Fix warning on bindings by removing the of_match_table 2022-10-05 09:20:44 -06:00
reset Here's the main clk pull request for this merge window. We have some 2022-10-08 10:06:48 -07:00
rpmsg
rtc rtc: cmos: fix build on non-ACPI platforms 2022-10-18 22:36:54 +02:00
s390 s390/vfio-ap: Fix memory allocation for mdev_types array 2022-10-26 14:47:31 +02:00
sbus
scsi SCSI fixes on 20221028 2022-10-29 18:12:45 -07:00
sh
siox
slimbus slimbus: qcom-ngd: Add error handling in of_qcom_slim_ngd_register 2022-09-24 14:53:06 +02:00
soc Merge patch series "Use composable cache instead of L2 cache" 2022-10-13 11:07:13 -07:00
soundwire soundwire updates for 6.1-rc1 2022-10-07 16:13:55 -07:00
spi spi: Fixes for v6.1 2022-10-26 17:38:46 -07:00
spmi spmi: pmic-arb: increase SPMI transaction timeout delay 2022-09-30 14:33:23 +02:00
ssb
staging media fixes for v6.1-rc2 2022-10-22 15:30:15 -07:00
target Merge branch '6.1/scsi-queue' into 6.1/scsi-fixes 2022-10-21 01:10:34 +00:00
tc
tee - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
thermal thermal: intel_powerclamp: Use first online CPU as control_cpu 2022-10-15 19:33:57 +02:00
thunderbolt treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
tty parisc architecture fixes and updates for kernel v6.1-rc1: 2022-10-14 12:10:01 -07:00
ufs scsi: ufs: core: Fix typo in comment 2022-10-22 03:29:32 +00:00
uio
usb fbdev fixes for kernel 6.1-rc3: 2022-10-30 11:31:14 -07:00
vdpa virtio: fixes, features 2022-10-10 14:02:53 -07:00
vfio VFIO updates for v6.1-rc1 2022-10-12 14:46:48 -07:00
vhost virtio: fixes, features 2022-10-10 14:02:53 -07:00
video fbdev fixes for kernel 6.1-rc3: 2022-10-30 11:31:14 -07:00
virt Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
virtio virtio_pci: use irq to detect interrupt support 2022-10-13 09:33:03 -04:00
vlynq
w1 Char/Misc and other driver changes for 6.1-rc1 2022-10-08 08:56:37 -07:00
watchdog linux-watchdog 6.1-rc2 tag 2022-10-21 12:25:39 -07:00
xen xen: branch for v6.1-rc2 2022-10-21 14:43:09 -07:00
zorro
Kconfig
Makefile