Commit Graph

297 Commits

Author SHA1 Message Date
Lu Baolu fba8ca3e6f iommu/vt-d: Fix WARN_ON in iommu probe path
[ Upstream commit 89436f4f54 ]

Commit 1a75cc710b ("iommu/vt-d: Use rbtree to track iommu probed
devices") adds all devices probed by the iommu driver in a rbtree
indexed by the source ID of each device. It assumes that each device
has a unique source ID. This assumption is incorrect and the VT-d
spec doesn't state this requirement either.

The reason for using a rbtree to track devices is to look up the device
with PCI bus and devfunc in the paths of handling ATS invalidation time
out error and the PRI I/O page faults. Both are PCI ATS feature related.

Only track the devices that have PCI ATS capabilities in the rbtree to
avoid unnecessary WARN_ON in the iommu probe path. Otherwise, on some
platforms below kernel splat will be displayed and the iommu probe results
in failure.

 WARNING: CPU: 3 PID: 166 at drivers/iommu/intel/iommu.c:158 intel_iommu_probe_device+0x319/0xd90
 Call Trace:
  <TASK>
  ? __warn+0x7e/0x180
  ? intel_iommu_probe_device+0x319/0xd90
  ? report_bug+0x1f8/0x200
  ? handle_bug+0x3c/0x70
  ? exc_invalid_op+0x18/0x70
  ? asm_exc_invalid_op+0x1a/0x20
  ? intel_iommu_probe_device+0x319/0xd90
  ? debug_mutex_init+0x37/0x50
  __iommu_probe_device+0xf2/0x4f0
  iommu_probe_device+0x22/0x70
  iommu_bus_notifier+0x1e/0x40
  notifier_call_chain+0x46/0x150
  blocking_notifier_call_chain+0x42/0x60
  bus_notify+0x2f/0x50
  device_add+0x5ed/0x7e0
  platform_device_add+0xf5/0x240
  mfd_add_devices+0x3f9/0x500
  ? preempt_count_add+0x4c/0xa0
  ? up_write+0xa2/0x1b0
  ? __debugfs_create_file+0xe3/0x150
  intel_lpss_probe+0x49f/0x5b0
  ? pci_conf1_write+0xa3/0xf0
  intel_lpss_pci_probe+0xcf/0x110 [intel_lpss_pci]
  pci_device_probe+0x95/0x120
  really_probe+0xd9/0x370
  ? __pfx___driver_attach+0x10/0x10
  __driver_probe_device+0x73/0x150
  driver_probe_device+0x19/0xa0
  __driver_attach+0xb6/0x180
  ? __pfx___driver_attach+0x10/0x10
  bus_for_each_dev+0x77/0xd0
  bus_add_driver+0x114/0x210
  driver_register+0x5b/0x110
  ? __pfx_intel_lpss_pci_driver_init+0x10/0x10 [intel_lpss_pci]
  do_one_initcall+0x57/0x2b0
  ? kmalloc_trace+0x21e/0x280
  ? do_init_module+0x1e/0x210
  do_init_module+0x5f/0x210
  load_module+0x1d37/0x1fc0
  ? init_module_from_file+0x86/0xd0
  init_module_from_file+0x86/0xd0
  idempotent_init_module+0x17c/0x230
  __x64_sys_finit_module+0x56/0xb0
  do_syscall_64+0x6e/0x140
  entry_SYSCALL_64_after_hwframe+0x71/0x79

Fixes: 1a75cc710b ("iommu/vt-d: Use rbtree to track iommu probed devices")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10689
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20240407011429.136282-1-baolu.lu@linux.intel.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-17 11:23:35 +02:00
Lu Baolu 333fe86968 iommu/vt-d: Fix NULL domain on device release
[ Upstream commit 81e921fd32 ]

In the kdump kernel, the IOMMU operates in deferred_attach mode. In this
mode, info->domain may not yet be assigned by the time the release_device
function is called. It leads to the following crash in the crash kernel:

    BUG: kernel NULL pointer dereference, address: 000000000000003c
    ...
    RIP: 0010:do_raw_spin_lock+0xa/0xa0
    ...
    _raw_spin_lock_irqsave+0x1b/0x30
    intel_iommu_release_device+0x96/0x170
    iommu_deinit_device+0x39/0xf0
    __iommu_group_remove_device+0xa0/0xd0
    iommu_bus_notifier+0x55/0xb0
    notifier_call_chain+0x5a/0xd0
    blocking_notifier_call_chain+0x41/0x60
    bus_notify+0x34/0x50
    device_del+0x269/0x3d0
    pci_remove_bus_device+0x77/0x100
    p2sb_bar+0xae/0x1d0
    ...
    i801_probe+0x423/0x740

Use the release_domain mechanism to fix it. The scalable mode context
entry which is not part of release domain should be cleared in
release_device().

Fixes: 586081d3f6 ("iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO")
Reported-by: Eric Badger <ebadger@purestorage.com>
Closes: https://lore.kernel.org/r/20240113181713.1817855-1-ebadger@purestorage.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240305013305.204605-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26 18:16:53 -04:00
Lu Baolu 3d39238991 iommu/vt-d: Use device rbtree in iopf reporting path
[ Upstream commit def054b01a ]

The existing I/O page fault handler currently locates the PCI device by
calling pci_get_domain_bus_and_slot(). This function searches the list
of all PCI devices until the desired device is found. To improve lookup
efficiency, replace it with device_rbtree_find() to search the device
within the probed device rbtree.

The I/O page fault is initiated by the device, which does not have any
synchronization mechanism with the software to ensure that the device
stays in the probed device tree. Theoretically, a device could be released
by the IOMMU subsystem after device_rbtree_find() and before
iopf_get_dev_fault_param(), which would cause a use-after-free problem.

Add a mutex to synchronize the I/O page fault reporting path and the IOMMU
release device path. This lock doesn't introduce any performance overhead,
as the conflict between I/O page fault reporting and device releasing is
very rare.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20240220065939.121116-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Stable-dep-of: 81e921fd32 ("iommu/vt-d: Fix NULL domain on device release")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26 18:16:52 -04:00
Lu Baolu c618d446f1 iommu/vt-d: Use rbtree to track iommu probed devices
[ Upstream commit 1a75cc710b ]

Use a red-black tree(rbtree) to track devices probed by the driver's
probe_device callback. These devices need to be looked up quickly by
a source ID when the hardware reports a fault, either recoverable or
unrecoverable.

Fault reporting paths are critical. Searching a list in this scenario
is inefficient, with an algorithm complexity of O(n). An rbtree is a
self-balancing binary search tree, offering an average search time
complexity of O(log(n)). This significant performance improvement
makes rbtrees a better choice.

Furthermore, rbtrees are implemented on a per-iommu basis, eliminating
the need for global searches and further enhancing efficiency in
critical fault paths. The rbtree is protected by a spin lock with
interrupts disabled to ensure thread-safe access even within interrupt
contexts.

Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20240220065939.121116-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Stable-dep-of: 80a9b50c0b ("iommu/vt-d: Improve ITE fault handling if target device isn't present")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26 18:16:52 -04:00
Yi Liu f1e1610950 iommu/vt-d: Add missing dirty tracking set for parent domain
Setting dirty tracking for a s2 domain requires to loop all the related
devices and set the dirty tracking enable bit in the PASID table entry.
This includes the devices that are attached to the nested domains of a
s2 domain if this s2 domain is used as parent. However, the existing dirty
tracking set only loops s2 domain's own devices. It will miss dirty page
logs in the parent domain.

Now, the parent domain tracks the nested domains, so it can loop the
nested domains and the devices attached to the nested domains to ensure
dirty tracking on the parent is set completely.

Fixes: b41e38e225 ("iommu/vt-d: Add nested domain allocation")
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240208082307.15759-9-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21 10:28:47 +01:00
Yi Liu 0c7f2497b3 iommu/vt-d: Wrap the dirty tracking loop to be a helper
Add device_set_dirty_tracking() to loop all the devices and set the dirty
tracking per the @enable parameter.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20240208082307.15759-8-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21 10:28:47 +01:00
Yi Liu 56ecaf6c58 iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking()
The only usage of input @domain is to get the domain id (DID) to flush
cache after setting dirty tracking. However, DID can be obtained from
the pasid entry. So no need to pass in domain. This can make this helper
cleaner when adding the missing dirty tracking for the parent domain,
which needs to use the DID of nested domain.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240208082307.15759-7-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21 10:28:46 +01:00
Yi Liu 5e54e861f1 iommu/vt-d: Add missing device iotlb flush for parent domain
ATS-capable devices cache the result of nested translation. This result
relies on the mappings in s2 domain (a.k.a. parent). When there are
modifications in the s2 domain, the related nested translation caches on
the device should be flushed. This includes the devices that are attached
to the s1 domain. However, the existing code ignores this fact to only
loops its own devices.

As there is no easy way to identify the exact set of nested translations
affected by the change of s2 domain. So, this just flushes the entire
device iotlb on the device.

As above, driver loops the s2 domain's s1_domains list and loops the
devices list of each s1_domain to flush the entire device iotlb on the
devices.

Fixes: b41e38e225 ("iommu/vt-d: Add nested domain allocation")
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240208082307.15759-6-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21 10:28:45 +01:00
Yi Liu 29e10487d6 iommu/vt-d: Update iotlb in nested domain attach
Should call domain_update_iotlb() to update the has_iotlb_device flag
of the domain after attaching device to nested domain. Without it, this
flag is not set properly and would result in missing device TLB flush.

Fixes: 9838f2bb6b ("iommu/vt-d: Set the nested domain to a device")
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240208082307.15759-5-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21 10:28:44 +01:00
Yi Liu 8219853011 iommu/vt-d: Add missing iotlb flush for parent domain
If a domain is used as the parent in nested translation its mappings might
be cached using DID of the nested domain. But the existing code ignores
this fact to only invalidate the iotlb entries tagged by the domain's own
DID.

Loop the s1_domains list, if any, to invalidate all iotlb entries related
to the target s2 address range. According to VT-d spec there is no need for
software to explicitly flush the affected s1 cache. It's implicitly done by
HW when s2 cache is invalidated.

Fixes: b41e38e225 ("iommu/vt-d: Add nested domain allocation")
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240208082307.15759-4-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21 10:28:44 +01:00
Yi Liu 0455d317f5 iommu/vt-d: Add __iommu_flush_iotlb_psi()
Add __iommu_flush_iotlb_psi() to do the psi iotlb flush with a DID input
rather than calculating it within the helper.

This is useful when flushing cache for parent domain which reuses DIDs of
its nested domains.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240208082307.15759-3-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21 10:28:43 +01:00
Yi Liu 85ce8e1d6d iommu/vt-d: Track nested domains in parent
Today the parent domain (s2_domain) is unaware of which DID's are
used by and which devices are attached to nested domains (s1_domain)
nested on it. This leads to a problem that some operations (flush
iotlb/devtlb and enable dirty tracking) on parent domain only apply to
DID's and devices directly tracked in the parent domain hence are
incomplete.

This tracks the nested domains in list in parent domain. With this,
operations on parent domain can loop the nested domains and refer to
the devices and iommu_array to ensure the operations on parent domain
take effect on all the affected devices and iommus.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240208082307.15759-2-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21 10:28:42 +01:00
Joerg Roedel 75f74f85a4 Merge branches 'apple/dart', 'arm/rockchip', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2024-01-03 09:59:32 +01:00
Lu Baolu 80b79e141d iommu/vt-d: Move inline helpers to header files
Move inline helpers to header files so that other files can use them
without duplicating the code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20231116015048.29675-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19 14:33:24 +01:00
Lu Baolu 47642bdd5a iommu/vt-d: Remove unused parameter of intel_pasid_setup_pass_through()
The domain parameter of this helper is unused and can be deleted to avoid
dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20231116015048.29675-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19 14:32:27 +01:00
Lu Baolu 1903ef8f0d iommu/vt-d: Refactor device_to_iommu() to retrieve iommu directly
The device_to_iommu() helper was originally designed to look up the DMAR
ACPI table to retrieve the iommu device and the request ID for a given
device. However, it was also being used in other places where there was
no need to lookup the ACPI table at all.

Retrieve the iommu device directly from the per-device iommu private data
in functions called after device is probed.

Rename the original device_to_iommu() function to a more meaningful name,
device_lookup_iommu(), to avoid mis-using it.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20231116015048.29675-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19 14:32:26 +01:00
Jason Gunthorpe eda1a94caf iommu: Mark dev_iommu_priv_set() with a lockdep
A perfect driver would only call dev_iommu_priv_set() from its probe
callback. We've made it functionally correct to call it from the of_xlate
by adding a lock around that call.

lockdep assert that iommu_probe_device_lock is held to discourage misuse.

Exclude PPC kernels with CONFIG_FSL_PAMU turned on because FSL_PAMU uses a
global static for its priv and abuses priv for its domain.

Remove the pointless stores of NULL, all these are on paths where the core
code will free dev->iommu after the op returns.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Tested-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v2-16e4def25ebb+820-iommu_fwspec_p1_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-12 10:18:49 +01:00
Kunwu Chan e378c7de74 iommu/vt-d: Set variable intel_dirty_ops to static
Fix the following warning:
drivers/iommu/intel/iommu.c:302:30: warning: symbol
 'intel_dirty_ops' was not declared. Should it be static?

This variable is only used in its defining file, so it should be static.

Fixes: f35f22cc76 ("iommu/vt-d: Access/Dirty bit support for SS domains")
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20231120101025.1103404-1-chentao@kylinos.cn
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27 11:07:54 +01:00
Abdul Halim, Mohd Syazwan 85b80fdffa iommu/vt-d: Add MTL to quirk list to skip TE disabling
The VT-d spec requires (10.4.4 Global Command Register, TE field) that:

Hardware implementations supporting DMA draining must drain any in-flight
DMA read/write requests queued within the Root-Complex before switching
address translation on or off and reflecting the status of the command
through the TES field in the Global Status register.

Unfortunately, some integrated graphic devices fail to do so after some
kind of power state transition. As the result, the system might stuck in
iommu_disable_translation(), waiting for the completion of TE transition.

Add MTL to the quirk list for those devices and skips TE disabling if the
qurik hits.

Fixes: b1012ca8dc ("iommu/vt-d: Skip TE disabling on quirky gfx dedicated iommu")
Cc: stable@vger.kernel.org
Signed-off-by: Abdul Halim, Mohd Syazwan <mohd.syazwan.abdul.halim@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20231116022324.30120-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27 11:07:53 +01:00
Lu Baolu 9a16ab9d64 iommu/vt-d: Make context clearing consistent with context mapping
In the iommu probe_device path, domain_context_mapping() allows setting
up the context entry for a non-PCI device. However, in the iommu
release_device path, domain_context_clear() only clears context entries
for PCI devices.

Make domain_context_clear() behave consistently with
domain_context_mapping() by clearing context entries for both PCI and
non-PCI devices.

Fixes: 579305f75d ("iommu/vt-d: Update to use PCI DMA aliases")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20231114011036.70142-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27 11:07:52 +01:00
Lu Baolu da37dddcf4 iommu/vt-d: Disable PCI ATS in legacy passthrough mode
When IOMMU hardware operates in legacy mode, the TT field of the context
entry determines the translation type, with three supported types (Section
9.3 Context Entry):

- DMA translation without device TLB support
- DMA translation with device TLB support
- Passthrough mode with translated and translation requests blocked

Device TLB support is absent when hardware is configured in passthrough
mode.

Disable the PCI ATS feature when IOMMU is configured for passthrough
translation type in legacy (non-scalable) mode.

Fixes: 0faa19a151 ("iommu/vt-d: Decouple PASID & PRI enabling from SVA")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20231114011036.70142-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27 11:07:52 +01:00
Lu Baolu e645c20e8e iommu/vt-d: Support enforce_cache_coherency only for empty domains
The enforce_cache_coherency callback ensures DMA cache coherency for
devices attached to the domain.

Intel IOMMU supports enforced DMA cache coherency when the Snoop
Control bit in the IOMMU's extended capability register is set.
Supporting it differs between legacy and scalable modes.

In legacy mode, it's supported page-level by setting the SNP field
in second-stage page-table entries. In scalable mode, it's supported
in PASID-table granularity by setting the PGSNP field in PASID-table
entries.

In legacy mode, mappings before attaching to a device have SNP
fields cleared, while mappings after the callback have them set.
This means partial DMAs are cache coherent while others are not.

One possible fix is replaying mappings and flipping SNP bits when
attaching a domain to a device. But this seems to be over-engineered,
given that all real use cases just attach an empty domain to a device.

To meet practical needs while reducing mode differences, only support
enforce_cache_coherency on a domain without mappings if SNP field is
used.

Fixes: fc0051cb95 ("iommu/vt-d: Check domain force_snooping against attached devices")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20231114011036.70142-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27 11:07:51 +01:00
Linus Torvalds 4bbdb725a3 IOMMU Updates for Linux v6.7
Including:
 
 	- Core changes:
 	  - Make default-domains mandatory for all IOMMU drivers
 	  - Remove group refcounting
 	  - Add generic_single_device_group() helper and consolidate
 	    drivers
 	  - Cleanup map/unmap ops
 	  - Scaling improvements for the IOVA rcache depot
 	  - Convert dart & iommufd to the new domain_alloc_paging()
 
 	- ARM-SMMU:
 	  - Device-tree binding update:
 	    - Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC
 	  - SMMUv2:
 	    - Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs
 	  - SMMUv3:
 	    - Large refactoring of the context descriptor code to
 	      move the CD table into the master, paving the way
 	      for '->set_dev_pasid()' support on non-SVA domains
 	  - Minor cleanups to the SVA code
 
 	- Intel VT-d:
 	  - Enable debugfs to dump domain attached to a pasid
 	  - Remove an unnecessary inline function.
 
 	- AMD IOMMU:
 	  - Initial patches for SVA support (not complete yet)
 
 	- S390 IOMMU:
 	  - DMA-API conversion and optimized IOTLB flushing
 
 	- Some smaller fixes and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmVJFcEACgkQK/BELZcB
 GuMgDxAAsnYVQjQ7wRkwR0rHARuEaJ+Lz2vkLNH+uYXjBzhFe2bT+ykMcZysAkdK
 A5PMLOFT5Etf+PAqOM0CoIGQFOefAId6uGl7S61Fp9ZWDKhMrOBFWhxGOaufA1Du
 tNvt3i66hwPSDZa82kY3wRCluYtj0aBBzmM6ZTwBwFZdQ7LABMtE8OxisqncVvq0
 H6vhV213fqvhCFSQJ6PnTAEiv70WvWBWygA+Z/gwYf9hypZQae91PNXdK9313a9z
 OvCzGBkL/R5/3KkJd88UhFwyYzyNGxq/DmH1etawYR5gYZ8UT/Z/sYpcx9hlO7qr
 eENPqeQc+YHZXpKqkaq66HBA1FSnXUqRZLl4cVaZahRRMe/yArsBM6R0W1AfkMAR
 rZxwHKoHUWeuHQLMVvmSDNL57h/GJJpTXjRc8HMxLZkVp+ScvnT5XCYHWWzRdCdx
 TcC/pJ1tet0FQ8rw09ovlwpGVA6eojWvcpVbLVLfGN8ZWViSVfvNFoPNb7HsGK6M
 iRi+L41Y7s63cyogC/Gsae2RAvYv29ZpvE91lmon2u+VBlTpMdOFX9EhWS6RqOBF
 cV30bhsw0dyCB7v5jDPtABYEOaR6l1mPLhn1gX3u0Ue/tmPhLX69k4bVWBY6wP3p
 gmmJD9ub8FuPQtFCGPE7/8ZINjGGrfiKO24DNI2Ty3XEeq21hU4=
 =UyWC
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "Core changes:
   - Make default-domains mandatory for all IOMMU drivers
   - Remove group refcounting
   - Add generic_single_device_group() helper and consolidate drivers
   - Cleanup map/unmap ops
   - Scaling improvements for the IOVA rcache depot
   - Convert dart & iommufd to the new domain_alloc_paging()

  ARM-SMMU:
   - Device-tree binding update:
       - Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC
   - SMMUv2:
       - Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs
   - SMMUv3:
       - Large refactoring of the context descriptor code to move the CD
         table into the master, paving the way for '->set_dev_pasid()'
         support on non-SVA domains
   - Minor cleanups to the SVA code

  Intel VT-d:
   - Enable debugfs to dump domain attached to a pasid
   - Remove an unnecessary inline function

  AMD IOMMU:
   - Initial patches for SVA support (not complete yet)

  S390 IOMMU:
   - DMA-API conversion and optimized IOTLB flushing

  And some smaller fixes and improvements"

* tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (102 commits)
  iommu/dart: Remove the force_bypass variable
  iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging()
  iommu/dart: Convert to domain_alloc_paging()
  iommu/dart: Move the blocked domain support to a global static
  iommu/dart: Use static global identity domains
  iommufd: Convert to alloc_domain_paging()
  iommu/vt-d: Use ops->blocked_domain
  iommu/vt-d: Update the definition of the blocking domain
  iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domain
  Revert "iommu/vt-d: Remove unused function"
  iommu/amd: Remove DMA_FQ type from domain allocation path
  iommu: change iommu_map_sgtable to return signed values
  iommu/virtio: Add __counted_by for struct viommu_request and use struct_size()
  iommu/vt-d: debugfs: Support dumping a specified page table
  iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid}
  iommu/vt-d: debugfs: Dump entry pointing to huge page
  iommu/vt-d: Remove unused function
  iommu/arm-smmu-v3-sva: Remove bond refcount
  iommu/arm-smmu-v3-sva: Remove unused iommu_sva handle
  iommu/arm-smmu-v3: Rename cdcfg to cd_table
  ...
2023-11-09 13:37:28 -08:00
Linus Torvalds 463f46e114 iommufd for 6.7
This branch has three new iommufd capabilities:
 
  - Dirty tracking for DMA. AMD/ARM/Intel CPUs can now record if a DMA
    writes to a page in the IOPTEs within the IO page table. This can be used
    to generate a record of what memory is being dirtied by DMA activities
    during a VM migration process. A VMM like qemu will combine the IOMMU
    dirty bits with the CPU's dirty log to determine what memory to
    transfer.
 
    VFIO already has a DMA dirty tracking framework that requires PCI
    devices to implement tracking HW internally. The iommufd version
    provides an alternative that the VMM can select, if available. The two
    are designed to have very similar APIs.
 
  - Userspace controlled attributes for hardware page
    tables (HWPT/iommu_domain). There are currently a few generic attributes
    for HWPTs (support dirty tracking, and parent of a nest). This is an
    entry point for the userspace iommu driver to control the HW in detail.
 
  - Nested translation support for HWPTs. This is a 2D translation scheme
    similar to the CPU where a DMA goes through a first stage to determine
    an intermediate address which is then translated trough a second stage
    to a physical address.
 
    Like for CPU translation the first stage table would exist in VM
    controlled memory and the second stage is in the kernel and matches the
    VM's guest to physical map.
 
    As every IOMMU has a unique set of parameter to describe the S1 IO page
    table and its associated parameters the userspace IOMMU driver has to
    marshal the information into the correct format.
 
    This is 1/3 of the feature, it allows creating the nested translation
    and binding it to VFIO devices, however the API to support IOTLB and
    ATC invalidation of the stage 1 io page table, and forwarding of IO
    faults are still in progress.
 
 The series includes AMD and Intel support for dirty tracking. Intel
 support for nested translation.
 
 Along the way are a number of internal items:
 
  - New iommu core items: ops->domain_alloc_user(), ops->set_dirty_tracking,
    ops->read_and_clear_dirty(), IOMMU_DOMAIN_NESTED, and iommu_copy_struct_from_user
 
  - UAF fix in iopt_area_split()
 
  - Spelling fixes and some test suite improvement
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZUDu2wAKCRCFwuHvBreF
 YcdeAQDaBmjyGLrRIlzPyohF6FrombyWo2512n51Hs8IHR4IvQEA3oRNgQ2tsJRr
 1UPuOqnOD5T/oVX6AkUPRBwanCUQwwM=
 =nyJ3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "This brings three new iommufd capabilities:

   - Dirty tracking for DMA.

     AMD/ARM/Intel CPUs can now record if a DMA writes to a page in the
     IOPTEs within the IO page table. This can be used to generate a
     record of what memory is being dirtied by DMA activities during a
     VM migration process. A VMM like qemu will combine the IOMMU dirty
     bits with the CPU's dirty log to determine what memory to transfer.

     VFIO already has a DMA dirty tracking framework that requires PCI
     devices to implement tracking HW internally. The iommufd version
     provides an alternative that the VMM can select, if available. The
     two are designed to have very similar APIs.

   - Userspace controlled attributes for hardware page tables
     (HWPT/iommu_domain). There are currently a few generic attributes
     for HWPTs (support dirty tracking, and parent of a nest). This is
     an entry point for the userspace iommu driver to control the HW in
     detail.

   - Nested translation support for HWPTs. This is a 2D translation
     scheme similar to the CPU where a DMA goes through a first stage to
     determine an intermediate address which is then translated trough a
     second stage to a physical address.

     Like for CPU translation the first stage table would exist in VM
     controlled memory and the second stage is in the kernel and matches
     the VM's guest to physical map.

     As every IOMMU has a unique set of parameter to describe the S1 IO
     page table and its associated parameters the userspace IOMMU driver
     has to marshal the information into the correct format.

     This is 1/3 of the feature, it allows creating the nested
     translation and binding it to VFIO devices, however the API to
     support IOTLB and ATC invalidation of the stage 1 io page table,
     and forwarding of IO faults are still in progress.

  The series includes AMD and Intel support for dirty tracking. Intel
  support for nested translation.

  Along the way are a number of internal items:

   - New iommu core items: ops->domain_alloc_user(),
     ops->set_dirty_tracking, ops->read_and_clear_dirty(),
     IOMMU_DOMAIN_NESTED, and iommu_copy_struct_from_user

   - UAF fix in iopt_area_split()

   - Spelling fixes and some test suite improvement"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (52 commits)
  iommufd: Organize the mock domain alloc functions closer to Joerg's tree
  iommufd/selftest: Fix page-size check in iommufd_test_dirty()
  iommufd: Add iopt_area_alloc()
  iommufd: Fix missing update of domains_itree after splitting iopt_area
  iommu/vt-d: Disallow read-only mappings to nest parent domain
  iommu/vt-d: Add nested domain allocation
  iommu/vt-d: Set the nested domain to a device
  iommu/vt-d: Make domain attach helpers to be extern
  iommu/vt-d: Add helper to setup pasid nested translation
  iommu/vt-d: Add helper for nested domain allocation
  iommu/vt-d: Extend dmar_domain to support nested domain
  iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  iommu/vt-d: Enhance capability check for nested parent domain allocation
  iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs
  iommufd/selftest: Add nested domain allocation for mock domain
  iommu: Add iommu_copy_struct_from_user helper
  iommufd: Add a nested HW pagetable object
  iommu: Pass in parent domain with user_data to domain_alloc_user op
  iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
  iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
  ...
2023-11-01 16:44:56 -10:00
Joerg Roedel e8cca466a8 Merge branches 'iommu/fixes', 'arm/tegra', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd', 'core' and 's390' into next 2023-10-27 09:13:40 +02:00
Joerg Roedel 3613047280 Linux 6.6-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmU1ngkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGrsIH/0k/+gdBBYFFdEym
 foRhKir9WV3ZX4oIozJjA1f7T+qVYclKs6kaYm3gNepRBb6AoG8pdgv4MMAqhYsf
 QMe2XHi0MrO/qKBgfNfivxEa9jq+0QK5uvTbqCRqCAB8LfwVyDqapCmg3EuiZcPW
 UbMITmnwLIfXgPxvp9rabmCsTqO6FLbf0GDOVIkNSAIDBXMpcO1iffjrWUbhRa7n
 oIoiJmWJLcXLxPWDsRKbpJwzw2cIG08YhfQYAiQnC3YaeRm1FKLDIICRBsmfYzja
 rWv9r4dn4TDfV4/AnjggQnsZvz2yPCxNaFSQIT88nIeiLvyuUTJ9j8aidsSfMZQf
 xZAbzbA=
 =NoQv
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-rc7' into core

Linux 6.6-rc7
2023-10-26 17:05:58 +02:00
Jason Gunthorpe 7d12eb2d2f iommu/vt-d: Use ops->blocked_domain
Trivially migrate to the ops->blocked_domain for the existing global
static.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/3-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26 16:53:50 +02:00
Jason Gunthorpe 7b6dd84e70 iommu/vt-d: Update the definition of the blocking domain
The global static should pre-define the type and the NOP free function can
be now left as NULL.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/2-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26 16:53:50 +02:00
Lu Baolu 03476e687e iommu/vt-d: Disallow read-only mappings to nest parent domain
When remapping hardware is configured by system software in scalable mode
as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry,
it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled)
in first-stage page-table entries even when second-stage mappings indicate
that corresponding first-stage page-table is Read-Only.

As the result, contents of pages designated by VMM as Read-Only can be
modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of
address translation process due to DMAs issued by Guest.

This disallows read-only mappings in the domain that is supposed to be used
as nested parent. Reference from Sapphire Rapids Specification Update [1],
errata details, SPR17. Userspace should know this limitation by checking
the IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 flag reported in the IOMMU_GET_HW_INFO
ioctl.

[1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html

Link: https://lore.kernel.org/r/20231026044216.64964-9-yi.l.liu@intel.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:16:34 -03:00
Lu Baolu b41e38e225 iommu/vt-d: Add nested domain allocation
This adds the support for IOMMU_HWPT_DATA_VTD_S1 type. And 'nested_parent'
is added to mark the nested parent domain to sanitize the input parent domain.

Link: https://lore.kernel.org/r/20231026044216.64964-8-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:16:34 -03:00
Yi Liu d86724d4dc iommu/vt-d: Make domain attach helpers to be extern
This makes the helpers visible to nested.c.

Link: https://lore.kernel.org/r/20231026044216.64964-6-yi.l.liu@intel.com
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:16:34 -03:00
Yi Liu a2cdecdf9d iommu/vt-d: Enhance capability check for nested parent domain allocation
This adds the scalable mode check before allocating the nested parent domain
as checking nested capability is not enough. User may turn off scalable mode
which also means no nested support even if the hardware supports it.

Fixes: c97d1b20d3 ("iommu/vt-d: Add domain_alloc_user op")
Link: https://lore.kernel.org/r/20231024150011.44642-1-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:16:11 -03:00
Yi Liu 2bdabb8e82 iommu: Pass in parent domain with user_data to domain_alloc_user op
domain_alloc_user op already accepts user flags for domain allocation, add
a parent domain pointer and a driver specific user data support as well.
The user data would be tagged with a type for iommu drivers to add their
own driver specific user data per hw_pagetable.

Add a struct iommu_user_data as a bundle of data_ptr/data_len/type from an
iommufd core uAPI structure. Make the user data opaque to the core, since
a userspace driver must match the kernel driver. In the future, if drivers
share some common parameter, there would be a generic parameter as well.

Link: https://lore.kernel.org/r/20231026043938.63898-7-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:15:57 -03:00
Joao Martins f35f22cc76 iommu/vt-d: Access/Dirty bit support for SS domains
IOMMU advertises Access/Dirty bits for second-stage page table if the
extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS).
The first stage table is compatible with CPU page table thus A/D bits are
implicitly supported. Relevant Intel IOMMU SDM ref for first stage table
"3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table
"3.7.2 Accessed and Dirty Flags".

First stage page table is enabled by default so it's allowed to set dirty
tracking and no control bits needed, it just returns 0. To use SSADS, set
bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB
via pasid_flush_caches() following the manual. Relevant SDM refs:

"3.7.2 Accessed and Dirty Flags"
"6.5.3.3 Guidance to Software for Invalidations,
 Table 23. Guidance to Software for Invalidations"

PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush
IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that
iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus the
caller of the iommu op will flush the IOTLB. Relevant manuals over the
hardware translation is chapter 6 with some special mention to:

"6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations"
"6.2.4 IOTLB"

Select IOMMUFD_DRIVER only if IOMMUFD is enabled, given that IOMMU dirty
tracking requires IOMMUFD.

Link: https://lore.kernel.org/r/20231024135109.73787-13-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:43 -03:00
Jingqi Liu d87731f609 iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid}
Add a debugfs directory per pair of {device, pasid} if the mappings of
its page table are created and destroyed by the iommu_map/unmap()
interfaces. i.e. /sys/kernel/debug/iommu/intel/<device source id>/<pasid>.
Create a debugfs file in the directory for users to dump the page
table corresponding to {device, pasid}. e.g.
/sys/kernel/debug/iommu/intel/0000:00:02.0/1/domain_translation_struct.
For the default domain without pasid, it creates a debugfs file in the
debugfs device directory for users to dump its page table. e.g.
/sys/kernel/debug/iommu/intel/0000:00:02.0/domain_translation_struct.

When setting a domain to a PASID of device, create a debugfs file in
the pasid debugfs directory for users to dump the page table of the
specified pasid. Remove the debugfs device directory of the device
when releasing a device. e.g.
/sys/kernel/debug/iommu/intel/0000:00:01.0

Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com>
Link: https://lore.kernel.org/r/20231013135811.73953-3-Jingqi.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-16 09:34:51 +02:00
Yi Liu c97d1b20d3 iommu/vt-d: Add domain_alloc_user op
Add the domain_alloc_user() op implementation. It supports allocating
domains to be used as parent under nested translation.

Unlike other drivers VT-D uses only a single page table format so it only
needs to check if the HW can support nesting.

Link: https://lore.kernel.org/r/20230928071528.26258-7-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-10 13:31:24 -03:00
Niklas Schnelle fa4c450709 iommu: Allow .iotlb_sync_map to fail and handle s390's -ENOMEM return
On s390 when using a paging hypervisor, .iotlb_sync_map is used to sync
mappings by letting the hypervisor inspect the synced IOVA range and
updating a shadow table. This however means that .iotlb_sync_map can
fail as the hypervisor may run out of resources while doing the sync.
This can be due to the hypervisor being unable to pin guest pages, due
to a limit on mapped addresses such as vfio_iommu_type1.dma_entry_limit
or lack of other resources. Either way such a failure to sync a mapping
should result in a DMA_MAPPING_ERROR.

Now especially when running with batched IOTLB flushes for unmap it may
be that some IOVAs have already been invalidated but not yet synced via
.iotlb_sync_map. Thus if the hypervisor indicates running out of
resources, first do a global flush allowing the hypervisor to free
resources associated with these mappings as well a retry creating the
new mappings and only if that also fails report this error to callers.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> # sun50i
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-1-9e5fc4dacc36@linux.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-02 08:42:57 +02:00
Zhang Rui 59df44bfb0 iommu/vt-d: Avoid memory allocation in iommu_suspend()
The iommu_suspend() syscore suspend callback is invoked with IRQ disabled.
Allocating memory with the GFP_KERNEL flag may re-enable IRQs during
the suspend callback, which can cause intermittent suspend/hibernation
problems with the following kernel traces:

Calling iommu_suspend+0x0/0x1d0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 15 at kernel/time/timekeeping.c:868 ktime_get+0x9b/0xb0
...
CPU: 0 PID: 15 Comm: rcu_preempt Tainted: G     U      E      6.3-intel #r1
RIP: 0010:ktime_get+0x9b/0xb0
...
Call Trace:
 <IRQ>
 tick_sched_timer+0x22/0x90
 ? __pfx_tick_sched_timer+0x10/0x10
 __hrtimer_run_queues+0x111/0x2b0
 hrtimer_interrupt+0xfa/0x230
 __sysvec_apic_timer_interrupt+0x63/0x140
 sysvec_apic_timer_interrupt+0x7b/0xa0
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1f/0x30
...
------------[ cut here ]------------
Interrupts enabled after iommu_suspend+0x0/0x1d0
WARNING: CPU: 0 PID: 27420 at drivers/base/syscore.c:68 syscore_suspend+0x147/0x270
CPU: 0 PID: 27420 Comm: rtcwake Tainted: G     U  W   E      6.3-intel #r1
RIP: 0010:syscore_suspend+0x147/0x270
...
Call Trace:
 <TASK>
 hibernation_snapshot+0x25b/0x670
 hibernate+0xcd/0x390
 state_store+0xcf/0xe0
 kobj_attr_store+0x13/0x30
 sysfs_kf_write+0x3f/0x50
 kernfs_fop_write_iter+0x128/0x200
 vfs_write+0x1fd/0x3c0
 ksys_write+0x6f/0xf0
 __x64_sys_write+0x1d/0x30
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

Given that only 4 words memory is needed, avoid the memory allocation in
iommu_suspend().

CC: stable@kernel.org
Fixes: 33e0715710 ("iommu/vt-d: Avoid GFP_ATOMIC where it is not needed")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Ooi, Chin Hao <chin.hao.ooi@intel.com>
Link: https://lore.kernel.org/r/20230921093956.234692-1-rui.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230925120417.55977-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25 16:10:36 +02:00
Linus Torvalds 0468be89b3 IOMMU Updates for Linux v6.6
Including:
 
 	- Core changes:
 	  - Consolidate probe_device path
 	  - Make the PCI-SAC IOVA allocation trick PCI-only
 
 	- AMD IOMMU:
 	  - Consolidate PPR log handling
 	  - Interrupt handling improvements
 	  - Refcount fixes for amd_iommu_v2 driver
 
 	- Intel VT-d driver:
 	  - Enable idxd device DMA with pasid through iommu dma ops.
 	  - Lift RESV_DIRECT check from VT-d driver to core.
 	  - Miscellaneous cleanups and fixes.
 
 	- ARM-SMMU drivers:
 	  - Device-tree binding updates:
 	    - Add additional compatible strings for Qualcomm SoCs
 	    - Allow ASIDs to be configured in the DT to work around Qualcomm's
 	      broken hypervisor
 	    - Fix clocks for Qualcomm's MSM8998 SoC
 	  - SMMUv2:
 	    - Support for Qualcomm's legacy firmware implementation featured on
 	      at least MSM8956 and MSM8976.
 	    - Match compatible strings for Qualcomm SM6350 and SM6375 SoC variants
 	  - SMMUv3:
 	    - Use 'ida' instead of a bitmap for VMID allocation
 
 	  - Rockchip IOMMU:
 	    - Lift page-table allocation restrictions on newer hardware
 
 	  - Mediatek IOMMU:
 	    - Add MT8188 IOMMU Support
 
 	  - Renesas IOMMU:
 	    - Allow PCIe devices
 
 	- Usual set of cleanups an smaller fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmTx7IMACgkQK/BELZcB
 GuMxUA/+P/wYvAKCbDpXyszIpyCTx37BkeRTBaVqG0vEKLG6439i+PIm3oudQK+6
 0y+1clJi0Ddu0uv1ck90cIEP1YDuKaKdrOVeE7TtlK+6LKYxTyeN+mz4csMIbahI
 6JMrWzrIEPIyMBHzAepQiGDCsmDkrCngPj0WmA7+EQZSSHVYp+TLe6OLzNs74vDF
 zCITkYNq6aKyg/dNJpMRy6VOHvw9PUiwRvm7ko7WONP4VCtpW4g3Jpkerf19zoV2
 s0nwZuGn3o7F0aFOpRJPPKQNfQnNjOjHdxjcsGBafD9qqAk4TLvnZH24njKtPidJ
 P8CiAu//HxhDyUPTgTIrDroVOGVG7s85XO+WesjPkEI3vnNjXy+qEIinQBJ3oIaI
 ppDLSnArEhfSRgt6dXvPCJ/g4+WGS9jNV85GCa7XBtal2Msu8G89NKC97mpmjCkb
 lnGmCF9t7Tkt/fLWxw4GADBN3m2tOib1GQMvPYAF2WM3jH5aRq2UliIRuCHZkzwv
 EF3SiFQQqab6oogU9tF/A1QLUKQ8QfYOdabqL9z2COgF5tS00VC6b/6VTNkKeBHe
 qIiOpI7IWo76tFJule5gRaUth9nVkjpEo6kL9I6rEldOlFJrX6uaHTta6/isY3gx
 vkN98V/OThRUbDwMD122YVKNNjZE2MNsTeptXqB3jHvl3UWiLsQ=
 =RV+G
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "Core changes:

   - Consolidate probe_device path

   - Make the PCI-SAC IOVA allocation trick PCI-only

  AMD IOMMU:

   - Consolidate PPR log handling

   - Interrupt handling improvements

   - Refcount fixes for amd_iommu_v2 driver

  Intel VT-d driver:

   - Enable idxd device DMA with pasid through iommu dma ops

   - Lift RESV_DIRECT check from VT-d driver to core

   - Miscellaneous cleanups and fixes

  ARM-SMMU drivers:

   - Device-tree binding updates:
      - Add additional compatible strings for Qualcomm SoCs
      - Allow ASIDs to be configured in the DT to work around Qualcomm's
        broken hypervisor
      - Fix clocks for Qualcomm's MSM8998 SoC

   - SMMUv2:
      - Support for Qualcomm's legacy firmware implementation featured
        on at least MSM8956 and MSM8976
      - Match compatible strings for Qualcomm SM6350 and SM6375 SoC
        variants

   - SMMUv3:
      - Use 'ida' instead of a bitmap for VMID allocation

   - Rockchip IOMMU:
      - Lift page-table allocation restrictions on newer hardware

   - Mediatek IOMMU:
      - Add MT8188 IOMMU Support

   - Renesas IOMMU:
      - Allow PCIe devices

  .. and the usual set of cleanups an smaller fixes"

* tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (64 commits)
  iommu: Explicitly include correct DT includes
  iommu/amd: Remove unused declarations
  iommu/arm-smmu-qcom: Add SM6375 SMMUv2
  iommu/arm-smmu-qcom: Add SM6350 DPU compatible
  iommu/arm-smmu-qcom: Add SM6375 DPU compatible
  iommu/arm-smmu-qcom: Sort the compatible list alphabetically
  dt-bindings: arm-smmu: Fix MSM8998 clocks description
  iommu/vt-d: Remove unused extern declaration dmar_parse_dev_scope()
  iommu/vt-d: Fix to convert mm pfn to dma pfn
  iommu/vt-d: Fix to flush cache of PASID directory table
  iommu/vt-d: Remove rmrr check in domain attaching device path
  iommu: Prevent RESV_DIRECT devices from blocking domains
  dmaengine/idxd: Re-enable kernel workqueue under DMA API
  iommu/vt-d: Add set_dev_pasid callback for dma domain
  iommu/vt-d: Prepare for set_dev_pasid callback
  iommu/vt-d: Make prq draining code generic
  iommu/vt-d: Remove pasid_mutex
  iommu/vt-d: Add domain_flush_pasid_iotlb()
  iommu: Move global PASID allocation from SVA to core
  iommu: Generalize PASID 0 for normal DMA w/o PASID
  ...
2023-09-01 16:54:25 -07:00
Joerg Roedel d8fe59f110 Merge branches 'apple/dart', 'arm/mediatek', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next 2023-08-21 14:18:43 +02:00
Yi Liu 55243393b0 iommu/vt-d: Implement hw_info for iommu capability query
Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.

Link: https://lore.kernel.org/r/20230818101033.4100-6-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-08-18 12:52:15 -03:00
Yanfei Xu fb5f50a43d iommu/vt-d: Fix to convert mm pfn to dma pfn
For the case that VT-d page is smaller than mm page, converting dma pfn
should be handled in two cases which are for start pfn and for end pfn.
Currently the calculation of end dma pfn is incorrect and the result is
less than real page frame number which is causing the mapping of iova
always misses some page frames.

Rename the mm_to_dma_pfn() to mm_to_dma_pfn_start() and add a new helper
for converting end dma pfn named mm_to_dma_pfn_end().

Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Link: https://lore.kernel.org/r/20230625082046.979742-1-yanfei.xu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:46:19 +02:00
Lu Baolu d3aedf94f4 iommu/vt-d: Remove rmrr check in domain attaching device path
The core code now prevents devices with RMRR regions from being assigned
to user space. There is no need to check for this condition in individual
drivers. Remove it to avoid duplicate code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230724060352.113458-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:46:18 +02:00
Lu Baolu 7d0c9da6c1 iommu/vt-d: Add set_dev_pasid callback for dma domain
This allows the upper layers to set a domain to a PASID of a device
if the PASID feature is supported by the IOMMU hardware. The typical
use cases are, for example, kernel DMA with PASID and hardware
assisted mediated device drivers.

The attaching device and pasid information is tracked in a per-domain
list and is used for IOTLB and devTLB invalidation.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-8-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:39 +02:00
Lu Baolu 37f900e718 iommu/vt-d: Prepare for set_dev_pasid callback
The domain_flush_pasid_iotlb() helper function is used to flush the IOTLB
entries for a given PASID. Previously, this function assumed that
RID2PASID was only used for the first-level DMA translation. However, with
the introduction of the set_dev_pasid callback, this assumption is no
longer valid.

Add a check before using the RID2PASID for PASID invalidation. This check
ensures that the domain has been attached to a physical device before
using RID2PASID.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-7-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:38 +02:00
Lu Baolu 154786235d iommu/vt-d: Make prq draining code generic
Currently draining page requests and responses for a pasid is part of SVA
implementation. This is because the driver only supports attaching an SVA
domain to a device pasid. As we are about to support attaching other types
of domains to a device pasid, the prq draining code becomes generic.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-6-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:37 +02:00
Lu Baolu ac1a3483fe iommu/vt-d: Add domain_flush_pasid_iotlb()
The VT-d spec requires to use PASID-based-IOTLB invalidation descriptor
to invalidate IOTLB and the paging-structure caches for a first-stage
page table. Add a generic helper to do this.

RID2PASID is used if the domain has been attached to a physical device,
otherwise real PASIDs that the domain has been attached to will be used.
The 'real' PASID attachment is handled in the subsequent change.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-4-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:37 +02:00
Jacob Pan 4298780126 iommu: Generalize PASID 0 for normal DMA w/o PASID
PCIe Process address space ID (PASID) is used to tag DMA traffic, it
provides finer grained isolation than requester ID (RID).

For each device/RID, 0 is a special PASID for the normal DMA (no
PASID). This is universal across all architectures that supports PASID,
therefore warranted to be reserved globally and declared in the common
header. Consequently, we can avoid the conflict between different PASID
use cases in the generic code. e.g. SVA and DMA API with PASIDs.

This paved away for device drivers to choose global PASID policy while
continue doing normal DMA.

Noting that VT-d could support none-zero RID/NO_PASID, but currently not
used.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:34 +02:00
Jason Gunthorpe 6eb4da8cf5 iommu: Have __iommu_probe_device() check for already probed devices
This is a step toward making __iommu_probe_device() self contained.

It should, under proper locking, check if the device is already associated
with an iommu driver and resolve parallel probes. All but one of the
callers open code this test using two different means, but they all
rely on dev->iommu_group.

Currently the bus_iommu_probe()/probe_iommu_group() and
probe_acpi_namespace_devices() rejects already probed devices with an
unlocked read of dev->iommu_group. The OF and ACPI "replay" functions use
device_iommu_mapped() which is the same read without the pointless
refcount.

Move this test into __iommu_probe_device() and put it under the
iommu_probe_device_lock. The store to dev->iommu_group is in
iommu_group_add_device() which is also called under this lock for iommu
driver devices, making it properly locked.

The only path that didn't have this check is the hotplug path triggered by
BUS_NOTIFY_ADD_DEVICE. The only way to get dev->iommu_group assigned
outside the probe path is via iommu_group_add_device(). Today the only
caller is VFIO no-iommu which never associates with an iommu driver. Thus
adding this additional check is safe.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-07-14 16:14:12 +02:00
Joerg Roedel a7a334076d Merge branches 'iommu/fixes', 'arm/smmu', 'ppc/pamu', 'virtio', 'x86/vt-d', 'core' and 'x86/amd' into next 2023-06-19 10:12:42 +02:00