linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-08-28 03:40:04 +00:00

Author	SHA1	Message	Date
Linus Torvalds	d0cb5f71c5	VFIO updates for v3.15 include: - Allow the vfio-type1 IOMMU to support multiple domains within a container - Plumb path to query whether all domains are cache-coherent - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd) The first patch also makes the vfio-type1 IOMMU driver completely independent of the bus_type of the devices it's handling, which enables it to be used for both vfio-pci and a future vfio-platform (and hopefully combinations involving both simultaneously). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTPbikAAoJECObm247sIsiG9cP/jnurW84YuAHmybzy3R4nMaa 8BWcQST+TyQ78GWvubFDcRt+vHmJUI4iFWPBIm1twBSVrMy6F1GcT/spmSne1Dfb OvcfGEy59tGO+BeklHg5Qq2Hj1UzfeV9uQoy+PrpTF/sYzsyL6g5+O3Zv39SNvr0 zCwnO2JAKXIpQlkK3wVha6V13X07Z/+d2b4JSsvmON2cnlPzOUYNrgBvoSeV2IQe 3kMAU9qfAfQlpNDhRRZL/HshgWazCg4XZUp9UdNjUkOxrUp4vXHrVTUJnUGBDytk 8V9UGRKF3mik9PqpZJk4jLV5urgUVpnUR5747uqs0KF+9GWxClXvh3gp1XX8Zn7f NCfDSMn/wrDr6fQBglKUadITaDhF49KV+J0cX083q46BbYxhAfgxv1I9UOvLreG6 SGAezlTB0mefa1BmSwJdfKwDBsuDOhbF0TsHC6zT7yFc8pqe3hl/vQzUyu3HtwuM yPycQiQQmvOaymaiiBRwyLocwJbDNKPigDomv8NZ8Mcd97Fu9YsF4ydaxMJmmeKf TffYS4z4tUElYiU2vv1ipB8T+ReCmdtj2cIvyfwTL9jycY9ouO4bJ56dOGWd3WNl m7AVokbrrJQ03itUpFMuYjHAzTNUlXVvXlGr9n6L4VTR0RTL1zwJVrMVDsXdhxm4 XgDbVB5PrxinDAC1vJvI =mnzO -----END PGP SIGNATURE----- Merge tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: "VFIO updates for v3.15 include: - Allow the vfio-type1 IOMMU to support multiple domains within a container - Plumb path to query whether all domains are cache-coherent - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd) The first patch also makes the vfio-type1 IOMMU driver completely independent of the bus_type of the devices it's handling, which enables it to be used for both vfio-pci and a future vfio-platform (and hopefully combinations involving both simultaneously)" * tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio: vfio: always select ANON_INODES kvm/vfio: Support for DMA coherent IOMMUs vfio: Add external user check extension interface vfio/type1: Add extension to test DMA cache coherence of IOMMU vfio/iommu_type1: Multi-IOMMU domain support	2014-04-03 14:05:02 -07:00
David Rientjes	668f9abbd4	mm: close PageTail race Commit `bf6bddf192` ("mm: introduce compaction and migration for ballooned pages") introduces page_count(page) into memory compaction which dereferences page->first_page if PageTail(page). This results in a very rare NULL pointer dereference on the aforementioned page_count(page). Indeed, anything that does compound_head(), including page_count() is susceptible to racing with prep_compound_page() and seeing a NULL or dangling page->first_page pointer. This patch uses Andrea's implementation of compound_trans_head() that deals with such a race and makes it the default compound_head() implementation. This includes a read memory barrier that ensures that if PageTail(head) is true that we return a head page that is neither NULL nor dangling. The patch then adds a store memory barrier to prep_compound_page() to ensure page->first_page is set. This is the safest way to ensure we see the head page that we are expecting, PageTail(page) is already in the unlikely() path and the memory barriers are unfortunately required. Hugetlbfs is the exception, we don't enforce a store memory barrier during init since no race is possible. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Holger Kiehl <Holger.Kiehl@dwd.de> Cc: Christoph Lameter <cl@linux.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-04 07:55:47 -08:00
Alex Williamson	aa42931827	vfio/type1: Add extension to test DMA cache coherence of IOMMU Now that the type1 IOMMU backend can support IOMMU_CACHE, we need to be able to test whether coherency is currently enforced. Add an extension for this. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-02-26 11:38:37 -07:00
Alex Williamson	1ef3e2bc04	vfio/iommu_type1: Multi-IOMMU domain support We currently have a problem that we cannot support advanced features of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee that those features will be supported by all of the hardware units involved with the domain over its lifetime. For instance, the Intel VT-d architecture does not require that all DRHDs support snoop control. If we create a domain based on a device behind a DRHD that does support snoop control and enable SNP support via the IOMMU_CACHE mapping option, we cannot then add a device behind a DRHD which does not support snoop control or we'll get reserved bit faults from the SNP bit in the pagetables. To add to the complexity, we can't know the properties of a domain until a device is attached. We could pass this problem off to userspace and require that a separate vfio container be used, but we don't know how to handle page accounting in that case. How do we know that a page pinned in one container is the same page as a different container and avoid double billing the user for the page. The solution is therefore to support multiple IOMMU domains per container. In the majority of cases, only one domain will be required since hardware is typically consistent within a system. However, this provides us the ability to validate compatibility of domains and support mixed environments where page table flags can be different between domains. To do this, our DMA tracking needs to change. We currently try to coalesce user mappings into as few tracking entries as possible. The problem then becomes that we lose granularity of user mappings. We've never guaranteed that a user is able to unmap at a finer granularity than the original mapping, but we must honor the granularity of the original mapping. This coalescing code is therefore removed, allowing only unmaps covering complete maps. The change in accounting is fairly small here, a typical QEMU VM will start out with roughly a dozen entries, so it's arguable if this coalescing was ever needed. We also move IOMMU domain creation to the point where a group is attached to the container. An interesting side-effect of this is that we now have access to the device at the time of domain creation and can probe the devices within the group to determine the bus_type. This finally makes vfio_iommu_type1 completely device/bus agnostic. In fact, each IOMMU domain can host devices on different buses managed by different physical IOMMUs, and present a single DMA mapping interface to the user. When a new domain is created, mappings are replayed to bring the IOMMU pagetables up to the state of the current container. And of course, DMA mapping and unmapping automatically traverse all of the configured IOMMU domains. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Varun Sethi <Varun.Sethi@freescale.com>	2014-02-26 11:38:36 -07:00
Antonios Motakis	d93b3ac0ed	VFIO: vfio_iommu_type1: fix bug caused by break in nested loop In vfio_iommu_type1.c there is a bug in vfio_dma_do_map, when checking that pages are not already mapped. Since the check is being done in a for loop nested within the main loop, breaking out of it does not create the intended behavior. If the underlying IOMMU driver returns a non-NULL value, this will be ignored and mapping the DMA range will be attempted anyway, leading to unpredictable behavior. This interracts badly with the ARM SMMU driver issue fixed in the patch that was submitted with the title: "[PATCH 2/2] ARM: SMMU: return NULL on error in arm_smmu_iova_to_phys" Both fixes are required in order to use the vfio_iommu_type1 driver with an ARM SMMU. This patch refactors the function slightly, in order to also make this kind of bug less likely. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-10-11 10:40:46 -06:00
Alex Williamson	8d38ef1948	vfio/type1: Fix leak on error path We also don't handle unpinning zero pages as an error on other exits so we can fix that inconsistency by rolling in the next conditional return. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-07-01 08:28:58 -06:00
Alex Williamson	f5bfdbf252	vfio/type1: Fix missed frees and zero sized removes With hugepage support we can only properly aligned and sized ranges. We only guarantee that we can unmap the same ranges mapped and not arbitrary sub-ranges. This means we might not free anything or might free more than requested. The vfio unmap interface started storing the unmapped size to return to userspace to handle this. This patch fixes a few places where we don't properly handle those cases, moves a memory allocation to a place where failure is an option and checks our loops to make sure we don't get into an infinite loop trying to remove an overlap. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-06-25 16:01:44 -06:00
Alex Williamson	5c6c2b21ec	vfio: Provide module option to disable vfio_iommu_type1 hugepage support Add a module option to vfio_iommu_type1 to disable IOMMU hugepage support. This causes iommu_map to only be called with single page mappings, disabling the IOMMU driver's ability to use hugepages. This option can be enabled by loading vfio_iommu_type1 with disable_hugepages=1 or dynamically through sysfs. If enabled dynamically, only new mappings are restricted. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-06-21 09:38:11 -06:00
Alex Williamson	166fd7d94a	vfio: hugepage support for vfio_iommu_type1 We currently send all mappings to the iommu in PAGE_SIZE chunks, which prevents the iommu from enabling support for larger page sizes. We still need to pin pages, which means we step through them in PAGE_SIZE chunks, but we can batch up contiguous physical memory chunks to allow the iommu the opportunity to use larger pages. The approach here is a bit different that the one currently used for legacy KVM device assignment. Rather than looking at the vma page size and using that as the maximum size to pass to the iommu, we instead simply look at whether the next page is physically contiguous. This means we might ask the iommu to map a 4MB region, while legacy KVM might limit itself to a maximum of 2MB. Splitting our mapping path also allows us to be smarter about locked memory because we can more easily unwind if the user attempts to exceed the limit. Therefore, rather than assuming that a mapping will result in locked memory, we test each page as it is pinned to determine whether it locks RAM vs an mmap'd MMIO region. This should result in better locking granularity and less locked page fudge factors in userspace. The unmap path uses the same algorithm as legacy KVM. We don't want to track the pfn for each mapping ourselves, but we need the pfn in order to unpin pages. We therefore ask the iommu for the iova to physical address translation, ask it to unpin a page, and see how many pages were actually unpinned. iommus supporting large pages will often return something bigger than a page here, which we know will be physically contiguous and we can unpin a batch of pfns. iommus not supporting large mappings won't see an improvement in batching here as they only unmap a page at a time. With this change, we also make a clarification to the API for mapping and unmapping DMA. We can only guarantee unmaps at the same granularity as used for the original mapping. In other words, unmapping a subregion of a previous mapping is not guaranteed and may result in a larger or smaller unmapping than requested. The size field in the unmapping structure is updated to reflect this. Previously this was unmodified on mapping, always returning the the requested unmap size. This is now updated to return the actual unmap size on success, allowing userspace to appropriately track mappings. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-06-21 09:38:02 -06:00
Alex Williamson	cd9b22685e	vfio: Convert type1 iommu to use rbtree We need to keep track of all the DMA mappings of an iommu container so that it can be automatically unmapped when the user releases the file descriptor. We currently do this using a simple list, where we merge entries with contiguous iovas and virtual addresses. Using a tree for this is a bit more efficient and allows us to use common code instead of inventing our own. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-06-21 09:37:50 -06:00
Alex Williamson	73fa0d10d0	vfio: Type1 IOMMU implementation This VFIO IOMMU backend is designed primarily for AMD-Vi and Intel VT-d hardware, but is potentially usable by anything supporting similar mapping functionality. We arbitrarily call this a Type1 backend for lack of a better name. This backend has no IOVA or host memory mapping restrictions for the user and is optimized for relatively static mappings. Mapped areas are pinned into system memory. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-07-31 08:16:23 -06:00

11 commits