linux-stable

Commit Graph

Author	SHA1	Message	Date
John Garry	aa4cc17b34	dma-iommu: add iommu_dma_opt_mapping_size() [ Upstream commit `6d9870b7e5` ] Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range limit, as IOVAs allocated above this limit must always be newly allocated, which may be quite slow. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Stable-dep-of: `afc5aa46ed` ("iommu/dma: Force swiotlb_max_mapping_size on an untrusted device") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-04-10 16:18:47 +02:00
Yunfei Wang	c929a230c8	iommu/iova: Fix alloc iova overflows issue commit `dcdb3ba7e2` upstream. In __alloc_and_insert_iova_range, there is an issue that retry_pfn overflows. The value of iovad->anchor.pfn_hi is ~0UL, then when iovad->cached_node is iovad->anchor, curr_iova->pfn_hi + 1 will overflow. As a result, if the retry logic is executed, low_pfn is updated to 0, and then new_pfn < low_pfn returns false to make the allocation successful. This issue occurs in the following two situations: 1. The first iova size exceeds the domain size. When initializing iova domain, iovad->cached_node is assigned as iovad->anchor. For example, the iova domain size is 10M, start_pfn is 0x1_F000_0000, and the iova size allocated for the first time is 11M. The following is the log information, new->pfn_lo is smaller than iovad->cached_node. Example log as follows: [ 223.798112][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range start_pfn:0x1f0000,retry_pfn:0x0,size:0xb00,limit_pfn:0x1f0a00 [ 223.799590][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range success start_pfn:0x1f0000,new->pfn_lo:0x1efe00,new->pfn_hi:0x1f08ff 2. The node with the largest iova->pfn_lo value in the iova domain is deleted, iovad->cached_node will be updated to iovad->anchor, and then the alloc iova size exceeds the maximum iova size that can be allocated in the domain. After judging that retry_pfn is less than limit_pfn, call retry_pfn+1 to fix the overflow issue. Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com> Signed-off-by: Yunfei Wang <yf.wang@mediatek.com> Cc: <stable@vger.kernel.org> # 5.15.* Fixes: `4e89dce725` ("iommu/iova: Retry from last rb tree node if iova search fails") Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20230111063801.25107-1-yf.wang@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-18 11:48:52 +01:00
Robin Murphy	8a4521456b	iommu/iova: Improve 32-bit free space estimate commit `5b61343b50` upstream. For various reasons based on the allocator behaviour and typical use-cases at the time, when the max32_alloc_size optimisation was introduced it seemed reasonable to couple the reset of the tracked size to the update of cached32_node upon freeing a relevant IOVA. However, since subsequent optimisations focused on helping genuine 32-bit devices make best use of even more limited address spaces, it is now a lot more likely for cached32_node to be anywhere in a "full" 32-bit address space, and as such more likely for space to become available from IOVAs below that node being freed. At this point, the short-cut in __cached_rbnode_delete_update() really doesn't hold up any more, and we need to fix the logic to reliably provide the expected behaviour. We still want cached32_node to only move upwards, but we should reset the allocation size if any 32-bit space has become available. Reported-by: Yunfei Wang <yf.wang@mediatek.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Miles Chen <miles.chen@mediatek.com> Link: https://lore.kernel.org/r/033815732d83ca73b13c11485ac39336f15c3b40.1646318408.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Cc: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:48 +02:00
Xiongfeng Wang	ef0af09d41	iommu/iova: Fix race between FQ timeout and teardown [ Upstream commit `d7061627d7` ] It turns out to be possible for hotplugging out a device to reach the stage of tearing down the device's group and default domain before the domain's flush queue has drained naturally. At this point, it is then possible for the timeout to expire just before the del_timer() call in free_iova_flush_queue(), such that we then proceed to free the FQ resources while fq_flush_timeout() is still accessing them on another CPU. Crashes due to this have been observed in the wild while removing NVMe devices. Close the race window by using del_timer_sync() to safely wait for any active timeout handler to finish before we start to free things. We already avoid any locking in free_iova_flush_queue() since the FQ is supposed to be inactive anyway, so the potential deadlock scenario does not apply. Fixes: `9a005a800a` ("iommu/iova: Add flush timer") Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> [ rm: rewrite commit message ] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/0a365e5b07f14b7344677ad6a9a734966a8422ce.1639753638.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:15 +01:00
Linus Torvalds	78e709522d	virtio,vdpa,vhost: features, fixes vduse driver supporting blk virtio-vsock support for end of record with SEQPACKET vdpa: mac and mq support for ifcvf and mlx5 vdpa: management netlink for ifcvf virtio-i2c, gpio dt bindings misc fixes, cleanups NB: when merging this with `b542e383d8` ("eventfd: Make signal recursion protection a task bit") from Linus' tree, replace eventfd_signal_count with eventfd_signal_allowed, and drop the export of eventfd_wake_count from ("eventfd: Export eventfd_wake_count to modules"). Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmE1+awPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpt6EIAJy0qrc62lktNA0IiIVJSLbUbTMmFj8MzkGR 8UxZdhpjWqBPJPyaOuNeksAqTGm/UAPEYx3C2c95Jhej7anFpy7dbCtIXcPHLJME DjcJg+EDrlNCj8m0FcsHpHWsFzPMERJpyEZNxgB5WazQbv+yWhGrg2FN5DCnF0Ro ZFYeKSVty148pQ0nHl8X0JM2XMtqit+O+LvKN2HQZ+fubh7BCzMxzkHY0QLHIzUS UeZqd3Qm8YcbqnlX38P5D6k+NPiTEgknmxaBLkPxg6H3XxDAmaIRFb8Ldd1rsgy1 zTLGDiSGpVDIpawRnuEAzqJThV3Y5/MVJ1WD+mDYQ96tmhfp+KY= =DBH/ -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: - vduse driver ("vDPA Device in Userspace") supporting emulated virtio block devices - virtio-vsock support for end of record with SEQPACKET - vdpa: mac and mq support for ifcvf and mlx5 - vdpa: management netlink for ifcvf - virtio-i2c, gpio dt bindings - misc fixes and cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits) Documentation: Add documentation for VDUSE vduse: Introduce VDUSE - vDPA Device in Userspace vduse: Implement an MMU-based software IOTLB vdpa: Support transferring virtual addressing during DMA mapping vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() vhost-iotlb: Add an opaque pointer for vhost IOTLB vhost-vdpa: Handle the failure of vdpa_reset() vdpa: Add reset callback in vdpa_config_ops vdpa: Fix some coding style issues file: Export receive_fd() to modules eventfd: Export eventfd_wake_count to modules iova: Export alloc_iova_fast() and free_iova_fast() virtio-blk: remove unneeded "likely" statements virtio-balloon: Use virtio_find_vqs() helper vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro vsock_test: update message bounds test for MSG_EOR af_vsock: rename variables in receive loop virtio/vsock: support MSG_EOR bit processing vhost/vsock: support MSG_EOR bit processing ...	2021-09-11 14:48:42 -07:00
Xie Yongji	a93a962669	iova: Export alloc_iova_fast() and free_iova_fast() Export alloc_iova_fast() and free_iova_fast() so that some modules can make use of the per-CPU cache to get rid of rbtree spinlock in alloc_iova() and free_iova() during IOVA allocation. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210831103634.33-2-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:56 -04:00
Robin Murphy	452e69b58c	iommu: Allow enabling non-strict mode dynamically Allocating and enabling a flush queue is in fact something we can reasonably do while a DMA domain is active, without having to rebuild it from scratch. Thus we can allow a strict -> non-strict transition from sysfs without requiring to unbind the device's driver, which is of particular interest to users who want to make selective relaxations to critical devices like the one serving their root filesystem. Disabling and draining a queue also seems technically possible to achieve without rebuilding the whole domain, but would certainly be more involved. Furthermore there's not such a clear use-case for tightening up security after the device may already have done whatever it is that you don't trust it not to do, so we only consider the relaxation case. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/d652966348c78457c38bf18daf369272a4ebc2c9.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-08-18 13:27:49 +02:00
Robin Murphy	7a7c5badf8	iommu: Indicate queued flushes via gather data Since iommu_iotlb_gather exists to help drivers optimise flushing for a given unmap request, it is also the logical place to indicate whether the unmap is strict or not, and thus help them further optimise for whether to expect a sync or a flush_all subsequently. As part of that, it also seems fair to make the flush queue code take responsibility for enforcing the really subtle ordering requirement it brings, so that we don't need to worry about forgetting that if new drivers want to add flush queue support, and can consolidate the existing versions. While we're adding to the kerneldoc, also fill in some info for @freelist which was overlooked previously. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/bf5f8e2ad84e48c712ccbf80fa8c610594c7595f.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-08-18 13:25:32 +02:00
Xiang Chen	7978724f39	iommu/iova: Put free_iova_mem() outside of spinlock iova_rbtree_lock It is not necessary to put free_iova_mem() inside of spinlock/unlock iova_rbtree_lock which only leads to more completion for the spinlock. It has a small promote on the performance after the change. And also rename private_free_iova() as remove_iova() because the function will not free iova after that change. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/1620647582-194621-1-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-06-08 14:15:46 +02:00
John Garry	6e1ea50a06	iommu: Stop exporting free_iova_fast() Function free_iova_fast() is only referenced by dma-iommu.c, which can only be in-built, so stop exporting it. This was missed in an earlier tidy-up patch. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/1616675401-151997-5-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-04-07 10:30:48 +02:00
John Garry	149448b353	iommu: Delete iommu_dma_free_cpu_cached_iovas() Function iommu_dma_free_cpu_cached_iovas() no longer has any caller, so delete it. With that, function free_cpu_cached_iovas() may be made static. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/1616675401-151997-4-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-04-07 10:30:47 +02:00
John Garry	f598a497bc	iova: Add CPU hotplug handler to flush rcaches Like the Intel IOMMU driver already does, flush the per-IOVA domain CPU rcache when a CPU goes offline - there's no point in keeping it. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1616675401-151997-2-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-04-07 10:27:27 +02:00
Robin Murphy	371d7955e3	iommu/iova: Improve restart logic When restarting after searching below the cached node fails, resetting the start point to the anchor node is often overly pessimistic. If allocations are made with mixed limits - particularly in the case of the opportunistic 32-bit allocation for PCI devices - this could mean significant time wasted walking through the whole populated upper range just to reach the initial limit. We can improve on that by implementing a proper tree traversal to find the first node above the relevant limit, and set the exact start point. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/076b3484d1e5057b95d8c387c894bd6ad2514043.1614962123.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-03-18 11:01:09 +01:00
Robin Murphy	7ae31cec5b	iommu/iova: Add rbtree entry helper Repeating the rb_entry() boilerplate all over the place gets old fast. Before adding yet more instances, add a little hepler to tidy it up. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/03931d86c0ad71f44b29394e3a8d38bfc32349cd.1614962123.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-03-18 11:01:09 +01:00
John Garry	2cf7dbff0a	iova: Stop exporting some more functions The following functions are not referenced outside dma-iommu.c (and iova.c), which can only be built-in: - init_iova_flush_queue() - free_iova_fast() - queue_iova() - alloc_iova_fast() So stop exporting them. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/1609940111-28563-4-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-01-27 12:28:58 +01:00
John Garry	6221061901	iova: Delete copy_reserved_iova() Since commit `c588072bba` ("iommu/vt-d: Convert intel iommu driver to the iommu ops"), function copy_reserved_iova() is not referenced, so delete it. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/1609940111-28563-3-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-01-27 12:27:36 +01:00
John Garry	9cc0aaeb96	iova: Make has_iova_flush_queue() private Function has_iova_flush_queue() has no users outside iova.c, so make it private. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/1609940111-28563-2-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2021-01-27 11:51:32 +01:00
Stefano Garzarella	6775ae901f	iommu/iova: fix 'domain' typos Replace misspelled 'doamin' with 'domain' in several comments. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201222164232.88795-1-sgarzare@redhat.com Signed-off-by: Will Deacon <will@kernel.org>	2021-01-05 18:48:57 +00:00
John Garry	176cfc187c	iommu: Stop exporting free_iova_mem() It has no user outside iova.c Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1607020492-189471-4-git-send-email-john.garry@huawei.com Signed-off-by: Will Deacon <will@kernel.org>	2020-12-08 14:14:48 +00:00
John Garry	51b70b817b	iommu: Stop exporting alloc_iova_mem() It is not used outside iova.c Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1607020492-189471-3-git-send-email-john.garry@huawei.com Signed-off-by: Will Deacon <will@kernel.org>	2020-12-08 14:14:48 +00:00
John Garry	2f24dfb712	iommu: Delete split_and_remove_iova() Function split_and_remove_iova() has not been referenced since commit `e70b081c6f` ("iommu/vt-d: Remove IOVA handling code from the non-dma_ops path"), so delete it. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1607020492-189471-2-git-send-email-john.garry@huawei.com Signed-off-by: Will Deacon <will@kernel.org>	2020-12-08 14:14:48 +00:00
Cong Wang	3a651b3a27	iommu: avoid taking iova_rbtree_lock twice Both find_iova() and __free_iova() take iova_rbtree_lock, there is no reason to take and release it twice inside free_iova(). Fold them into one critical section by calling the unlock versions instead. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1605608734-84416-5-git-send-email-john.garry@huawei.com Signed-off-by: Will Deacon <will@kernel.org>	2020-12-01 21:04:56 +00:00
Vijayanand Jitta	6fa3525b45	iommu/iova: Free global iova rcache on iova alloc failure When ever an iova alloc request fails we free the iova ranges present in the percpu iova rcaches and then retry but the global iova rcache is not freed as a result we could still see iova alloc failure even after retry as global rcache is holding the iova's which can cause fragmentation. So, free the global iova rcache as well and then go for the retry. Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Acked-by: John Garry <john.garry@huaqwei.com> Link: https://lore.kernel.org/r/1601451864-5956-2-git-send-email-vjitta@codeaurora.org Signed-off-by: Will Deacon <will@kernel.org>	2020-11-17 23:03:13 +00:00
Vijayanand Jitta	4e89dce725	iommu/iova: Retry from last rb tree node if iova search fails When ever a new iova alloc request comes iova is always searched from the cached node and the nodes which are previous to cached node. So, even if there is free iova space available in the nodes which are next to the cached node iova allocation can still fail because of this approach. Consider the following sequence of iova alloc and frees on 1GB of iova space 1) alloc - 500MB 2) alloc - 12MB 3) alloc - 499MB 4) free - 12MB which was allocated in step 2 5) alloc - 13MB After the above sequence we will have 12MB of free iova space and cached node will be pointing to the iova pfn of last alloc of 13MB which will be the lowest iova pfn of that iova space. Now if we get an alloc request of 2MB we just search from cached node and then look for lower iova pfn's for free iova and as they aren't any, iova alloc fails though there is 12MB of free iova space. To avoid such iova search failures do a retry from the last rb tree node when iova search fails, this will search the entire tree and get an iova if its available. Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/1601451864-5956-1-git-send-email-vjitta@codeaurora.org Signed-off-by: Will Deacon <will@kernel.org>	2020-11-17 23:03:13 +00:00
Yuqi Jin	ba328f8261	iommu/iova: Replace cmpxchg with xchg in queue_iova The performance of the atomic_xchg is better than atomic_cmpxchg because no comparison is required. While the value of @fq_timer_on can only be 0 or 1. Let's use atomic_xchg instead of atomic_cmpxchg here because we only need to check that the value changes from 0 to 1 or from 1 to 1. Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Link: https://lore.kernel.org/r/1598517834-30275-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-09-04 12:11:06 +02:00
Robin Murphy	d3e3d2be68	iommu/iova: Don't BUG on invalid PFNs Unlike the other instances which represent a complete loss of consistency within the rcache mechanism itself, or a fundamental and obvious misconfiguration by an IOMMU driver, the BUG_ON() in iova_magazine_free_pfns() can be provoked at more or less any time in a "spooky action-at-a-distance" manner by any old device driver passing nonsense to dma_unmap_*() which then propagates through to queue_iova(). Not only is this well outside the IOVA layer's control, it's also nowhere near fatal enough to justify panicking anyway - all that really achieves is to make debugging the offending driver more difficult. Let's simply WARN and otherwise ignore bogus PFNs. Reported-by: Prakash Gupta <guptap@codeaurora.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Prakash Gupta <guptap@codeaurora.org> Link: https://lore.kernel.org/r/acbd2d092b42738a03a21b417ce64e27f8c91c86.1591103298.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-06-30 10:42:27 +02:00
Andy Shevchenko	3a0ce12e3b	iommu/iova: Unify format of the printed messages Unify format of the printed messages, i.e. replace printk(LEVEL ... ) with pr_level(...). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20200507161804.13275-2-andriy.shevchenko@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-05-13 10:54:51 +02:00
Qian Cai	944c917539	iommu/iova: Silence warnings under memory pressure When running heavy memory pressure workloads, this 5+ old system is throwing endless warnings below because disk IO is too slow to recover from swapping. Since the volume from alloc_iova_fast() could be large, once it calls printk(), it will trigger disk IO (writing to the log files) and pending softirqs which could cause an infinite loop and make no progress for days by the ongoimng memory reclaim. This is the counter part for Intel where the AMD part has already been merged. See the commit `3d70889532` ("iommu/amd: Silence warnings under memory pressure"). Since the allocation failure will be reported in intel_alloc_iova(), so just call dev_err_once() there because even the "ratelimited" is too much, and silence the one in alloc_iova_mem() to avoid the expensive warn_alloc(). hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed slab_out_of_memory: 66 callbacks suppressed SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 0: slabs: 1822, objs: 16398, free: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 1: slabs: 2051, objs: 18459, free: 31 node 0: slabs: 697, objs: 4182, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) node 1: slabs: 381, objs: 2286, free: 27 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed warn_alloc: 96 callbacks suppressed kworker/11:1H: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: G B Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 12/27/2015 Workqueue: kblockd blk_mq_run_work_fn Call Trace: dump_stack+0xa0/0xea warn_alloc.cold.94+0x8a/0x12d __alloc_pages_slowpath+0x1750/0x1870 __alloc_pages_nodemask+0x58a/0x710 alloc_pages_current+0x9c/0x110 alloc_slab_page+0xc9/0x760 allocate_slab+0x48f/0x5d0 new_slab+0x46/0x70 ___slab_alloc+0x4ab/0x7b0 __slab_alloc+0x43/0x70 kmem_cache_alloc+0x2dd/0x450 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) alloc_iova+0x33/0x210 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 alloc_iova_fast+0x62/0x3d1 node 1: slabs: 381, objs: 2286, free: 27 intel_alloc_iova+0xce/0xe0 intel_map_sg+0xed/0x410 scsi_dma_map+0xd7/0x160 scsi_queue_rq+0xbf7/0x1310 blk_mq_dispatch_rq_list+0x4d9/0xbc0 blk_mq_sched_dispatch_requests+0x24a/0x300 __blk_mq_run_hw_queue+0x156/0x230 blk_mq_run_work_fn+0x3b/0x40 process_one_work+0x579/0xb90 worker_thread+0x63/0x5b0 kthread+0x1e6/0x210 ret_from_fork+0x3a/0x50 Mem-Info: active_anon:2422723 inactive_anon:361971 isolated_anon:34403 active_file:2285 inactive_file:1838 isolated_file:0 unevictable:0 dirty:1 writeback:5 unstable:0 slab_reclaimable:13972 slab_unreclaimable:453879 mapped:2380 shmem:154 pagetables:6948 bounce:0 free:19133 free_pcp:7363 free_cma:0 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2019-12-23 14:07:03 +01:00
Xiaotao Yin	472d26df5e	iommu/iova: Init the struct iova to fix the possible memleak During ethernet(Marvell octeontx2) set ring buffer test: ethtool -G eth1 rx <rx ring size> tx <tx ring size> following kmemleak will happen sometimes: unreferenced object 0xffff000b85421340 (size 64): comm "ethtool", pid 867, jiffies 4295323539 (age 550.500s) hex dump (first 64 bytes): 80 13 42 85 0b 00 ff ff ff ff ff ff ff ff ff ff ..B............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000001b204ddf>] kmem_cache_alloc+0x1b0/0x350 [<00000000d9ef2e50>] alloc_iova+0x3c/0x168 [<00000000ea30f99d>] alloc_iova_fast+0x7c/0x2d8 [<00000000b8bb2f1f>] iommu_dma_alloc_iova.isra.0+0x12c/0x138 [<000000002f1a43b5>] __iommu_dma_map+0x8c/0xf8 [<00000000ecde7899>] iommu_dma_map_page+0x98/0xf8 [<0000000082004e59>] otx2_alloc_rbuf+0xf4/0x158 [<000000002b107f6b>] otx2_rq_aura_pool_init+0x110/0x270 [<00000000c3d563c7>] otx2_open+0x15c/0x734 [<00000000a2f5f3a8>] otx2_dev_open+0x3c/0x68 [<00000000456a98b5>] otx2_set_ringparam+0x1ac/0x1d4 [<00000000f2fbb819>] dev_ethtool+0xb84/0x2028 [<0000000069b67c5a>] dev_ioctl+0x248/0x3a0 [<00000000af38663a>] sock_ioctl+0x280/0x638 [<000000002582384c>] do_vfs_ioctl+0x8b0/0xa80 [<000000004e1a2c02>] ksys_ioctl+0x84/0xb8 The reason: When alloc_iova_mem() without initial with Zero, sometimes fpn_lo will equal to IOVA_ANCHOR by chance, so when return with -ENOMEM(iova32_full) from __alloc_and_insert_iova_range(), the new_iova will not be freed in free_iova_mem(). Fixes: `bb68b2fbfb` ("iommu/iova: Add rbtree anchor node") Signed-off-by: Xiaotao Yin <xiaotao.yin@windriver.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2019-12-17 11:13:20 +01:00
Eric Dumazet	0d87308cca	iommu/iova: Avoid false sharing on fq_timer_on In commit `14bd9a607f` ("iommu/iova: Separate atomic variables to improve performance") Jinyu Qi identified that the atomic_cmpxchg() in queue_iova() was causing a performance loss and moved critical fields so that the false sharing would not impact them. However, avoiding the false sharing in the first place seems easy. We should attempt the atomic_cmpxchg() no more than 100 times per second. Adding an atomic_read() will keep the cache line mostly shared. This false sharing came with commit `9a005a800a` ("iommu/iova: Add flush timer"). Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `9a005a800a` ('iommu/iova: Add flush timer') Cc: Jinyu Qi <jinyuqi@huawei.com> Cc: Joerg Roedel <jroedel@suse.de> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2019-08-30 15:21:53 +02:00
Chris Wilson	9eed17d37c	iommu/iova: Remove stale cached32_node Since the cached32_node is allowed to be advanced above dma_32bit_pfn (to provide a shortcut into the limited range), we need to be careful to remove the to be freed node if it is the cached32_node. [ 48.477773] BUG: KASAN: use-after-free in __cached_rbnode_delete_update+0x68/0x110 [ 48.477812] Read of size 8 at addr ffff88870fc19020 by task kworker/u8:1/37 [ 48.477843] [ 48.477879] CPU: 1 PID: 37 Comm: kworker/u8:1 Tainted: G U 5.2.0+ #735 [ 48.477915] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [ 48.478047] Workqueue: i915 __i915_gem_free_work [i915] [ 48.478075] Call Trace: [ 48.478111] dump_stack+0x5b/0x90 [ 48.478137] print_address_description+0x67/0x237 [ 48.478178] ? __cached_rbnode_delete_update+0x68/0x110 [ 48.478212] __kasan_report.cold.3+0x1c/0x38 [ 48.478240] ? __cached_rbnode_delete_update+0x68/0x110 [ 48.478280] ? __cached_rbnode_delete_update+0x68/0x110 [ 48.478308] __cached_rbnode_delete_update+0x68/0x110 [ 48.478344] private_free_iova+0x2b/0x60 [ 48.478378] iova_magazine_free_pfns+0x46/0xa0 [ 48.478403] free_iova_fast+0x277/0x340 [ 48.478443] fq_ring_free+0x15a/0x1a0 [ 48.478473] queue_iova+0x19c/0x1f0 [ 48.478597] cleanup_page_dma.isra.64+0x62/0xb0 [i915] [ 48.478712] __gen8_ppgtt_cleanup+0x63/0x80 [i915] [ 48.478826] __gen8_ppgtt_cleanup+0x42/0x80 [i915] [ 48.478940] __gen8_ppgtt_clear+0x433/0x4b0 [i915] [ 48.479053] __gen8_ppgtt_clear+0x462/0x4b0 [i915] [ 48.479081] ? __sg_free_table+0x9e/0xf0 [ 48.479116] ? kfree+0x7f/0x150 [ 48.479234] i915_vma_unbind+0x1e2/0x240 [i915] [ 48.479352] i915_vma_destroy+0x3a/0x280 [i915] [ 48.479465] __i915_gem_free_objects+0xf0/0x2d0 [i915] [ 48.479579] __i915_gem_free_work+0x41/0xa0 [i915] [ 48.479607] process_one_work+0x495/0x710 [ 48.479642] worker_thread+0x4c7/0x6f0 [ 48.479687] ? process_one_work+0x710/0x710 [ 48.479724] kthread+0x1b2/0x1d0 [ 48.479774] ? kthread_create_worker_on_cpu+0xa0/0xa0 [ 48.479820] ret_from_fork+0x1f/0x30 [ 48.479864] [ 48.479907] Allocated by task 631: [ 48.479944] save_stack+0x19/0x80 [ 48.479994] __kasan_kmalloc.constprop.6+0xc1/0xd0 [ 48.480038] kmem_cache_alloc+0x91/0xf0 [ 48.480082] alloc_iova+0x2b/0x1e0 [ 48.480125] alloc_iova_fast+0x58/0x376 [ 48.480166] intel_alloc_iova+0x90/0xc0 [ 48.480214] intel_map_sg+0xde/0x1f0 [ 48.480343] i915_gem_gtt_prepare_pages+0xb8/0x170 [i915] [ 48.480465] huge_get_pages+0x232/0x2b0 [i915] [ 48.480590] ____i915_gem_object_get_pages+0x40/0xb0 [i915] [ 48.480712] __i915_gem_object_get_pages+0x90/0xa0 [i915] [ 48.480834] i915_gem_object_prepare_write+0x2d6/0x330 [i915] [ 48.480955] create_test_object.isra.54+0x1a9/0x3e0 [i915] [ 48.481075] igt_shared_ctx_exec+0x365/0x3c0 [i915] [ 48.481210] __i915_subtests.cold.4+0x30/0x92 [i915] [ 48.481341] __run_selftests.cold.3+0xa9/0x119 [i915] [ 48.481466] i915_live_selftests+0x3c/0x70 [i915] [ 48.481583] i915_pci_probe+0xe7/0x220 [i915] [ 48.481620] pci_device_probe+0xe0/0x180 [ 48.481665] really_probe+0x163/0x4e0 [ 48.481710] device_driver_attach+0x85/0x90 [ 48.481750] __driver_attach+0xa5/0x180 [ 48.481796] bus_for_each_dev+0xda/0x130 [ 48.481831] bus_add_driver+0x205/0x2e0 [ 48.481882] driver_register+0xca/0x140 [ 48.481927] do_one_initcall+0x6c/0x1af [ 48.481970] do_init_module+0x106/0x350 [ 48.482010] load_module+0x3d2c/0x3ea0 [ 48.482058] __do_sys_finit_module+0x110/0x180 [ 48.482102] do_syscall_64+0x62/0x1f0 [ 48.482147] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 48.482190] [ 48.482224] Freed by task 37: [ 48.482273] save_stack+0x19/0x80 [ 48.482318] __kasan_slab_free+0x12e/0x180 [ 48.482363] kmem_cache_free+0x70/0x140 [ 48.482406] __free_iova+0x1d/0x30 [ 48.482445] fq_ring_free+0x15a/0x1a0 [ 48.482490] queue_iova+0x19c/0x1f0 [ 48.482624] cleanup_page_dma.isra.64+0x62/0xb0 [i915] [ 48.482749] __gen8_ppgtt_cleanup+0x63/0x80 [i915] [ 48.482873] __gen8_ppgtt_cleanup+0x42/0x80 [i915] [ 48.482999] __gen8_ppgtt_clear+0x433/0x4b0 [i915] [ 48.483123] __gen8_ppgtt_clear+0x462/0x4b0 [i915] [ 48.483250] i915_vma_unbind+0x1e2/0x240 [i915] [ 48.483378] i915_vma_destroy+0x3a/0x280 [i915] [ 48.483500] __i915_gem_free_objects+0xf0/0x2d0 [i915] [ 48.483622] __i915_gem_free_work+0x41/0xa0 [i915] [ 48.483659] process_one_work+0x495/0x710 [ 48.483704] worker_thread+0x4c7/0x6f0 [ 48.483748] kthread+0x1b2/0x1d0 [ 48.483787] ret_from_fork+0x1f/0x30 [ 48.483831] [ 48.483868] The buggy address belongs to the object at ffff88870fc19000 [ 48.483868] which belongs to the cache iommu_iova of size 40 [ 48.483920] The buggy address is located 32 bytes inside of [ 48.483920] 40-byte region [ffff88870fc19000, ffff88870fc19028) [ 48.483964] The buggy address belongs to the page: [ 48.484006] page:ffffea001c3f0600 refcount:1 mapcount:0 mapping:ffff8888181a91c0 index:0x0 compound_mapcount: 0 [ 48.484045] flags: 0x8000000000010200(slab\|head) [ 48.484096] raw: 8000000000010200 ffffea001c421a08 ffffea001c447e88 ffff8888181a91c0 [ 48.484141] raw: 0000000000000000 0000000000120012 00000001ffffffff 0000000000000000 [ 48.484188] page dumped because: kasan: bad access detected [ 48.484230] [ 48.484265] Memory state around the buggy address: [ 48.484314] ffff88870fc18f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 48.484361] ffff88870fc18f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 48.484406] >ffff88870fc19000: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc [ 48.484451] ^ [ 48.484494] ffff88870fc19080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 48.484530] ffff88870fc19100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108602 Fixes: `e60aa7b538` ("iommu/iova: Extend rbtree node caching") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: <stable@vger.kernel.org> # v4.15+ Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2019-07-22 17:50:49 +02:00
Dmitry Safonov	effa467870	iommu/vt-d: Don't queue_iova() if there is no flush queue Intel VT-d driver was reworked to use common deferred flushing implementation. Previously there was one global per-cpu flush queue, afterwards - one per domain. Before deferring a flush, the queue should be allocated and initialized. Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush queue. It's probably worth to init it for static or unmanaged domains too, but it may be arguable - I'm leaving it to iommu folks. Prevent queuing an iova flush if the domain doesn't have a queue. The defensive check seems to be worth to keep even if queue would be initialized for all kinds of domains. And is easy backportable. On 4.19.43 stable kernel it has a user-visible effect: previously for devices in si domain there were crashes, on sata devices: BUG: spinlock bad magic on CPU#6, swapper/0/1 lock: 0xffff88844f582008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1 Call Trace: <IRQ> dump_stack+0x61/0x7e spin_bug+0x9d/0xa3 do_raw_spin_lock+0x22/0x8e _raw_spin_lock_irqsave+0x32/0x3a queue_iova+0x45/0x115 intel_unmap+0x107/0x113 intel_unmap_sg+0x6b/0x76 __ata_qc_complete+0x7f/0x103 ata_qc_complete+0x9b/0x26a ata_qc_complete_multiple+0xd0/0xe3 ahci_handle_port_interrupt+0x3ee/0x48a ahci_handle_port_intr+0x73/0xa9 ahci_single_level_irq_intr+0x40/0x60 __handle_irq_event_percpu+0x7f/0x19a handle_irq_event_percpu+0x32/0x72 handle_irq_event+0x38/0x56 handle_edge_irq+0x102/0x121 handle_irq+0x147/0x15c do_IRQ+0x66/0xf2 common_interrupt+0xf/0xf RIP: 0010:__do_softirq+0x8c/0x2df The same for usb devices that use ehci-pci: BUG: spinlock bad magic on CPU#0, swapper/0/1 lock: 0xffff88844f402008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4 Call Trace: <IRQ> dump_stack+0x61/0x7e spin_bug+0x9d/0xa3 do_raw_spin_lock+0x22/0x8e _raw_spin_lock_irqsave+0x32/0x3a queue_iova+0x77/0x145 intel_unmap+0x107/0x113 intel_unmap_page+0xe/0x10 usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d usb_hcd_unmap_urb_for_dma+0x17/0x100 unmap_urb_for_dma+0x22/0x24 __usb_hcd_giveback_urb+0x51/0xc3 usb_giveback_urb_bh+0x97/0xde tasklet_action_common.isra.4+0x5f/0xa1 tasklet_action+0x2d/0x30 __do_softirq+0x138/0x2df irq_exit+0x7d/0x8b smp_apic_timer_interrupt+0x10f/0x151 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39 Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: iommu@lists.linux-foundation.org Cc: <stable@vger.kernel.org> # 4.14+ Fixes: `13cf017446` ("iommu/vt-d: Make use of iova deferred flushing") Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2019-07-22 17:43:06 +02:00
Thomas Gleixner	3b20eb2372	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 320 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 33 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190530000435.254582722@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-06-05 17:37:05 +02:00
Robert Richter	80ef4464d5	iommu/iova: Fix tracking of recently failed iova address If a 32 bit allocation request is too big to possibly succeed, it early exits with a failure and then should never update max32_alloc_ size. This patch fixes current code, now the size is only updated if the slow path failed while walking the tree. Without the fix the allocation may enter the slow path again even if there was a failure before of a request with the same or a smaller size. Cc: <stable@vger.kernel.org> # 4.20+ Fixes: `bee60e94a1` ("iommu/iova: Optimise attempts to allocate iova from 32bit address range") Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2019-03-22 12:01:58 +01:00
Ganapatrao Kulkarni	bee60e94a1	iommu/iova: Optimise attempts to allocate iova from 32bit address range As an optimisation for PCI devices, there is always first attempt been made to allocate iova from SAC address range. This will lead to unnecessary attempts, when there are no free ranges available. Adding fix to track recently failed iova address size and allow further attempts, only if requested size is lesser than a failed size. The size is updated when any replenish happens. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2018-09-25 10:18:27 +02:00
Kees Cook	e99e88a9d2	treewide: setup_timer() -> timer_setup() This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something ptr = (struct something )data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list t) { struct something ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something ptr = (struct something )data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list t) { struct something ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); \| -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); \| -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); \| -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| _E->_timer@_stl.function = _callback; \| _E->_timer@_stl.function = &_callback; \| _E->_timer@_stl.function = (_cast_func)_callback; \| _E->_timer@_stl.function = (_cast_func)&_callback; \| _E._timer@_stl.function = _callback; \| _E._timer@_stl.function = &_callback; \| _E._timer@_stl.function = (_cast_func)_callback; \| _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list t ) { ( ... when != _origarg _handletype _handle = -(_handletype )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg \| ... when != _origarg _handletype _handle = -(void )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg \| ... when != _origarg _handletype _handle; ... when != _handle _handle = -(_handletype )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg \| ... when != _origarg _handletype _handle; ... when != _handle _handle = -(void )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list t ) { + _handletype _origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype )_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list t) { ... } // callback(struct something handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype _handle +struct timer_list t ) { + _handletype _handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list t) { - _handletype _handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); \| -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast \|\| change_callback_handle_cast_no_arg \|\| change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; \| _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; \| _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; \| _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast \|\| change_callback_handle_cast_no_arg \|\| change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer \| -(_cast_data)&_E +&_E._timer \| -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); \| -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); \| -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); \| -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); \| -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); \| -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>	2017-11-21 15:57:07 -08:00
Sebastian Andrzej Siewior	94e2cc4dba	iommu/iova: Use raw_cpu_ptr() instead of get_cpu_ptr() for ->fq get_cpu_ptr() disabled preemption and returns the ->fq object of the current CPU. raw_cpu_ptr() does the same except that it not disable preemption which means the scheduler can move it to another CPU after it obtained the per-CPU object. In this case this is not bad because the data structure itself is protected with a spin_lock. This change shouldn't matter however on RT it does because the sleeping lock can't be accessed with disabled preemption. Cc: Joerg Roedel <joro@8bytes.org> Cc: iommu@lists.linux-foundation.org Reported-by: vinadhy@gmail.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-11-06 11:24:35 -07:00
Tomasz Nowicki	538d5b3332	iommu/iova: Make rcache flush optional on IOVA allocation failure Since IOVA allocation failure is not unusual case we need to flush CPUs' rcache in hope we will succeed in next round. However, it is useful to decide whether we need rcache flush step because of two reasons: - Not scalability. On large system with ~100 CPUs iterating and flushing rcache for each CPU becomes serious bottleneck so we may want to defer it. - free_cpu_cached_iovas() does not care about max PFN we are interested in. Thus we may flush our rcaches and still get no new IOVA like in the commonly used scenario: if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev)) iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift); if (!iova) iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift); 1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get PCI devices a SAC address 2. alloc_iova() fails due to full 32-bit space 3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas() throws entries away for nothing and alloc_iova() fails again 4. Next alloc_iova_fast() call cannot take advantage of rcache since we have just defeated caches. In this case we pick the slowest option to proceed. This patch reworks flushed_rcache local flag to be additional function argument instead and control rcache flush step. Also, it updates all users to do the flush as the last chance. Signed-off-by: Tomasz Nowicki <Tomasz.Nowicki@caviumnetworks.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Nate Watterson <nwatters@codeaurora.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-10-12 14:18:02 +02:00
Robin Murphy	abbb8a0938	iommu/iova: Don't try to copy anchor nodes Anchor nodes are not reserved IOVAs in the way that copy_reserved_iova() cares about - while the failure from reserve_iova() is benign since the target domain will already have its own anchor, we still don't want to be triggering spurious warnings. Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Fixes: `bb68b2fbfb` ('iommu/iova: Add rbtree anchor node') Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-10-02 15:45:24 +02:00
Robin Murphy	e8b1984027	iommu/iova: Try harder to allocate from rcache magazine When devices with different DMA masks are using the same domain, or for PCI devices where we usually try a speculative 32-bit allocation first, there is a fair possibility that the top PFN of the rcache stack at any given time may be unsuitable for the lower limit, prompting a fallback to allocating anew from the rbtree. Consequently, we may end up artifically increasing pressure on the 32-bit IOVA space as unused IOVAs accumulate lower down in the rcache stacks, while callers with 32-bit masks also impose unnecessary rbtree overhead. In such cases, let's try a bit harder to satisfy the allocation locally first - scanning the whole stack should still be relatively inexpensive. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-09-28 14:57:16 +02:00
Robin Murphy	b826ee9a4f	iommu/iova: Make rcache limit_pfn handling more robust When popping a pfn from an rcache, we are currently checking it directly against limit_pfn for viability. Since this represents iova->pfn_lo, it is technically possible for the corresponding iova->pfn_hi to be greater than limit_pfn. Although we generally get away with it in practice since limit_pfn is typically a power-of-two boundary and the IOVAs are size-aligned, it's pretty trivial to make the iova_rcache_get() path take the allocation size into account for complete safety. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-09-28 14:57:15 +02:00
Robin Murphy	7595dc588a	iommu/iova: Simplify domain destruction All put_iova_domain() should have to worry about is freeing memory - by that point the domain must no longer be live, so the act of cleaning up doesn't need to be concurrency-safe or maintain the rbtree in a self-consistent state. There's no need to waste time with locking or emptying the rcache magazines, and we can just use the postorder traversal helper to clear out the remaining rbtree entries in-place. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-09-28 14:57:15 +02:00
Robin Murphy	973f5fbedb	iommu/iova: Simplify cached node logic The logic of __get_cached_rbnode() is a little obtuse, but then __get_prev_node_of_cached_rbnode_or_last_node_and_update_limit_pfn() wouldn't exactly roll off the tongue... Now that we have the invariant that there is always a valid node to start searching downwards from, everything gets a bit easier to follow if we simplify that function to do what it says on the tin and return the cached node (or anchor node as appropriate) directly. In turn, we can then deduplicate the rb_prev() and limit_pfn logic into the main loop itself, further reduce the amount of code under the lock, and generally make the inner workings a bit less subtle. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-09-27 17:09:58 +02:00
Robin Murphy	bb68b2fbfb	iommu/iova: Add rbtree anchor node Add a permanent dummy IOVA reservation to the rbtree, such that we can always access the top of the address space instantly. The immediate benefit is that we remove the overhead of the rb_last() traversal when not using the cached node, but it also paves the way for further simplifications. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-09-27 17:09:57 +02:00
Zhen Lei	aa3ac9469c	iommu/iova: Make dma_32bit_pfn implicit Now that the cached node optimisation can apply to all allocations, the couple of users which were playing tricks with dma_32bit_pfn in order to benefit from it can stop doing so. Conversely, there is also no need for all the other users to explicitly calculate a 'real' 32-bit PFN, when init_iova_domain() can happily do that itself from the page granularity. CC: Thierry Reding <thierry.reding@gmail.com> CC: Jonathan Hunter <jonathanh@nvidia.com> CC: David Airlie <airlied@linux.ie> CC: Sudeep Dutt <sudeep.dutt@intel.com> CC: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Zhen Lei <thunder.leizhen@huawei.com> Tested-by: Nate Watterson <nwatters@codeaurora.org> [rm: use iova_shift(), rewrote commit message] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-09-27 17:09:57 +02:00
Robin Murphy	e60aa7b538	iommu/iova: Extend rbtree node caching The cached node mechanism provides a significant performance benefit for allocations using a 32-bit DMA mask, but in the case of non-PCI devices or where the 32-bit space is full, the loss of this benefit can be significant - on large systems there can be many thousands of entries in the tree, such that walking all the way down to find free space every time becomes increasingly awful. Maintain a similar cached node for the whole IOVA space as a superset of the 32-bit space so that performance can remain much more consistent. Inspired by work by Zhen Lei <thunder.leizhen@huawei.com>. Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Zhen Lei <thunder.leizhen@huawei.com> Tested-by: Nate Watterson <nwatters@codeaurora.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-09-27 17:09:57 +02:00
Zhen Lei	086c83acb7	iommu/iova: Optimise the padding calculation The mask for calculating the padding size doesn't change, so there's no need to recalculate it every loop iteration. Furthermore, Once we've done that, it becomes clear that we don't actually need to calculate a padding size at all - by flipping the arithmetic around, we can just combine the upper limit, size, and mask directly to check against the lower limit. For an arm64 build, this alone knocks 20% off the object code size of the entire alloc_iova() function! Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Zhen Lei <thunder.leizhen@huawei.com> Tested-by: Nate Watterson <nwatters@codeaurora.org> [rm: simplified more of the arithmetic, rewrote commit message] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-09-27 17:09:56 +02:00
Zhen Lei	2070f940a6	iommu/iova: Optimise rbtree searching Checking the IOVA bounds separately before deciding which direction to continue the search (if necessary) results in redundantly comparing both pfns twice each. GCC can already determine that the final comparison op is redundant and optimise it down to 3 in total, but we can go one further with a little tweak of the ordering (which makes the intent of the code that much cleaner as a bonus). Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Zhen Lei <thunder.leizhen@huawei.com> Tested-by: Nate Watterson <nwatters@codeaurora.org> [rm: rewrote commit message to clarify] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-09-27 17:09:56 +02:00
Joerg Roedel	9a005a800a	iommu/iova: Add flush timer Add a timer to flush entries from the Flush-Queues every 10ms. This makes sure that no stale TLB entries remain for too long after an IOVA has been unmapped. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-08-15 18:23:52 +02:00
Joerg Roedel	8109c2a2f8	iommu/iova: Add locking to Flush-Queues The lock is taken from the same CPU most of the time. But having it allows to flush the queue also from another CPU if necessary. This will be used by a timer to regularily flush any pending IOVAs from the Flush-Queues. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-08-15 18:23:52 +02:00

1 2

74 Commits