linux-stable/include/linux/dma-buf.h

635 lines
21 KiB
C
Raw Normal View History

/* SPDX-License-Identifier: GPL-2.0-only */
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
/*
* Header file for dma buffer sharing framework.
*
* Copyright(C) 2011 Linaro Limited. All rights reserved.
* Author: Sumit Semwal <sumit.semwal@ti.com>
*
* Many thanks to linaro-mm-sig list, and specially
* Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
* Daniel Vetter <daniel@ffwll.ch> for their support in creation and
* refining of this idea.
*/
#ifndef __DMA_BUF_H__
#define __DMA_BUF_H__
dma-buf-map: Rename to iosys-map Rename struct dma_buf_map to struct iosys_map and corresponding APIs. Over time dma-buf-map grew up to more functionality than the one used by dma-buf: in fact it's just a shim layer to abstract system memory, that can be accessed via regular load and store, from IO memory that needs to be acessed via arch helpers. The idea is to extend this API so it can fulfill other needs, internal to a single driver. Example: in the i915 driver it's desired to share the implementation for integrated graphics, which uses mostly system memory, with discrete graphics, which may need to access IO memory. The conversion was mostly done with the following semantic patch: @r1@ @@ - struct dma_buf_map + struct iosys_map @r2@ @@ ( - DMA_BUF_MAP_INIT_VADDR + IOSYS_MAP_INIT_VADDR | - dma_buf_map_set_vaddr + iosys_map_set_vaddr | - dma_buf_map_set_vaddr_iomem + iosys_map_set_vaddr_iomem | - dma_buf_map_is_equal + iosys_map_is_equal | - dma_buf_map_is_null + iosys_map_is_null | - dma_buf_map_is_set + iosys_map_is_set | - dma_buf_map_clear + iosys_map_clear | - dma_buf_map_memcpy_to + iosys_map_memcpy_to | - dma_buf_map_incr + iosys_map_incr ) @@ @@ - #include <linux/dma-buf-map.h> + #include <linux/iosys-map.h> Then some files had their includes adjusted and some comments were update to remove mentions to dma-buf-map. Since this is not specific to dma-buf anymore, move the documentation to the "Bus-Independent Device Accesses" section. v2: - Squash patches v3: - Fix wrong removal of dma-buf.h from MAINTAINERS - Move documentation from dma-buf.rst to device-io.rst v4: - Change documentation title and level Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20220204170541.829227-1-lucas.demarchi@intel.com
2022-02-04 17:05:41 +00:00
#include <linux/iosys-map.h>
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
#include <linux/file.h>
#include <linux/err.h>
#include <linux/scatterlist.h>
#include <linux/list.h>
#include <linux/dma-mapping.h>
#include <linux/fs.h>
dma-buf: Rename struct fence to dma_fence I plan to usurp the short name of struct fence for a core kernel struct, and so I need to rename the specialised fence/timeline for DMA operations to make room. A consensus was reached in https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html that making clear this fence applies to DMA operations was a good thing. Since then the patch has grown a bit as usage increases, so hopefully it remains a good thing! (v2...: rebase, rerun spatch) v3: Compile on msm, spotted a manual fixup that I broke. v4: Try again for msm, sorry Daniel coccinelle script: @@ @@ - struct fence + struct dma_fence @@ @@ - struct fence_ops + struct dma_fence_ops @@ @@ - struct fence_cb + struct dma_fence_cb @@ @@ - struct fence_array + struct dma_fence_array @@ @@ - enum fence_flag_bits + enum dma_fence_flag_bits @@ @@ ( - fence_init + dma_fence_init | - fence_release + dma_fence_release | - fence_free + dma_fence_free | - fence_get + dma_fence_get | - fence_get_rcu + dma_fence_get_rcu | - fence_put + dma_fence_put | - fence_signal + dma_fence_signal | - fence_signal_locked + dma_fence_signal_locked | - fence_default_wait + dma_fence_default_wait | - fence_add_callback + dma_fence_add_callback | - fence_remove_callback + dma_fence_remove_callback | - fence_enable_sw_signaling + dma_fence_enable_sw_signaling | - fence_is_signaled_locked + dma_fence_is_signaled_locked | - fence_is_signaled + dma_fence_is_signaled | - fence_is_later + dma_fence_is_later | - fence_later + dma_fence_later | - fence_wait_timeout + dma_fence_wait_timeout | - fence_wait_any_timeout + dma_fence_wait_any_timeout | - fence_wait + dma_fence_wait | - fence_context_alloc + dma_fence_context_alloc | - fence_array_create + dma_fence_array_create | - to_fence_array + to_dma_fence_array | - fence_is_array + dma_fence_is_array | - trace_fence_emit + trace_dma_fence_emit | - FENCE_TRACE + DMA_FENCE_TRACE | - FENCE_WARN + DMA_FENCE_WARN | - FENCE_ERR + DMA_FENCE_ERR ) ( ... ) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
2016-10-25 12:00:45 +00:00
#include <linux/dma-fence.h>
#include <linux/wait.h>
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
struct device;
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
struct dma_buf;
struct dma_buf_attachment;
/**
* struct dma_buf_ops - operations possible on struct dma_buf
* @vmap: [optional] creates a virtual mapping for the buffer into kernel
* address space. Same restrictions as for vmap and friends apply.
* @vunmap: [optional] unmaps a vmap from the buffer
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
*/
struct dma_buf_ops {
/**
* @cache_sgt_mapping:
*
* If true the framework will cache the first mapping made for each
* attachment. This avoids creating mappings for attachments multiple
* times.
*/
bool cache_sgt_mapping;
/**
* @attach:
*
* This is called from dma_buf_attach() to make sure that a given
* &dma_buf_attachment.dev can access the provided &dma_buf. Exporters
* which support buffer objects in special locations like VRAM or
* device-specific carveout areas should check whether the buffer could
* be move to system memory (or directly accessed by the provided
* device), and otherwise need to fail the attach operation.
*
* The exporter should also in general check whether the current
* allocation fulfills the DMA constraints of the new device. If this
* is not the case, and the allocation cannot be moved, it should also
* fail the attach operation.
*
* Any exporter-private housekeeping data can be stored in the
* &dma_buf_attachment.priv pointer.
*
* This callback is optional.
*
* Returns:
*
* 0 on success, negative error code on failure. It might return -EBUSY
* to signal that backing storage is already allocated and incompatible
* with the requirements of requesting device.
*/
int (*attach)(struct dma_buf *, struct dma_buf_attachment *);
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
/**
* @detach:
*
* This is called by dma_buf_detach() to release a &dma_buf_attachment.
* Provided so that exporters can clean up any housekeeping for an
* &dma_buf_attachment.
*
* This callback is optional.
*/
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
/**
* @pin:
*
* This is called by dma_buf_pin() and lets the exporter know that the
* DMA-buf can't be moved any more. Ideally, the exporter should
* pin the buffer so that it is generally accessible by all
* devices.
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
*
* This is called with the &dmabuf.resv object locked and is mutual
* exclusive with @cache_sgt_mapping.
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
*
* This is called automatically for non-dynamic importers from
* dma_buf_attach().
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
*
* Note that similar to non-dynamic exporters in their @map_dma_buf
* callback the driver must guarantee that the memory is available for
* use and cleared of any old data by the time this function returns.
* Drivers which pipeline their buffer moves internally must wait for
* all moves and clears to complete.
*
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
* Returns:
*
* 0 on success, negative error code on failure.
*/
int (*pin)(struct dma_buf_attachment *attach);
/**
* @unpin:
*
* This is called by dma_buf_unpin() and lets the exporter know that the
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
* DMA-buf can be moved again.
*
* This is called with the dmabuf->resv object locked and is mutual
* exclusive with @cache_sgt_mapping.
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
*
* This callback is optional.
*/
void (*unpin)(struct dma_buf_attachment *attach);
/**
* @map_dma_buf:
*
* This is called by dma_buf_map_attachment() and is used to map a
* shared &dma_buf into device address space, and it is mandatory. It
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
* can only be called if @attach has been called successfully.
*
* This call may sleep, e.g. when the backing storage first needs to be
* allocated, or moved to a location suitable for all currently attached
* devices.
*
* Note that any specific buffer attributes required for this function
* should get added to device_dma_parameters accessible via
* &device.dma_params from the &dma_buf_attachment. The @attach callback
* should also check these constraints.
*
* If this is being called for the first time, the exporter can now
* choose to scan through the list of attachments for this buffer,
* collate the requirements of the attached devices, and choose an
* appropriate backing storage for the buffer.
*
* Based on enum dma_data_direction, it might be possible to have
* multiple users accessing at the same time (for reading, maybe), or
* any other kind of sharing that the exporter might wish to make
* available to buffer-users.
*
dma-buf: change DMA-buf locking convention v3 This patch is a stripped down version of the locking changes necessary to support dynamic DMA-buf handling. It adds a dynamic flag for both importers as well as exporters so that drivers can choose if they want the reservation object locked or unlocked during mapping of attachments. For compatibility between drivers we cache the DMA-buf mapping during attaching an importer as soon as exporter/importer disagree on the dynamic handling. Issues and solutions we considered: - We can't change all existing drivers, and existing improters have strong opinions about which locks they're holding while calling dma_buf_attachment_map/unmap. Exporters also have strong opinions about which locks they can acquire in their ->map/unmap callbacks, levaing no room for change. The solution to avoid this was to move the actual map/unmap out from this call, into the attach/detach callbacks, and cache the mapping. This works because drivers don't call attach/detach from deep within their code callchains (like deep in memory management code called from cs/execbuf ioctl), but directly from the fd2handle implementation. - The caching has some troubles on some soc drivers, which set other modes than DMA_BIDIRECTIONAL. We can't have 2 incompatible mappings, and we can't re-create the mapping at _map time due to the above locking fun. We very carefuly step around that by only caching at attach time if the dynamic mode between importer/expoert mismatches. - There's been quite some discussion on dma-buf mappings which need active cache management, which would all break down when caching, plus we don't have explicit flush operations on the attachment side. The solution to this was to shrug and keep the current discrepancy between what the dma-buf docs claim and what implementations do, with the hope that the begin/end_cpu_access hooks are good enough and that all necessary flushing to keep device mappings consistent will be done there. v2: cleanup set_name merge, improve kerneldoc v3: update commit message, kerneldoc and cleanup _debug_show() Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/336788/
2018-07-03 14:42:26 +00:00
* This is always called with the dmabuf->resv object locked when
* the dynamic_mapping flag is true.
*
* Note that for non-dynamic exporters the driver must guarantee that
* that the memory is available for use and cleared of any old data by
* the time this function returns. Drivers which pipeline their buffer
* moves internally must wait for all moves and clears to complete.
* Dynamic exporters do not need to follow this rule: For non-dynamic
* importers the buffer is already pinned through @pin, which has the
* same requirements. Dynamic importers otoh are required to obey the
* dma_resv fences.
*
* Returns:
*
* A &sg_table scatter list of the backing storage of the DMA buffer,
* already mapped into the device address space of the &device attached
* with the provided &dma_buf_attachment. The addresses and lengths in
* the scatter list are PAGE_SIZE aligned.
*
* On failure, returns a negative error value wrapped into a pointer.
* May also return -EINTR when a signal was received while being
* blocked.
dma-buf: Add debug option We have too many people abusing the struct page they can get at but really shouldn't in importers. Aside from that the backing page might simply not exist (for dynamic p2p mappings) looking at it and using it e.g. for mmap can also wreak the page handling of the exporter completely. Importers really must go through the proper interface like dma_buf_mmap for everything. I'm semi-tempted to enforce this for dynamic importers since those really have no excuse at all to break the rules. Unfortuantely we can't store the right pointers somewhere safe to make sure we oops on something recognizable, so best is to just wrangle them a bit by flipping all the bits. At least on x86 kernel addresses have all their high bits sets and the struct page array is fairly low in the kernel mapping, so flipping all the bits gives us a very high pointer in userspace and hence excellent chances for an invalid dereference. v2: Add a note to the @map_dma_buf hook that exporters shouldn't do fancy caching tricks, which would blow up with this address scrambling trick here (Chris) Enable by default when CONFIG_DMA_API_DEBUG is enabled. v3: Only one copy of the mangle/unmangle code (Christian) v4: #ifdef, not #if (0day) v5: sg_table can also be an ERR_PTR (Chris, Christian) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Stevens <stevensd@chromium.org> Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20210115164739.3958206-1-daniel.vetter@ffwll.ch
2021-01-15 16:47:39 +00:00
*
* Note that exporters should not try to cache the scatter list, or
* return the same one for multiple calls. Caching is done either by the
* DMA-BUF code (for non-dynamic importers) or the importer. Ownership
* of the scatter list is transferred to the caller, and returned by
* @unmap_dma_buf.
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
*/
struct sg_table * (*map_dma_buf)(struct dma_buf_attachment *,
enum dma_data_direction);
/**
* @unmap_dma_buf:
*
* This is called by dma_buf_unmap_attachment() and should unmap and
* release the &sg_table allocated in @map_dma_buf, and it is mandatory.
* For static dma_buf handling this might also unpin the backing
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
* storage if this is the last mapping of the DMA buffer.
*/
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
void (*unmap_dma_buf)(struct dma_buf_attachment *,
struct sg_table *,
enum dma_data_direction);
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
/* TODO: Add try_map_dma_buf version, to return immed with -EBUSY
* if the call would block.
*/
/**
* @release:
*
* Called after the last dma_buf_put to release the &dma_buf, and
* mandatory.
*/
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
void (*release)(struct dma_buf *);
/**
* @begin_cpu_access:
*
* This is called from dma_buf_begin_cpu_access() and allows the
* exporter to ensure that the memory is actually coherent for cpu
* access. The exporter also needs to ensure that cpu access is coherent
* for the access direction. The direction can be used by the exporter
* to optimize the cache flushing, i.e. access with a different
* direction (read instead of write) might return stale or even bogus
* data (e.g. when the exporter needs to copy the data to temporary
* storage).
*
* Note that this is both called through the DMA_BUF_IOCTL_SYNC IOCTL
* command for userspace mappings established through @mmap, and also
* for kernel mappings established with @vmap.
*
* This callback is optional.
*
* Returns:
*
* 0 on success or a negative error code on failure. This can for
* example fail when the backing storage can't be allocated. Can also
* return -ERESTARTSYS or -EINTR when the call has been interrupted and
* needs to be restarted.
*/
int (*begin_cpu_access)(struct dma_buf *, enum dma_data_direction);
/**
* @end_cpu_access:
*
* This is called from dma_buf_end_cpu_access() when the importer is
* done accessing the CPU. The exporter can use this to flush caches and
* undo anything else done in @begin_cpu_access.
*
* This callback is optional.
*
* Returns:
*
* 0 on success or a negative error code on failure. Can return
* -ERESTARTSYS or -EINTR when the call has been interrupted and needs
* to be restarted.
*/
dma-buf, drm, ion: Propagate error code from dma_buf_start_cpu_access() Drivers, especially i915.ko, can fail during the initial migration of a dma-buf for CPU access. However, the error code from the driver was not being propagated back to ioctl and so userspace was blissfully ignorant of the failure. Rendering corruption ensues. Whilst fixing the ioctl to return the error code from dma_buf_start_cpu_access(), also do the same for dma_buf_end_cpu_access(). For most drivers, dma_buf_end_cpu_access() cannot fail. i915.ko however, as most drivers would, wants to avoid being uninterruptible (as would be required to guarrantee no failure when flushing the buffer to the device). As userspace already has to handle errors from the SYNC_IOCTL, take advantage of this to be able to restart the syscall across signals. This fixes a coherency issue for i915.ko as well as reducing the uninterruptible hold upon its BKL, the struct_mutex. Fixes commit c11e391da2a8fe973c3c2398452000bed505851e Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Feb 11 20:04:51 2016 -0200 dma-buf: Add ioctls to allow userspace to flush Testcase: igt/gem_concurrent_blit/*dmabuf*interruptible Testcase: igt/prime_mmap_coherency/ioctl-errors Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tiago Vignatti <tiago.vignatti@intel.com> Cc: Stéphane Marchesin <marcheu@chromium.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Daniel Vetter <daniel.vetter@intel.com> CC: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: intel-gfx@lists.freedesktop.org Cc: devel@driverdev.osuosl.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1458331359-2634-1-git-send-email-chris@chris-wilson.co.uk
2016-03-18 20:02:39 +00:00
int (*end_cpu_access)(struct dma_buf *, enum dma_data_direction);
dma-buf: mmap support Compared to Rob Clark's RFC I've ditched the prepare/finish hooks and corresponding ioctls on the dma_buf file. The major reason for that is that many people seem to be under the impression that this is also for synchronization with outstanding asynchronous processsing. I'm pretty massively opposed to this because: - It boils down reinventing a new rather general-purpose userspace synchronization interface. If we look at things like futexes, this is hard to get right. - Furthermore a lot of kernel code has to interact with this synchronization primitive. This smells a look like the dri1 hw_lock, a horror show I prefer not to reinvent. - Even more fun is that multiple different subsystems would interact here, so we have plenty of opportunities to create funny deadlock scenarios. I think synchronization is a wholesale different problem from data sharing and should be tackled as an orthogonal problem. Now we could demand that prepare/finish may only ensure cache coherency (as Rob intended), but that runs up into the next problem: We not only need mmap support to facilitate sw-only processing nodes in a pipeline (without jumping through hoops by importing the dma_buf into some sw-access only importer), which allows for a nicer ION->dma-buf upgrade path for existing Android userspace. We also need mmap support for existing importing subsystems to support existing userspace libraries. And a loot of these subsystems are expected to export coherent userspace mappings. So prepare/finish can only ever be optional and the exporter /needs/ to support coherent mappings. Given that mmap access is always somewhat fallback-y in nature I've decided to drop this optimization, instead of just making it optional. If we demonstrate a clear need for this, supported by benchmark results, we can always add it in again later as an optional extension. Other differences compared to Rob's RFC is the above mentioned support for mapping a dma-buf through facilities provided by the importer. Which results in mmap support no longer being optional. Note that this dma-buf mmap patch does _not_ support every possible insanity an existing subsystem could pull of with mmap: Because it does not allow to intercept pagefaults and shoot down ptes importing subsystems can't add some magic of their own at these points (e.g. to automatically synchronize with outstanding rendering or set up some special resources). I've done a cursory read through a few mmap implementions of various subsytems and I'm hopeful that we can avoid this (and the complexity it'd bring with it). Additonally I've extended the documentation a bit to explain the hows and whys of this mmap extension. In case we ever want to add support for explicitly cache maneged userspace mmap with a prepare/finish ioctl pair, we could specify that userspace needs to mmap a different part of the dma_buf, e.g. the range starting at dma_buf->size up to dma_buf->size*2. This works because the size of a dma_buf is invariant over it's lifetime. The exporter would obviously need to fall back to coherent mappings for both ranges if a legacy clients maps the coherent range and the architecture cannot suppor conflicting caching policies. Also, this would obviously be optional and userspace needs to be able to fall back to coherent mappings. v2: - Spelling fixes from Rob Clark. - Compile fix for !DMA_BUF from Rob Clark. - Extend commit message to explain how explicitly cache managed mmap support could be added later. - Extend the documentation with implementations notes for exporters that need to manually fake coherency. v3: - dma_buf pointer initialization goof-up noticed by Rebecca Schultz Zavin. Cc: Rob Clark <rob.clark@linaro.org> Cc: Rebecca Schultz Zavin <rebecca@android.com> Acked-by: Rob Clark <rob.clark@linaro.org> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2012-04-24 09:08:52 +00:00
/**
* @mmap:
*
* This callback is used by the dma_buf_mmap() function
*
* Note that the mapping needs to be incoherent, userspace is expected
* to bracket CPU access using the DMA_BUF_IOCTL_SYNC interface.
*
* Because dma-buf buffers have invariant size over their lifetime, the
* dma-buf core checks whether a vma is too large and rejects such
* mappings. The exporter hence does not need to duplicate this check.
* Drivers do not need to check this themselves.
*
* If an exporter needs to manually flush caches and hence needs to fake
* coherency for mmap support, it needs to be able to zap all the ptes
* pointing at the backing storage. Now linux mm needs a struct
* address_space associated with the struct file stored in vma->vm_file
* to do that with the function unmap_mapping_range. But the dma_buf
* framework only backs every dma_buf fd with the anon_file struct file,
* i.e. all dma_bufs share the same file.
*
* Hence exporters need to setup their own file (and address_space)
* association by setting vma->vm_file and adjusting vma->vm_pgoff in
* the dma_buf mmap callback. In the specific case of a gem driver the
* exporter could use the shmem file already provided by gem (and set
* vm_pgoff = 0). Exporters can then zap ptes by unmapping the
* corresponding range of the struct address_space associated with their
* own file.
*
* This callback is optional.
*
* Returns:
*
* 0 on success or a negative error code on failure.
*/
dma-buf: mmap support Compared to Rob Clark's RFC I've ditched the prepare/finish hooks and corresponding ioctls on the dma_buf file. The major reason for that is that many people seem to be under the impression that this is also for synchronization with outstanding asynchronous processsing. I'm pretty massively opposed to this because: - It boils down reinventing a new rather general-purpose userspace synchronization interface. If we look at things like futexes, this is hard to get right. - Furthermore a lot of kernel code has to interact with this synchronization primitive. This smells a look like the dri1 hw_lock, a horror show I prefer not to reinvent. - Even more fun is that multiple different subsystems would interact here, so we have plenty of opportunities to create funny deadlock scenarios. I think synchronization is a wholesale different problem from data sharing and should be tackled as an orthogonal problem. Now we could demand that prepare/finish may only ensure cache coherency (as Rob intended), but that runs up into the next problem: We not only need mmap support to facilitate sw-only processing nodes in a pipeline (without jumping through hoops by importing the dma_buf into some sw-access only importer), which allows for a nicer ION->dma-buf upgrade path for existing Android userspace. We also need mmap support for existing importing subsystems to support existing userspace libraries. And a loot of these subsystems are expected to export coherent userspace mappings. So prepare/finish can only ever be optional and the exporter /needs/ to support coherent mappings. Given that mmap access is always somewhat fallback-y in nature I've decided to drop this optimization, instead of just making it optional. If we demonstrate a clear need for this, supported by benchmark results, we can always add it in again later as an optional extension. Other differences compared to Rob's RFC is the above mentioned support for mapping a dma-buf through facilities provided by the importer. Which results in mmap support no longer being optional. Note that this dma-buf mmap patch does _not_ support every possible insanity an existing subsystem could pull of with mmap: Because it does not allow to intercept pagefaults and shoot down ptes importing subsystems can't add some magic of their own at these points (e.g. to automatically synchronize with outstanding rendering or set up some special resources). I've done a cursory read through a few mmap implementions of various subsytems and I'm hopeful that we can avoid this (and the complexity it'd bring with it). Additonally I've extended the documentation a bit to explain the hows and whys of this mmap extension. In case we ever want to add support for explicitly cache maneged userspace mmap with a prepare/finish ioctl pair, we could specify that userspace needs to mmap a different part of the dma_buf, e.g. the range starting at dma_buf->size up to dma_buf->size*2. This works because the size of a dma_buf is invariant over it's lifetime. The exporter would obviously need to fall back to coherent mappings for both ranges if a legacy clients maps the coherent range and the architecture cannot suppor conflicting caching policies. Also, this would obviously be optional and userspace needs to be able to fall back to coherent mappings. v2: - Spelling fixes from Rob Clark. - Compile fix for !DMA_BUF from Rob Clark. - Extend commit message to explain how explicitly cache managed mmap support could be added later. - Extend the documentation with implementations notes for exporters that need to manually fake coherency. v3: - dma_buf pointer initialization goof-up noticed by Rebecca Schultz Zavin. Cc: Rob Clark <rob.clark@linaro.org> Cc: Rebecca Schultz Zavin <rebecca@android.com> Acked-by: Rob Clark <rob.clark@linaro.org> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2012-04-24 09:08:52 +00:00
int (*mmap)(struct dma_buf *, struct vm_area_struct *vma);
dma-buf-map: Rename to iosys-map Rename struct dma_buf_map to struct iosys_map and corresponding APIs. Over time dma-buf-map grew up to more functionality than the one used by dma-buf: in fact it's just a shim layer to abstract system memory, that can be accessed via regular load and store, from IO memory that needs to be acessed via arch helpers. The idea is to extend this API so it can fulfill other needs, internal to a single driver. Example: in the i915 driver it's desired to share the implementation for integrated graphics, which uses mostly system memory, with discrete graphics, which may need to access IO memory. The conversion was mostly done with the following semantic patch: @r1@ @@ - struct dma_buf_map + struct iosys_map @r2@ @@ ( - DMA_BUF_MAP_INIT_VADDR + IOSYS_MAP_INIT_VADDR | - dma_buf_map_set_vaddr + iosys_map_set_vaddr | - dma_buf_map_set_vaddr_iomem + iosys_map_set_vaddr_iomem | - dma_buf_map_is_equal + iosys_map_is_equal | - dma_buf_map_is_null + iosys_map_is_null | - dma_buf_map_is_set + iosys_map_is_set | - dma_buf_map_clear + iosys_map_clear | - dma_buf_map_memcpy_to + iosys_map_memcpy_to | - dma_buf_map_incr + iosys_map_incr ) @@ @@ - #include <linux/dma-buf-map.h> + #include <linux/iosys-map.h> Then some files had their includes adjusted and some comments were update to remove mentions to dma-buf-map. Since this is not specific to dma-buf anymore, move the documentation to the "Bus-Independent Device Accesses" section. v2: - Squash patches v3: - Fix wrong removal of dma-buf.h from MAINTAINERS - Move documentation from dma-buf.rst to device-io.rst v4: - Change documentation title and level Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20220204170541.829227-1-lucas.demarchi@intel.com
2022-02-04 17:05:41 +00:00
int (*vmap)(struct dma_buf *dmabuf, struct iosys_map *map);
void (*vunmap)(struct dma_buf *dmabuf, struct iosys_map *map);
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
};
/**
* struct dma_buf - shared buffer object
*
* This represents a shared buffer, created by calling dma_buf_export(). The
* userspace representation is a normal file descriptor, which can be created by
* calling dma_buf_fd().
*
* Shared dma buffers are reference counted using dma_buf_put() and
* get_dma_buf().
*
* Device DMA access is handled by the separate &struct dma_buf_attachment.
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
*/
struct dma_buf {
/**
* @size:
*
* Size of the buffer; invariant over the lifetime of the buffer.
*/
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
size_t size;
/**
* @file:
*
* File pointer used for sharing buffers across, and for refcounting.
* See dma_buf_get() and dma_buf_put().
*/
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
struct file *file;
/**
* @attachments:
*
* List of dma_buf_attachment that denotes all devices attached,
* protected by &dma_resv lock @resv.
*/
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
struct list_head attachments;
/** @ops: dma_buf_ops associated with this buffer object. */
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
const struct dma_buf_ops *ops;
/**
* @vmapping_counter:
*
* Used internally to refcnt the vmaps returned by dma_buf_vmap().
* Protected by @lock.
*/
unsigned vmapping_counter;
/**
* @vmap_ptr:
* The current vmap ptr if @vmapping_counter > 0. Protected by @lock.
*/
dma-buf-map: Rename to iosys-map Rename struct dma_buf_map to struct iosys_map and corresponding APIs. Over time dma-buf-map grew up to more functionality than the one used by dma-buf: in fact it's just a shim layer to abstract system memory, that can be accessed via regular load and store, from IO memory that needs to be acessed via arch helpers. The idea is to extend this API so it can fulfill other needs, internal to a single driver. Example: in the i915 driver it's desired to share the implementation for integrated graphics, which uses mostly system memory, with discrete graphics, which may need to access IO memory. The conversion was mostly done with the following semantic patch: @r1@ @@ - struct dma_buf_map + struct iosys_map @r2@ @@ ( - DMA_BUF_MAP_INIT_VADDR + IOSYS_MAP_INIT_VADDR | - dma_buf_map_set_vaddr + iosys_map_set_vaddr | - dma_buf_map_set_vaddr_iomem + iosys_map_set_vaddr_iomem | - dma_buf_map_is_equal + iosys_map_is_equal | - dma_buf_map_is_null + iosys_map_is_null | - dma_buf_map_is_set + iosys_map_is_set | - dma_buf_map_clear + iosys_map_clear | - dma_buf_map_memcpy_to + iosys_map_memcpy_to | - dma_buf_map_incr + iosys_map_incr ) @@ @@ - #include <linux/dma-buf-map.h> + #include <linux/iosys-map.h> Then some files had their includes adjusted and some comments were update to remove mentions to dma-buf-map. Since this is not specific to dma-buf anymore, move the documentation to the "Bus-Independent Device Accesses" section. v2: - Squash patches v3: - Fix wrong removal of dma-buf.h from MAINTAINERS - Move documentation from dma-buf.rst to device-io.rst v4: - Change documentation title and level Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20220204170541.829227-1-lucas.demarchi@intel.com
2022-02-04 17:05:41 +00:00
struct iosys_map vmap_ptr;
/**
* @exp_name:
*
* Name of the exporter; useful for debugging. See the
* DMA_BUF_SET_NAME IOCTL.
*/
const char *exp_name;
/**
* @name:
*
* Userspace-provided name; useful for accounting and debugging,
* protected by dma_resv_lock() on @resv and @name_lock for read access.
*/
const char *name;
/** @name_lock: Spinlock to protect name access for read access. */
spinlock_t name_lock;
/**
* @owner:
*
* Pointer to exporter module; used for refcounting when exporter is a
* kernel module.
*/
struct module *owner;
/** @list_node: node for dma_buf accounting and debugging. */
struct list_head list_node;
/** @priv: exporter specific private data for this buffer object. */
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
void *priv;
/**
* @resv:
*
* Reservation object linked to this dma-buf.
dma-buf: Document dma-buf implicit fencing/resv fencing rules Docs for struct dma_resv are fairly clear: "A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)." https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-objects Furthermore a review across all of upstream. First of render drivers and how they set implicit fences: - nouveau follows this contract, see in validate_fini_no_ticket() nouveau_bo_fence(nvbo, fence, !!b->write_domains); and that last boolean controls whether the exclusive or shared fence slot is used. - radeon follows this contract by setting p->relocs[i].tv.num_shared = !r->write_domain; in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing. - vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot. - etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences() - i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract. - lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo - msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer - panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess - v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved - v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos() - vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu - vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence() This covers the setting of the exclusive fences when writing. Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few: - i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan) - etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback. - v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small. - v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up. - panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper. - lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d. - msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea. - all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain. Onwards to the display side. - Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs. - I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right. In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu. Amdgpu tried to fix this already in commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König <christian.koenig@amd.com> Date: Wed Sep 19 16:54:35 2018 +0200 drm/amdgpu: fix using shared fence for exported BOs v2 but this fix falls short on a number of areas: - It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable. - It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst <maarten.lankhorst@canonical.com> Date: Tue Jul 1 12:57:43 2014 +0200 dma-buf: add poll support, v3 - Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting "Yeah, and that is exactly the reason why I will NAK this uAPI change. "This doesn't works for amdgpu at all for the reasons outlined above." https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail.com/ Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works. Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan: - amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk. - Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model. - for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same. - since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag. - New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot. - Then when this is all in place we can merge this patch here. What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted. Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers. v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed. This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is - fishing the implicit fences out of a shared object at the right time - setting the exclusive implicit fence slot at the right time. Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd: https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.net/ v3: Since Christian has fixed amdgpu now in commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König <christian.koenig@amd.com> Date: Wed Jun 9 13:51:36 2021 +0200 drm/amdgpu: rework dma_resv handling v3 Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers. Since dynamic importers have different rules also hammer these in again while we're at it. v4: - Add the missing "through the device" in the dynamic section that I overlooked. - Fix a kerneldoc markup mistake, the link didn't connect v5: - A few s/should/must/ to make clear what must be done (if the driver does implicit sync) and what's more a maybe (Daniel Stone) - drop all the example api discussion, that needs to be expanded, clarified and put into a new chapter in drm-uapi.rst (Daniel Stone) Cc: Daniel Stone <daniel@fooishbar.org> Acked-by: Daniel Stone <daniel@fooishbar.org> Reviewed-by: Dave Airlie <airlied@redhat.com> (v4) Reviewed-by: Christian König <christian.koenig@amd.com> (v3) Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Dave Airlie <airlied@gmail.com> Cc: Rob Clark <robdclark@chromium.org> Cc: Kristian H. Kristensen <hoegsberg@google.com> Cc: Michel Dänzer <michel@daenzer.net> Cc: Daniel Stone <daniels@collabora.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Chen Li <chenli@uniontech.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210624125246.166721-1-daniel.vetter@ffwll.ch
2021-06-24 12:52:46 +00:00
*
* IMPLICIT SYNCHRONIZATION RULES:
*
* Drivers which support implicit synchronization of buffer access as
* e.g. exposed in `Implicit Fence Poll Support`_ must follow the
* below rules.
*
* - Drivers must add a read fence through dma_resv_add_fence() with the
* DMA_RESV_USAGE_READ flag for anything the userspace API considers a
* read access. This highly depends upon the API and window system.
dma-buf: Document dma-buf implicit fencing/resv fencing rules Docs for struct dma_resv are fairly clear: "A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)." https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-objects Furthermore a review across all of upstream. First of render drivers and how they set implicit fences: - nouveau follows this contract, see in validate_fini_no_ticket() nouveau_bo_fence(nvbo, fence, !!b->write_domains); and that last boolean controls whether the exclusive or shared fence slot is used. - radeon follows this contract by setting p->relocs[i].tv.num_shared = !r->write_domain; in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing. - vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot. - etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences() - i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract. - lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo - msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer - panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess - v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved - v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos() - vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu - vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence() This covers the setting of the exclusive fences when writing. Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few: - i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan) - etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback. - v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small. - v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up. - panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper. - lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d. - msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea. - all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain. Onwards to the display side. - Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs. - I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right. In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu. Amdgpu tried to fix this already in commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König <christian.koenig@amd.com> Date: Wed Sep 19 16:54:35 2018 +0200 drm/amdgpu: fix using shared fence for exported BOs v2 but this fix falls short on a number of areas: - It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable. - It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst <maarten.lankhorst@canonical.com> Date: Tue Jul 1 12:57:43 2014 +0200 dma-buf: add poll support, v3 - Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting "Yeah, and that is exactly the reason why I will NAK this uAPI change. "This doesn't works for amdgpu at all for the reasons outlined above." https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail.com/ Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works. Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan: - amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk. - Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model. - for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same. - since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag. - New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot. - Then when this is all in place we can merge this patch here. What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted. Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers. v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed. This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is - fishing the implicit fences out of a shared object at the right time - setting the exclusive implicit fence slot at the right time. Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd: https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.net/ v3: Since Christian has fixed amdgpu now in commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König <christian.koenig@amd.com> Date: Wed Jun 9 13:51:36 2021 +0200 drm/amdgpu: rework dma_resv handling v3 Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers. Since dynamic importers have different rules also hammer these in again while we're at it. v4: - Add the missing "through the device" in the dynamic section that I overlooked. - Fix a kerneldoc markup mistake, the link didn't connect v5: - A few s/should/must/ to make clear what must be done (if the driver does implicit sync) and what's more a maybe (Daniel Stone) - drop all the example api discussion, that needs to be expanded, clarified and put into a new chapter in drm-uapi.rst (Daniel Stone) Cc: Daniel Stone <daniel@fooishbar.org> Acked-by: Daniel Stone <daniel@fooishbar.org> Reviewed-by: Dave Airlie <airlied@redhat.com> (v4) Reviewed-by: Christian König <christian.koenig@amd.com> (v3) Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Dave Airlie <airlied@gmail.com> Cc: Rob Clark <robdclark@chromium.org> Cc: Kristian H. Kristensen <hoegsberg@google.com> Cc: Michel Dänzer <michel@daenzer.net> Cc: Daniel Stone <daniels@collabora.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Chen Li <chenli@uniontech.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210624125246.166721-1-daniel.vetter@ffwll.ch
2021-06-24 12:52:46 +00:00
*
* - Similarly drivers must add a write fence through
* dma_resv_add_fence() with the DMA_RESV_USAGE_WRITE flag for
* anything the userspace API considers write access.
dma-buf: Document dma-buf implicit fencing/resv fencing rules Docs for struct dma_resv are fairly clear: "A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)." https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-objects Furthermore a review across all of upstream. First of render drivers and how they set implicit fences: - nouveau follows this contract, see in validate_fini_no_ticket() nouveau_bo_fence(nvbo, fence, !!b->write_domains); and that last boolean controls whether the exclusive or shared fence slot is used. - radeon follows this contract by setting p->relocs[i].tv.num_shared = !r->write_domain; in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing. - vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot. - etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences() - i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract. - lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo - msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer - panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess - v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved - v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos() - vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu - vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence() This covers the setting of the exclusive fences when writing. Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few: - i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan) - etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback. - v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small. - v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up. - panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper. - lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d. - msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea. - all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain. Onwards to the display side. - Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs. - I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right. In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu. Amdgpu tried to fix this already in commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König <christian.koenig@amd.com> Date: Wed Sep 19 16:54:35 2018 +0200 drm/amdgpu: fix using shared fence for exported BOs v2 but this fix falls short on a number of areas: - It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable. - It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst <maarten.lankhorst@canonical.com> Date: Tue Jul 1 12:57:43 2014 +0200 dma-buf: add poll support, v3 - Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting "Yeah, and that is exactly the reason why I will NAK this uAPI change. "This doesn't works for amdgpu at all for the reasons outlined above." https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail.com/ Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works. Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan: - amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk. - Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model. - for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same. - since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag. - New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot. - Then when this is all in place we can merge this patch here. What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted. Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers. v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed. This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is - fishing the implicit fences out of a shared object at the right time - setting the exclusive implicit fence slot at the right time. Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd: https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.net/ v3: Since Christian has fixed amdgpu now in commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König <christian.koenig@amd.com> Date: Wed Jun 9 13:51:36 2021 +0200 drm/amdgpu: rework dma_resv handling v3 Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers. Since dynamic importers have different rules also hammer these in again while we're at it. v4: - Add the missing "through the device" in the dynamic section that I overlooked. - Fix a kerneldoc markup mistake, the link didn't connect v5: - A few s/should/must/ to make clear what must be done (if the driver does implicit sync) and what's more a maybe (Daniel Stone) - drop all the example api discussion, that needs to be expanded, clarified and put into a new chapter in drm-uapi.rst (Daniel Stone) Cc: Daniel Stone <daniel@fooishbar.org> Acked-by: Daniel Stone <daniel@fooishbar.org> Reviewed-by: Dave Airlie <airlied@redhat.com> (v4) Reviewed-by: Christian König <christian.koenig@amd.com> (v3) Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Dave Airlie <airlied@gmail.com> Cc: Rob Clark <robdclark@chromium.org> Cc: Kristian H. Kristensen <hoegsberg@google.com> Cc: Michel Dänzer <michel@daenzer.net> Cc: Daniel Stone <daniels@collabora.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Chen Li <chenli@uniontech.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210624125246.166721-1-daniel.vetter@ffwll.ch
2021-06-24 12:52:46 +00:00
*
* - Drivers may just always add a write fence, since that only
* causes unnecessary synchronization, but no correctness issues.
dma-buf: Document dma-buf implicit fencing/resv fencing rules Docs for struct dma_resv are fairly clear: "A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)." https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-objects Furthermore a review across all of upstream. First of render drivers and how they set implicit fences: - nouveau follows this contract, see in validate_fini_no_ticket() nouveau_bo_fence(nvbo, fence, !!b->write_domains); and that last boolean controls whether the exclusive or shared fence slot is used. - radeon follows this contract by setting p->relocs[i].tv.num_shared = !r->write_domain; in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing. - vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot. - etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences() - i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract. - lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo - msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer - panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess - v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved - v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos() - vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu - vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence() This covers the setting of the exclusive fences when writing. Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few: - i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan) - etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback. - v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small. - v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up. - panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper. - lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d. - msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea. - all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain. Onwards to the display side. - Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs. - I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right. In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu. Amdgpu tried to fix this already in commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König <christian.koenig@amd.com> Date: Wed Sep 19 16:54:35 2018 +0200 drm/amdgpu: fix using shared fence for exported BOs v2 but this fix falls short on a number of areas: - It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable. - It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst <maarten.lankhorst@canonical.com> Date: Tue Jul 1 12:57:43 2014 +0200 dma-buf: add poll support, v3 - Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting "Yeah, and that is exactly the reason why I will NAK this uAPI change. "This doesn't works for amdgpu at all for the reasons outlined above." https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail.com/ Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works. Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan: - amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk. - Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model. - for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same. - since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag. - New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot. - Then when this is all in place we can merge this patch here. What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted. Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers. v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed. This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is - fishing the implicit fences out of a shared object at the right time - setting the exclusive implicit fence slot at the right time. Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd: https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.net/ v3: Since Christian has fixed amdgpu now in commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König <christian.koenig@amd.com> Date: Wed Jun 9 13:51:36 2021 +0200 drm/amdgpu: rework dma_resv handling v3 Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers. Since dynamic importers have different rules also hammer these in again while we're at it. v4: - Add the missing "through the device" in the dynamic section that I overlooked. - Fix a kerneldoc markup mistake, the link didn't connect v5: - A few s/should/must/ to make clear what must be done (if the driver does implicit sync) and what's more a maybe (Daniel Stone) - drop all the example api discussion, that needs to be expanded, clarified and put into a new chapter in drm-uapi.rst (Daniel Stone) Cc: Daniel Stone <daniel@fooishbar.org> Acked-by: Daniel Stone <daniel@fooishbar.org> Reviewed-by: Dave Airlie <airlied@redhat.com> (v4) Reviewed-by: Christian König <christian.koenig@amd.com> (v3) Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Dave Airlie <airlied@gmail.com> Cc: Rob Clark <robdclark@chromium.org> Cc: Kristian H. Kristensen <hoegsberg@google.com> Cc: Michel Dänzer <michel@daenzer.net> Cc: Daniel Stone <daniels@collabora.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Chen Li <chenli@uniontech.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210624125246.166721-1-daniel.vetter@ffwll.ch
2021-06-24 12:52:46 +00:00
*
* - Some drivers only expose a synchronous userspace API with no
* pipelining across drivers. These do not set any fences for their
* access. An example here is v4l.
*
* - Driver should use dma_resv_usage_rw() when retrieving fences as
* dependency for implicit synchronization.
*
dma-buf: Document dma-buf implicit fencing/resv fencing rules Docs for struct dma_resv are fairly clear: "A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)." https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-objects Furthermore a review across all of upstream. First of render drivers and how they set implicit fences: - nouveau follows this contract, see in validate_fini_no_ticket() nouveau_bo_fence(nvbo, fence, !!b->write_domains); and that last boolean controls whether the exclusive or shared fence slot is used. - radeon follows this contract by setting p->relocs[i].tv.num_shared = !r->write_domain; in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing. - vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot. - etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences() - i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract. - lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo - msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer - panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess - v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved - v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos() - vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu - vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence() This covers the setting of the exclusive fences when writing. Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few: - i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan) - etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback. - v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small. - v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up. - panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper. - lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d. - msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea. - all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain. Onwards to the display side. - Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs. - I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right. In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu. Amdgpu tried to fix this already in commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König <christian.koenig@amd.com> Date: Wed Sep 19 16:54:35 2018 +0200 drm/amdgpu: fix using shared fence for exported BOs v2 but this fix falls short on a number of areas: - It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable. - It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst <maarten.lankhorst@canonical.com> Date: Tue Jul 1 12:57:43 2014 +0200 dma-buf: add poll support, v3 - Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting "Yeah, and that is exactly the reason why I will NAK this uAPI change. "This doesn't works for amdgpu at all for the reasons outlined above." https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail.com/ Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works. Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan: - amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk. - Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model. - for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same. - since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag. - New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot. - Then when this is all in place we can merge this patch here. What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted. Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers. v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed. This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is - fishing the implicit fences out of a shared object at the right time - setting the exclusive implicit fence slot at the right time. Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd: https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.net/ v3: Since Christian has fixed amdgpu now in commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König <christian.koenig@amd.com> Date: Wed Jun 9 13:51:36 2021 +0200 drm/amdgpu: rework dma_resv handling v3 Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers. Since dynamic importers have different rules also hammer these in again while we're at it. v4: - Add the missing "through the device" in the dynamic section that I overlooked. - Fix a kerneldoc markup mistake, the link didn't connect v5: - A few s/should/must/ to make clear what must be done (if the driver does implicit sync) and what's more a maybe (Daniel Stone) - drop all the example api discussion, that needs to be expanded, clarified and put into a new chapter in drm-uapi.rst (Daniel Stone) Cc: Daniel Stone <daniel@fooishbar.org> Acked-by: Daniel Stone <daniel@fooishbar.org> Reviewed-by: Dave Airlie <airlied@redhat.com> (v4) Reviewed-by: Christian König <christian.koenig@amd.com> (v3) Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Dave Airlie <airlied@gmail.com> Cc: Rob Clark <robdclark@chromium.org> Cc: Kristian H. Kristensen <hoegsberg@google.com> Cc: Michel Dänzer <michel@daenzer.net> Cc: Daniel Stone <daniels@collabora.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Chen Li <chenli@uniontech.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210624125246.166721-1-daniel.vetter@ffwll.ch
2021-06-24 12:52:46 +00:00
* DYNAMIC IMPORTER RULES:
*
* Dynamic importers, see dma_buf_attachment_is_dynamic(), have
* additional constraints on how they set up fences:
*
* - Dynamic importers must obey the write fences and wait for them to
dma-buf: Document dma-buf implicit fencing/resv fencing rules Docs for struct dma_resv are fairly clear: "A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)." https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-objects Furthermore a review across all of upstream. First of render drivers and how they set implicit fences: - nouveau follows this contract, see in validate_fini_no_ticket() nouveau_bo_fence(nvbo, fence, !!b->write_domains); and that last boolean controls whether the exclusive or shared fence slot is used. - radeon follows this contract by setting p->relocs[i].tv.num_shared = !r->write_domain; in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing. - vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot. - etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences() - i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract. - lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo - msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer - panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess - v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved - v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos() - vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu - vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence() This covers the setting of the exclusive fences when writing. Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few: - i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan) - etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback. - v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small. - v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up. - panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper. - lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d. - msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea. - all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain. Onwards to the display side. - Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs. - I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right. In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu. Amdgpu tried to fix this already in commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König <christian.koenig@amd.com> Date: Wed Sep 19 16:54:35 2018 +0200 drm/amdgpu: fix using shared fence for exported BOs v2 but this fix falls short on a number of areas: - It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable. - It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst <maarten.lankhorst@canonical.com> Date: Tue Jul 1 12:57:43 2014 +0200 dma-buf: add poll support, v3 - Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting "Yeah, and that is exactly the reason why I will NAK this uAPI change. "This doesn't works for amdgpu at all for the reasons outlined above." https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail.com/ Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works. Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan: - amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk. - Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model. - for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same. - since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag. - New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot. - Then when this is all in place we can merge this patch here. What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted. Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers. v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed. This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is - fishing the implicit fences out of a shared object at the right time - setting the exclusive implicit fence slot at the right time. Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd: https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.net/ v3: Since Christian has fixed amdgpu now in commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König <christian.koenig@amd.com> Date: Wed Jun 9 13:51:36 2021 +0200 drm/amdgpu: rework dma_resv handling v3 Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers. Since dynamic importers have different rules also hammer these in again while we're at it. v4: - Add the missing "through the device" in the dynamic section that I overlooked. - Fix a kerneldoc markup mistake, the link didn't connect v5: - A few s/should/must/ to make clear what must be done (if the driver does implicit sync) and what's more a maybe (Daniel Stone) - drop all the example api discussion, that needs to be expanded, clarified and put into a new chapter in drm-uapi.rst (Daniel Stone) Cc: Daniel Stone <daniel@fooishbar.org> Acked-by: Daniel Stone <daniel@fooishbar.org> Reviewed-by: Dave Airlie <airlied@redhat.com> (v4) Reviewed-by: Christian König <christian.koenig@amd.com> (v3) Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Dave Airlie <airlied@gmail.com> Cc: Rob Clark <robdclark@chromium.org> Cc: Kristian H. Kristensen <hoegsberg@google.com> Cc: Michel Dänzer <michel@daenzer.net> Cc: Daniel Stone <daniels@collabora.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Chen Li <chenli@uniontech.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210624125246.166721-1-daniel.vetter@ffwll.ch
2021-06-24 12:52:46 +00:00
* signal before allowing access to the buffer's underlying storage
* through the device.
*
* - Dynamic importers should set fences for any access that they can't
* disable immediately from their &dma_buf_attach_ops.move_notify
* callback.
dma-resv: Give the docs a do-over Specifically document the new/clarified rules around how the shared fences do not have any ordering requirements against the exclusive fence. But also document all the things a bit better, given how central struct dma_resv to dynamic buffer management the docs have been very inadequat. - Lots more links to other pieces of the puzzle. Unfortunately ttm_buffer_object has no docs, so no links :-( - Explain/complain a bit about dma_resv_locking_ctx(). I still don't like that one, but fixing the ttm call chains is going to be horrible. Plus we want to plug in real slowpath locking when we do that anyway. - Main part of the patch is some actual docs for struct dma_resv. Overall I think we still have a lot of bad naming in this area (e.g. dma_resv.fence is singular, but contains the multiple shared fences), but I think that's more indicative of how the semantics and rules are just not great. Another thing that's real awkard is how chaining exclusive fences right now means direct dma_resv.exclusive_fence pointer access with an rcu_assign_pointer. Not so great either. v2: - Fix a pile of typos (Matt, Jason) - Hammer it in that breaking the rules leads to use-after-free issues around dma-buf sharing (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-21-daniel.vetter@ffwll.ch
2021-08-05 10:47:05 +00:00
*
* IMPORTANT:
*
* All drivers and memory management related functions must obey the
* struct dma_resv rules, specifically the rules for updating and
* obeying fences. See enum dma_resv_usage for further descriptions.
*/
struct dma_resv *resv;
/** @poll: for userspace poll support */
wait_queue_head_t poll;
/** @cb_in: for userspace poll support */
/** @cb_out: for userspace poll support */
struct dma_buf_poll_cb_t {
dma-buf: Rename struct fence to dma_fence I plan to usurp the short name of struct fence for a core kernel struct, and so I need to rename the specialised fence/timeline for DMA operations to make room. A consensus was reached in https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html that making clear this fence applies to DMA operations was a good thing. Since then the patch has grown a bit as usage increases, so hopefully it remains a good thing! (v2...: rebase, rerun spatch) v3: Compile on msm, spotted a manual fixup that I broke. v4: Try again for msm, sorry Daniel coccinelle script: @@ @@ - struct fence + struct dma_fence @@ @@ - struct fence_ops + struct dma_fence_ops @@ @@ - struct fence_cb + struct dma_fence_cb @@ @@ - struct fence_array + struct dma_fence_array @@ @@ - enum fence_flag_bits + enum dma_fence_flag_bits @@ @@ ( - fence_init + dma_fence_init | - fence_release + dma_fence_release | - fence_free + dma_fence_free | - fence_get + dma_fence_get | - fence_get_rcu + dma_fence_get_rcu | - fence_put + dma_fence_put | - fence_signal + dma_fence_signal | - fence_signal_locked + dma_fence_signal_locked | - fence_default_wait + dma_fence_default_wait | - fence_add_callback + dma_fence_add_callback | - fence_remove_callback + dma_fence_remove_callback | - fence_enable_sw_signaling + dma_fence_enable_sw_signaling | - fence_is_signaled_locked + dma_fence_is_signaled_locked | - fence_is_signaled + dma_fence_is_signaled | - fence_is_later + dma_fence_is_later | - fence_later + dma_fence_later | - fence_wait_timeout + dma_fence_wait_timeout | - fence_wait_any_timeout + dma_fence_wait_any_timeout | - fence_wait + dma_fence_wait | - fence_context_alloc + dma_fence_context_alloc | - fence_array_create + dma_fence_array_create | - to_fence_array + to_dma_fence_array | - fence_is_array + dma_fence_is_array | - trace_fence_emit + trace_dma_fence_emit | - FENCE_TRACE + DMA_FENCE_TRACE | - FENCE_WARN + DMA_FENCE_WARN | - FENCE_ERR + DMA_FENCE_ERR ) ( ... ) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
2016-10-25 12:00:45 +00:00
struct dma_fence_cb cb;
wait_queue_head_t *poll;
__poll_t active;
} cb_in, cb_out;
dmabuf: Add the capability to expose DMA-BUF stats in sysfs Overview ======== The patch adds DMA-BUF statistics to /sys/kernel/dmabuf/buffers. It allows statistics to be enabled for each DMA-BUF in sysfs by enabling the config CONFIG_DMABUF_SYSFS_STATS. The following stats will be exposed by the interface: /sys/kernel/dmabuf/buffers/<inode_number>/exporter_name /sys/kernel/dmabuf/buffers/<inode_number>/size /sys/kernel/dmabuf/buffers/<inode_number>/attachments/<attach_uid>/device /sys/kernel/dmabuf/buffers/<inode_number>/attachments/<attach_uid>/map_counter The inode_number is unique for each DMA-BUF and was added earlier [1] in order to allow userspace to track DMA-BUF usage across different processes. Use Cases ========= The interface provides a way to gather DMA-BUF per-buffer statistics from production devices. These statistics will be used to derive DMA-BUF per-exporter stats and per-device usage stats for Android Bug reports. The corresponding userspace changes can be found at [2]. Telemetry tools will also capture this information(along with other memory metrics) periodically as well as on important events like a foreground app kill (which might have been triggered by Low Memory Killer). It will also contribute to provide a snapshot of the system memory usage on other events such as OOM kills and Application Not Responding events. Background ========== Currently, there are two existing interfaces that provide information about DMA-BUFs. 1) /sys/kernel/debug/dma_buf/bufinfo debugfs is however unsuitable to be mounted in production systems and cannot be considered as an alternative to the sysfs interface being proposed. 2) proc/<pid>/fdinfo/<fd> The proc/<pid>/fdinfo/<fd> files expose information about DMA-BUF fds. However, the existing procfs interfaces can only provide information about the buffers for which processes hold fds or have the buffers mmapped into their address space. Since the procfs interfaces alone cannot provide a full picture of all DMA-BUFs in the system, there is the need for an alternate interface to provide this information on production systems. The patch contains the following major improvements over v1: 1) Each attachment is represented by its own directory to allow creating a symlink to the importing device and to also provide room for future expansion. 2) The number of distinct mappings of each attachment is exposed in a separate file. 3) The per-buffer statistics are now in /sys/kernel/dmabuf/buffers inorder to make the interface expandable in future. All of the improvements above are based on suggestions/feedback from Daniel Vetter and Christian König. A shell script that can be run on a classic Linux environment to read out the DMA-BUF statistics can be found at [3](suggested by John Stultz). [1]: https://lore.kernel.org/patchwork/patch/1088791/ [2]: https://android-review.googlesource.com/q/topic:%22dmabuf-sysfs%22+(status:open%20OR%20status:merged) [3]: https://android-review.googlesource.com/c/platform/system/memory/libmeminfo/+/1549734 Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Hridya Valsaraju <hridya@google.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210603214758.2955251-1-hridya@google.com
2021-06-03 21:47:51 +00:00
#ifdef CONFIG_DMABUF_SYSFS_STATS
/**
* @sysfs_entry:
*
* For exposing information about this buffer in sysfs. See also
* `DMA-BUF statistics`_ for the uapi this enables.
*/
dmabuf: Add the capability to expose DMA-BUF stats in sysfs Overview ======== The patch adds DMA-BUF statistics to /sys/kernel/dmabuf/buffers. It allows statistics to be enabled for each DMA-BUF in sysfs by enabling the config CONFIG_DMABUF_SYSFS_STATS. The following stats will be exposed by the interface: /sys/kernel/dmabuf/buffers/<inode_number>/exporter_name /sys/kernel/dmabuf/buffers/<inode_number>/size /sys/kernel/dmabuf/buffers/<inode_number>/attachments/<attach_uid>/device /sys/kernel/dmabuf/buffers/<inode_number>/attachments/<attach_uid>/map_counter The inode_number is unique for each DMA-BUF and was added earlier [1] in order to allow userspace to track DMA-BUF usage across different processes. Use Cases ========= The interface provides a way to gather DMA-BUF per-buffer statistics from production devices. These statistics will be used to derive DMA-BUF per-exporter stats and per-device usage stats for Android Bug reports. The corresponding userspace changes can be found at [2]. Telemetry tools will also capture this information(along with other memory metrics) periodically as well as on important events like a foreground app kill (which might have been triggered by Low Memory Killer). It will also contribute to provide a snapshot of the system memory usage on other events such as OOM kills and Application Not Responding events. Background ========== Currently, there are two existing interfaces that provide information about DMA-BUFs. 1) /sys/kernel/debug/dma_buf/bufinfo debugfs is however unsuitable to be mounted in production systems and cannot be considered as an alternative to the sysfs interface being proposed. 2) proc/<pid>/fdinfo/<fd> The proc/<pid>/fdinfo/<fd> files expose information about DMA-BUF fds. However, the existing procfs interfaces can only provide information about the buffers for which processes hold fds or have the buffers mmapped into their address space. Since the procfs interfaces alone cannot provide a full picture of all DMA-BUFs in the system, there is the need for an alternate interface to provide this information on production systems. The patch contains the following major improvements over v1: 1) Each attachment is represented by its own directory to allow creating a symlink to the importing device and to also provide room for future expansion. 2) The number of distinct mappings of each attachment is exposed in a separate file. 3) The per-buffer statistics are now in /sys/kernel/dmabuf/buffers inorder to make the interface expandable in future. All of the improvements above are based on suggestions/feedback from Daniel Vetter and Christian König. A shell script that can be run on a classic Linux environment to read out the DMA-BUF statistics can be found at [3](suggested by John Stultz). [1]: https://lore.kernel.org/patchwork/patch/1088791/ [2]: https://android-review.googlesource.com/q/topic:%22dmabuf-sysfs%22+(status:open%20OR%20status:merged) [3]: https://android-review.googlesource.com/c/platform/system/memory/libmeminfo/+/1549734 Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Hridya Valsaraju <hridya@google.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210603214758.2955251-1-hridya@google.com
2021-06-03 21:47:51 +00:00
struct dma_buf_sysfs_entry {
struct kobject kobj;
struct dma_buf *dmabuf;
} *sysfs_entry;
#endif
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
};
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
/**
* struct dma_buf_attach_ops - importer operations for an attachment
*
* Attachment operations implemented by the importer.
*/
struct dma_buf_attach_ops {
/**
* @allow_peer2peer:
*
* If this is set to true the importer must be able to handle peer
* resources without struct pages.
*/
bool allow_peer2peer;
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
/**
* @move_notify: [optional] notification that the DMA-buf is moving
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
*
* If this callback is provided the framework can avoid pinning the
* backing store while mappings exists.
*
* This callback is called with the lock of the reservation object
* associated with the dma_buf held and the mapping function must be
* called with this lock held as well. This makes sure that no mapping
* is created concurrently with an ongoing move operation.
*
* Mappings stay valid and are not directly affected by this callback.
* But the DMA-buf can now be in a different physical location, so all
* mappings should be destroyed and re-created as soon as possible.
*
* New mappings can be created after this callback returns, and will
* point to the new location of the DMA-buf.
*/
void (*move_notify)(struct dma_buf_attachment *attach);
};
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
/**
* struct dma_buf_attachment - holds device-buffer attachment data
* @dmabuf: buffer for this attachment.
* @dev: device attached to the buffer.
dma-buf: change DMA-buf locking convention v3 This patch is a stripped down version of the locking changes necessary to support dynamic DMA-buf handling. It adds a dynamic flag for both importers as well as exporters so that drivers can choose if they want the reservation object locked or unlocked during mapping of attachments. For compatibility between drivers we cache the DMA-buf mapping during attaching an importer as soon as exporter/importer disagree on the dynamic handling. Issues and solutions we considered: - We can't change all existing drivers, and existing improters have strong opinions about which locks they're holding while calling dma_buf_attachment_map/unmap. Exporters also have strong opinions about which locks they can acquire in their ->map/unmap callbacks, levaing no room for change. The solution to avoid this was to move the actual map/unmap out from this call, into the attach/detach callbacks, and cache the mapping. This works because drivers don't call attach/detach from deep within their code callchains (like deep in memory management code called from cs/execbuf ioctl), but directly from the fd2handle implementation. - The caching has some troubles on some soc drivers, which set other modes than DMA_BIDIRECTIONAL. We can't have 2 incompatible mappings, and we can't re-create the mapping at _map time due to the above locking fun. We very carefuly step around that by only caching at attach time if the dynamic mode between importer/expoert mismatches. - There's been quite some discussion on dma-buf mappings which need active cache management, which would all break down when caching, plus we don't have explicit flush operations on the attachment side. The solution to this was to shrug and keep the current discrepancy between what the dma-buf docs claim and what implementations do, with the hope that the begin/end_cpu_access hooks are good enough and that all necessary flushing to keep device mappings consistent will be done there. v2: cleanup set_name merge, improve kerneldoc v3: update commit message, kerneldoc and cleanup _debug_show() Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/336788/
2018-07-03 14:42:26 +00:00
* @node: list of dma_buf_attachment, protected by dma_resv lock of the dmabuf.
* @sgt: cached mapping.
* @dir: direction of cached mapping.
* @peer2peer: true if the importer can handle peer resources without pages.
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
* @priv: exporter specific attachment data.
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
* @importer_ops: importer operations for this attachment, if provided
* dma_buf_map/unmap_attachment() must be called with the dma_resv lock held.
* @importer_priv: importer specific attachment data.
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
*
* This structure holds the attachment information between the dma_buf buffer
* and its user device(s). The list contains one attachment struct per device
* attached to the buffer.
*
* An attachment is created by calling dma_buf_attach(), and released again by
* calling dma_buf_detach(). The DMA mapping itself needed to initiate a
* transfer is created by dma_buf_map_attachment() and freed again by calling
* dma_buf_unmap_attachment().
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
*/
struct dma_buf_attachment {
struct dma_buf *dmabuf;
struct device *dev;
struct list_head node;
struct sg_table *sgt;
enum dma_data_direction dir;
bool peer2peer;
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
const struct dma_buf_attach_ops *importer_ops;
void *importer_priv;
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
void *priv;
};
/**
* struct dma_buf_export_info - holds information needed to export a dma_buf
* @exp_name: name of the exporter - useful for debugging.
* @owner: pointer to exporter module - used for refcounting kernel module
* @ops: Attach allocator-defined dma buf ops to the new buffer
* @size: Size of the buffer - invariant over the lifetime of the buffer
* @flags: mode flags for the file
* @resv: reservation-object, NULL to allocate default one
* @priv: Attach private data of allocator to this buffer
*
* This structure holds the information required to export the buffer. Used
* with dma_buf_export() only.
*/
struct dma_buf_export_info {
const char *exp_name;
struct module *owner;
const struct dma_buf_ops *ops;
size_t size;
int flags;
struct dma_resv *resv;
void *priv;
};
/**
* DEFINE_DMA_BUF_EXPORT_INFO - helper macro for exporters
* @name: export-info name
*
* DEFINE_DMA_BUF_EXPORT_INFO macro defines the &struct dma_buf_export_info,
* zeroes it out and pre-populates exp_name in it.
*/
#define DEFINE_DMA_BUF_EXPORT_INFO(name) \
struct dma_buf_export_info name = { .exp_name = KBUILD_MODNAME, \
.owner = THIS_MODULE }
/**
* get_dma_buf - convenience wrapper for get_file.
* @dmabuf: [in] pointer to dma_buf
*
* Increments the reference count on the dma-buf, needed in case of drivers
* that either need to create additional references to the dmabuf on the
* kernel side. For example, an exporter that needs to keep a dmabuf ptr
* so that subsequent exports don't create a new dmabuf.
*/
static inline void get_dma_buf(struct dma_buf *dmabuf)
{
get_file(dmabuf->file);
}
dma-buf: change DMA-buf locking convention v3 This patch is a stripped down version of the locking changes necessary to support dynamic DMA-buf handling. It adds a dynamic flag for both importers as well as exporters so that drivers can choose if they want the reservation object locked or unlocked during mapping of attachments. For compatibility between drivers we cache the DMA-buf mapping during attaching an importer as soon as exporter/importer disagree on the dynamic handling. Issues and solutions we considered: - We can't change all existing drivers, and existing improters have strong opinions about which locks they're holding while calling dma_buf_attachment_map/unmap. Exporters also have strong opinions about which locks they can acquire in their ->map/unmap callbacks, levaing no room for change. The solution to avoid this was to move the actual map/unmap out from this call, into the attach/detach callbacks, and cache the mapping. This works because drivers don't call attach/detach from deep within their code callchains (like deep in memory management code called from cs/execbuf ioctl), but directly from the fd2handle implementation. - The caching has some troubles on some soc drivers, which set other modes than DMA_BIDIRECTIONAL. We can't have 2 incompatible mappings, and we can't re-create the mapping at _map time due to the above locking fun. We very carefuly step around that by only caching at attach time if the dynamic mode between importer/expoert mismatches. - There's been quite some discussion on dma-buf mappings which need active cache management, which would all break down when caching, plus we don't have explicit flush operations on the attachment side. The solution to this was to shrug and keep the current discrepancy between what the dma-buf docs claim and what implementations do, with the hope that the begin/end_cpu_access hooks are good enough and that all necessary flushing to keep device mappings consistent will be done there. v2: cleanup set_name merge, improve kerneldoc v3: update commit message, kerneldoc and cleanup _debug_show() Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/336788/
2018-07-03 14:42:26 +00:00
/**
* dma_buf_is_dynamic - check if a DMA-buf uses dynamic mappings.
* @dmabuf: the DMA-buf to check
*
* Returns true if a DMA-buf exporter wants to be called with the dma_resv
* locked for the map/unmap callbacks, false if it doesn't wants to be called
* with the lock held.
*/
static inline bool dma_buf_is_dynamic(struct dma_buf *dmabuf)
{
return !!dmabuf->ops->pin;
dma-buf: change DMA-buf locking convention v3 This patch is a stripped down version of the locking changes necessary to support dynamic DMA-buf handling. It adds a dynamic flag for both importers as well as exporters so that drivers can choose if they want the reservation object locked or unlocked during mapping of attachments. For compatibility between drivers we cache the DMA-buf mapping during attaching an importer as soon as exporter/importer disagree on the dynamic handling. Issues and solutions we considered: - We can't change all existing drivers, and existing improters have strong opinions about which locks they're holding while calling dma_buf_attachment_map/unmap. Exporters also have strong opinions about which locks they can acquire in their ->map/unmap callbacks, levaing no room for change. The solution to avoid this was to move the actual map/unmap out from this call, into the attach/detach callbacks, and cache the mapping. This works because drivers don't call attach/detach from deep within their code callchains (like deep in memory management code called from cs/execbuf ioctl), but directly from the fd2handle implementation. - The caching has some troubles on some soc drivers, which set other modes than DMA_BIDIRECTIONAL. We can't have 2 incompatible mappings, and we can't re-create the mapping at _map time due to the above locking fun. We very carefuly step around that by only caching at attach time if the dynamic mode between importer/expoert mismatches. - There's been quite some discussion on dma-buf mappings which need active cache management, which would all break down when caching, plus we don't have explicit flush operations on the attachment side. The solution to this was to shrug and keep the current discrepancy between what the dma-buf docs claim and what implementations do, with the hope that the begin/end_cpu_access hooks are good enough and that all necessary flushing to keep device mappings consistent will be done there. v2: cleanup set_name merge, improve kerneldoc v3: update commit message, kerneldoc and cleanup _debug_show() Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/336788/
2018-07-03 14:42:26 +00:00
}
/**
* dma_buf_attachment_is_dynamic - check if a DMA-buf attachment uses dynamic
* mappings
dma-buf: change DMA-buf locking convention v3 This patch is a stripped down version of the locking changes necessary to support dynamic DMA-buf handling. It adds a dynamic flag for both importers as well as exporters so that drivers can choose if they want the reservation object locked or unlocked during mapping of attachments. For compatibility between drivers we cache the DMA-buf mapping during attaching an importer as soon as exporter/importer disagree on the dynamic handling. Issues and solutions we considered: - We can't change all existing drivers, and existing improters have strong opinions about which locks they're holding while calling dma_buf_attachment_map/unmap. Exporters also have strong opinions about which locks they can acquire in their ->map/unmap callbacks, levaing no room for change. The solution to avoid this was to move the actual map/unmap out from this call, into the attach/detach callbacks, and cache the mapping. This works because drivers don't call attach/detach from deep within their code callchains (like deep in memory management code called from cs/execbuf ioctl), but directly from the fd2handle implementation. - The caching has some troubles on some soc drivers, which set other modes than DMA_BIDIRECTIONAL. We can't have 2 incompatible mappings, and we can't re-create the mapping at _map time due to the above locking fun. We very carefuly step around that by only caching at attach time if the dynamic mode between importer/expoert mismatches. - There's been quite some discussion on dma-buf mappings which need active cache management, which would all break down when caching, plus we don't have explicit flush operations on the attachment side. The solution to this was to shrug and keep the current discrepancy between what the dma-buf docs claim and what implementations do, with the hope that the begin/end_cpu_access hooks are good enough and that all necessary flushing to keep device mappings consistent will be done there. v2: cleanup set_name merge, improve kerneldoc v3: update commit message, kerneldoc and cleanup _debug_show() Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/336788/
2018-07-03 14:42:26 +00:00
* @attach: the DMA-buf attachment to check
*
* Returns true if a DMA-buf importer wants to call the map/unmap functions with
* the dma_resv lock held.
*/
static inline bool
dma_buf_attachment_is_dynamic(struct dma_buf_attachment *attach)
{
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
return !!attach->importer_ops;
dma-buf: change DMA-buf locking convention v3 This patch is a stripped down version of the locking changes necessary to support dynamic DMA-buf handling. It adds a dynamic flag for both importers as well as exporters so that drivers can choose if they want the reservation object locked or unlocked during mapping of attachments. For compatibility between drivers we cache the DMA-buf mapping during attaching an importer as soon as exporter/importer disagree on the dynamic handling. Issues and solutions we considered: - We can't change all existing drivers, and existing improters have strong opinions about which locks they're holding while calling dma_buf_attachment_map/unmap. Exporters also have strong opinions about which locks they can acquire in their ->map/unmap callbacks, levaing no room for change. The solution to avoid this was to move the actual map/unmap out from this call, into the attach/detach callbacks, and cache the mapping. This works because drivers don't call attach/detach from deep within their code callchains (like deep in memory management code called from cs/execbuf ioctl), but directly from the fd2handle implementation. - The caching has some troubles on some soc drivers, which set other modes than DMA_BIDIRECTIONAL. We can't have 2 incompatible mappings, and we can't re-create the mapping at _map time due to the above locking fun. We very carefuly step around that by only caching at attach time if the dynamic mode between importer/expoert mismatches. - There's been quite some discussion on dma-buf mappings which need active cache management, which would all break down when caching, plus we don't have explicit flush operations on the attachment side. The solution to this was to shrug and keep the current discrepancy between what the dma-buf docs claim and what implementations do, with the hope that the begin/end_cpu_access hooks are good enough and that all necessary flushing to keep device mappings consistent will be done there. v2: cleanup set_name merge, improve kerneldoc v3: update commit message, kerneldoc and cleanup _debug_show() Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/336788/
2018-07-03 14:42:26 +00:00
}
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
dma-buf: change DMA-buf locking convention v3 This patch is a stripped down version of the locking changes necessary to support dynamic DMA-buf handling. It adds a dynamic flag for both importers as well as exporters so that drivers can choose if they want the reservation object locked or unlocked during mapping of attachments. For compatibility between drivers we cache the DMA-buf mapping during attaching an importer as soon as exporter/importer disagree on the dynamic handling. Issues and solutions we considered: - We can't change all existing drivers, and existing improters have strong opinions about which locks they're holding while calling dma_buf_attachment_map/unmap. Exporters also have strong opinions about which locks they can acquire in their ->map/unmap callbacks, levaing no room for change. The solution to avoid this was to move the actual map/unmap out from this call, into the attach/detach callbacks, and cache the mapping. This works because drivers don't call attach/detach from deep within their code callchains (like deep in memory management code called from cs/execbuf ioctl), but directly from the fd2handle implementation. - The caching has some troubles on some soc drivers, which set other modes than DMA_BIDIRECTIONAL. We can't have 2 incompatible mappings, and we can't re-create the mapping at _map time due to the above locking fun. We very carefuly step around that by only caching at attach time if the dynamic mode between importer/expoert mismatches. - There's been quite some discussion on dma-buf mappings which need active cache management, which would all break down when caching, plus we don't have explicit flush operations on the attachment side. The solution to this was to shrug and keep the current discrepancy between what the dma-buf docs claim and what implementations do, with the hope that the begin/end_cpu_access hooks are good enough and that all necessary flushing to keep device mappings consistent will be done there. v2: cleanup set_name merge, improve kerneldoc v3: update commit message, kerneldoc and cleanup _debug_show() Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/336788/
2018-07-03 14:42:26 +00:00
struct device *dev);
struct dma_buf_attachment *
dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev,
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
const struct dma_buf_attach_ops *importer_ops,
void *importer_priv);
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
void dma_buf_detach(struct dma_buf *dmabuf,
dma-buf: change DMA-buf locking convention v3 This patch is a stripped down version of the locking changes necessary to support dynamic DMA-buf handling. It adds a dynamic flag for both importers as well as exporters so that drivers can choose if they want the reservation object locked or unlocked during mapping of attachments. For compatibility between drivers we cache the DMA-buf mapping during attaching an importer as soon as exporter/importer disagree on the dynamic handling. Issues and solutions we considered: - We can't change all existing drivers, and existing improters have strong opinions about which locks they're holding while calling dma_buf_attachment_map/unmap. Exporters also have strong opinions about which locks they can acquire in their ->map/unmap callbacks, levaing no room for change. The solution to avoid this was to move the actual map/unmap out from this call, into the attach/detach callbacks, and cache the mapping. This works because drivers don't call attach/detach from deep within their code callchains (like deep in memory management code called from cs/execbuf ioctl), but directly from the fd2handle implementation. - The caching has some troubles on some soc drivers, which set other modes than DMA_BIDIRECTIONAL. We can't have 2 incompatible mappings, and we can't re-create the mapping at _map time due to the above locking fun. We very carefuly step around that by only caching at attach time if the dynamic mode between importer/expoert mismatches. - There's been quite some discussion on dma-buf mappings which need active cache management, which would all break down when caching, plus we don't have explicit flush operations on the attachment side. The solution to this was to shrug and keep the current discrepancy between what the dma-buf docs claim and what implementations do, with the hope that the begin/end_cpu_access hooks are good enough and that all necessary flushing to keep device mappings consistent will be done there. v2: cleanup set_name merge, improve kerneldoc v3: update commit message, kerneldoc and cleanup _debug_show() Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/336788/
2018-07-03 14:42:26 +00:00
struct dma_buf_attachment *attach);
dma-buf: add dynamic DMA-buf handling v15 On the exporter side we add optional explicit pinning callbacks. Which are called when the importer doesn't implement dynamic handling, move notification or need the DMA-buf locked in place for its use case. On the importer side we add an optional move_notify callback. This callback is used by the exporter to inform the importers that their mappings should be destroyed as soon as possible. This allows the exporter to provide the mappings without the need to pin the backing store. v2: don't try to invalidate mappings when the callback is NULL, lock the reservation obj while using the attachments, add helper to set the callback v3: move flag for invalidation support into the DMA-buf, use new attach_info structure to set the callback v4: use importer_priv field instead of mangling exporter priv. v5: drop invalidation_supported flag v6: squash together with pin/unpin changes v7: pin/unpin takes an attachment now v8: nuke dma_buf_attachment_(map|unmap)_locked, everything is now handled backward compatible v9: always cache when export/importer don't agree on dynamic handling v10: minimal style cleanup v11: drop automatically re-entry avoidance v12: rename callback to move_notify v13: add might_lock in appropriate places v14: rebase on separated locking change v15: add EXPERIMENTAL flag, some more code comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2018-07-03 14:42:26 +00:00
int dma_buf_pin(struct dma_buf_attachment *attach);
void dma_buf_unpin(struct dma_buf_attachment *attach);
struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info);
int dma_buf_fd(struct dma_buf *dmabuf, int flags);
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
struct dma_buf *dma_buf_get(int fd);
void dma_buf_put(struct dma_buf *dmabuf);
struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
enum dma_data_direction);
void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
enum dma_data_direction);
dma-buf: change DMA-buf locking convention v3 This patch is a stripped down version of the locking changes necessary to support dynamic DMA-buf handling. It adds a dynamic flag for both importers as well as exporters so that drivers can choose if they want the reservation object locked or unlocked during mapping of attachments. For compatibility between drivers we cache the DMA-buf mapping during attaching an importer as soon as exporter/importer disagree on the dynamic handling. Issues and solutions we considered: - We can't change all existing drivers, and existing improters have strong opinions about which locks they're holding while calling dma_buf_attachment_map/unmap. Exporters also have strong opinions about which locks they can acquire in their ->map/unmap callbacks, levaing no room for change. The solution to avoid this was to move the actual map/unmap out from this call, into the attach/detach callbacks, and cache the mapping. This works because drivers don't call attach/detach from deep within their code callchains (like deep in memory management code called from cs/execbuf ioctl), but directly from the fd2handle implementation. - The caching has some troubles on some soc drivers, which set other modes than DMA_BIDIRECTIONAL. We can't have 2 incompatible mappings, and we can't re-create the mapping at _map time due to the above locking fun. We very carefuly step around that by only caching at attach time if the dynamic mode between importer/expoert mismatches. - There's been quite some discussion on dma-buf mappings which need active cache management, which would all break down when caching, plus we don't have explicit flush operations on the attachment side. The solution to this was to shrug and keep the current discrepancy between what the dma-buf docs claim and what implementations do, with the hope that the begin/end_cpu_access hooks are good enough and that all necessary flushing to keep device mappings consistent will be done there. v2: cleanup set_name merge, improve kerneldoc v3: update commit message, kerneldoc and cleanup _debug_show() Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/336788/
2018-07-03 14:42:26 +00:00
void dma_buf_move_notify(struct dma_buf *dma_buf);
int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
enum dma_data_direction dir);
dma-buf, drm, ion: Propagate error code from dma_buf_start_cpu_access() Drivers, especially i915.ko, can fail during the initial migration of a dma-buf for CPU access. However, the error code from the driver was not being propagated back to ioctl and so userspace was blissfully ignorant of the failure. Rendering corruption ensues. Whilst fixing the ioctl to return the error code from dma_buf_start_cpu_access(), also do the same for dma_buf_end_cpu_access(). For most drivers, dma_buf_end_cpu_access() cannot fail. i915.ko however, as most drivers would, wants to avoid being uninterruptible (as would be required to guarrantee no failure when flushing the buffer to the device). As userspace already has to handle errors from the SYNC_IOCTL, take advantage of this to be able to restart the syscall across signals. This fixes a coherency issue for i915.ko as well as reducing the uninterruptible hold upon its BKL, the struct_mutex. Fixes commit c11e391da2a8fe973c3c2398452000bed505851e Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Feb 11 20:04:51 2016 -0200 dma-buf: Add ioctls to allow userspace to flush Testcase: igt/gem_concurrent_blit/*dmabuf*interruptible Testcase: igt/prime_mmap_coherency/ioctl-errors Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tiago Vignatti <tiago.vignatti@intel.com> Cc: Stéphane Marchesin <marcheu@chromium.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Daniel Vetter <daniel.vetter@intel.com> CC: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: intel-gfx@lists.freedesktop.org Cc: devel@driverdev.osuosl.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1458331359-2634-1-git-send-email-chris@chris-wilson.co.uk
2016-03-18 20:02:39 +00:00
int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
enum dma_data_direction dir);
struct sg_table *
dma_buf_map_attachment_unlocked(struct dma_buf_attachment *attach,
enum dma_data_direction direction);
void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach,
struct sg_table *sg_table,
enum dma_data_direction direction);
dma-buf: mmap support Compared to Rob Clark's RFC I've ditched the prepare/finish hooks and corresponding ioctls on the dma_buf file. The major reason for that is that many people seem to be under the impression that this is also for synchronization with outstanding asynchronous processsing. I'm pretty massively opposed to this because: - It boils down reinventing a new rather general-purpose userspace synchronization interface. If we look at things like futexes, this is hard to get right. - Furthermore a lot of kernel code has to interact with this synchronization primitive. This smells a look like the dri1 hw_lock, a horror show I prefer not to reinvent. - Even more fun is that multiple different subsystems would interact here, so we have plenty of opportunities to create funny deadlock scenarios. I think synchronization is a wholesale different problem from data sharing and should be tackled as an orthogonal problem. Now we could demand that prepare/finish may only ensure cache coherency (as Rob intended), but that runs up into the next problem: We not only need mmap support to facilitate sw-only processing nodes in a pipeline (without jumping through hoops by importing the dma_buf into some sw-access only importer), which allows for a nicer ION->dma-buf upgrade path for existing Android userspace. We also need mmap support for existing importing subsystems to support existing userspace libraries. And a loot of these subsystems are expected to export coherent userspace mappings. So prepare/finish can only ever be optional and the exporter /needs/ to support coherent mappings. Given that mmap access is always somewhat fallback-y in nature I've decided to drop this optimization, instead of just making it optional. If we demonstrate a clear need for this, supported by benchmark results, we can always add it in again later as an optional extension. Other differences compared to Rob's RFC is the above mentioned support for mapping a dma-buf through facilities provided by the importer. Which results in mmap support no longer being optional. Note that this dma-buf mmap patch does _not_ support every possible insanity an existing subsystem could pull of with mmap: Because it does not allow to intercept pagefaults and shoot down ptes importing subsystems can't add some magic of their own at these points (e.g. to automatically synchronize with outstanding rendering or set up some special resources). I've done a cursory read through a few mmap implementions of various subsytems and I'm hopeful that we can avoid this (and the complexity it'd bring with it). Additonally I've extended the documentation a bit to explain the hows and whys of this mmap extension. In case we ever want to add support for explicitly cache maneged userspace mmap with a prepare/finish ioctl pair, we could specify that userspace needs to mmap a different part of the dma_buf, e.g. the range starting at dma_buf->size up to dma_buf->size*2. This works because the size of a dma_buf is invariant over it's lifetime. The exporter would obviously need to fall back to coherent mappings for both ranges if a legacy clients maps the coherent range and the architecture cannot suppor conflicting caching policies. Also, this would obviously be optional and userspace needs to be able to fall back to coherent mappings. v2: - Spelling fixes from Rob Clark. - Compile fix for !DMA_BUF from Rob Clark. - Extend commit message to explain how explicitly cache managed mmap support could be added later. - Extend the documentation with implementations notes for exporters that need to manually fake coherency. v3: - dma_buf pointer initialization goof-up noticed by Rebecca Schultz Zavin. Cc: Rob Clark <rob.clark@linaro.org> Cc: Rebecca Schultz Zavin <rebecca@android.com> Acked-by: Rob Clark <rob.clark@linaro.org> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2012-04-24 09:08:52 +00:00
int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
unsigned long);
dma-buf-map: Rename to iosys-map Rename struct dma_buf_map to struct iosys_map and corresponding APIs. Over time dma-buf-map grew up to more functionality than the one used by dma-buf: in fact it's just a shim layer to abstract system memory, that can be accessed via regular load and store, from IO memory that needs to be acessed via arch helpers. The idea is to extend this API so it can fulfill other needs, internal to a single driver. Example: in the i915 driver it's desired to share the implementation for integrated graphics, which uses mostly system memory, with discrete graphics, which may need to access IO memory. The conversion was mostly done with the following semantic patch: @r1@ @@ - struct dma_buf_map + struct iosys_map @r2@ @@ ( - DMA_BUF_MAP_INIT_VADDR + IOSYS_MAP_INIT_VADDR | - dma_buf_map_set_vaddr + iosys_map_set_vaddr | - dma_buf_map_set_vaddr_iomem + iosys_map_set_vaddr_iomem | - dma_buf_map_is_equal + iosys_map_is_equal | - dma_buf_map_is_null + iosys_map_is_null | - dma_buf_map_is_set + iosys_map_is_set | - dma_buf_map_clear + iosys_map_clear | - dma_buf_map_memcpy_to + iosys_map_memcpy_to | - dma_buf_map_incr + iosys_map_incr ) @@ @@ - #include <linux/dma-buf-map.h> + #include <linux/iosys-map.h> Then some files had their includes adjusted and some comments were update to remove mentions to dma-buf-map. Since this is not specific to dma-buf anymore, move the documentation to the "Bus-Independent Device Accesses" section. v2: - Squash patches v3: - Fix wrong removal of dma-buf.h from MAINTAINERS - Move documentation from dma-buf.rst to device-io.rst v4: - Change documentation title and level Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20220204170541.829227-1-lucas.demarchi@intel.com
2022-02-04 17:05:41 +00:00
int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map);
void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map);
int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
dma-buf: Introduce dma buffer sharing mechanism This is the first step in defining a dma buffer sharing mechanism. A new buffer object dma_buf is added, with operations and API to allow easy sharing of this buffer object across devices. The framework allows: - creation of a buffer object, its association with a file pointer, and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - different devices to 'attach' themselves to this exported buffer object, to facilitate backing storage negotiation, using dma_buf_attach() API. - the exported buffer object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated exporter-defined operations. - the exporter and user to share the scatterlist associated with this buffer object using map_dma_buf and unmap_dma_buf operations. Atleast one 'attach()' call is required to be made prior to calling the map_dma_buf() operation. Couple of building blocks in map_dma_buf() are added to ease introduction of sync'ing across exporter and users, and late allocation by the exporter. For this first version, this framework will work with certain conditions: - *ONLY* exporter will be allowed to mmap to userspace (outside of this framework - mmap is not a buffer object operation), - currently, *ONLY* users that do not need CPU access to the buffer are allowed. More details are there in the documentation patch. This is based on design suggestions from many people at the mini-summits[1], most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and Daniel Vetter <daniel@ffwll.ch>. The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing between two v4l2 devices. [2] [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement [2]: http://lwn.net/Articles/454389 Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-and-Tested-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-26 09:23:15 +00:00
#endif /* __DMA_BUF_H__ */