linux-stable

Commit Graph

Author	SHA1	Message	Date
Chris Wilson	f7978a0c58	drm/i915: Allow the user to pass a context to any ring With full-ppgtt, we want the user to have full control over their memory layout, with a separate instance per context. Forcing them to use a shared memory layout for !RCS not only duplicates the amount of work we have to do, but also defeats the memory segregation on offer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20160822080350.4964-4-chris@chris-wilson.co.uk Reviewed-by: John Harrison <john.c.harrison@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2016-08-26 20:26:53 +01:00
Chris Wilson	f7bbe7883c	drm/i915: Embed the io-mapping struct inside drm_i915_private As io_mapping.h now always allocates the struct, we can avoid that allocation and extra pointer dance by embedding the struct inside drm_i915_private Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk	2016-08-19 17:13:35 +01:00
Chris Wilson	0b5372727b	drm/i915/cmdparser: Use cached vmappings The single largest factor in the overhead of parsing the commands is the setup of the virtual mapping to provide a continuous block for the batch buffer. If we keep those vmappings around (against the better judgement of mm/vmalloc.c, which we offset by handwaving and looking suggestively at the shrinker) we can dramatically improve the performance of the parser for small batches (such as media workloads). Furthermore, we can use the prepare shmem read/write functions to determine how best we need to clflush the range (rather than every page of the object). The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt for the throughput on small batches. (Caching just the dst vmap and iterating over the src, doing a page by page copy is roughly 5% slower on both platforms. That may be an acceptable trade-off to eliminate one cached vmapping, and we may be able to reduce the per-page copying overhead further.) For this simple test case, the cmdparser is now within a factor of 2 of ideal performance. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-33-chris@chris-wilson.co.uk	2016-08-18 22:36:59 +01:00
Chris Wilson	49ef5294cd	drm/i915: Move fence tracking from object to vma In order to handle tiled partial GTT mmappings, we need to associate the fence with an individual vma. v2: A couple of silly drops replaced spotted by Joonas Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk	2016-08-18 22:36:50 +01:00
Chris Wilson	a1e5afbe4d	drm/i915: Rename fence.lru_list to link Our current practice is to only name the actual list (here dev_priv->fence_list) using "list", and elements upon that list are referred to as "link". Further, the lru nature is of the list and not of the node and including in the name does not disambiguate the link from anything else. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-20-chris@chris-wilson.co.uk	2016-08-18 22:36:49 +01:00
Chris Wilson	05a20d098d	drm/i915: Move map-and-fenceable tracking to the VMA By moving map-and-fenceable tracking from the object to the VMA, we gain fine-grained tracking and the ability to track individual fences on the VMA (subsequent patch). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-16-chris@chris-wilson.co.uk	2016-08-18 22:36:48 +01:00
Chris Wilson	9e53d9be0d	drm/i915: Disallow direct CPU access to stolen pages for relocations As we cannot access the backing pages behind stolen objects, we should not attempt to do so for relocations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-15-chris@chris-wilson.co.uk	2016-08-18 22:36:47 +01:00
Chris Wilson	e8cb909ac3	drm/i915: Fallback to single page GTT mmappings for relocations If we cannot pin the entire object into the mappable region of the GTT, try to pin a single page instead. This is much more likely to succeed, and prevents us falling back to the clflush slow path. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-14-chris@chris-wilson.co.uk	2016-08-18 22:36:47 +01:00
Chris Wilson	d50415cc6c	drm/i915: Refactor execbuffer relocation writing With the introduction of the reloc page cache, we are just one step away from refactoring the relocation write functions into one. Not only does it tidy the code (slightly), but it greatly simplifies the control logic much to gcc's satisfaction. v2: Add selftests to document the relationship between the clflush flags, the KMAP bit and packing into the page-aligned pointer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-13-chris@chris-wilson.co.uk	2016-08-18 22:36:46 +01:00
Chris Wilson	31a39207f0	drm/i915: Cache kmap between relocations When doing relocations, we have to obtain a mapping to the page containing the target address. This is either a kmap or iomap depending on GPU and its cache coherency. Neighbouring relocation entries are typically within the same page and so we can cache our kmapping between them and avoid those pesky TLB flushes. Note that there is some sleight-of-hand in how the slow relocate works as the reloc_entry_cache implies pagefaults disabled (as we are inside a kmap_atomic section). However, the slow relocate code is meant to be the fallback from the atomic fast path failing. Fortunately it works as we already have performed the copy_from_user for the relocation array (no more pagefaults there) and the kmap_atomic cache is enabled after we have waited upon an active buffer (so no more sleeping in atomic). Magic! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-7-chris@chris-wilson.co.uk	2016-08-18 22:36:43 +01:00
Chris Wilson	600f436801	drm/i915: Unconditionally flush any chipset buffers before execbuf If userspace is asynchronously streaming into the batch or other execobjects, we may not flush those writes along with a change in cache domain (as there is no change). Therefore those writes may end up in internal chipset buffers and not visible to the GPU upon execution. We must issue a flush command or otherwise we encounter incoherency in the batchbuffers and the GPU executing invalid commands (i.e. hanging) quite regularly. v2: Throw a paranoid wmb() into the general flush so that we remain consistent with before. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90841 Fixes: `1816f92363` ("drm/i915: Support creation of unbound wc user...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Tested-by: Matti Hämäläinen <ccr@tnsp.org> Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-1-chris@chris-wilson.co.uk	2016-08-18 22:36:15 +01:00
Chris Wilson	058d88c433	drm/i915: Track pinned VMA Treat the VMA as the primary struct responsible for tracking bindings into the GPU's VM. That is we want to treat the VMA returned after we pin an object into the VM as the cookie we hold and eventually release when unpinning. Doing so eliminates the ambiguity in pinning the object and then searching for the relevant pin later. v2: Joonas' stylistic nitpicks, a fun rebase. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:13 +01:00
Chris Wilson	624192cfd3	drm/i915: Add convenience wrappers for vma's object get/put The VMA are unreferenced, they belong to the object and live until they are closed. However, if we want to use the VMA as a cookie and use it to keep the object alive, we want to hold onto a reference to the object for the lifetime of the VMA cookie. To facilitate this, add a couple of simple wrappers for managing the reference count on the object owning the VMA. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-11-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:00:57 +01:00
Chris Wilson	17f298cf54	drm/i915: Move setting of request->batch into its single callsite request->batch_obj is only set by execbuffer for the convenience of debugging hangs. By moving that operation to the callsite, we can simplify all other callers and future patches. We also move the complications of reference handling of the request->batch_obj next to where the active tracking is set up for the request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470832906-13972-2-git-send-email-chris@chris-wilson.co.uk	2016-08-10 16:07:52 +01:00
Chris Wilson	3e510a8e65	drm/i915: Repack fence tiling mode and stride into a single integer In the previous commit, we moved the obj->tiling_mode out of a bitfield and into its own integer so that we could safely use READ_ONCE(). Let us now repair some of that damage by sharing the tiling_mode with its companion, the fence stride. v2: New magic Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:43 +01:00
Chris Wilson	ad778f8967	drm/i915: Export our request as a dma-buf fence on the reservation object If the GEM objects being rendered with in this request have been exported via dma-buf to a third party, hook ourselves into the dma-buf reservation object so that the third party can serialise with our rendering via the dma-buf fences. Testcase: igt/prime_busy Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-26-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:20:06 +01:00
Chris Wilson	573adb3962	drm/i915: Move obj->active:5 to obj->flags We are motivated to avoid using a bitfield for obj->active for a couple of reasons. Firstly, we wish to document our lockless read of obj->active using READ_ONCE inside i915_gem_busy_ioctl() and that requires an integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code when presented with a bitfield and that shows up high on the profiles of request tracking (mainly due to excess memory traffic as it converts the bitfield to a register and back and generates frequent AGI in the process). v2: BIT, break up a long line in compute the other engines, new paint for i915_gem_object_is_active (now i915_gem_object_get_active). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:20:04 +01:00
Chris Wilson	5d723d7afd	drm/i915: Separate intel_frontbuffer into its own header In view of adding inline functions into the intel_frontbuffer section, we first split the header into its own file so that we can integrate it more easily with kerneldoc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-19-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:20:01 +01:00
Chris Wilson	de895082f7	drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin() Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for i915_gem_object_ggtt_pin(), spare us the confusion and remove it. Removing it now simplifies later patches to change the i915_vma_pin() (and friends) interface. v2: Add a redundant GEM_BUG_ON(!view) to i915_gem_obj_lookup_or_create_ggtt_vma() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:20:00 +01:00
Chris Wilson	3272db5313	drm/i915: Combine all i915_vma bitfields into a single set of flags In preparation to perform some magic to speed up i915_vma_pin(), which is among the hottest of hot paths in execbuf, refactor all the bitfields accessed by i915_vma_pin() into a single unified set of flags. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-16-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:19:59 +01:00
Chris Wilson	59bfa1248e	drm/i915: Start passing around i915_vma from execbuffer During execbuffer we look up the i915_vma in order to reserve them in the VM. However, we then do a double lookup of the vma in order to then pin them, all because we lack the necessary interfaces to operate on i915_vma - so introduce i915_vma_pin()! v2: Tidy parameter lists to remove one level of redirection in the hot path. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-15-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:19:58 +01:00
Chris Wilson	20dfbde463	drm/i915: Wrap vma->pin_count accessors with small inline helpers In the next few patches, the VMA pinning API is overhauled and to reduce the churn we pull out the update to the accessors into a prep patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-14-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:19:58 +01:00
Chris Wilson	91b2db6f65	drm/i915: Pad GTT views of exec objects up to user specified size Our GPUs impose certain requirements upon buffers that depend upon how exactly they are used. Typically this is expressed as that they require a larger surface than would be naively computed by pitch * height. Normally such requirements are hidden away in the userspace driver, but when we accept pointers from strangers and later impose extra conditions on them, the original client allocator has no idea about the monstrosities in the GPU and we require the userspace driver to inform the kernel how many padding pages are required beyond the client allocation. v2: Long time, no see v3: Try an anonymous union for uapi struct compatibility Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-7-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:19:53 +01:00
Chris Wilson	e655bc35fd	drm/i915: Remove i915_gem_execbuffer_retire_commands() Move the single line to the callsite as the name is now misleading, and the purpose is solely to add the request to the execution queue. Here, we can see that if we failed to dispatch the batch from the request, we can forgo flushing the GPU when closing the request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-5-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:19:52 +01:00
Chris Wilson	0340d9fd0f	drm/i915: Remove request retirement before each batch This reimplements the denial-of-service protection against igt from commit `227f782e46` ("drm/i915: Retire requests before creating a new one") and transfers the stall from before each batch into get_pages(). The issue is that the stall is increasing latency between batches which is detrimental in some cases (especially coupled with execlists) to keeping the GPU well fed. Also we have made the observation that retiring requests can of itself free objects (and requests) and therefore makes a good first step when shrinking. v2: Recycle objects prior to i915_gem_object_get_pages() v3: Remove the reference to the ring from i915_gem_requests_ring() as it operates on an intel_engine_cs. v4: Since commit `9b5f4e5ed6` ("drm/i915: Retire oldest completed request before allocating next") we no longer need the safeguard to retire requests before get_pages(). We no longer see the huge latencies when hitting the shrinker between allocations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-4-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:19:51 +01:00
Chris Wilson	b0decaf75b	drm/i915: Track active vma requests Hook the vma itself into the i915_gem_request_retire() so that we can accurately track when a solitary vma is inactive (as opposed to having to wait for the entire object to be idle). This improves the interaction when using multiple contexts (with full-ppgtt) and eliminates some frequent list walking when retiring objects after a completed request. A side-effect is that we get an active vma reference for free. The consequence of this is shown in the next patch... v2: Update inline names to be consistent with i915_gem_object_get_active() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-25-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:32 +01:00
Chris Wilson	5cf3d28098	drm/i915: i915_vma_move_to_active prep patch This patch is broken out of the next just to remove the code motion from that patch and make it more readable. What we do here is move the i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three stages (read, write, fenced) together so that future modifications to active handling are all located in the same spot. The importance of this is so that we can more simply control the order in which the requests are place in the retirement list (i.e. control the order at which we retire and so control the lifetimes to avoid having to hold onto references). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-24-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:31 +01:00
Chris Wilson	909d074c31	drm/i915: Double check activity before relocations If the object is active and we need to perform a relocation upon it, we need to take the slow relocation path. Before we do, double check the active requests to see if they have completed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-22-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:29 +01:00
Chris Wilson	381f371b25	drm/i915: Introduce i915_gem_active for request tracking In the next patch, request tracking is made more generic and for that we need a new expanded struct and to separate out the logic changes from the mechanical churn, we split out the structure renaming into this patch. v2: Writer's block. Add some spiel about why we track requests. v3: Now i915_gem_active. v4: Now with i915_gem_active_set() for attaching to the active request. v5: Use i915_gem_active_set() from inside the retirement handlers Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:20 +01:00
Chris Wilson	5b043f4e60	drm/i915: Unify legacy/execlists submit_execbuf callbacks Now that emitting requests is identical between legacy and execlists, we can use the same function to build up the ring for submitting to either engine. (With the exception of i915_switch_contexts(), but in time that will also be handled gracefully.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-30-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-21-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:31 +01:00
Chris Wilson	803688babd	drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START Both the ->dispatch_execbuffer and ->emit_bb_start callbacks do exactly the same thing, add MI_BATCHBUFFER_START to the request's ringbuffer - we need only one vfunc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-20-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-10-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:21 +01:00
Chris Wilson	8e63717837	drm/i915: Simplify request_alloc by returning the allocated request If is simpler and leads to more readable code through the callstack if the allocation returns the allocated struct through the return value. The importance of this is that it no longer looks like we accidentally allocate requests as side-effect of calling certain functions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-19-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-9-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:20 +01:00
Chris Wilson	7c9cf4e33a	drm/i915: Reduce engine->emit_flush() to a single mode parameter Rather than passing a complete set of GPU cache domains for either invalidation or for flushing, or even both, just pass a single parameter to the engine->emit_flush to determine the required operations. engine->emit_flush(GPU, 0) -> engine->emit_flush(EMIT_INVALIDATE) engine->emit_flush(0, GPU) -> engine->emit_flush(EMIT_FLUSH) engine->emit_flush(GPU, GPU) -> engine->emit_flush(EMIT_FLUSH \| EMIT_INVALIDATE) This allows us to extend the behaviour easily in future, for example if we want just a command barrier without the overhead of flushing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-8-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:19 +01:00
Chris Wilson	c7fe7d25ed	drm/i915: Remove obsolete engine->gpu_caches_dirty Space for flushing the GPU cache prior to completing the request is preallocated and so cannot fail - the GPU caches will always be flushed along with the completed request. This means we no longer have to track whether the GPU cache is dirty between batches like we had to with the outstanding_lazy_seqno. With the removal of the duplication in the per-backend entry points for emitting the obsolete lazy flush, we can then further unify the engine->emit_flush. v2: Expand a bit on the legacy of gpu_caches_dirty Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-18-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-7-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:19 +01:00
Chris Wilson	7e37f889b5	drm/i915: Rename struct intel_ringbuffer to struct intel_ring The state stored in this struct is not only the information about the buffer object, but the ring used to communicate with the hardware. Using buffer here is overly specific and, for me at least, conflates with the notion of buffer objects themselves. s/struct intel_ringbuffer/struct intel_ring/ s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/ s/describe_ctx_ringbuf()/describe_ctx_ring()/ s/intel_ring_get_active_head()/intel_engine_get_active_head()/ s/intel_ring_sync_index()/intel_engine_sync_index()/ s/intel_ring_init_seqno()/intel_engine_init_seqno()/ s/ring_stuck()/engine_stuck()/ s/intel_cleanup_engine()/intel_engine_cleanup()/ s/intel_stop_engine()/intel_engine_stop()/ s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/ s/intel_unpin_ringbuffer()/intel_unpin_ring()/ s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/ s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/ s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/ s/intel_ringbuffer_free()/intel_ring_free()/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:16 +01:00
Chris Wilson	1dae2dfb0b	drm/i915: Rename request->ringbuf to request->ring Now that we have disambuigated ring and engine, we can use the clearer and more consistent name for the intel_ringbuffer pointer in the request. @@ struct drm_i915_gem_request *r; @@ - r->ringbuf + r->ring Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-12-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-2-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:15 +01:00
Chris Wilson	b5321f309b	drm/i915: Unify intel_logical_ring_emit and intel_ring_emit Both perform the same actions with more or less indirection, so just unify the code. v2: Add back a few intel_engine_cs locals Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-11-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-1-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:13 +01:00
Chris Wilson	c80ff16e11	drm/i915: Use engine to refer to the user's BSD intel_engine_cs This patch transitions the execbuf engine selection away from using the ring nomenclature - though we still refer to the user's incoming selector as their user_ring_id. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-7-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-2-git-send-email-chris@chris-wilson.co.uk	2016-07-27 16:23:05 +01:00
Chris Wilson	33a051a5fc	drm/i915/cmdparser: Remove stray intel_engine_cs *ring When we refer to intel_engine_cs, we want to use engine so as not to confuse ourselves about ringbuffers. v2: Rename all the functions as well, as well as a few more stray comments. v3: Split the really long error message strings Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-6-git-send-email-chris@chris-wilson.co.uk Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-1-git-send-email-chris@chris-wilson.co.uk	2016-07-27 16:23:05 +01:00
Dave Gordon	f8ca0c07f6	drm/i915: rename & update eb_select_ring() 'ring' is an old deprecated term for a GPU engine, so we're trying to phase out all such terminology. eb_select_ring() not only has 'ring' (meaning engine) in its name, but it has an ugly calling convention whereby it returns an errno and stores a pointer-to-engine indirectly through an output parameter. As there is only one error it ever returns (-EINVAL), we can make it return the pointer directly, and have the caller pass back the error code -EINVAL if the pointer result is NULL. Thus we can replace - ret = eb_select_ring(dev_priv, file, args, &engine); - if (ret) - return ret; with + engine = eb_select_engine(dev_priv, file, args); + if (!engine) + return -EINVAL; for increased clarity and maybe save a few cycles too. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469034967-15840-4-git-send-email-david.s.gordon@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-07-21 09:59:46 +01:00
Chris Wilson	f8c417cdb1	drm/i915: Rename drm_gem_object_unreference in preparation for lockless free Ultimately wraps kref_put(), so adopt its nomenclature for consistency with other subsystems. s/drm_gem_object_unreference/i915_gem_object_put/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-6-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-5-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:12 +01:00
Chris Wilson	25dc556a2a	drm/i915: Wrap drm_gem_object_reference in i915_gem_object_get Ultimately wraps kref_get(), so adopt its nomenclature for consistency with other subsystems. s/drm_gem_object_reference/i915_gem_object_get/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-5-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-4-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:11 +01:00
Chris Wilson	9a6feaf0d7	drm/i915: Rename i915_gem_context_reference/unreference() As these are wrappers around kref_get/kref_put() it is preferable to follow the naming convention and use the same verb get/put in our wrapper names for manipulating a reference to the context. s/i915_gem_context_reference/i915_gem_context_get/ s/i915_gem_context_unreference/i915_gem_context_put/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-3-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-2-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:10 +01:00
Dave Gordon	4bfa339aa4	drm/i915: refactor eb_get_batch() Precursor for fix to secure batch execution. We will need to be able to retrieve the batch VMA (as well as the batch itself) from the eb list, so this patch extracts that part of eb_get_batch() into a separate function, and moves both parts to a more logical place in the file, near where the eb list is created. Also, it may not be obvious, but the current execbuffer2 ioctl interface requires that the buffer object containing the batch-to-be-executed be the LAST entry in the exec2_list[] array (I expected it to be the first!). To clarify this, we can replace the rather obscure construct "list_entry(eb->vmas.prev, ...)" in the old version of eb_get_batch() with the equivalent but more explicit "list_last_entry(&eb->vmas,...)" in the new eb_get_batch_vma() and of course add an explanatory comment. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1468504324-12690-2-git-send-email-david.s.gordon@intel.com	2016-07-19 09:06:32 +02:00
Dave Gordon	9e2793f6e4	drm/i915: compile-time consistency check on __EXEC_OBJECT flags Two different sets of flag bits are stored in the 'flags' member of a 'struct drm_i915_gem_exec_object2', and they're defined in two different source files, increasing the risk of an accidental clash. Some flags in this field are supplied by the user; these are defined in i915_drm.h, and they start from the LSB and work up. Other flags are defined in i915_gem_execbuffer, for internal use within that file only; they start from the MSB and work down. So here we add a compile-time check that the two sets of flags do not overlap, which would cause all sorts of confusion. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1468504324-12690-1-git-send-email-david.s.gordon@intel.com	2016-07-19 09:06:16 +02:00
Chris Wilson	91c8a326a1	drm/i915: Convert dev_priv->dev backpointers to dev_priv->drm Since drm_i915_private is now a subclass of drm_device we do not need to chase the drm_i915_private->dev backpointer and can instead simply access drm_i915_private->drm directly. text data bss dec hex filename `1068757` 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko 1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko Created by the coccinelle script: @@ struct drm_i915_private *d; identifier i; @@ ( - d->dev->i + d->drm.i \| - d->dev + &d->drm ) and for good measure the dev_priv->dev backpointer was removed entirely. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk	2016-07-05 11:58:45 +01:00
Chris Wilson	fac5e23e3c	drm/i915: Mass convert dev->dev_private to to_i915(dev) Since we now subclass struct drm_device, we can save pointer dances by noting the equivalence of struct drm_device and struct drm_i915_private, i.e. by using to_i915(). text data bss dec hex filename 1073824 4562 416 1078802 107612 drivers/gpu/drm/i915/i915.ko 1068976 4562 416 1073954 106322 drivers/gpu/drm/i915/i915.ko Created by the coccinelle script: @@ expression E; identifier p; @@ - struct drm_i915_private p = E->dev_private; + struct drm_i915_private p = to_i915(E); Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk	2016-07-04 12:54:07 +01:00
Chris Wilson	67d97da349	drm/i915: Only start retire worker when idle The retire worker is a low frequency task that makes sure we retire outstanding requests if userspace is being lax. We only need to start it once as it remains active until the GPU is idle, so do a cheap test before the more expensive queue_work(). A consequence of this is that we need correct locking in the worker to make the hot path of request submission cheap. To keep the symmetry and keep hangcheck strictly bound by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle worker before dropping the wakelock. v2: Guard against RCU fouling the breadcrumbs bottom-half whilst we kick the waiter. v3: Remove the wakeref assertion squelching (now we hold a wakeref for the hangcheck, any rpm error there is genuine). v4: To prevent excess work when retiring requests, we split the busy flag into two, a boolean to denote whether we hold the wakeref and a bitmask of active engines. v5: Reorder cancelling hangcheck upon idling to avoid a race where we might cancel a hangcheck after being preempted by a new task Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> References: https://bugs.freedesktop.org/show_bug.cgi?id=88437 Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-1-git-send-email-chris@chris-wilson.co.uk	2016-07-04 08:18:19 +01:00
Daniel Vetter	b3ac9f2591	drm: Extract drm_is_current_master Just rolling out a bit of abstraction to be able to clean up the master logic in the next step. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2016-06-21 21:58:12 +02:00
Daniel Vetter	5599617ec0	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued Git got absolutely destroyed with all our cherry-picking from drm-intel-next-queued to various branches. It ended up inserting intel_crtc_page_flip 2x even in intel_display.c. Backmerge to get back to sanity. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2016-06-02 09:54:12 +02:00

1 2 3 4 5 ...

350 Commits