Commit Graph

470 Commits

Author SHA1 Message Date
Chris Wilson 697b9a8714 drm/i915: Make closing request flush mandatory
For symmetry, simplicity and ensuring the request is always truly idle
upon its completion, always emit the closing flush prior to emitting the
request breadcrumb. Previously, we would only emit the flush if we had
started a user batch, but this just leaves all the other paths open to
speculation (do they affect the GPU caches or not?) With mm switching, a
key requirement is that the GPU is flushed and invalidated before hand,
so for absolute safety, we want that closing flush be mandatory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180612105135.4459-1-chris@chris-wilson.co.uk
2018-06-14 08:16:12 +01:00
Chris Wilson 746c8f143a drm/i915: Apply batch location restrictions before pinning
We special case the position of the batch within the GTT to prevent
negative self-relocation deltas from underflowing. However, that
restriction is being applied after a trial pin of the batch in its
current position. Thus we are not rejecting an invalid location if the
batch has been used before, leading to an assertion if we happen to need
to rearrange the entire payload. In the worst case, this may cause a GPU
hang on gen7 or perhaps missing state.

References: https://bugs.freedesktop.org/show_bug.cgi?id=105720
Fixes: 2889caa923 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Martin Peres <martin.peres@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180610194325.13467-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-06-11 11:00:26 +01:00
Chris Wilson 82ad6443a5 drm/i915/gtt: Rename i915_hw_ppgtt base member
In the near future, I want to subclass gen6_hw_ppgtt as it contains a
few specialised members and I wish to add more. To avoid the ugliness of
using ppgtt->base.base, rename the i915_hw_ppgtt base member
(i915_address_space) as vm, which is our common shorthand for an
i915_address_space local.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180605153758.18422-1-chris@chris-wilson.co.uk
2018-06-05 21:11:20 +01:00
Chris Wilson 3365e2268b drm/i915: Lazily unbind vma on close
When userspace is passing around swapbuffers using DRI, we frequently
have to open and close the same object in the foreign address space.
This shows itself as the same object being rebound at roughly 30fps
(with a second object also being rebound at 30fps), which involves us
having to rewrite the page tables and maintain the drm_mm range manager
every time.

However, since the object still exists and it is only the local handle
that disappears, if we are lazy and do not unbind the VMA immediately
when the local user closes the object but defer it until the GPU is
idle, then we can reuse the same VMA binding. We still have to be
careful to mark the handle and lookup tables as closed to maintain the
uABI, just allowing the underlying VMA to be resurrected if the user is
able to access the same object from the same context again.

If the object itself is destroyed (neither userspace keeping a handle to
it), the VMA will be reaped immediately as usual.

In the future, this will be even more useful as instantiating a new VMA
for use on the GPU will become heavier. A nuisance indeed, so nip it in
the bud.

v2: s/__i915_vma_final_close/i915_vma_destroy/ etc.
v3: Leave a hint as to why we deferred the unbind on close.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180503195115.22309-1-chris@chris-wilson.co.uk
2018-05-04 07:26:56 +01:00
Kevin Rogovin 99d7e4eeea drm/i915: Describe the bottom of stack in processing a batchbuffer
Now that "DOC: User command execution" of i915_gem_execbuffer.c is included
in the i915.rst, it is benecifial (for new developers) to read what happens
at the bottom of the driver stack (in terms of bytes written to be read
by the GPU) when processing a user-space batchbuffer.

v5:
  Typo correction of lacking double tick.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[Joonas: correcting the patch title]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1523001957-6427-4-git-send-email-kevin.rogovin@intel.com
2018-04-06 13:12:23 +03:00
Xidong Wang 6be1187dbf drm/i915: Do no use kfree() to free a kmem_cache_alloc() return value
Along the eb_lookup_vmas() error path, the return value from
kmem_cache_alloc() was freed using kfree(). Fix it to use the proper
kmem_cache_free() instead.

Fixes: d1b48c1e71 ("drm/i915: Replace execbuf vma ht with an idr")
Signed-off-by: Xidong Wang <wangxidong_97@163.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v4.14+
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180404093824.9313-1-chris@chris-wilson.co.uk
2018-04-04 20:53:51 +01:00
Chris Wilson e61e0f51ba drm/i915: Rename drm_i915_gem_request to i915_request
We want to de-emphasize the link between the request (dependency,
execution and fence tracking) from GEM and so rename the struct from
drm_i915_gem_request to i915_request. That is we may implement the GEM
user interface on top of requests, but they are an abstraction for
tracking execution rather than an implementation detail of GEM. (Since
they are not tied to HW, we keep the i915 prefix as opposed to intel.)

In short, the spatch:
@@

@@
- struct drm_i915_gem_request
+ struct i915_request

A corollary to contracting the type name, we also harmonise on using
'rq' shorthand for local variables where space if of the essence and
repetition makes 'request' unwieldy. For globals and struct members,
'request' is still much preferred for its clarity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-02-21 20:57:22 +00:00
Chris Wilson ed2f353232 drm/i915: Clear the in-use marker on execbuf failure
If we fail to unbind the vma (due to a signal on an active buffer that
needs to be moved for the next execbuf), then we need to clear the
persistent tracking state we setup for this execbuf.

Fixes: c7c6e46f91 ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields")
Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.14+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180219140144.24004-1-chris@chris-wilson.co.uk
2018-02-19 19:43:05 +00:00
Christian König c0a51fd07b drm: move read_domains and write_domain into i915
i915 is the only driver using those fields in the drm_gem_object
structure, so they only waste memory for all other drivers.

Move the fields into drm_i915_gem_object instead and patch the i915 code
with the following sed commands:

sed -i "s/obj->base.read_domains/obj->read_domains/g" drivers/gpu/drm/i915/*.c drivers/gpu/drm/i915/*/*.c
sed -i "s/obj->base.write_domain/obj->write_domain/g" drivers/gpu/drm/i915/*.c drivers/gpu/drm/i915/*/*.c

Change is only compile tested.

v2: move fields around as suggested by Chris.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180216124338.9087-1-christian.koenig@amd.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2018-02-16 14:12:48 +00:00
Daniele Ceraolo Spurio b6a88e4a80 drm/i915: Fix rsvd2 mask when out-fence is returned
GENMASK_ULL wants the high bit of the mask first. The current value
cancels the in-fence when an out-fence is returned.

Fixes: fec0445caa ("drm/i915: Support explicit fencing for execbuf")
Testcase: igt/gem_exec_fence/keep-in-fence*
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180214191827.8465-1-daniele.ceraolospurio@intel.com
Cc: <stable@vger.kernel.org> # v4.12+
2018-02-14 20:57:44 +00:00
Ville Syrjälä 6a20fe7b17 drm/i915: Give all ioctl functions an _ioctl suffix
Most of our ioctl functions have an _ioctl suffix in the name. I like
that idea since it makes it easy to figure out how the function is
going to get called. Rename the handful of exceptions to follow the
same pattern.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207164841.19431-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2018-02-09 18:03:36 +02:00
Chris Wilson 204bcfef60 drm/i915: Fix kerneldoc warnings in i915_gem_execbuffer
drivers/gpu/drm/i915/i915_gem_execbuffer.c:1983: warning: No description found for parameter 'dev_priv'
drivers/gpu/drm/i915/i915_gem_execbuffer.c:1983: warning: No description found for parameter 'file'

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180208113917.8439-1-chris@chris-wilson.co.uk
2018-02-08 15:17:21 +00:00
Matthew Auld 73ebd50303 drm/i915: make mappable struct resource centric
Now that we are using struct resource to track the stolen region, it is
more convenient if we track the mappable region in a resource as well.

v2: prefer iomap and gmadr naming scheme
    prefer DEFINE_RES_MEM

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-8-matthew.auld@intel.com
2017-12-12 12:30:21 +02:00
Tvrtko Ursulin 439e2ee4ca drm/i915: Move engine->needs_cmd_parser to engine->flags
Will be adding a new per-engine flags shortly so it makes sense
to consolidate.

v2: Keep the original code flow in intel_engine_cleanup_cmd_parser.
    (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171129082409.18189-1-tvrtko.ursulin@linux.intel.com
2017-11-29 12:29:39 +00:00
Chris Wilson 3fef5cda97 drm/i915: Automatic i915_switch_context for legacy
During request construction, after pinning the context we know whether
or not we have to emit a context switch. So move this common operation
from every caller into i915_gem_request_alloc() itself.

v2: Always submit the request if we emitted some commands during request
construction, as typically it also involves changes in global state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120102002.22254-2-chris@chris-wilson.co.uk
2017-11-20 15:56:16 +00:00
Chris Wilson 2113184c6f drm/i915: Pull the unconditional GPU cache invalidation into request construction
As the request will, in the following patch, implicitly invoke a
context-switch on construction, we should precede that with a GPU TLB
invalidation. Also, even before using GGTT, we always want to invalidate
the TLBs for any updates (as well as the ppgtt invalidates that are
unconditionally applied by execbuf). Since we almost always require the
TLB invalidate, do it unconditionally on request allocation and so we can
remove it from all other paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120102002.22254-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-11-20 15:56:16 +00:00
Chris Wilson d710fc16ff drm/i915: Prevent overflow of execbuf.buffer_count and num_cliprects
We check whether the multiplies will overflow prior to calling
kmalloc_array so that we can respond with -EINVAL for the invalid user
arguments rather than treating it as an -ENOMEM that would otherwise
occur. However, as Dan Carpenter pointed out, we did an addition on the
unsigned int prior to passing to kmalloc_array where it would be
promoted to size_t for the calculation, thereby allowing it to overflow
and underallocate.

v2: buffer_count is currently limited to INT_MAX because we treat it as
signaled variable for LUT_HANDLE in eb_lookup_vma
v3: Move common checks for eb1/eb2 into the same function
v4: Put the check back for nfence*sizeof(user_fence) overflow
v5: access_ok uses ULONG_MAX but kvmalloc_array uses SIZE_MAX
v6: size_t and unsigned long are not type-equivalent on 32b

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171116105059.25142-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-16 14:17:08 +00:00
Tvrtko Ursulin ebcaa1ff8b drm/i915: Reject unknown syncobj flags
We have to reject unknown flags for uAPI considerations, and also
because the curent implementation limits their i915 storage space
to two bits.

v2: (Chris Wilson)
 * Fix fail in ABI check.
 * Added unknown flags and BUILD_BUG_ON.

v3:
 * Use ARCH_KMALLOC_MINALIGN instead of alignof. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: cf6e7bac63 ("drm/i915: Add support for drm syncobjs")
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171031102326.9738-1-tvrtko.ursulin@linux.intel.com
2017-11-03 09:28:05 +00:00
Chris Wilson 1d033beb20 drm/i915: Check incoming alignment for unfenced buffers (on i915gm)
In case the object has changed tiling between calls to execbuf, we need
to check if the existing offset inside the GTT matches the new tiling
constraint. We even need to do this for "unfenced" tiled objects, where
the 3D commands use an implied fence and so the object still needs to
match the physical fence restrictions on alignment (only required for
gen2 and early gen3).

In commit 2889caa923 ("drm/i915: Eliminate lots of iterations over
the execobjects array"), the idea was to remove the second guessing and
only set the NEEDS_MAP flag when required. However, the entire check
for an unusable offset for fencing was removed and not just the
secondary check. I.e.

	/* avoid costly ping-pong once a batch bo ended up non-mappable */
        if (entry->flags & __EXEC_OBJECT_NEEDS_MAP &&
            !i915_vma_is_map_and_fenceable(vma))
                return !only_mappable_for_reloc(entry->flags);

was entirely removed as the ping-pong between execbuf passes was fixed,
but its primary purpose in forcing unaligned unfenced access to be
rebound was forgotten.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103502
Fixes: 2889caa923 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171031103607.17836-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-11-01 13:25:25 +00:00
Chris Wilson 3c755c5b56 drm/i915: Try a minimal attempt to insert the whole object for relocations
As we have a lightweight fallback to insert a single page into the
aperture, try to avoid any heavier evictions when attempting to insert
the entire object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-5-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-10-09 17:07:29 +01:00
Chris Wilson 3bd4073524 drm/i915: Consolidate get_fence with pin_fence
Following the pattern now used for obj->mm.pages, use just pin_fence and
unpin_fence to control access to the fence registers. I.e. instead of
calling get_fence(); pin_fence(), we now just need to call pin_fence().
This will make it easier to reduce the locking requirements around
fence registers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-10-09 17:07:29 +01:00
Jani Nikula 32f35b8634 Merge drm-upstream/drm-next into drm-intel-next-queued
Need MST sideband message transaction to power up/down nodes.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-09-28 15:56:49 +03:00
Dave Airlie 9afafdbfbf Merge tag 'drm-intel-next-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel into drm-next
Getting started with v4.15 features:

- Cannonlake workarounds (Rodrigo, Oscar)
- Infoframe refactoring and fixes to enable infoframes for DP (Ville)
- VBT definition updates (Jani)
- Sparse warning fixes (Ville, Chris)
- Crtc state usage fixes and cleanups (Ville)
- DP vswing, pre-emph and buffer translation refactoring and fixes (Rodrigo)
- Prevent IPS from interfering with CRC capture (Ville, Marta)
- Enable Mesa to advertise ARB_timer_query (Nanley)
- Refactor GT number into intel_device_info (Lionel)
- Avoid eDP DP AUX CH timeouts harder (Manasi)
- CDCLK check improvements (Ville)
- Restore GPU clock boost on missed pageflip vblanks (Chris)
- Fence register reservation API for vGPU (Changbin)
- First batch of CCS fixes (Ville)
- Finally, numerous GEM fixes, cleanups and improvements (Chris)

* tag 'drm-intel-next-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel: (100 commits)
  drm/i915: Update DRIVER_DATE to 20170907
  drm/i915/cnl: WaThrottleEUPerfToAvoidTDBackPressure:cnl(pre-prod)
  drm/i915: Lift has-pinned-pages assert to caller of ____i915_gem_object_get_pages
  drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk
  drm/i915/cnl: Allow the reg_read ioctl to read the RCS TIMESTAMP register
  drm/i915: Move device_info.has_snoop into the static tables
  drm/i915: Disable MI_STORE_DATA_IMM for i915g/i915gm
  drm/i915: Re-enable GTT following a device reset
  drm/i915/cnp: Wa 1181: Fix Backlight issue
  drm/i915: Annotate user relocs with __user
  drm/i915: Constify load detect mode
  drm/i915/perf: Remove __user from u64 in drm_i915_perf_oa_config
  drm/i915: Silence sparse by using gfp_t
  drm/i915: io unmap functions want __iomem
  drm/i915: Add __rcu to radix tree slot pointer
  drm/i915: Wake up the device for the fbdev setup
  drm/i915: Add interface to reserve fence registers for vGPU
  drm/i915: Use correct path to trace include
  drm/i915: Fix the missing PPAT cache attributes on CNL
  drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
  ...
2017-09-28 07:12:44 +10:00
Michal Wajdeczko 4f044a88a8 drm/i915: Rename global i915 to i915_modparams
Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.

v5: pure rename
v6: fix

Credits-to: Coccinelle

@@
identifier n;
@@
(
-	i915.n
+	i915_modparams.n
)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com
2017-09-22 14:50:36 +03:00
Chris Wilson 74c1c694a2 drm/i915: Document the split in internal and public execbuf flags
Since we reuse the same field for the user passing in their control
flags, and for the kernel to track a couple of bits of state, document
and check that those do not overlap.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170921110135.15990-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-09-21 17:19:24 +01:00
Michal Hocko 0ee931c4e3 mm: treewide: remove GFP_TEMPORARY allocation flag
GFP_TEMPORARY was introduced by commit e12ba74d8f ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation.  As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory.  So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification.  I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring.  This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL.  Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
seems to be a heuristic without any measured advantage for most (if not
all) its current users.  The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers.  So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
  Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13 18:53:16 -07:00
Chris Wilson ac70ebe873 drm/i915: Cleanup error paths through eb_lookup_vma()
Following the simplification to a single lookup loop in commit
170fa29b14 ("drm/i915: Simplify eb_lookup_vmas()") and commit
d1b48c1e71 ("drm/i915: Replace execbuf vma ht with an idr"), we can go
one step further and reorder the error paths so that the state of the
local variable obj is always known to the compiler and doesn't need the
uninitialized_var markup to squelch a compiler warning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170912150752.20411-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-09-12 20:45:04 +01:00
Ville Syrjälä 389e0d3d3f drm/i915: Annotate user relocs with __user
Add the missing __user to the urelocs cast to fix the following sparse
warning:
i915_gem_execbuffer.c:1541:47: warning: cast removes address space of expression
i915_gem_execbuffer.c:1541:62: warning: incorrect type in argument 2 (different address spaces)
i915_gem_execbuffer.c:1541:62:    expected void const [noderef] <asn:1>*from
i915_gem_execbuffer.c:1541:62:    got char *

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 2889caa923 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901165434.24636-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> #irc
(cherry picked from commit 908a610557)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-09-06 10:18:33 -07:00
Chris Wilson 90cad095ee drm/i915: Disable MI_STORE_DATA_IMM for i915g/i915gm
The early gen3 machines (i915g/Grantsdale and i915gm/Alviso) share a lot
of characteristics in their MI/GTT blocks with gen2, and in particular
can only use physical addresses in MI_STORE_DATA_IMM. This makes it
incompatible with our usage, so include those two machines in the
blacklist to prevent usage.

v2: Make it easy for gcc and rewrite it as a switch to save some space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20170906152859.5304-1-chris@chris-wilson.co.uk
2017-09-06 17:36:30 +01:00
Ville Syrjälä 908a610557 drm/i915: Annotate user relocs with __user
Add the missing __user to the urelocs cast to fix the following sparse
warning:
i915_gem_execbuffer.c:1541:47: warning: cast removes address space of expression
i915_gem_execbuffer.c:1541:62: warning: incorrect type in argument 2 (different address spaces)
i915_gem_execbuffer.c:1541:62:    expected void const [noderef] <asn:1>*from
i915_gem_execbuffer.c:1541:62:    got char *

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 2889caa923 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901165434.24636-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> #irc
2017-09-05 21:21:17 +03:00
Chris Wilson 4bce4e98c2 drm/i915: Silence sparse by using gfp_t
Sparse enforces that GFP flags are only manipulated inside gfp_t locals.

Fixes: 4d470f7359 ("drm/i915: Avoid undefined behaviour of "u32 >> 32"")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901145729.21363-1-chris@chris-wilson.co.uk
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 0d95c883ba)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-09-05 10:00:16 -07:00
Chris Wilson 0d95c883ba drm/i915: Silence sparse by using gfp_t
Sparse enforces that GFP flags are only manipulated inside gfp_t locals.

Fixes: 4d470f7359 ("drm/i915: Avoid undefined behaviour of "u32 >> 32"")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901145729.21363-1-chris@chris-wilson.co.uk
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-09-05 11:56:26 +01:00
Jani Nikula d149d6ae17 Merge drm-upstream/drm-next into drm-intel-next-queued
Catch up with upstream while it's easy.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-09-04 21:40:34 +03:00
Chris Wilson fa3722f649 drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
By using drm_gem_flink/drm_gem_open on an object using the same fd, it
is possible for a client to create multiple handles pointing to the same
object (tied to the same contexts and VMA), as exemplified by
igt::gem_handle_to_libdrm_bo(). Since this duplication has been possible
since forever, we cannot assume that the handle:(fpriv, object) is
unique and so must handle the multiple users of a single VMA.

v2: Added commentary noise.

Testcase: igt/gem_close
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102355
Fixes: d1b48c1e71 ("drm/i915: Replace execbuf vma ht with an idr")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-3-chris@chris-wilson.co.uk
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry-picked from commit 3ffff01749)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-08-30 12:17:22 -07:00
Chris Wilson 3b24e7e810 drm/i915: Recreate vmapping even when the object is pinned
Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.

This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).

We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.

v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.

Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit a575c67617)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-08-30 12:08:11 -07:00
Chris Wilson 3dbf26ed7b drm/i915: Don't use GPU relocations prior to cmdparser stalls
If we are using the cmdparser, we will have to copy the batch and so
stall for the relocations. Rather than prolong that stall by adding more
relocation requests, just use CPU relocations and do the stall upfront.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170826135620.25949-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-29 10:41:33 +01:00
Chris Wilson a575c67617 drm/i915: Recreate vmapping even when the object is pinned
Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.

This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).

We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.

v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.

Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-29 10:39:08 +01:00
Jason Ekstrand afca4216b8 i915: Use drm_syncobj_fence_get
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-08-29 06:20:31 +10:00
Chris Wilson 3ffff01749 drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
By using drm_gem_flink/drm_gem_open on an object using the same fd, it
is possible for a client to create multiple handles pointing to the same
object (tied to the same contexts and VMA), as exemplified by
igt::gem_handle_to_libdrm_bo(). Since this duplication has been possible
since forever, we cannot assume that the handle:(fpriv, object) is
unique and so must handle the multiple users of a single VMA.

v2: Added commentary noise.

Testcase: igt/gem_close
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102355
Fixes: d1b48c1e71 ("drm/i915: Replace execbuf vma ht with an idr")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-3-chris@chris-wilson.co.uk
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-08-24 15:28:05 +01:00
Chris Wilson 0519bcb173 drm/i915: Trivial grammar fix s/opt of/opt out of/ in comment
The word out was dropped from the sentence across the line break, put it
back.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-6-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-18 11:59:28 +01:00
Chris Wilson d1b48c1e71 drm/i915: Replace execbuf vma ht with an idr
This was the competing idea long ago, but it was only with the rewrite
of the idr as an radixtree and using the radixtree directly ourselves,
along with the realisation that we can store the vma directly in the
radixtree and only need a list for the reverse mapping, that made the
patch performant enough to displace using a hashtable. Though the vma ht
is fast and doesn't require any extra allocation (as we can embed the node
inside the vma), it does require a thread for resizing and serialization
and will have the occasional slow lookup. That is hairy enough to
investigate alternatives and favour them if equivalent in peak performance.
One advantage of allocating an indirection entry is that we can support a
single shared bo between many clients, something that was done on a
first-come first-serve basis for shared GGTT vma previously. To offset
the extra allocations, we create yet another kmem_cache for them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk
2017-08-18 11:59:02 +01:00
Chris Wilson 170fa29b14 drm/i915: Simplify eb_lookup_vmas()
Since the introduction of being able to perform a lockless lookup of an
object (i915_gem_object_get_rcu() in fbbd37b36f ("drm/i915: Move object
release to a freelist + worker") we no longer need to split the
object/vma lookup into 3 phases and so combine them into a much simpler
single loop.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-4-chris@chris-wilson.co.uk
2017-08-18 11:58:29 +01:00
Chris Wilson c7c6e46f91 drm/i915: Convert execbuf to use struct-of-array packing for critical fields
When userspace is doing most of the work, avoiding relocs (using
NO_RELOC) and opting out of implicit synchronisation (using ASYNC), we
still spend a lot of time processing the arrays in execbuf, even though
we now should have nothing to do most of the time. One issue that
becomes readily apparent in profiling anv is that iterating over the
large execobj[] is unfriendly to the loop prefetchers of the CPU and it
much prefers iterating over a pair of arrays rather than one big array.

v2: Clear vma[] on construction to handle errors during vma lookup

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-3-chris@chris-wilson.co.uk
2017-08-18 11:57:36 +01:00
Chris Wilson 8bcbfb1281 drm/i915: Check context status before looking up our obj/vma
Since we keep the context around across the slow lookup where we may
drop the struct_mutex, we should double check that the context is still
valid upon reacquisition.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-08-18 11:57:13 +01:00
Chris Wilson f2f5c0610f drm/i915: Don't use MI_STORE_DWORD_IMM on Sandybridge/vcs
MI_STORE_DWORD_IMM just doesn't work on the video decode engine under
Sandybridge, so refrain from using it. Then switch the selftests over to
using the now common test prior to using MI_STORE_DWORD_IMM.

Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.13-rc1+
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-08-18 11:55:02 +01:00
Jason Ekstrand cf6e7bac63 drm/i915: Add support for drm syncobjs
This commit adds support for waiting on or signaling DRM syncobjs as
part of execbuf.  It does so by hijacking the currently unused cliprects
pointer to instead point to an array of i915_gem_exec_fence structs
which containe a DRM syncobj and a flags parameter which specifies
whether to wait on it or to signal it.  This implementation
theoretically allows for both flags to be set in which case it waits on
the dma_fence that was in the syncobj and then immediately replaces it
with the dma_fence from the current execbuf.

v2:
 - Rebase on new syncobj API
v3:
 - Pull everything out into helpers
 - Do all allocation in gem_execbuffer2
 - Pack the flags in the bottom 2 bits of the drm_syncobj*
v4:
 - Prevent a potential race on syncobj->fence

Testcase: igt/gem_exec_fence/syncobj*
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Link: https://patchwork.freedesktop.org/patch/msgid/1499289202-25441-1-git-send-email-jason.ekstrand@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170815145733.4562-1-chris@chris-wilson.co.uk
2017-08-15 16:46:57 +01:00
Chris Wilson b8f55be644 drm/i915: Split obj->cache_coherent to track r/w
Another month, another story in the cache coherency saga. This time, we
come to the realisation that i915_gem_object_is_coherent() has been
reporting whether we can read from the target without requiring a cache
invalidate; but we were using it in places for testing whether we could
write into the object without requiring a cache flush. So split the
tracking into two, one to decide before reads, one after writes.

See commit e27ab73d17 ("drm/i915: Mark CPU cache as dirty on every
transition for CPU writes") for the previous entry in this saga.

v2: Be verbose
v3: Remove unused function (i915_gem_object_is_coherent)
v4: Fix inverted coherency check prior to execbuf (from v2)
v5: Add comment for nasty code where we are optimising on gcc's behalf.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555
Testcase: igt/kms_mmap_write_crc
Testcase: igt/kms_pwrite_crc
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-15 15:46:57 +01:00
Chris Wilson 0f46daa1a2 drm/i915: Force CPU synchronisation even if userspace requests ASYNC
The goal here was to minimise doing any thing or any check inside the
kernel that was not strictly required. For a userspace that assumes
complete control over the cache domains, the kernel is usually using
outdated information and may trigger clflushes where none were
required.

However, swapping is a situation where userspace has no knowledge of the
domain transfer, and will leave the object in the CPU cache. The kernel
must flush this out to the backing storage prior to use with the GPU. As
we use an asynchronous task tracked by an implicit fence for this, we
also need to cancel the ASYNC flag on the object so that the object will
wait for the clflush to complete before being executed. This also absolves
userspace of the responsibility imposed by commit 77ae995789 ("drm/i915:
Enable userspace to opt-out of implicit fencing") that its needed to ensure
that the object was out of the CPU cache prior to use on the GPU.

Fixes: 77ae995789 ("drm/i915: Enable userspace to opt-out of implicit fencing")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101571
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721145037.25105-5-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:39:00 +02:00
Chris Wilson 1f727d9e72 drm/i915: Only skip updating execobject.offset after error
I was being overly paranoid in not updating the execobject.offset after
performing the fallback copy where we set reloc.presumed_offset to -1.
The thinking was to ensure that a subsequent NORELOC execbuf would be
forced to process the invalid relocations. However this is overkill so
long as we *only* update the execobject.offset following a successful
update of the relocation value witin the batch. If we have to repeat the
execbuf due to a later interruption, then we may skip the relocations on
the second pass (honouring NORELOC) since the execobject.offset match
the actual offsets (even though reloc.presumed_offset is garbage).

Subsequent calls to execbuf with NORELOC should themselves ensure that
the reloc.presumed_offset have been corrected in case of future
migration.

Reporting back the actual execobject.offset, even when
reloc.presumed_offset is garbage, ensures that reuse of those objects
use the latest information to avoid relocations.

Fixes: 2889caa923 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101635
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721145037.25105-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:39:00 +02:00
Chris Wilson 1da7b54c46 drm/i915: Only mark the execobject as pinned on success
If we fail to acquire a fence (for old school fenced GPU access) then we
unwind the vma reservation, including its pin. However, we were making
the execobject as holding the pin before erring out, leading to a double
unpin:

[ 3193.991802] kernel BUG at drivers/gpu/drm/i915/i915_vma.h:287!
[ 3193.998131] invalid opcode: 0000 [#1] PREEMPT SMP
[ 3194.002816] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_analog snd_hda_codec_generic coretemp snd_hda_codec snd_hwdep snd_hda_core snd_pcm lpc_ich mei_me e1000e mei prime_numbers ptp pps_core [last unloaded: i915]
[ 3194.022841] CPU: 0 PID: 8123 Comm: kms_flip Tainted: G     U          4.13.0-rc1-CI-CI_DRM_471+ #1
[ 3194.031765] Hardware name: Dell Inc. OptiPlex 755                 /0PU052, BIOS A04 11/05/2007
[ 3194.040343] task: ffff8800785d4c40 task.stack: ffffc90001768000
[ 3194.046339] RIP: 0010:eb_release_vmas.isra.6+0x119/0x180 [i915]
[ 3194.052234] RSP: 0018:ffffc9000176ba80 EFLAGS: 00010246
[ 3194.057439] RAX: 00000000000003c0 RBX: ffff8800710fc2d8 RCX: ffff8800588e4f48
[ 3194.064546] RDX: ffffffff1fffffff RSI: 00000000ffffffff RDI: ffff8800588e00d0
[ 3194.071654] RBP: ffffc9000176bab0 R08: 0000000000000000 R09: 0000000000000000
[ 3194.078761] R10: 0000000000000040 R11: 0000000000000001 R12: ffff880060822f00
[ 3194.085867] R13: 0000000000000310 R14: 00000000000003b8 R15: ffffc9000176bbb0
[ 3194.092975] FS:  00007fd2b94aba40(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000
[ 3194.101033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3194.106754] CR2: 00007ffbec3ff000 CR3: 0000000074e67000 CR4: 00000000000006f0
[ 3194.113861] Call Trace:
[ 3194.116321]  eb_relocate_slow+0x67/0x4e0 [i915]
[ 3194.120861]  i915_gem_do_execbuffer+0x429/0x1260 [i915]
[ 3194.126070]  ? lock_acquire+0xb5/0x210
[ 3194.129803]  ? __might_fault+0x39/0x90
[ 3194.133563]  i915_gem_execbuffer2+0x9b/0x1b0 [i915]
[ 3194.138447]  ? i915_gem_execbuffer+0x2b0/0x2b0 [i915]
[ 3194.143478]  drm_ioctl_kernel+0x64/0xb0
[ 3194.147298]  drm_ioctl+0x2cd/0x390
[ 3194.150710]  ? i915_gem_execbuffer+0x2b0/0x2b0 [i915]
[ 3194.155741]  ? finish_task_switch+0xa5/0x210
[ 3194.159993]  ? finish_task_switch+0x6a/0x210
[ 3194.164247]  do_vfs_ioctl+0x90/0x670
[ 3194.167806]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
[ 3194.172492]  ? __this_cpu_preempt_check+0x13/0x20
[ 3194.177176]  ? trace_hardirqs_on_caller+0xe7/0x1c0
[ 3194.181946]  SyS_ioctl+0x3c/0x70
[ 3194.185159]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 3194.189756] RIP: 0033:0x7fd2b76a8587
[ 3194.193314] RSP: 002b:00007fff074845b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 3194.200855] RAX: ffffffffffffffda RBX: ffffffff8146da43 RCX: 00007fd2b76a8587
[ 3194.207962] RDX: 00007fff074846e0 RSI: 0000000040406469 RDI: 0000000000000003
[ 3194.215068] RBP: ffffc9000176bf88 R08: 0000000000000000 R09: 0000000000000003
[ 3194.222175] R10: 00007fd2b796bb58 R11: 0000000000000246 R12: 00007fff07484880
[ 3194.229280] R13: 0000000000000003 R14: 0000000040406469 R15: 0000000000000000
[ 3194.236386]  ? __this_cpu_preempt_check+0x13/0x20
[ 3194.241070] Code: 24 b0 00 00 00 48 85 c9 0f 84 6c ff ff ff 8b 41 20 85 c0 7e 73 83 e8 01 89 41 20 41 8b 84 24 e8 00 00 00 a8 0f 0f 85 5f ff ff ff <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d f3 c3 49 8b 84
[ 3194.259943] RIP: eb_release_vmas.isra.6+0x119/0x180 [i915] RSP: ffffc9000176ba80
[ 3194.268047] ---[ end trace 1d7348c6575d8800 ]---
[ 3673.658819] softdog: Initiating panic
[ 3673.662471] Kernel panic - not syncing: Software Watchdog Timer expired
[ 3673.669066] Kernel Offset: disabled
[ 3673.672541] Rebooting in 1 seconds..

Reported-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Fixes: 2889caa923 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721145037.25105-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:59 +02:00