The bspec documents multiple MCR ranges; make sure they're all captured
by the driver.
Bspec: 13991, 52079
Fixes: 592a7c5e08 ("drm/i915: Extend non readable mcr range")
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200311162300.1838847-2-matthew.d.roper@intel.com
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
(cherry picked from commit 415d126997)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
This reverts commit 36a6b5d964.
The commit takes care Wa_1604544889 which was fixed on a0 stepping based on
a0 replan. So no SW workaround is required on any stepping now.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Fixes: 36a6b5d964 ("drm/i915/tgl: Add extra hdc flush workaround")
Link: https://patchwork.freedesktop.org/patch/msgid/1c751032ce79c80c5485cae315f1a9904ce07cac.1583359940.git.caz.yokoyama@intel.com
(cherry picked from commit 175c4d9b3b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Record the initial active element we use when building the next ELSP
submission, so that we can compare against it latter to see if there's
no change.
Fixes: 44d0a9c05b ("drm/i915/execlists: Skip redundant resubmission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200311092624.10012-2-chris@chris-wilson.co.uk
(cherry picked from commit 60ef5b7ac6)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
If the cacheline may still be busy, atomically mark it for future
release, and only if we can determine that it will never be used again,
immediately free it.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1392
Fixes: ebece75392 ("drm/i915: Keep timeline HWSP allocated until idle across the system")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Link: https://patchwork.freedesktop.org/patch/msgid/20200306154647.3528345-1-chris@chris-wilson.co.uk
(cherry picked from commit 2d4bd971f5)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
If we stop filling the ELSP due to an incompatible virtual engine
request, check if we should enable the timeslice on behalf of the queue.
This fixes the case where we are inspecting the last->next element when
we know that the last element is the last request in the execution queue,
and so decided we did not need to enable timeslicing despite the intent
to do so!
Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306113012.3184606-1-chris@chris-wilson.co.uk
(cherry picked from commit 3df2deed41)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The alignment is u64, and yet is_power_of_2() assumes unsigned long,
which might give different results between 32b and 64b kernel.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305203534.210466-1-matthew.auld@intel.com
Cc: stable@vger.kernel.org
(cherry picked from commit 2920516b2f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Commit c3b5a8430d ("drm/i915/gvt: Enable gfx virtualiztion for CFL")
added the support on CFL. The vgpu emulation hotplug support on CFL was
supposed to be included in that patch. Without the vgpu emulation
hotplug support, the dma-buf based display gives us a blur face.
So fix this issue by adding the vgpu emulation hotplug support on CFL.
Fixes: c3b5a8430d ("drm/i915/gvt: Enable gfx virtualiztion for CFL")
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200227010041.32248-1-tina.zhang@intel.com
(cherry picked from commit 135dde8853)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Requests within a timeline are ordered by that timeline, so awaiting for
the start of a request within the timeline is a no-op. This used to work
by falling out of the mutex_trylock() as the signaler and waiter had the
same timeline and not returning an error.
Fixes: 6a79d84840 ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305134822.2750496-1-chris@chris-wilson.co.uk
(cherry picked from commit ab7a69020f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Fix the inverted test to emit the wait on the end of the previous
request if we /haven't/ already.
Fixes: 6a79d84840 ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305104210.2619967-1-chris@chris-wilson.co.uk
(cherry picked from commit 07e9c59d63)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The emulated vbt doesn't tell its size correctly. According to the
intel_vbt_defs.h, vbt_header.vbt_size should the size of VBT (VBT Header,
BDB Header and data blocks), and bdb_header.bdb_size should be the size
of BDB (BDB Header and data blocks).
This patch fixes the issue and lets vbt provided by GVT-g pass the guest
i915's sanity test.
v2: refine the commit message. (Zhenyu)
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200305131600.29640-1-tina.zhang@intel.com
As we have pinned the timeline (using tl->active_count), we can safely
drop the tl->mutex as we wait for what we believe to be the final
request on that timeline. This is useful for ensuring that we do not
block the engine heartbeat by hogging the kernel_context's timeline on a
dead GPU.
References: https://gitlab.freedesktop.org/drm/intel/issues/1364
Fixes: 058179e72e ("drm/i915/gt: Replace hangcheck by heartbeats")
Fixes: f33a8a5160 ("drm/i915: Merge wait_for_timelines with retire_request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200303140009.1494819-1-chris@chris-wilson.co.uk
(cherry picked from commit 82126e596d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We still need to wait for the initial OA configuration to happen
before we enable OA report writes to the OA buffer.
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 15d0ace1f8 ("drm/i915/perf: execute OA configuration from command stream")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1356
Testcase: igt/perf/stream-open-close
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200302085812.4172450-7-chris@chris-wilson.co.uk
(cherry picked from commit 4b4e973d5e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
From commit f25a49ab8a ("drm/i915/gvt: Use vgpu_lock to protect per
vgpu access") the vgpu idr destroy is moved later than vgpu resource
destroy, then it would fail to stop timer for schedule policy clean
which to check vgpu idr for any left vGPU. So this trys to destroy
vgpu idr earlier.
Cc: Colin Xu <colin.xu@intel.com>
Fixes: f25a49ab8a ("drm/i915/gvt: Use vgpu_lock to protect per vgpu access")
Acked-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200229055445.31481-1-zhenyuw@linux.intel.com
The assert_mmap_offset() returns type bool so if we return an error
pointer that is "return true;" or success. If we have an error, then
we should return false.
Fixes: 3d81d589d6 ("drm/i915: Test exhaustion of the mmap space")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200228141413.qfjf4abr323drlo4@kili.mountain
(cherry picked from commit efbf928824)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We need to be extremely careful inside i915_request_await_start() as it
needs to walk the list of requests in the foreign timeline with very
little protection. As we hold our own timeline mutex, we can not nest
inside the signaler's timeline mutex, so all that remains is our RCU
protection. However, to be safe we need to tell the compiler that we may
be traversing the list only under RCU protection, and furthermore we
need to start declaring requests as elements of the timeline from their
construction.
Fixes: 9ddc8ec027 ("drm/i915: Eliminate the trylock for awaiting an earlier request")
Fixes: 6a79d84840 ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-11-chris@chris-wilson.co.uk
(cherry picked from commit d22d2d073e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Wa_1608008084 is an additional WA that applies to writes on FF_MODE2
register. We can't read it back either from CPU or GPU. Since the other
bits should be 0, recommendation to handle Wa_1604555607 is to actually
just write the timer value.
Do a write only and don't try to read it, neither before or after
the WA is applied.
Fixes: ff690b2111 ("drm/i915/tgl: Implement Wa_1604555607")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200224191258.15668-1-lucas.demarchi@intel.com
(cherry picked from commit e94bda1432)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We need to explicitly set the TLB Request Timer initial value in the
BW_BUDDY registers to 0x8 rather than relying on the hardware default.
v2: Apply missing REG_FIELD_PREP to ensure 0x8 is placed in the correct
bits during the rmw. (Jose)
Bspec: 52890
Bspec: 50044
Fixes: 3fa01d642f ("drm/i915/tgl: Program BW_BUDDY registers during display init")
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200219215655.2923650-1-matthew.d.roper@intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit 87e04f7592)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200228004320.127142-2-matthew.d.roper@intel.com
Commit 60c6a14b48 ("drm/i915/display: Force the state compute phase
once to enable PSR") was forcing the state compute too earlier
causing errors because not everything was initialized, so here
moving to the end of i915_driver_modeset_probe() when the display is
all initialized.
Also fixing the place where it disarm the force probe as during the
atomic check phase errors could happen like the ones due locking and
it would cause PSR to never be enabled if that happens.
Leaving the disarm to the atomic commit phase, intel_psr_enable() or
intel_psr_update() will be called even if the current state do not
allow PSR to be enabled.
v2: Check if intel_dp is null in intel_psr_force_mode_changed_set()
v3: Check intel_dp before get dev_priv
v4:
- renamed intel_psr_force_mode_changed_set() to
intel_psr_set_force_mode_changed()
- removed the set parameter from intel_psr_set_force_mode_changed()
- not calling intel_psr_set_force_mode_changed() from
intel_psr_enable/update(), directly setting it after the same checks
that intel_psr_set_force_mode_changed() does
- moved intel_psr_set_force_mode_changed() arm call to
i915_driver_modeset_probe() as it is a better for a PSR call, all the
functions calls happening between the old and the new function call
will cause issue
[backported to v5.6-rc3]
Fixes: 60c6a14b48 ("drm/i915/display: Force the state compute phase once to enable PSR")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1151
Tested-by: Ross Zwisler <zwisler@google.com>
Reported-by: Ross Zwisler <zwisler@google.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200221212635.11614-1-jose.souza@intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20200227205540.126135-1-jose.souza@intel.com
(cherry picked from commit df1a5bfc16)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Call cond_resched() between each freed object in case we have a really,
really long list, and we don't want to block normal processes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200221100953.2587176-1-chris@chris-wilson.co.uk
(cherry picked from commit deeee411a9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Attempting to bind / unbind module from devices where we have both
integrated and discreete GPU handled by i915, will cause us to try and
double free the global state, hitting null ptr deref in free_event_attributes.
Let's move it to i915_pmu.
Fixes: 05488673a4 ("drm/i915/pmu: Support multiple GPUs")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200219161822.24592-2-michal.winiarski@intel.com
(cherry picked from commit 46129dc10f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Attempting to bind / unbind module from devices where we have both
integrated and discreete GPU handled by i915 can lead to leaks and
warnings from cpuhp:
Error: Removing state XXX which has instances left.
Let's move the state to i915_pmu.
Fixes: 05488673a4 ("drm/i915/pmu: Support multiple GPUs")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200219161822.24592-1-michal.winiarski@intel.com
(cherry picked from commit f5a179d468)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Full-ppgtt on gen7 is proving to be highly unstable and not robust.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/694
Fixes: 3cd6e8860e ("drm/i915/gen7: Re-enable full-ppgtt for ivb & hsw")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200224101120.4024481-1-chris@chris-wilson.co.uk
(cherry picked from commit 4fbe112a56)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
$(CC) with $(CFLAGS_GCOV) assumes the output filename with .gcno suffix
appended is writable. This is not the case when the output filename is
/dev/null:
HDRTEST drivers/gpu/drm/i915/display/intel_frontbuffer.h
/dev/null:1:0: error: cannot open /dev/null.gcno
HDRTEST drivers/gpu/drm/i915/display/intel_ddi.h
/dev/null:1:0: error: cannot open /dev/null.gcno
make[5]: *** [../drivers/gpu/drm/i915/Makefile:307:
drivers/gpu/drm/i915/display/intel_ddi.hdrtest] Error 1
make[5]: *** Waiting for unfinished jobs....
make[5]: *** [../drivers/gpu/drm/i915/Makefile:307:
drivers/gpu/drm/i915/display/intel_frontbuffer.hdrtest] Error 1
Filter out $(CFLAGS_GVOC) from the header test $(c_flags) as they don't
make sense here anyway.
References: http://lore.kernel.org/r/d8112767-4089-4c58-d7d3-2ce03139858a@infradead.org
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: c6d4a099a2 ("drm/i915: reimplement header test feature")
Cc: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200221105414.14358-1-jani.nikula@intel.com
(cherry picked from commit 408c1b3253)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Deleting dmabuf item's list head after releasing its container can lead
to KASAN-reported issue:
BUG: KASAN: use-after-free in __list_del_entry_valid+0x15/0xf0
Read of size 8 at addr ffff88818a4598a8 by task kworker/u8:3/13119
So fix this issue by puting deleting dmabuf_objs ahead of releasing its
container.
Fixes: dfb6ae4e14 ("drm/i915/gvt: Handle orphan dmabuf_objs")
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200225053527.8336-2-tina.zhang@intel.com
We manipulate ring->head while active in i915_request_retire underneath
the timeline manipulation. We cannot rely on a stable ring->head outside
of the timeline->mutex, in particular while setting up the context for
resume and reset.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1126
Fixes: 0881954965 ("drm/i915: Introduce intel_context.pin_mutex for pin management")
Fixes: e5dadff4b0 ("drm/i915: Protect request retirement with timeline->mutex")
References: f3c0efc9fe ("drm/i915/execlists: Leave resetting ring to intel_ring")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200211120131.958949-1-chris@chris-wilson.co.uk
(cherry picked from commit 42827350f7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
If we rewind the RING_TAIL on a context, due to a preemption event, we
must force the context restore for the RING_TAIL update to be properly
handled. Rather than note which preemption events may cause us to rewind
the tail, compare the new request's tail with the previously submitted
RING_TAIL, as it turns out that timeslicing was causing unexpected
rewinds.
<idle>-0 0d.s2 1280851190us : __execlists_submission_tasklet: 0000:00:02.0 rcs0: expired last=130:4698, prio=3, hint=3
<idle>-0 0d.s2 1280851192us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 66:119966, current 119964
<idle>-0 0d.s2 1280851195us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4698, current 4695
<idle>-0 0d.s2 1280851198us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4696, current 4695
^---- Note we unwind 2 requests from the same context
<idle>-0 0d.s2 1280851208us : __i915_request_submit: 0000:00:02.0 rcs0: fence 130:4696, current 4695
<idle>-0 0d.s2 1280851213us : __i915_request_submit: 0000:00:02.0 rcs0: fence 134:1508, current 1506
^---- But to apply the new timeslice, we have to replay the first request
before the new client can start -- the unexpected RING_TAIL rewind
<idle>-0 0d.s2 1280851219us : trace_ports: 0000:00:02.0 rcs0: submit { 130:4696*, 134:1508 }
synmark2-5425 2..s. 1280851239us : process_csb: 0000:00:02.0 rcs0: cs-irq head=5, tail=0
synmark2-5425 2..s. 1280851240us : process_csb: 0000:00:02.0 rcs0: csb[0]: status=0x00008002:0x00000000
^---- Preemption event for the ELSP update; note the lite-restore
synmark2-5425 2..s. 1280851243us : trace_ports: 0000:00:02.0 rcs0: preempted { 130:4698, 66:119966 }
synmark2-5425 2..s. 1280851246us : trace_ports: 0000:00:02.0 rcs0: promote { 130:4696*, 134:1508 }
synmark2-5425 2.... 1280851462us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4700, current 4695
synmark2-5425 2.... 1280852111us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4702, current 4695
synmark2-5425 2.Ns1 1280852296us : process_csb: 0000:00:02.0 rcs0: cs-irq head=0, tail=2
synmark2-5425 2.Ns1 1280852297us : process_csb: 0000:00:02.0 rcs0: csb[1]: status=0x00000814:0x00000000
synmark2-5425 2.Ns1 1280852299us : trace_ports: 0000:00:02.0 rcs0: completed { 130:4696!, 134:1508 }
synmark2-5425 2.Ns1 1280852301us : process_csb: 0000:00:02.0 rcs0: csb[2]: status=0x00000818:0x00000040
synmark2-5425 2.Ns1 1280852302us : trace_ports: 0000:00:02.0 rcs0: completed { 134:1508, 0:0 }
synmark2-5425 2.Ns1 1280852313us : process_csb: process_csb:2336 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists))
Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
Referenecs: 82c69bf586 ("drm/i915/gt: Detect if we miss WaIdleLiteRestore")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Link: https://patchwork.freedesktop.org/patch/msgid/20200207211452.2860634-1-chris@chris-wilson.co.uk
(cherry picked from commit 5ba32c7be8)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
drm_pci_alloc and drm_pci_free are just very thin wrappers around
dma_alloc_coherent, with a note that we should be removing them.
Furthermore since
commit de09d31dd3
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date: Fri Jan 15 16:51:42 2016 -0800
page-flags: define PG_reserved behavior on compound pages
As far as I can see there's no users of PG_reserved on compound pages.
Let's use PF_NO_COMPOUND here.
drm_pci_alloc has been declared broken since it mixes GFP_COMP and
SetPageReserved. Avoid this conflict by weaning ourselves off using the
abstraction and using the dma functions directly.
Reported-by: Taketo Kabe
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1027
Fixes: de09d31dd3 ("page-flags: define PG_reserved behavior on compound pages")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v4.5+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200202153934.3899472-1-chris@chris-wilson.co.uk
(cherry picked from commit c6790dc223)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Virtual engines are fleeting. They carry a reference count and may be freed
when their last request is retired. This makes them unsuitable for the
task of housing engine->retire.work so assert that it is not used.
Tvrtko tracked down an instance where we did indeed violate this rule.
In virtual_submit_request, we flush a completed request directly with
__i915_request_submit and this causes us to queue that request on the
veng's breadcrumb list and signal it. Leading us down a path where we
should not attach the retire.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: dc93c9b693 ("drm/i915/gt: Schedule request retirement when signaler idles")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200206204915.2636606-1-chris@chris-wilson.co.uk
(cherry picked from commit f91d8156ab)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We lack full state readout of DSC config, which may lead to DSC enable
using a config that's all zeros, failing spectacularly. Force full
modeset and thus compute config at probe to get a sane state, until we
implement DSC state readout. Any fastset that did appear to work with
DSC at probe, worked by coincidence. [1] is an example of a change that
triggered the issue on TGL DSI DSC.
[1] http://patchwork.freedesktop.org/patch/msgid/20200212150102.7600-1-ville.syrjala@linux.intel.com
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Vandita Kulkarni <vandita.kulkarni@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: fbacb15ea8 ("drm/i915/dsc: add basic hardware state readout support")
Acked-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200213140412.32697-3-stanislav.lisovskiy@intel.com
(cherry picked from commit a4277aa398)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Voltage level depends not only on the cdclk, but also on the DDI clock.
Last time the bspec voltage level table for EHL was updated, we only
updated the cdclk requirements, but forgot to account for the new port
clock criteria.
Bspec: 21809
Fixes: d147483884 ("drm/i915/ehl: Update voltage level checks")
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200207001417.1229251-1-matthew.d.roper@intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit 9d5fd37ed7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Inside the intel_timeline_get_seqno(), we currently track the retirement
of the old cachelines by listening to the new request. This requires
that the new request is ready to be used and so requires a minimum bit
of initialisation prior to getting the new seqno.
Fixes: b1e3177bd1 ("drm/i915: Coordinate i915_active with its own mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200203094152.4150550-2-chris@chris-wilson.co.uk
(cherry picked from commit 855e39e65c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
To enable non-persistent contexts, we require a means of cancelling any
inflight work from that context. This is first done "gracefully" by
using preemption to kick the active context off the engine, and then
forcefully by resetting the engine if it is active. If we are unable to
reset the engine to remove hostile userspace, we should not allow
userspace to opt into using non-persistent contexts.
If the per-engine reset fails, we still do a full GPU reset, but that is
rare and usually indicative of much deeper issues. The damage is already
done. However, the goal of the interface to allow long running compute
jobs without causing collateral damage elsewhere, and if we are unable
to support that we should make that known by not providing the
interface (and falsely pretending we can).
Fixes: a0e047156c ("drm/i915/gem: Make context persistence optional")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130164553.1937718-1-chris@chris-wilson.co.uk
(cherry picked from commit d1b9b5f127)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFWWmW3ewYy4RJOWc05gHnSar7m8FAl5FGuMACgkQ05gHnSar
7m8vRxAAobELoLLkdf44Cf4f8hgIxLRYiWxSVavh6ucFXBlGEWpGVwf2t8fM6ZBB
yBNJWcxun6F8wy8aHZxjAHSK/LDL5sKUyMdU+GCUthkgtJxM/SYTJLvFL1y+Eacr
PuQ50IXQFHRRQI1Cp6kVo9Y91/oU2LuWzrX82ZOIcxglO35A8vm3iT4Ggno3cDli
1vAl1VIbXQX2GKhm1y4dGK2/lzbeN4byqJNpGQIq+1PDBEVgNsOPXRMhNLBFqIhA
yVn/t1Z780KSTh8Oa24xkLSFKj4y0Yj7TDdkmIsaxPADqxy6Ptiuysf+scuPEpOS
epRG3R3Dtajb+ZHzV2A5TmVAlgEvSDBKWKDA9wBzMIEKS8m5eW1UoDuJ4JhRy/IR
ZNVcPNRAX61owmjEhlncQh9Mx8cUF3ku1Oup17/cm5o9Tcphubl6ilGmC5JAO3zj
rX6NUyxbp4h9Gv6kY1eQfXtAe8Vo+vwejStew4ajo/r2PdAlRWzDUXyn6K7kXRb2
3btgaVKulLAQQayP5FPp3LXvyaU4/Zg6QYKaV+5sXDDy/onvwoK4m1z9dxxC511a
0DnpZOIIX0eVL5p7/FcIkfMan5wKK2QiWYfKe2jVC/9TooxK2g4brp+ImCXZlt8t
kT/M1sYKkblUy4KN+f/asKLM85jzRjw5lH7LPiCwChj8KVWS7BQ=
=BZol
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-next-fixes-2020-02-13' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.6-rc2
Most of these were aimed at a "next fixes" pull already during the merge
window, but there were issues with the baseline I used, which resulted
in a lot of issues in CI. I've regenerated this stuff piecemeal now,
adding gradually to it, and it seems healthy now.
Due to the issues this is much bigger than I'd like. But it was
obviously necessary to take the time to ensure it's not garbage...
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/878sl6yfrn.fsf@intel.com
Keep the rq->fence.flags consistent with the status of the
rq->sched.link, and clear the associated bits when decoupling the link
on retirement (as we may wish to inspect those flags independent of
other state).
Fixes: c3f1ed90e6 ("drm/i915/gt: Allow temporary suspension of inflight requests")
References: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-3-chris@chris-wilson.co.uk
(cherry picked from commit b4a9a149f9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
If we encounter a hang on a virtual engine, as we process the hang the
request may already have been moved back to the virtual engine (we are
processing the hang on the physical engine). We need to reclaim the
request from the virtual engine so that the locking is consistent and
local to the real engine on which we will hold the request for error
state capturing.
v2: Pull the reclamation into execlists_hold() and assert that cannot be
called from outside of the reset (i.e. with the tasklet disabled).
v3: Added selftest
v4: Drop the reference owned by the virtual engine
Fixes: ad18ba7b5e ("drm/i915/execlists: Offline error capture")
Testcase: igt/gem_exec_balancer/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-2-chris@chris-wilson.co.uk
(cherry picked from commit 989df3a7bd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Thanks to preempt-to-busy, we leave the request on the HW as we submit
the preemption request. This means that the request may complete at any
moment as we process HW events, and in particular the request may be
retired as we are planning to capture it for a preemption timeout.
Be more careful while obtaining the request to capture after a
preemption timeout, and check to see if it completed before we were able
to put it on the on-hold list. If we do see it did complete just before
we capture the request, proclaim the preemption-timeout a false positive
and pardon the reset as we should hit an arbitration point momentarily
and so be able to process the preemption.
Note that even after we move the request to be on hold it may be retired
(as the reset to stop the HW comes after), so we do require to hold our
own reference as we work on the request for capture (and all of the
peeking at state within the request needs to be carefully protected).
Fixes: c3f1ed90e6 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-1-chris@chris-wilson.co.uk
(cherry picked from commit 4ba5c086a1)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Currently, we skip error capture upon forced preemption. We apply forced
preemption when there is a higher priority request that should be
running but is being blocked, and we skip inline error capture so that
the preemption request is not further delayed by a user controlled
capture -- extending the denial of service.
However, preemption reset is also used for heartbeats and regular GPU
hangs. By skipping the error capture, we remove the ability to debug GPU
hangs.
In order to capture the error without delaying the preemption request
further, we can do an out-of-line capture by removing the guilty request
from the execution queue and scheduling a worker to dump that request.
When removing a request, we need to remove the entire context and all
descendants from the execution queue, so that they do not jump past.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/738
Fixes: 3a7a92aba8 ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-3-chris@chris-wilson.co.uk
(cherry picked from commit 748317386a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
In order to support out-of-line error capture, we need to remove the
active request from HW and put it to one side while a worker compresses
and stores all the details associated with that request. (As that
compression may take an arbitrary user-controlled amount of time, we
want to let the engine continue running on other workloads while the
hanging request is dumped.) Not only do we need to remove the active
request, but we also have to remove its context and all requests that
were dependent on it (both in flight, queued and future submission).
Finally once the capture is complete, we need to be able to resubmit the
request and its dependents and allow them to execute.
v2: Replace stack recursion with a simple list.
v3: Check all the parents, not just the first, when searching for a
stuck ancestor!
References: https://gitlab.freedesktop.org/drm/intel/issues/738
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk
(cherry picked from commit 32ff621fd7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
If we keep track of when the i915_request.sched.link is on the HW
runlist, or in the priority queue we can simplify our interactions with
the request (such as during rescheduling). This also simplifies the next
patch where we introduce a new in-between list, for requests that are
ready but neither on the run list or in the queue.
v2: Update i915_sched_node.link explanation for current usage where it
is a link on both the queue and on the runlists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
(cherry picked from commit 672c368f93)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Only the first and the last nodes were being added to
ref->preallocated_barriers.
Renaming variables to make it more easy to read.
Fixes: 8413502238 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200129232345.84512-1-jose.souza@intel.com
(cherry picked from commit d4c3c0b822)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>