linux-stable/drivers
Chris Wilson 9a4dc80399 drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset
Inside the live_hangcheck (reset) selftests, we occasionally see
failures like

<7>[  239.094840] i915_gem_set_wedged rcs0
<7>[  239.094843] i915_gem_set_wedged 	current seqno 19a98, last 19a9a, hangcheck 0 [5158 ms]
<7>[  239.094846] i915_gem_set_wedged 	Reset count: 6239 (global 1)
<7>[  239.094848] i915_gem_set_wedged 	Requests:
<7>[  239.095052] i915_gem_set_wedged 		first  19a99 [e8c:5f] prio=1024 @ 5159ms: (null)
<7>[  239.095056] i915_gem_set_wedged 		last   19a9a [e81:1a] prio=139 @ 5159ms: igt/rcs0[5977]/1
<7>[  239.095059] i915_gem_set_wedged 		active 19a99 [e8c:5f] prio=1024 @ 5159ms: (null)
<7>[  239.095062] i915_gem_set_wedged 		[head 0220, postfix 0280, tail 02a8, batch 0xffffffff_ffffffff]
<7>[  239.100050] i915_gem_set_wedged 		ring->start:  0x00283000
<7>[  239.100053] i915_gem_set_wedged 		ring->head:   0x000001f8
<7>[  239.100055] i915_gem_set_wedged 		ring->tail:   0x000002a8
<7>[  239.100057] i915_gem_set_wedged 		ring->emit:   0x000002a8
<7>[  239.100059] i915_gem_set_wedged 		ring->space:  0x00000f10
<7>[  239.100085] i915_gem_set_wedged 	RING_START: 0x00283000
<7>[  239.100088] i915_gem_set_wedged 	RING_HEAD:  0x00000260
<7>[  239.100091] i915_gem_set_wedged 	RING_TAIL:  0x000002a8
<7>[  239.100094] i915_gem_set_wedged 	RING_CTL:   0x00000001
<7>[  239.100097] i915_gem_set_wedged 	RING_MODE:  0x00000300 [idle]
<7>[  239.100100] i915_gem_set_wedged 	RING_IMR: fffffefe
<7>[  239.100104] i915_gem_set_wedged 	ACTHD:  0x00000000_0000609c
<7>[  239.100108] i915_gem_set_wedged 	BBADDR: 0x00000000_0000609d
<7>[  239.100111] i915_gem_set_wedged 	DMA_FADDR: 0x00000000_00283260
<7>[  239.100114] i915_gem_set_wedged 	IPEIR: 0x00000000
<7>[  239.100117] i915_gem_set_wedged 	IPEHR: 0x02800000
<7>[  239.100120] i915_gem_set_wedged 	Execlist status: 0x00044052 00000002
<7>[  239.100124] i915_gem_set_wedged 	Execlist CSB read 5 [5 cached], write 5 [5 from hws], interrupt posted? no, tasklet queued? no (enabled)
<7>[  239.100128] i915_gem_set_wedged 		ELSP[0] count=1, ring->start=00283000, rq: 19a99 [e8c:5f] prio=1024 @ 5164ms: (null)
<7>[  239.100132] i915_gem_set_wedged 		ELSP[1] count=1, ring->start=00257000, rq: 19a9a [e81:1a] prio=139 @ 5164ms: igt/rcs0[5977]/1
<7>[  239.100135] i915_gem_set_wedged 		HW active? 0x5
<7>[  239.100250] i915_gem_set_wedged 		E 19a99 [e8c:5f] prio=1024 @ 5164ms: (null)
<7>[  239.100338] i915_gem_set_wedged 		E 19a9a [e81:1a] prio=139 @ 5164ms: igt/rcs0[5977]/1
<7>[  239.100340] i915_gem_set_wedged 		Queue priority: 139
<7>[  239.100343] i915_gem_set_wedged 		Q 0 [e98:19] prio=132 @ 5164ms: igt/rcs0[5977]/8
<7>[  239.100346] i915_gem_set_wedged 		Q 0 [e84:19] prio=121 @ 5165ms: igt/rcs0[5977]/2
<7>[  239.100349] i915_gem_set_wedged 		Q 0 [e87:19] prio=82 @ 5165ms: igt/rcs0[5977]/3
<7>[  239.100352] i915_gem_set_wedged 		Q 0 [e84:1a] prio=44 @ 5164ms: igt/rcs0[5977]/2
<7>[  239.100356] i915_gem_set_wedged 		Q 0 [e8b:19] prio=20 @ 5165ms: igt/rcs0[5977]/4
<7>[  239.100362] i915_gem_set_wedged 	drv_selftest [5894] waiting for 19a99

where the GPU saw an arbitration point and idles; AND HAS NOT BEEN RESET!
The RING_MODE indicates that is idle and has the STOP_RING bit set, so
try clearing it.

v2: Only clear the bit on restarting the ring, as we want to be sure the
STOP_RING bit is kept if reset fails on wedging.
v3: Spot when the ring state doesn't make sense when re-initialising the
engine and dump it to the logs so that we don't have to wait for an
error later and try to guess what happened earlier.
v4: Prepare to print all the unexpected state, not just the first.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180518100933.2239-1-chris@chris-wilson.co.uk
2018-05-25 09:51:49 +01:00
..
accessibility
acpi ACPI fixes for 4.17-rc3 2018-04-26 11:06:36 -07:00
amba ARM: amba: Fix race condition with driver_override 2018-04-26 10:35:04 +02:00
android ANDROID: binder: prevent transactions into own process. 2018-04-23 12:12:41 +02:00
ata
atm atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit" 2018-04-19 13:41:49 -04:00
auxdisplay
base Driver core fixes for 4.17-rc3 2018-04-27 10:12:20 -07:00
bcma
block for-linus-20180425 2018-04-25 21:05:15 -07:00
bluetooth
bus HISI LPC: Add Kconfig MFD_CORE dependency 2018-04-26 16:53:23 +02:00
cdrom cdrom: information leak in cdrom_ioctl_media_changed() 2018-04-18 08:21:32 -06:00
char virtio: fixups 2018-04-26 16:36:11 -07:00
clk The large diff this time around is from the addition of a new clk driver 2018-04-13 15:51:06 -07:00
clocksource clocksource/imx-tpm: Correct -ETIME return condition check 2018-04-19 13:21:35 +02:00
connector
cpufreq powerpc fixes for 4.17 #4 2018-04-28 09:45:34 -07:00
cpuidle cpuidle: menu: Avoid selecting shallow states with stopped tick 2018-04-09 11:54:57 +02:00
crypto .gitignore: move *-asn1.[ch] patterns to the top-level .gitignore 2018-04-07 19:04:02 +09:00
dax device-dax: allow MAP_SYNC to succeed 2018-04-19 15:11:50 -07:00
dca
devfreq
dio
dma DMAengine updates for v4.17-rc1 2018-04-10 12:14:37 -07:00
dma-buf
edac * Add NVDIMM support to EDAC (Tony Luck) 2018-04-05 14:21:13 -07:00
eisa
extcon Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
firewire
firmware firmware: arm_scmi: remove redundant null check on array 2018-04-16 10:15:58 +01:00
fmc
fpga fpga-manager: altera-ps-spi: preserve nCONFIG state 2018-04-23 13:27:05 +02:00
fsi
gpio DeviceTree updates for 4.17: 2018-04-05 21:03:42 -07:00
gpu drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset 2018-05-25 09:51:49 +01:00
hid HID: i2c-hid: fix inverted return value from i2c_hid_command() 2018-04-19 09:25:15 +02:00
hsi
hv ARM: 2018-04-09 11:42:31 -07:00
hwmon hwmon: (k10temp) Add support for AMD Ryzen w/ Vega graphics 2018-04-25 05:31:06 -07:00
hwspinlock
hwtracing Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
i2c i2c: sprd: Fix the i2c count issue 2018-04-27 14:12:43 +02:00
ide for-4.17/block-20180402 2018-04-05 14:27:02 -07:00
idle
iio This is the bulk of GPIO changes for the v4.17 kernel cycle: 2018-04-05 09:51:41 -07:00
infiniband Merge candidates for 4.17 merge window 2018-04-06 17:35:43 -07:00
input Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME 2018-04-26 14:53:32 +02:00
iommu IOMMU Updates for Linux v4.17 2018-04-11 18:50:41 -07:00
ipack
irqchip IOMMU Updates for Linux v4.17 2018-04-11 18:50:41 -07:00
isdn mISDN: Remove VLAs 2018-04-12 21:46:10 -04:00
leds
lightnvm
macintosh powerpc updates for 4.17 2018-04-07 12:08:19 -07:00
mailbox
mcb
md Merge tag 'md/4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2018-04-20 10:39:44 -07:00
media remoteproc updates for v4.17 2018-04-10 12:09:27 -07:00
memory ARM: OMAP2+: Fix build when using split object directories 2018-04-18 10:07:13 -07:00
memstick
message scsi: mptsas: Disable WRITE SAME 2018-04-18 23:37:25 -04:00
mfd platform/chrome: mfd/cros_ec_dev: Add sysfs entry to set keyboard wake lid angle 2018-04-10 22:25:07 -07:00
misc * Fix 2032 time access issues and new compiler warnings 2018-04-12 10:21:19 -07:00
mmc MMC host: 2018-04-20 10:41:31 -07:00
mtd mtd: rawnand: marvell: fix the chip-select DT parsing logic 2018-04-26 19:06:42 +02:00
mux
net Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 2018-04-24 16:17:59 -04:00
nfc
ntb
nubus
nvdimm Revert "libnvdimm, of_pmem: workaround OF_NUMA=n build error" 2018-04-19 15:10:56 -07:00
nvme nvme: expand nvmf_check_if_ready checks 2018-04-12 09:58:27 -06:00
nvmem Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
of earlycon: Use a pointer table to fix __earlycon_table stride 2018-04-23 10:06:59 +02:00
opp
oprofile
parisc
parport Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
pci pci-v4.17-fixes-1 2018-04-26 16:28:24 -07:00
pcmcia Merge branch 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-04-09 09:26:36 -07:00
perf ARM: SoC driver updates for 4.17 2018-04-05 21:29:35 -07:00
phy ARM: SoC platform updates for 4.17 2018-04-05 21:21:08 -07:00
pinctrl This is the bulk of GPIO changes for the v4.17 kernel cycle: 2018-04-05 09:51:41 -07:00
platform Changes to chrome-platform for v4.17 2018-04-13 16:20:36 -07:00
pnp
power ARM: SoC platform updates for 4.17 2018-04-05 21:21:08 -07:00
powercap
pps
ps3
ptp
pwm pwm: Changes for v4.17-rc1 2018-04-13 15:46:21 -07:00
rapidio rapidio: fix rio_dma_transfer error handling 2018-04-20 17:18:35 -07:00
ras
regulator
remoteproc remoteproc: fix null pointer dereference on glink only platforms 2018-04-05 22:53:16 -07:00
reset
rpmsg
rtc rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops 2018-04-25 13:24:13 +10:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2018-04-26 10:29:46 -07:00
sbus
scsi scsi: sd_zbc: Avoid that resetting a zone fails sporadically 2018-04-19 00:04:10 -04:00
sfi
sh
siox
slimbus slimbus: Fix out-of-bounds access in slim_slicesize() 2018-04-23 13:40:15 +02:00
sn
soc soc: bcm: raspberrypi-power: Fix use of __packed 2018-04-16 15:15:23 -07:00
soundwire
spi
spmi
ssb
staging drm-misc-next for v4.18: 2018-04-30 09:32:43 +10:00
target scsi: target: fix crash with iscsi target and dvd 2018-04-19 00:41:03 -04:00
tc
tee
thermal Merge branches 'thermal-core' and 'thermal-soc' into next 2018-04-13 14:11:53 +08:00
thunderbolt
tty tty: Use __GFP_NOFAIL for tty_ldisc_get() 2018-04-25 15:03:44 +02:00
uio uio_hv_generic: fix subchannel ring mmap 2018-04-23 12:43:48 +02:00
usb USB-serial fixes for v4.17-rc3 2018-04-26 19:29:24 +02:00
uwb
vfio VFIO updates for v4.17-rc1 2018-04-06 19:44:27 -07:00
vhost vhost: return bool from *_access_ok() functions 2018-04-11 10:54:06 -04:00
video fbdev changes for v4.17: 2018-04-10 10:20:00 -07:00
virt virt: vbox: Log an error when we fail to get the host version 2018-04-23 13:41:55 +02:00
virtio virtio: feature 2018-04-11 18:58:27 -07:00
visorbus
vlynq
vme
w1
watchdog aspeed: watchdog: Set bootstatus during probe 2018-04-16 10:22:40 +02:00
xen xen: fixes and one header update for 4.17-rc2 2018-04-20 08:36:04 -07:00
zorro
Kconfig
Makefile