linux-stable/drivers
Jeffrey Hugo bce3f77068 bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state
When processing a SYSERR, if the device does not respond to the MHI_RESET
from the host, the host will be stuck in a difficult to recover state.
The host will remain in MHI_PM_SYS_ERR_PROCESS and not clean up the host
channels.  Clients will not be notified of the SYSERR via the destruction
of their channel devices, which means clients may think that the device is
still up.  Subsequent SYSERR events such as a device fatal error will not
be processed as the state machine cannot transition from PROCESS back to
DETECT.  The only way to recover from this is to unload the mhi module
(wipe the state machine state) or for the mhi controller to initiate
SHUTDOWN.

This issue was discovered by stress testing soc_reset events on AIC100
via the sysfs node.

soc_reset is processed entirely in hardware.  When the register write
hits the endpoint hardware, it causes the soc to reset without firmware
involvement.  In stress testing, there is a rare race where soc_reset N
will cause the soc to reset and PBL to signal SYSERR (fatal error).  If
soc_reset N+1 is triggered before PBL can process the MHI_RESET from the
host, then the soc will reset again, and re-run PBL from the beginning.
This will cause PBL to lose all state.  PBL will be waiting for the host
to respond to the new syserr, but host will be stuck expecting the
previous MHI_RESET to be processed.

Additionally, the AMSS EE firmware (QSM) was hacked to synthetically
reproduce the issue by simulating a FW hang after the QSM issued a
SYSERR.  In this case, soc_reset would not recover the device.

For this failure case, to recover the device, we need a state similar to
PROCESS, but can transition to DETECT.  There is not a viable existing
state to use.  POR has the needed transitions, but assumes the device is
in a good state and could allow the host to attempt to use the device.
Allowing PROCESS to transition to DETECT invites the possibility of
parallel SYSERR processing which could get the host and device out of
sync.

Thus, invent a new state - MHI_PM_SYS_ERR_FAIL

This essentially a holding state.  It allows us to clean up the host
elements that are based on the old state of the device (channels), but
does not allow us to directly advance back to an operational state.  It
does allow the detection and processing of another SYSERR which may
recover the device, or allows the controller to do a clean shutdown.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240112180800.536733-1-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-01-30 23:52:40 +05:30
..
accel
accessibility
acpi
amba
android
ata
atm
auxdisplay
base
bcma
block
bluetooth
bus bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state 2024-01-30 23:52:40 +05:30
cache
cdrom
cdx
char
clk
clocksource
comedi
connector
counter
cpufreq
cpuidle
crypto
cxl
dax
dca
devfreq
dio
dma dmaengine fixes for v6.8-rc1 2024-01-20 15:03:25 -08:00
dma-buf
dpll
edac
eisa
extcon
firewire
firmware Revert "firmware/sysfb: Clear screen_info state after consuming it" 2024-01-19 22:22:26 +01:00
fpga
fsi
gnss
gpio
gpu drm fixes for 6.8-rc1 2024-01-19 11:50:00 -08:00
greybus
hid
hsi
hte
hv
hwmon
hwspinlock
hwtracing
i2c
i3c
idle
iio
infiniband
input
interconnect
iommu
ipack
irqchip
isdn
leds
macintosh
mailbox
mcb
md
media media: solo6x10: replace max(a, min(b, c)) by clamp(b, a, c) 2024-01-20 13:30:00 -08:00
memory
memstick
message
mfd
misc
mmc
most
mtd
mux
net
nfc
ntb
nubus
nvdimm
nvme
nvmem
of
opp
parisc
parport
pci
pcmcia
peci
perf
phy
pinctrl
platform
pmdomain
pnp
power
powercap
pps
ps3
ptp
pwm
rapidio
ras
regulator
remoteproc
reset
rpmsg
rtc
s390
sbus
scsi SCSI misc on 20240120 2024-01-20 09:42:32 -08:00
sh
siox
slimbus
soc
soundwire
spi spi: Fix for v6.8 2024-01-19 12:50:09 -08:00
spmi
ssb
staging
target SCSI misc on 20240120 2024-01-20 09:42:32 -08:00
tc
tee
thermal
thunderbolt
tty RISC-V Patches for the 6.8 Merge Window, Part 4 2024-01-20 11:06:04 -08:00
ufs SCSI misc on 20240120 2024-01-20 09:42:32 -08:00
uio
usb
vdpa
vfio
vhost
video
virt
virtio
w1
watchdog
xen
zorro
Kconfig
Makefile