linux-stable/drivers/s390/cio
Peter Oberparleiter 5ef1dc40ff s390/cio: fix invalid -EBUSY on ccw_device_start
The s390 common I/O layer (CIO) returns an unexpected -EBUSY return code
when drivers try to start I/O while a path-verification (PV) process is
pending. This can lead to failed device initialization attempts with
symptoms like broken network connectivity after boot.

Fix this by replacing the -EBUSY return code with a deferred condition
code 1 reply to make path-verification handling consistent from a
driver's point of view.

The problem can be reproduced semi-regularly using the following process,
while repeating steps 2-3 as necessary (example assumes an OSA device
with bus-IDs 0.0.a000-0.0.a002 on CHPID 0.02):

1. echo 0.0.a000,0.0.a001,0.0.a002 >/sys/bus/ccwgroup/drivers/qeth/group
2. echo 0 > /sys/bus/ccwgroup/devices/0.0.a000/online
3. echo 1 > /sys/bus/ccwgroup/devices/0.0.a000/online ; \
   echo on > /sys/devices/css0/chp0.02/status

Background information:

The common I/O layer starts path-verification I/Os when it receives
indications about changes in a device path's availability. This occurs
for example when hardware events indicate a change in channel-path
status, or when a manual operation such as a CHPID vary or configure
operation is performed.

If a driver attempts to start I/O while a PV is running, CIO reports a
successful I/O start (ccw_device_start() return code 0). Then, after
completion of PV, CIO synthesizes an interrupt response that indicates
an asynchronous status condition that prevented the start of the I/O
(deferred condition code 1).

If a PV indication arrives while a device is busy with driver-owned I/O,
PV is delayed until after I/O completion was reported to the driver's
interrupt handler. To ensure that PV can be started eventually, CIO
reports a device busy condition (ccw_device_start() return code -EBUSY)
if a driver tries to start another I/O while PV is pending.

In some cases this -EBUSY return code causes device drivers to consider
a device not operational, resulting in failed device initialization.

Note: The code that introduced the problem was added in 2003. Symptoms
started appearing with the following CIO commit that causes a PV
indication when a device is removed from the cio_ignore list after the
associated parent subchannel device was probed, but before online
processing of the CCW device has started:

2297791c92 ("s390/cio: dont unregister subchannel from child-drivers")

During boot, the cio_ignore list is modified by the cio_ignore dracut
module [1] as well as Linux vendor-specific systemd service scripts[2].
When combined, this commit and boot scripts cause a frequent occurrence
of the problem during boot.

[1] https://github.com/dracutdevs/dracut/tree/master/modules.d/81cio_ignore
[2] https://github.com/SUSE/s390-tools/blob/master/cio_ignore.service

Cc: stable@vger.kernel.org # v5.15+
Fixes: 2297791c92 ("s390/cio: dont unregister subchannel from child-drivers")
Tested-By: Thorsten Winkler <twinkler@linux.ibm.com>
Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-22 15:53:15 +01:00
..
Makefile
airq.c s390/airq: remove lsi_mask from airq_struct 2023-08-30 11:03:28 +02:00
blacklist.c s390/cio: avoid excessive path-verification requests 2021-09-27 13:54:38 +02:00
blacklist.h
ccwgroup.c s390: fix various typos 2023-07-03 11:19:42 +02:00
ccwreq.c
chp.c s390/cio: export CMG value as decimal 2023-10-25 15:08:29 +02:00
chp.h
chsc.c s390/cio: make sch->lock spinlock pointer a member 2023-12-12 14:41:58 +01:00
chsc.h s390/cio: replace zero-length array with flexible-array member 2023-04-13 17:36:29 +02:00
chsc_sch.c s390/cio: make sch->lock spinlock pointer a member 2023-12-12 14:41:58 +01:00
chsc_sch.h
cio.c s390/cio: make sch->lock spinlock pointer a member 2023-12-12 14:41:58 +01:00
cio.h s390/cio: make sch->lock spinlock pointer a member 2023-12-12 14:41:58 +01:00
cio_debug.h
cio_debugfs.c
cio_inject.c
cio_inject.h
cmf.c
crw.c s390: use control register bit defines 2023-09-19 13:26:57 +02:00
css.c s390/cio: make sch->lock spinlock pointer a member 2023-12-12 14:41:58 +01:00
css.h s390/cio: evaluate devices with non-operational paths 2023-01-22 18:42:34 +01:00
device.c s390/cio: make sch->lock spinlock pointer a member 2023-12-12 14:41:58 +01:00
device.h
device_fsm.c s390: fix various typos 2023-07-03 11:19:42 +02:00
device_id.c s390/cio: sort out physical vs virtual pointers usage 2022-12-01 10:58:04 +01:00
device_ops.c s390/cio: fix invalid -EBUSY on ccw_device_start 2024-02-22 15:53:15 +01:00
device_pgid.c s390/cio: make sch->lock spinlock pointer a member 2023-12-12 14:41:58 +01:00
device_status.c s390/cio: sort out physical vs virtual pointers usage 2022-12-01 10:58:04 +01:00
eadm_sch.c s390/cio: make sch->lock spinlock pointer a member 2023-12-12 14:41:58 +01:00
eadm_sch.h
fcx.c s390/cio: sort out physical vs virtual pointers usage 2022-12-01 10:58:04 +01:00
idset.c
idset.h
io_sch.h
ioasm.c s390/extable: move EX_TABLE define to asm-extable.h 2022-03-08 00:33:00 +01:00
ioasm.h
isc.c s390/ctlreg: add local and system prefix to some functions 2023-09-19 13:26:56 +02:00
itcw.c s390/cio: sort out physical vs virtual pointers usage 2022-12-01 10:58:04 +01:00
orb.h
qdio.h s390/qdio: fix do_sqbs() inline assembly constraint 2023-05-17 15:20:17 +02:00
qdio_debug.c s390: move from strlcpy with unused retval to strscpy 2022-08-30 22:00:33 +02:00
qdio_debug.h
qdio_main.c s390/qdio: remove unneeded sanity check in qdio_do_sqbs() 2021-12-06 14:42:26 +01:00
qdio_setup.c s390/qdio: clarify handler logic for qdio_handle_activate_check() 2021-12-06 14:42:25 +01:00
qdio_thinint.c s390/airq: pass more TPI info to airq handlers 2022-07-11 09:54:10 +02:00
scm.c driver core: make struct bus_type.uevent() take a const * 2023-01-27 13:45:52 +01:00
trace.c
trace.h
vfio_ccw_async.c vfio/ccw: Remove private->mdev 2022-07-07 14:06:12 -06:00
vfio_ccw_chp.c eventfd: simplify eventfd_signal() 2023-11-28 14:08:38 +01:00
vfio_ccw_cp.c s390: fix various typos 2023-07-03 11:19:42 +02:00
vfio_ccw_cp.h vfio/ccw: simplify the cp_get_orb interface 2023-01-09 14:34:07 +01:00
vfio_ccw_drv.c s390 updates for 6.8 merge window 2024-01-10 18:18:20 -08:00
vfio_ccw_fsm.c s390/cio: make sch->lock spinlock pointer a member 2023-12-12 14:41:58 +01:00
vfio_ccw_ops.c eventfd: simplify eventfd_signal() 2023-11-28 14:08:38 +01:00
vfio_ccw_private.h vfio/ccw: replace one-element array with flexible-array member 2023-06-01 17:07:55 +02:00
vfio_ccw_trace.c
vfio_ccw_trace.h