linux-stable/drivers/scsi
Brian King c03ecc192c scsi: ibmvfc: Set default timeout to avoid crash during migration
[ Upstream commit 764907293e ]

While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:

CPU 0                          CPU 1
-------------------------------  ----------------------------------
scsi_queue_rq                    migration_store
 -> blk_mq_start_request          -> rtas_ibm_suspend_me
  -> blk_add_timer                 -> on_each_cpu(rtas_percpu_suspend_me
              _______________________________________V
             |
             V
    -> IPI from CPU 1
     -> rtas_percpu_suspend_me
                                     -> __rtas_suspend_last_cpu

-- Linux partition suspended for > 30 seconds --
                                      -> for_each_online_cpu(cpu)
                                           plpar_hcall_norets(H_PROD
 -> scsi_dispatch_cmd
                                      -> scsi_times_out
                                       -> scsi_abort_command
                                        -> queue_delayed_work
  -> ibmvfc_queuecommand_lck
   -> ibmvfc_send_event
    -> ibmvfc_send_crq
     - returns H_CLOSED
   <- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request

                                      -> scmd_eh_abort_handler
                                       -> scsi_try_to_abort_cmd
                                         - returns SUCCESS
                                       -> scsi_queue_insert

Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.

In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.

The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.

Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-07 15:35:48 +01:00
..
aacraid scsi: aacraid: Fix error handling paths in aac_probe_one() 2020-10-01 13:17:56 +02:00
aic7xxx scsi: aic7xxx: Adjust indentation in ahc_find_syncrate 2020-02-24 08:36:38 +01:00
aic94xx scsi: aic94xx: Remove unnecessary null check 2019-07-30 12:12:59 -04:00
arcmsr
arm scsi: eesox: Fix different dev_id between request_irq() and free_irq() 2020-08-19 08:16:09 +02:00
be2iscsi scsi: be2iscsi: Revert "Fix a theoretical leak in beiscsi_create_eqs()" 2020-12-16 10:56:58 +01:00
bfa scsi: bfa: Fix error return in bfad_pci_init() 2020-10-29 09:57:57 +01:00
bnx2fc SCSI fixes on 20191004 2019-10-05 12:53:27 -07:00
bnx2i scsi: bnx2i: Requires MMU 2020-12-30 11:50:53 +01:00
csiostor scsi: csiostor: Fix wrong return value in csio_hw_prep_fw() 2020-10-29 09:57:37 +01:00
cxgbi scsi: cxgb4i: Fix TLS dependency 2021-01-06 14:48:38 +01:00
cxlflash scsi: cxlflash: Fix error return code in cxlflash_probe() 2020-10-01 13:18:02 +02:00
device_handler scsi: scsi_dh_alua: Avoid crash during alua_bus_detach() 2020-11-18 19:20:23 +01:00
dpt
esas2r scsi: esas2r: unlock on error in esas2r_nvram_read_direct() 2020-01-23 08:22:58 +01:00
fcoe scsi: fcoe: Memory leak fix in fcoe_sysfs_fcf_del() 2020-09-03 11:26:47 +02:00
fnic scsi: fnic: Fix memleak in vnic_dev_init_devcmd2 2021-02-07 15:35:48 +01:00
hisi_sas scsi: hisi_sas: Do not reset phy timer to wait for stray phy up 2020-06-24 17:50:15 +02:00
ibmvscsi scsi: ibmvfc: Set default timeout to avoid crash during migration 2021-02-07 15:35:48 +01:00
ibmvscsi_tgt scsi: ibmvscsi_tgt: Mark expected switch fall-throughs 2019-07-30 15:59:53 -04:00
isci scsi: libsas: aic94xx: hisi_sas: mvsas: pm8001: Use dev_is_expander() 2019-06-20 15:37:02 -04:00
libfc scsi: libfc: Avoid invoking response handler twice if ep is already completed 2021-02-07 15:35:48 +01:00
libsas scsi: libsas: Fix error path in sas_notify_lldd_dev_found() 2020-09-23 12:40:40 +02:00
lpfc scsi: lpfc: Make lpfc_defer_acc_rsp static 2021-01-23 15:57:55 +01:00
megaraid scsi: megaraid_sas: Fix MEGASAS_IOC_FIRMWARE regression 2021-01-27 11:47:47 +01:00
mpt3sas scsi: mpt3sas: Increase IOCInit request timeout to 30s 2020-12-30 11:50:57 +01:00
mvsas SCSI misc on 20190709 2019-07-11 15:14:01 -07:00
pcmcia SCSI sg on 20190709 2019-07-11 15:17:41 -07:00
pm8001 scsi: pm80xx: Fix error return in pm8001_pci_probe() 2020-12-30 11:51:21 +01:00
qedf scsi: qedf: Return SUCCESS if stale rport is encountered 2020-10-29 09:58:07 +01:00
qedi scsi: qedi: Correct max length of CHAP secret 2021-01-27 11:47:43 +01:00
qla2xxx scsi: qla2xxx: Fix crash during driver load on big endian machines 2020-12-30 11:51:43 +01:00
qla4xxx scsi: qla4xxx: Fix an error handling path in 'qla4xxx_get_host_stats()' 2020-10-29 09:57:36 +01:00
smartpqi scsi: smartpqi: Avoid crashing kernel for controller issues 2020-10-29 09:58:09 +01:00
snic
sym53c8xx_2 scsi: sym53c8xx_2: remove redundant assignment to retv 2019-08-12 21:58:07 -04:00
ufs scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback 2021-01-27 11:47:43 +01:00
.gitignore
3w-9xxx.c
3w-9xxx.h
3w-sas.c
3w-sas.h
3w-xxxx.c
3w-xxxx.h
53c700.c
53c700.h
53c700.scr
53c700_d.h_shipped
a100u2w.c
a100u2w.h
a2091.c
a2091.h
a3000.c
a3000.h
a4000t.c
advansys.c SCSI sg on 20190709 2019-07-11 15:17:41 -07:00
aha152x.c SCSI sg on 20190709 2019-07-11 15:17:41 -07:00
aha152x.h
aha1542.c
aha1542.h
aha1740.c
aha1740.h
am53c974.c
atari_scsi.c scsi: atari_scsi: sun3_scsi: Set sg_tablesize to 1 instead of SG_NONE 2020-01-04 19:18:10 +01:00
atp870u.c
atp870u.h
BusLogic.c
BusLogic.h
bvme6000_scsi.c
ch.c scsi: ch: Make it possible to open a ch device multiple times again 2019-10-09 23:39:35 -04:00
constants.c
dc395x.c
dc395x.h
dmx3191d.c
dpt_i2o.c
dpti.h
esp_scsi.c SCSI sg on 20190709 2019-07-11 15:17:41 -07:00
esp_scsi.h
fdomain.c scsi: fdomain: use BSTAT_{MSG|CMD|IO} in fdomain_work() 2019-07-30 12:17:28 -04:00
fdomain.h
fdomain_isa.c scsi: fdomain_isa: use CFG1_IRQ_MASK 2019-07-30 12:18:24 -04:00
fdomain_pci.c
FlashPoint.c
g_NCR5380.c
gdth.c
gdth.h
gdth_ioctl.h
gdth_proc.c
gdth_proc.h
gvp11.c
gvp11.h
hosts.c SCSI fixes on 20190720 2019-07-20 10:04:58 -07:00
hpsa.c scsi: hpsa: Fix memory leak in hpsa_init_one() 2020-11-18 19:20:22 +01:00
hpsa.h
hpsa_cmd.h SCSI misc on 20190709 2019-07-11 15:14:01 -07:00
hptiop.c
hptiop.h
imm.c SCSI sg on 20190709 2019-07-11 15:17:41 -07:00
imm.h
initio.c
initio.h
ipr.c scsi: ipr: Fix softlockup when rescanning devices in petitboot 2020-04-01 11:01:54 +02:00
ipr.h scsi: ipr: Fix softlockup when rescanning devices in petitboot 2020-04-01 11:01:54 +02:00
ips.c
ips.h
iscsi_boot_sysfs.c scsi: iscsi: Fix reference count leak in iscsi_boot_create_kobj 2020-06-24 17:50:37 +02:00
iscsi_tcp.c scsi: iscsi: Don't destroy session if there are outstanding connections 2020-02-24 08:36:50 +01:00
iscsi_tcp.h
jazz_esp.c
Kconfig scsi: sr: remove references to BLK_DEV_SR_VENDOR, leave it enabled 2020-07-22 09:32:57 +02:00
lasi700.c
libiscsi.c scsi: libiscsi: Fix NOP race condition 2020-12-02 08:49:49 +01:00
libiscsi_tcp.c SCSI misc on 20190709 2019-07-11 15:14:01 -07:00
mac53c94.c
mac53c94.h
mac_esp.c
mac_scsi.c scsi: atari_scsi: sun3_scsi: Set sg_tablesize to 1 instead of SG_NONE 2020-01-04 19:18:10 +01:00
Makefile scsi: remove pointless $(MODVERDIR)/$(obj)/53c700.ver 2019-07-17 22:39:27 +09:00
megaraid.c scsi: megaraid: disable device when probe failed after enabled device 2019-09-23 23:09:42 -04:00
megaraid.h
mesh.c scsi: mesh: Fix panic after host or bus reset 2020-08-19 08:16:15 +02:00
mesh.h
mvme16x_scsi.c
mvme147.c
mvme147.h
mvumi.c scsi: mvumi: Fix error return in mvumi_io_attach() 2020-10-29 09:58:04 +01:00
mvumi.h
myrb.c
myrb.h
myrs.c
myrs.h
ncr53c8xx.c scsi: ncr53c8xx: Mark expected switch fall-through 2019-08-07 21:53:23 -04:00
ncr53c8xx.h
NCR5380.c scsi: NCR5380: Add disconnect_mask module parameter 2020-01-04 19:18:16 +01:00
NCR5380.h Revert "scsi: ncr5380: Increase register polling limit" 2019-06-20 15:37:02 -04:00
nsp32.c
nsp32.h
nsp32_debug.c
nsp32_io.h
pmcraid.c scsi: pmcraid: Fix a typo - pcmraid --> pmcraid 2019-08-12 21:57:13 -04:00
pmcraid.h
ppa.c
ppa.h
ps3rom.c
qla1280.c qla1280: remove SGI SN2 support 2019-08-16 11:33:56 -07:00
qla1280.h qla1280: remove SGI SN2 support 2019-08-16 11:33:56 -07:00
qlogicfas.c
qlogicfas408.c
qlogicfas408.h
qlogicpti.c scsi: qlogicpti: Mark expected switch fall-throughs 2019-08-07 21:32:53 -04:00
qlogicpti.h
raid_class.c
script_asm.pl
scsi.c SCSI misc on 20190709 2019-07-11 15:14:01 -07:00
scsi.h
scsi_common.c
scsi_debug.c scsi: scsi_debug: Add check for sdebug_max_queue during module init 2020-08-19 08:16:11 +02:00
scsi_debugfs.c scsi: scsi_debugfs: Use for_each_set_bit to simplify code 2019-07-30 12:42:55 -04:00
scsi_debugfs.h
scsi_devinfo.c scsi: dh: Add Fujitsu device to devinfo and dh lists 2020-07-29 10:18:27 +02:00
scsi_dh.c scsi: dh: Add Fujitsu device to devinfo and dh lists 2020-07-29 10:18:27 +02:00
scsi_error.c scsi: core: save/restore command resid for error handling 2019-10-03 21:43:04 -04:00
scsi_ioctl.c
scsi_lib.c scsi: core: Fix VPD LUN ID designator priorities 2020-12-30 11:51:08 +01:00
scsi_lib_dma.c
scsi_logging.c scsi: core: Reduce memory required for SCSI logging 2019-08-07 21:47:29 -04:00
scsi_logging.h
scsi_netlink.c
scsi_pm.c scsi: pm: Balance pm_only counter of request queue during system resume 2020-06-07 13:18:50 +02:00
scsi_priv.h
scsi_proc.c drivers: Add generic helper to match any device 2019-07-30 13:07:42 +02:00
scsi_sas_internal.h
scsi_scan.c scsi: core: Don't start concurrent async scan on same host 2020-11-10 12:37:30 +01:00
scsi_sysctl.c
scsi_sysfs.c scsi: core: try to get module before removing device 2019-10-17 21:57:09 -04:00
scsi_trace.c scsi: core: scsi_trace: Use get_unaligned_be*() 2020-01-23 08:22:59 +01:00
scsi_transport_api.h
scsi_transport_fc.c SCSI misc on 20190709 2019-07-11 15:14:01 -07:00
scsi_transport_iscsi.c scsi: iscsi: Do not put host in iscsi_set_flashnode_param() 2020-09-03 11:26:47 +02:00
scsi_transport_sas.c scsi: scsi_transport_sas: Fix memory leak when removing devices 2020-01-23 08:22:58 +01:00
scsi_transport_spi.c scsi: scsi_transport_spi: Set RQF_PM for domain validation commands 2021-01-12 20:16:09 +01:00
scsi_transport_srp.c scsi: scsi_transport_srp: Don't block target in failfast state 2021-02-07 15:35:48 +01:00
scsicam.c
sd.c scsi: sd: Suppress spurious errors when WRITE SAME is being disabled 2021-01-27 11:47:43 +01:00
sd.h scsi: implement REQ_OP_ZONE_RESET_ALL 2019-08-04 21:41:29 -06:00
sd_dif.c
sd_zbc.c scsi: sd_zbc: Fix sd_zbc_complete() 2019-11-05 23:17:53 -05:00
sense_codes.h
ses.c SCSI misc on 20190709 2019-07-11 15:14:01 -07:00
sg.c scsi: sg: add sg_remove_request in sg_write 2020-05-20 08:20:07 +02:00
sgiwd93.c
sim710.c
sni_53c710.c scsi: sni_53c710: fix compilation error 2019-10-09 23:35:42 -04:00
sr.c scsi: sr: Fix sr_probe() missing deallocate of device minor 2020-06-24 17:50:19 +02:00
sr.h
sr_ioctl.c
sr_vendor.c scsi: sr: remove references to BLK_DEV_SR_VENDOR, leave it enabled 2020-07-22 09:32:57 +02:00
st.c SCSI misc on 20190709 2019-07-11 15:14:01 -07:00
st.h
st_options.h
stex.c
storvsc_drv.c scsi: storvsc: Correctly set number of hardware queues for IDE disk 2020-01-23 08:22:38 +01:00
sun3_scsi.c scsi: atari_scsi: sun3_scsi: Set sg_tablesize to 1 instead of SG_NONE 2020-01-04 19:18:10 +01:00
sun3_scsi_vme.c
sun3x_esp.c
sun_esp.c
virtio_scsi.c scsi: virtio_scsi: unplug LUNs when events missed 2019-09-10 22:10:17 -04:00
vmw_pvscsi.c SCSI sg on 20190709 2019-07-11 15:17:41 -07:00
vmw_pvscsi.h
wd33c93.c scsi: wd33c93: Mark expected switch fall-through 2019-08-07 21:35:59 -04:00
wd33c93.h
wd719x.c SCSI misc on 20190709 2019-07-11 15:14:01 -07:00
wd719x.h
xen-scsifront.c
zalon.c
zorro7xx.c
zorro_esp.c scsi: zorro_esp: Limit DMA transfers to 65536 bytes (except on Fastlane) 2020-01-04 19:17:37 +01:00