Commit graph

661979 commits

Author SHA1 Message Date
Hannes Reinecke
8e8c9d01c5 scsi: make eh_eflags persistent
If a failed command is retried and fails again we need
to enter SCSI EH, otherwise we will never be able to
recover the command.
To detect this situation we must not clear scmd->eh_eflags
when EH finishes but rather make it persistent throughout
the lifetime of the command.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:33 -04:00
Christoph Hellwig
909657615d scsi: libsas: allow async aborts
We now first try to call ->eh_abort_handler from a work queue, but libsas
was always failing that for no good reason.  Allow async aborts.

Reviewed-by: Johannes Thumshirn <jth@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:32 -04:00
Hannes Reinecke
1bcb93047e scsi: always send command aborts
When a command has timed out we always should be sending an
abort; with the previous code a failed abort might signal
SCSI EH to start, and all other timed out commands will
never be aborted, even though they might belong to a
different ITL nexus.

Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:32 -04:00
Hannes Reinecke
e8f8d50e07 scsi: sd: Return SUCCESS in sd_eh_action() after device offline
If sd_eh_action() decides to take the device offline there is
no point in returning FAILED, as taking the device offline
is the ultimate step in SCSI EH anyway.
So further escalation via SCSI EH is not likely to make a
difference and we can as well return SUCCESS.

Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:32 -04:00
Hannes Reinecke
7a38dc0bfb scsi: scsi_error: count medium access timeout only once per EH run
The current medium access timeout counter will be increased for
each command, so if there are enough failed commands we'll hit
the medium access timeout for even a single device failure and
the following kernel message is displayed:

sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk!

Fix this by making the timeout per EH run, ie the counter will
only be increased once per device and EH run.

Fixes: 18a4d0a ("[SCSI] Handle disk devices which can not process medium access commands")
Cc: Ewan Milne <emilne@redhat.com>
Cc: Lawrence Obermann <loberman@redhat.com>
Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Cc: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:32 -04:00
Christoph Hellwig
104d9c7f94 scsi: csiostor: switch to pci_alloc_irq_vectors
And get automatic MSI-X affinity for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 12:57:03 -04:00
Mauricio Faria de Oliveira
75106523f3 scsi: ses: don't get power status of SES device slot on probe
The commit 08024885a2 ("ses: Add power_status to SES device slot")
introduced the 'power_status' attribute to enclosure components and
the associated callbacks.

There are 2 callbacks available to get the power status of a device:
1) ses_get_power_status() for 'struct enclosure_component_callbacks'
2) get_component_power_status() for the sysfs device attribute
(these are available for kernel-space and user-space, respectively.)

However, despite both methods being available to get power status
on demand, that commit also introduced a call to get power status
in ses_enclosure_data_process().

This dramatically increased the total probe time for SCSI devices
on larger configurations, because ses_enclosure_data_process() is
called several times during the SCSI devices probe and loops over
the component devices (but that is another problem, another patch).

That results in a tremendous continuous hammering of SCSI Receive
Diagnostics commands to the enclosure-services device, which does
delay the total probe time for the SCSI devices __significantly__:

  Originally, ~34 minutes on a system attached to ~170 disks:

    [ 9214.490703] mpt3sas version 13.100.00.00 loaded
    ...
    [11256.580231] scsi 17:0:177:0: qdepth(16), tagged(1), simple(0),
                   ordered(0), scsi_level(6), cmd_que(1)

  With this patch, it decreased to ~2.5 minutes -- a 13.6x faster

    [ 1002.992533] mpt3sas version 13.100.00.00 loaded
    ...
    [ 1151.978831] scsi 11:0:177:0: qdepth(16), tagged(1), simple(0),
                   ordered(0), scsi_level(6), cmd_que(1)

Back to the commit discussion.. on the ses_get_power_status() call
introduced in ses_enclosure_data_process(): impact of removing it.

That may possibly be in place to initialize the power status value
on device probe.  However, those 2 functions available to retrieve
that value _do_ automatically refresh/update it.  So the potential
benefit would be a direct access of the 'power_status' field which
does not use the callbacks...

But the only reader of 'struct enclosure_component::power_status'
is the get_component_power_status() callback for sysfs attribute,
and it _does_ check for and call the .get_power_status callback,
(which indeed is defined and implemented by that commit), so the
power status value is, again, automatically updated.

So, the remaining potential for a direct/non-callback access to
the power_status attribute would be out-of-tree modules -- well,
for those, if they are for whatever reason interested in values
that are set during device probe and not up-to-date by the time
they need it.. well, that would be curious.

Well, to handle that more properly, set the initial power state
value to '-1' (i.e., uninitialized) instead of '1' (power 'on'),
and check for it in that callback which may do an direct access
to the field value _if_ a callback function is not defined.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Fixes: 08024885a2 ("ses: Add power_status to SES device slot")
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 12:48:05 -04:00
Bart Van Assche
ed12e031b0 scsi: Make checking the scsi_device_get() return value mandatory
Now that all scsi_device_get() callers check the return value of this
function, make checking that return value mandatory.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 12:45:38 -04:00
Bart Van Assche
c02465fa13 scsi: osd_uld: Check scsi_device_get() return value
scsi_device_get() can fail. Hence check its return value.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 12:43:12 -04:00
Johannes Thumshirn
8690218a4c scsi: sas: remove sas_domain_release_transport
sas_domain_release_transport is unused since at least v3.13, remove it.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-04 20:16:38 -04:00
Milan P Gandhi
5a68a1c29f scsi: qla2xxx: Fix typo in driver
Signed-off-by: Milan P Gandhi <mgandhi@redhat.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-04 20:13:57 -04:00
Arnd Bergmann
44a5b97712 scsi: advansys: fix uninitialized data access
gcc-7.0.1 now warns about a previously unnoticed access of uninitialized
struct members:

drivers/scsi/advansys.c: In function 'AscMsgOutSDTR':
drivers/scsi/advansys.c:3860:26: error: '*((void *)&sdtr_buf+5)' may be used uninitialized in this function [-Werror=maybe-uninitialized]
         ((ushort)s_buffer[i + 1] << 8) | s_buffer[i]);
                          ^
drivers/scsi/advansys.c:3860:26: error: '*((void *)&sdtr_buf+7)' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/scsi/advansys.c:3860:26: error: '*((void *)&sdtr_buf+5)' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/scsi/advansys.c:3860:26: error: '*((void *)&sdtr_buf+7)' may be used uninitialized in this function [-Werror=maybe-uninitialized]

The code has existed in this exact form at least since v2.6.12, and the
warning seems correct. This uses named initializers to ensure we
initialize all members of the structure.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-04 19:39:39 -04:00
Christoph Hellwig
831488669a scsi: be2iscsi: switch to pci_alloc_irq_vectors
And get automatic MSI-X affinity for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-30 11:37:16 -04:00
Szymon Mielczarek
dbd34a61c3 Revert "scsi: ufs: add queries retry mechanism"
This reverts commit 61e073590b.

The patch introduced redundant query retries as we already had such
mechanism provided with _retry functions.  Both ufshcd_read_desc and
ufshcd_read_unit_desc_param functions call ufshcd_query_descriptor_retry
wrapper.

Signed-off-by: Szymon Mielczarek <szymonx.mielczarek@intel.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:51:40 -04:00
Don Brace
5765d180fd scsi: hpsa: change driver version
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:50:09 -04:00
Don Brace
7f1974a76d scsi: hpsa: update pci ids
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:49:48 -04:00
Arnd Bergmann
8bb74d3661 scsi: hisi_sas: fix SATA dependency
Removing the 'select SCSI_SAS_LIBSAS' statement in Kconfig resulted in a
link failure in configurations that have hisi_sas built-in but libsas as
a loadable module:

drivers/scsi/built-in.o: In function `hisi_sas_scan_finished':
hisi_sas_main.c:(.text+0x37ce9): undefined reference to `sas_drain_work'
drivers/scsi/built-in.o: In function `hisi_sas_slave_configure':
hisi_sas_main.c:(.text+0x37d17): undefined reference to `sas_slave_configure'
hisi_sas_main.c:(.text+0x37d40): undefined reference to `sas_change_queue_depth'
drivers/scsi/built-in.o: In function `hisi_sas_remove':

All other libsas users have the 'select' statement, so we should do the
same here for consistency. For all I can tell, the patch that added the
sata softreset does not actually introduce a dependency on SCSI_SAS_ATA
but instead adds calls into libata itself, so we can express that with a
more specific dependency.

We cannot have 'select SCSI_SAS_LIBSAS; depends on SCSI_SAS_ATA' as that
would cause a dependency loop.

Fixes: 7c594f0407 ("scsi: hisi_sas: add softreset function for SATA disk")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:44:53 -04:00
Tomohiro Kusumi
d985c6eaa8 scsi: ufs: just use sizeof() for snprintf()
Not much reason to use ARRAY_SIZE() when we know it's for a C string.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Tomohiro Kusumi
0b9961be10 scsi: ufs: remove deprecated enum for hw interrupt
These flags are no longer needed after 2fbd009b in 2013.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Tomohiro Kusumi
f6b254513c scsi: ufs: add missing macros for register bits from UFSHCI spec
Add macros for register bits that can be found in JESD223C (v2.1).

Not all registers are defined in ufshci.h (i.e. some are unused whether
macros are defined or undefined), but all the bits for those registers
that are already defined should appear here.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Tomohiro Kusumi
9c490d2dd3 scsi: ufs: non functional macro fix
Not having () isn't likely to do any harm in this case, but all the
other macros below do have it. Also add "are" in a comment.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Tomohiro Kusumi
4a8eec2bac scsi: ufs: use existing macro CONTROLLER_ENABLE to test register bit
Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Tomohiro Kusumi
c9e6010b4e scsi: ufs: make ufshcd_is_{device_present, hba_active}() return bool
ufshcd driver generally uses bool for is_xxx type things instead of int,
so conform to its style.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Colin Ian King
a28b259b43 scsi: hisi_sas: add missing break in switch statement
It appears that a break in the TRANS_TX_OPEN_CNX_ERR_NO_DESTINATION case
got accidentally removed in an earlier commit, as it stands, the
ts->stat and ts->open_rej_reason are being updated twice for this case
which looks incorrect.  Fix this by adding in the missing break
statement.

Detected by CoverityScan, CID#1422110 ("Missing break in switch")

Fixes: 634a9585f4 ("scsi: hisi_sas: process error codes according to their priority")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:38:53 -04:00
Jack Wang
ce8c4a1ec9 MAINTAINERS: remove pmchba list for PM8001
The email address is undeliverable for some time now, so just remove it.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:05:07 -04:00
Jitendra Bhivare
cbe9fc8594 scsi: be2iscsi: Update driver version
Version 11.4.0.0

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:42 -04:00
Jitendra Bhivare
942b76542e scsi: be2iscsi: Update Copyright
Update Broadcom Copyright markings in all files.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:41 -04:00
Jitendra Bhivare
0ddee50e3f scsi: be2iscsi: Check size before copying ASYNC handle
Data in buffers are gathered into a single buffer before giving to iSCSI
layer. Though less likely to have payload more than 8K in ASYNC PDU, the
data length is provide by FW and check is missing for overrun.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:41 -04:00
Jitendra Bhivare
ba6983a745 scsi: be2iscsi: Remove free_list for ASYNC handles
With previous patch adding ASYNC Rx buffers to free_list is not
required.  Remove all free_list related operations.

Add in_use to track if buffer posted is being processed by driver and
purge all buffers received for connection if found so.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:40 -04:00
Jitendra Bhivare
1e2931f134 scsi: be2iscsi: Use num_cons field in Rx CQE
FW runs out of buffer if buffers are not posted back soon.  ASYNC Rx CQE
indicates that FW has consumed 8 RQEs.  Use it to post back buffers
instead of waiting for buffers to be processed and freed by driver.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:39 -04:00
Jitendra Bhivare
fecc382469 scsi: be2iscsi: Increase HDQ default queue size
Currently, ASYNC PDU default queue size is set to max connections.  This
leaves only one buffer per connection for any ASYNC PDUs from targets.

Double the size of the default queue.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:39 -04:00
Jitendra Bhivare
90e96313a9 scsi: scsi_transport_iscsi: Use flush_work in iscsi_remove_session
scsi_flush_work flushes workqueue for the Scsi_Host.  In iSCSI offload
enabled host, this would wait for all other sessions under the host.

Use flush_work for the session being removed instead.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:33 -04:00
Jitendra Bhivare
d1e1d63b32 scsi: be2iscsi: Replace spin_unlock_bh with spin_lock
spin_unlock_bh back_lock is used in beiscsi_eh_device_reset instead of
spin_lock.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:04 -04:00
Jitendra Bhivare
49fc5152f5 scsi: be2iscsi: Fix closing of connection
CID needs to be freed even when invalidate or upload connection fails.
Attempt to close connection 3 times before freeing CID.

Set cleanup_type to INVALIDATE instead of force TCP_RST.  This
unnecessarily is terminating connection with reset instead of gracefully
closing it.

Set save_cfg to 0 - session not to be saved on flash.

Add delay and process CQ before uploading connection.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:04 -04:00
Jitendra Bhivare
eb419229be scsi: be2iscsi: Check tag in beiscsi_mccq_compl_wait
scsi host12: BS_1377 : mgmt_invalidate_connection Failed for cid=256
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff81332ebf>] __list_add+0xf/0xc0
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
...
CPU: 9 PID: 1542 Comm: iscsid Tainted: G               ------------ T 3.10.0-514.el7.x86_64 #1
Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/12/2016
task: ffff88076f310fb0 ti: ffff88076bba8000 task.ti: ffff88076bba8000
RIP: 0010:[<ffffffff81332ebf>]  [<ffffffff81332ebf>] __list_add+0xf/0xc0
RSP: 0018:ffff88076bbab8e8  EFLAGS: 00010046
RAX: 0000000000000246 RBX: ffff88076bbab990 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880468badf58 RDI: ffff88076bbab990
RBP: ffff88076bbab900 R08: 0000000000000246 R09: 00000000000020de
R10: 0000000000000000 R11: ffff88076bbab5be R12: 0000000000000000
R13: ffff880468badf58 R14: 000000000001adb0 R15: ffff88076f310fb0
FS:  00007f377124a880(0000) GS:ffff88046fa40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000771318000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88076bbab990 ffff880468badf50 0000000000000001 ffff88076bbab938
ffffffff810b128b 0000000000000246 00000000cf9b7040 ffff880468bac7a0
0000000000000000 ffff880468bac7a0 ffff88076bbab9d0 ffffffffa05a6ea3

Call Trace:
[<ffffffff810b128b>] prepare_to_wait+0x7b/0x90
[<ffffffffa05a6ea3>] beiscsi_mccq_compl_wait+0x153/0x330 [be2iscsi]
[<ffffffff810b1600>] ? wake_up_atomic_t+0x30/0x30
[<ffffffffa05981b1>] beiscsi_ep_disconnect+0x91/0x2d0 [be2iscsi]
[<ffffffffa0202ffa>] iscsi_if_ep_disconnect.isra.14+0x5a/0x70 [scsi_transport_iscsi]
[<ffffffffa02042fb>] iscsi_if_recv_msg+0x113b/0x14a0 [scsi_transport_iscsi]
[<ffffffff811dffd8>] ? __kmalloc_node_track_caller+0x58/0x290
[<ffffffffa02046ee>] iscsi_if_rx+0x8e/0x1f0 [scsi_transport_iscsi]
[<ffffffff815a351d>] netlink_unicast+0xed/0x1b0
[<ffffffff815a38fe>] netlink_sendmsg+0x31e/0x690
[<ffffffff815a03e4>] ? netlink_rcv_wake+0x44/0x60
[<ffffffff815a19e3>] ? netlink_recvmsg+0x1e3/0x450

beiscsi_mccq_compl_wait gets called even when MCC tag allocation failed
for mgmt_invalidate_connection.  mcc_wait is not initialized for tag 0
so causes crash in prepare_to_wait.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:04 -04:00
Tomohiro Kusumi
031d1e0f2d scsi: ufs: fix wrong/ambiguous fall through comments
These aren't really falling through to anywhere meaningful.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 21:56:03 -04:00
Dan Carpenter
03b1a06203 scsi: osd_uld: remove an unneeded NULL check
We don't call the remove() function unless probe() succeeds so "oud"
can't be NULL here.  Plus, if it were NULL, we dereference it on the
next line so it would crash anyway.

[mkp: applied by hand]

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 21:55:14 -04:00
Brian King
16a20b52d1 scsi: ipr: Driver version 2.6.4
Bump driver version

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King
ef97d8ae12 scsi: ipr: Fix SATA EH hang
This patch fixes a hang that can occur in ATA EH with ipr. With ipr's
usage of libata, commands should never end up on ap->eh_done_q. The
timeout function we use for ipr, even for SATA devices, is
scsi_times_out, so ATA_QCFLAG_EH_SCHEDULED never gets set for ipr and EH
is driven completely by ipr and SCSI. The SCSI EH thread ends up calling
ipr's eh_device_reset_handler, which then calls
ata_std_error_handler. This ends up calling ipr_sata_reset, which issues
a reset to the device. This should result in all pending commands
getting failed back and having ata_qc_complete called for them, which
should end up clearing ATA_QCFLAG_FAILED as qc->flags gets zeroed in
ata_qc_free.  This ensures that when we end up in ata_eh_finish, we
don't do anything more with the command.

On adapters that only support a single interrupt and when running with
two MSI-X vectors or less, the adapter firmware guarantees that
responses to all outstanding commands are sent back prior to sending the
response to the SATA reset command.  On newer adapters supporting
multiple HRRQs, however, this can no longer be guaranteed, since the
command responses and reset response may be processed on different
HRRQs.

If ipr returns from ipr_sata_reset before the outstanding command was
returned, this sends us down the path of __ata_eh_qc_complete which then
moves the associated scsi_cmd from the work_q in
scsi_eh_bus_device_reset to ap->eh_done_q, which then will sit there
forever and we will be wedged.

This patch fixes this up by ensuring that any outstanding commands are
flushed before returning from eh_device_reset_handler for a SATA device.

Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King
f646f325a8 scsi: ipr: Error path locking fixes
This patch closes up some potential race conditions observed in the
error handling paths in ipr while debugging an issue resulting in a hang
with SATA error handling. These patches ensure we are holding the
correct lock when adding and removing commands from the free and pending
queues in some error scenarios.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King
439ae285b9 scsi: ipr: Fix abort path race condition
This fixes a race condition in the error handlomg paths of ipr. While a
command is outstanding to the adapter, it is placed on a pending queue
for the hrrq it is associated with, while holding the HRRQ lock. When a
command is completed, it is removed from the pending queue, under HRRQ
lock, and placed on a local list.  This list is then iterated through
without any locks and each command's done function is invoked, inside of
which, the command gets returned to the free list while grabbing the
HRRQ lock. This fixes two race conditions when commands have been
removed from the pending list but have not yet been added to the free
list. Both of these changes fix race conditions that could result in
returning success from eh_abort_handler and then later calling scsi_done
for the same request.

The first race condition is in ipr_cancel_op. It looks through each
pending queue to see if the command to be aborted is still outstanding
or not. Rather than looking on the pending queue, reverse the logic to
check to look for commands that are NOT on the free queue.  The second
race condition can occur when in ipr_wait_for_ops where we are waiting
for responses for commands we've aborted.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King
960e96480e scsi: ipr: Remove redundant initialization
Removes some code in __ipr_eh_dev_reset which was modifying the ipr_cmd
done function. This should have already been setup at command allocation
time and if its since been changed, it means we are in the ipr_erp*
functions and need to wait for them to complete and don't want to
override that here.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King
66a0d59cdd scsi: ipr: Fix missed EH wakeup
Following a command abort or device reset, ipr's EH handlers wait for
the commands getting aborted to get sent back from the adapter prior to
returning from the EH handler. This fixes up some cases where the
completion handler was not getting called, which would have resulted in
the EH thread waiting until it timed out, greatly extending EH time.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Xiaofei Tan
4935933e07 scsi: hisi_sas: add is_sata_phy_v2_hw()
Add helper function is_sata_phy_v2_hw() to judge whether the attached
device is SATA disk for a root PHY.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen
6073b7719a scsi: hisi_sas: use dev_is_sata to identify SATA or SAS disk
When SMP IO is sent, sas_protocol_ata couldn't judge whether the disk is
SATA or SAS disk.  So use dev_is_sata to identify SATA or SAS disk.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
14d3f397f6 scsi: hisi_sas: check hisi_sas_lu_reset() error message
Unless we actually get some sort of failure in hisi_sas_lu_reset(),
don't print a message.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen
ccbfe5a05a scsi: hisi_sas: release SMP slot in lldd_abort_task
When an SMP task timeouts, it will call lldd_abort_task to release the
associated slot, and then will release the sas_task.

Currently in lldd_abort_task, if we fail to internally abort IO, then
the slot of SMP IO is not released, but sas_task will still be later
released, so the slot's sas_task is NULL, which will cause NULL pointer
when hisi_sas_slot_task_free happens later.

To resolve, check the return value of internal abort, and release the
slot if it failed.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
8b05ad6a9d scsi: hisi_sas: add hisi_sas_clear_nexus_ha()
Add function for upper-layer to reset controller when all else fails.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
4df642db5b scsi: hisi_sas: rename hisi_sas_link_timeout_{enable, disable}_link
For consistency, remove the "hisi_sas_" prefix.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiaofei Tan
981843c6c4 scsi: hisi_sas: handle PHY UP+DOWN simultaneous irq
Handle the situation that PHY UP and DOWN irq happen simultaneously.
There is no mechanism of SoC HW to ensure this situation will never
happen. So, we add this handle just in case.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00