Currently we set the protection parameters after calling scsi_add_host()
for v3 hw.
They should be set beforehand, so make this change.
Appearantly this fixes our DIX issue (not mainline yet) also, but more
testing required.
Fixes: d6a9000b81 ("scsi: hisi_sas: Add support for DIF feature for v2 hw")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas. Additionally, we have
a pile of annotation, unused variable and minor updates. The big API
change is the updates for Christoph's DMA rework which include
removing the DISABLE_CLUSTERING flag. And finally there are a couple
of target tree updates.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXCEUNiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdjKAP9vrTTv
qFaYmAoRSbPq9ZiixaXLMy0K/6o76Uay0gnBqgD/fgn3jg/KQ6alNaCjmfeV3wAj
u1j3H7tha9j1it+4pUw=
=GDa+
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas.
Additionally, we have a pile of annotation, unused variable and minor
updates.
The big API change is the updates for Christoph's DMA rework which
include removing the DISABLE_CLUSTERING flag.
And finally there are a couple of target tree updates"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (259 commits)
scsi: isci: request: mark expected switch fall-through
scsi: isci: remote_node_context: mark expected switch fall-throughs
scsi: isci: remote_device: Mark expected switch fall-throughs
scsi: isci: phy: Mark expected switch fall-through
scsi: iscsi: Capture iscsi debug messages using tracepoints
scsi: myrb: Mark expected switch fall-throughs
scsi: megaraid: fix out-of-bound array accesses
scsi: mpt3sas: mpt3sas_scsih: Mark expected switch fall-through
scsi: fcoe: remove set but not used variable 'port'
scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
scsi: smartpqi: fix build warnings
scsi: smartpqi: update driver version
scsi: smartpqi: add ofa support
scsi: smartpqi: increase fw status register read timeout
scsi: smartpqi: bump driver version
scsi: smartpqi: add smp_utils support
scsi: smartpqi: correct lun reset issues
scsi: smartpqi: correct volume status
scsi: smartpqi: do not offline disks for transient did no connect conditions
scsi: smartpqi: allow for larger raid maps
...
For v3 hw, we support DIF operation for SAS, but not SATA.
In addition, DIF CRC16 is supported.
This patchset adds the SW support for the described features. The main
components are as follows:
- Get protection mask from module param
- Fill PI fields
- Fill related to DIF in DQ and protection iu memories
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Most SCSI drivers want to enable "clustering", that is merging of
segments so that they might span more than a single page. Remove the
ENABLE_CLUSTERING define, and require drivers to explicitly set
DISABLE_CLUSTERING to disable this feature.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sht->sg_tablesize is set in the driver, and it will be assigned to
shost->sg_tablesize in SCSI mid-layer. So it is not necessary to assign
shost->sg_table one more time in the driver.
In addition to the change, change each scsi_host_template.sg_tablesize
to HISI_SAS_SGE_PAGE_CNT instead of SG_ALL.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Relocate the codes related to dma_map/unmap in hisi_sas_task_prep() to
reduce complexity, with a view to add DIF/DIX support.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patchset fixes some warnings detected by the sparse tool, like these:
drivers/scsi/hisi_sas/hisi_sas_main.c:1469:52: warning: incorrect type in assignment (different base types)
drivers/scsi/hisi_sas/hisi_sas_main.c:1469:52: expected unsigned short [unsigned] [assigned] [usertype] tag_of_task_to_be_managed
drivers/scsi/hisi_sas/hisi_sas_main.c:1469:52: got restricted __le16 [usertype] <noident>
drivers/scsi/hisi_sas/hisi_sas_main.c:1723:52: warning: incorrect type in assignment (different base types)
drivers/scsi/hisi_sas/hisi_sas_main.c:1723:52: expected unsigned short [unsigned] [assigned] [usertype] tag_of_task_to_be_managed
drivers/scsi/hisi_sas/hisi_sas_main.c:1723:52: got restricted __le16 [usertype] <noident>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the time of SAS SSP connection is 1ms, which means the link
connection will fail if no IO response after this period.
For some disks handling large IO (such as 512k), 1ms is not enough, so
change it to 5ms.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In function hisi_sas_task_prep(), we check asd_sas_port, but in function
hisi_sas_task_exec(), we already refer to asd_sas_port by using function
dev_to_hisi_hba() implicitly. So to avoid this possible invalid
dereference, relocate the check to function hisi_sas_task_prep().
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If INT_COAL_EN is enabled, configure time and count of interrupt
coalescing. Then if CQ collects count of CQ entries in time, it will
report the interrupt. Or if CQ doesn't collect enough CQ entries in time,
it will report the interrupt at timeout.
As all the registers are not supported to be changed dynamically, we need
to config those register between disable and enable PHYs.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If CQ_INT_CONVERGE_EN is enabled, the interrupts of all the 16 CQ queues
will be reported by CQ0.
So we need to change the process of CQ tasklet for this situation.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently all the three HBA (v1/v2/v3 HW) share the same host attributes.
To support each HBA having separate attributes in future, create per-HBA
attributes.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver currently uses pci_set_dma_mask despite otherwise using the
generic DMA API. Switch it over to the better generic DMA API.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c: In function 'start_delivery_v1_hw':
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c:907:20: warning:
variable 'dq_list' set but not used [-Wunused-but-set-variable]
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c: In function 'start_delivery_v2_hw':
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c:1671:20: warning:
variable 'dq_list' set but not used [-Wunused-but-set-variable]
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function 'start_delivery_v3_hw':
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:889:20: warning:
variable 'dq_list' set but not used [-Wunused-but-set-variable]
It never used since introduction in commit
fa222db0b0 ("scsi: hisi_sas: Don't lock DQ for complete task sending")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a NULL pointer dereference in case *slot* happens to be NULL at
lines 1053 and 1878:
struct hisi_sas_cq *cq =
&hisi_hba->cq[slot->dlvry_queue];
Notice that *slot* is being NULL checked at lines 1057 and 1881:
if (slot), which implies it may be NULL.
Fix this by placing the declaration and definition of variable cq, which
contains the pointer dereference slot->dlvry_queue, after slot has been
properly NULL checked.
Addresses-Coverity-ID: 1474515 ("Dereference before null check")
Addresses-Coverity-ID: 1474520 ("Dereference before null check")
Fixes: 584f53fe5f ("scsi: hisi_sas: Fix the race between IO completion and timeout for SMP/internal IO")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently a spin_unlock_irqrestore() call is missing on the error path,
so add it.
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update registers as follows:
- Default value of AIP timer is 1ms, and it is easy for some expanders to
cause IO error. Change the value to max value 65ms to avoid IO error for
those expanders.
- A CQ completion will be reported by HW when 4 CQs have occurred or the
aging timer expires, whichever happens first. Sor serial IO scenario, it
will still wait 8us for every IO before it is reported. So in the
situation, the performance is poor. So to improve it, change the limit
time to the least value.
For other scenario, it does little affect to the performance.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently we use the IPTT defined in LLDD to identify IOs. Actually for
IOs which are from the block layer, they have tags to identify them. So
for those IOs, use tag of the block layer directly, and for IOs which is
not from the block layer (such as internal IOs from libsas/LLDD), reserve
96 IPTTs for them.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The interrupts of ent72 and ent74 are not processed by PCIe AER handling,
so we need to unmask the interrupts and process them first in the driver.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an SSP/SMP IO times out, it may be actually in reality be
simultaneously processing completion of the slot in
slot_complete_vx_hw().
Then if the slot is freed in slot_complete_vx_hw() (this IPTT is freed
and it may be re-used by other slot), and we may abort the wrong slot in
hisi_sas_abort_task().
So to solve the issue, free the slot after the check of
SAS_TASK_STATE_ABORTED in slot_complete_vx_hw().
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If SMP/internal IO times out, we will possibly free the task immediately.
However if the IO actually completes at the same time, the IO completion
may refer to task which has been freed.
So to solve the issue, flush the tasklet to finish IO completion before
free'ing slot/task.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In evaluating hisi_hba, the sas_port may be NULL, so for safety relocate
the the check to value possible NULL deference.
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
At directly attached situation, if the user modifies the sysfs interface
of maximum_linkrate and minimum_linkrate to renegotiate the linkrate
between SAS controller and target, the value of both files mentioned
above should have change to user setting after renegotiate is over, but
it remains unchanged.
To fix this bug, maximum_linkrate and minimum_linkrate will be directly
fed back to relevant sas_phy structure.
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Now LLDDs have to implement lldd_port_deformed method otherwise NULL
dereference will happen. Make it optional and remove the dummy implementation
in hisi_sas.
Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: John Garry <john.garry@huawei.com>
CC: Johannes Thumshirn <jthumshirn@suse.de>
CC: Ewan Milne <emilne@redhat.com>
CC: Christoph Hellwig <hch@lst.de>
CC: Tomas Henzl <thenzl@redhat.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: Hannes Reinecke <hare@suse.com>
Acked-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a check ERR bit of status to decide whether there is something wrong
with initial register-D2H FIS. If error exist, PHY link reset the channel
to restart OOB.
Directly call work HISI_PHYE_LINK_RESET replacing disable_phy_vx_hw() and
enable_phy_vx_hw().
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In task start delivery function, we need to add a memory barrier to prevent
re-ordering of reading memory by hardware. Because the slot data is set in
task prepare function and it could be running in another CPU.
This patch adds an memory barrier after s->ready is read in the task start
delivery function, and uses WRITE_ONCE() in the places where s->ready is
set to ensure that the compiler does not re-order.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To decrease the usage of spinlock during delivery IO, relocate some code in
hisi_sas_task_prep().
Also an invalid comment is removed.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch implements handlers of PCIe FLR for v3 hw, reset_prepare() and
reset_done().
User can issue FLR through sysfs interface, as v3 hw support PCIe FLR.
Then if we don't implement these two handlers, our SAS controller will not
work after executing FLR.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Much code of PM suspend function also exists in soft reset function. This
is not concise. So, this patch relocates the common code of these two
functions to a separate function.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch tidies host controller reset function by putting some code to
two new functions, and exports these two functions out, so that they could
be used by FLR feature to be realised.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is an issue that link reset can't recover PHY when STP link timeout.
Because current process of enabling PHY for v3 hw will wait last
transmission done. The time of one transmission depends IO size, disk model
and so on. Normally, it should be shorter than 50ms. But the last
transmission could be never done for some abnormal scenarios, such as STP
link timeout.
This patch is to fix the issue. Check PHY status after starting process of
enabling PHY for 50ms. If the PHY is still active, we disable it forcibly
by PHY reset. Of course, we need to clear the PHY reset bit when enable
PHY.
Besides, the function disable_phy_v3_hw() should not be suitable to call in
interrupts for hilink bug for this 50ms delay. Then, we do link reset for
hilink bug directly. The change is that we don't clear the invalid dword
count register. This is better. Because we should not clear such error
count while not saved.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The ISR of channel interrupt of v3 hw is a little long and messy. This
patch tidies it by relocating CHL_INT1 and CHL_INT2 handling to new
function separately.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For some time now we have not used hisi_sas_slot_abort() to handle erroring
slots, apart from in archaic v1 hw.
As such, remove this function and associated code. For v1 hw, move error
handling to same scheme as other hw revisions, where we allow erroring
commands to timeout.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update CFG_1US_TIMER_TRSH and CON_CFG_DRIVER settings.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The init is missed for hisi_sas_phy spinlock, so add it.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the driver spends much time allocating and freeing the slot DMA
buffer for command delivery/completion. To boost the performance,
pre-allocate the buffers for all IPTT. The downside of this approach is
that we are reallocating all buffer memory upfront, so hog memory which we
may not need.
However, the current method - DMA buffer pool - also caches all buffers and
does not free them until the pool is destroyed, so is not exactly efficient
either.
On top of this, since the slot DMA buffer is slightly bigger than a 4K
page, we need to allocate 2x4K pages per buffer (for 4K page kernel), which
is quite wasteful. For 64K page size this is not such an issue.
So, for the 4K page case, in order to make memory usage more efficient,
pre-allocating larger blocks of DMA memory for the buffers can be more
efficient.
To make DMA memory usage most efficient, we would choose a single
contiguous DMA memory block, but this could use up all the DMA memory in
the system (when CMA enabled and no IOMMU), or we may just not be able to
allocate a DMA buffer large enough when no CMA or IOMMU.
To decide the block size we use the LCM (least common multiple) of the
buffer size and the page size. We roundup(64) to ensure the LCM is not too
large, even though a little memory may be wasted per block.
So, with this, the total memory requirement is about is about 17MB for 4096
max IPTT.
Previously (for 4K pages case), it would be 32MB (for all slots
allocated).
With this change, the relative increase of IOPS for bs=4K read when
PAGE_SIZE=4K and PAGE_SIZE=64K is as follows:
IODEPTH 4K PAGE_SIZE 64K PAGE_SIZE
32 56% 47%
64 53% 44%
128 64% 43%
256 67% 45%
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In host reset, we use TMF or soft-reset to re-init device, and if success,
we will release all LLDD resources of this device. If the init fails -
maybe because the device was removed or link has not come up - then do not
release the LLDD resources, but rather rely on SCSI EH to handle the
timeout for these resources later on.
But if clear nexus ha calls host reset, which is the last effort of SCSI
EH, we should release all LLDD remain resources. Because SCSI EH will
release all tasks after clear nexus ha.
Before release, we do I_T nexus reset to try to clear target remain IOs.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During reset, we don't want PHY events reported to libsas for PHYs which
were previously attached prior to reset.
So check hisi_hba->flags for HISI_SAS_RESET_BIT to filter PHY events during
reset.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After soft_reset() for host reset, we should not be allowed to send
commands to the HW before the PHYs have come up and the port ids have been
refreshed.
Prior to this point, any commands cannot be successfully completed.
This exclusion is achieved by grabbing the host reset semaphore.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a possible conflict when a device is removed and host reset occurs
concurrently.
The reason is that then the device is notified as gone, we try to clear the
ITCT, which is notified via an interrupt. The dev gone function pends on
this event with a completion, which is completed when the ITCT interrupt
occurs.
But host reset will disable all interrupts, the wait_for_completion() may
wait indefinitely.
This patch adds an semaphore to synchronise this two processes. The
semaphore is taken by the host reset as the basis of synchronising.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are many BROADCAST primitives generated by the host. We are only
interested in BROADCAST (CHANGE) primitives currently, so only process
this.
We have applied this processing for v2 hw before, and it is also needed for
v3 hw.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch replaces the usage of dma_alloc_coherent() with the managed
version, dmam_alloc_coherent(), hereby reducing replicated code.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by; John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When issuing a nexus reset for directly attached device, we want to ignore
the PHY down events so libsas will not deform and reform the port.
In the case that the attached SAS changes for the reset, libsas will deform
and form a port.
For scenario that the PHY does not come up after a timeout period, then
report the PHY down to libsas.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It is an step of executing task to get free slot. If the step fails, we
will cleanup LLDD resources and should return failure to upper layer or
internal caller to abort task execution of this time.
But in the current code, the caller of get_free_slot() doesn't return
failure when get_free_slot() failed. This patch is to fix it.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For v2 hw, STP link from target is rejected after host reset because of a
SoC bug. The STP reject will be terminated after we have sent IO from each
PHY of a port.
This is not an problem before, as we don't need to setup STP link from
target immediately after host reset. But now, it is. Because we want to
send soft-reset immediately after host reset.
In order to terminate STP reject quickly, this patch send ATA reset command
through each PHY of a port. Notes: ATA reset command don't need target's
response.
Besides, we do abort dev for each device before terminating STP reject.
This is a quirk of v2 hw.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch adds a force PHY function for internal ATA command for v2 hw.
Because there is an SoC bug in v2 hw, and need send an IO through each PHY
of a port to work around a bug which occurs after a controller reset.
This force PHY function will be used in the later patch.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In future scenarios we will want to use the TMF struct for more task types
than SSP.
As such, we can add struct hisi_sas_tmf_task directly into struct
hisi_sas_slot, and this will mean we can remove the TMF parameters from the
task prep functions.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We may reset the controller in many scenarios, such as SCSI EH and HW
errors. There should be no IO which returns from target when SCSI EH is
active. But for other scenarios, there may be. It is not necessary to make
such IOs fail.
This patch adds an function of trying to wait for any commands, or IO, to
complete before host reset. If no more CQ returned from host controller in
100ms, we assume no more IO can return, and then stop waiting. We wait 5s
at most.
The HW has a register CQE_SEND_CNT to indicate the total number of CQs that
has been reported to driver. We can use this register and it is reliable to
resd this register in such scenarios that require host reset.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After the controller is reset, it is possible that the disks attached still
have outstanding IO to complete.
Thus, when the PHYs come back up after controller reset, it is possible
that these IOs complete at some unknown point later.
We want to ensure that all IOs are complete after the controller reset so
that all associated IPTT and other resources can be recycled safely.
To achieve this, re-init the disks by TMF or softreset (in case of ATA
devices).
If the init fails - maybe because the device was removed or link has not
come up - then do not release the device resources, but rather rely on SCSI
EH to handle the timeout for these resources later on.
This patch also does some cleanup to hisi_sas_init_disk(), including
removing superfluous cases in the switch statement.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a SCSI host is registered, the SCSI mid-layer takes a reference to a
module in Scsi_host.hostt.module. In doing this, we are prevented from
removing the driver module for the host in dangerous scenario, like when a
disk is mounted.
Currently there is only one scsi_host_template (sht) for all HW versions,
and this is the main.c module. So this means that we can possibly remove
the HW module in this dangerous scenario, as SCSI mid-layer is only
referencing the main.c module.
To fix this, create a sht per module, referencing that same module to
create the Scsi host.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>