linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-30 08:02:30 +00:00

Author	SHA1	Message	Date
James Smart	c26c265b16	scsi: lpfc: Fix sg_seg_cnt for HBAs that don't support NVME On an SLI-3 adapter which does not support NVMe, but with the driver global attribute to enable nvme on any adapter if it does support NVMe (e.g. module parameter lpfc_enable_fc4_type=3), the SGL and total SGE values are being munged by the protocol enablement when it shouldn't be. Correct by changing the location of where the NVME sgl information is being applied, which will avoid any SLI-3-based adapter. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-08-19 22:41:10 -04:00
James Smart	a643c6de14	scsi: lpfc: Fix propagation of devloss_tmo setting to nvme transport If admin changes the devloss_tmo on an rport via the fc_remote_port rport dev_loss_tmo attribute, the value is on set on scsi stack. The change is not propagated to NVMe. The set routine in the lldd lacks the call to nvme_fc_set_remoteport_devloss() to set the value. Fix by adding the call to the lldd set routine. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-08-19 22:41:10 -04:00
James Smart	07f50997d6	scsi: lpfc: Fix null ptr oops updating lpfc_devloss_tmo via sysfs attribute If an admin updates lpfc's devloss_tmo sysfs attribute, the kernel will oops. Coding of a loop allowed a new value (rport) to be set/checked for null followed by an older value (remoteport) checked for null to allow progress where the new value, even though null, will be referenced. Rework the logic to validate and prevent any reference to the null ptr. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-08-19 22:41:09 -04:00
Linus Torvalds	ba6d10ab80	SCSI misc on 20190709 This is mostly update of the usual drivers: qla2xxx, hpsa, lpfc, ufs, mpt3sas, ibmvscsi, megaraid_sas, bnx2fc and hisi_sas as well as the removal of the osst driver (I heard from Willem privately that he would like the driver removed because all his test hardware has failed). Plus number of minor changes, spelling fixes and other trivia. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXSTl4yYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdcxAQDCJVbd fPUX76/V1ldupunF97+3DTharxxbst+VnkOnCwD8D4c0KFFFOI9+F36cnMGCPegE fjy17dQLvsJ4GsidHy8= =aS5B -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is mostly update of the usual drivers: qla2xxx, hpsa, lpfc, ufs, mpt3sas, ibmvscsi, megaraid_sas, bnx2fc and hisi_sas as well as the removal of the osst driver (I heard from Willem privately that he would like the driver removed because all his test hardware has failed). Plus number of minor changes, spelling fixes and other trivia. The big merge conflict this time around is the SPDX licence tags. Following discussion on linux-next, we believe our version to be more accurate than the one in the tree, so the resolution is to take our version for all the SPDX conflicts" Note on the SPDX license tag conversion conflicts: the SCSI tree had done its own SPDX conversion, which in some cases conflicted with the treewide ones done by Thomas & co. In almost all cases, the conflicts were purely syntactic: the SCSI tree used the old-style SPDX tags ("GPL-2.0" and "GPL-2.0+") while the treewide conversion had used the new-style ones ("GPL-2.0-only" and "GPL-2.0-or-later"). In these cases I picked the new-style one. In a few cases, the SPDX conversion was actually different, though. As explained by James above, and in more detail in a pre-pull-request thread: "The other problem is actually substantive: In the libsas code Luben Tuikov originally specified gpl 2.0 only by dint of stating: * This file is licensed under GPLv2. In all the libsas files, but then muddied the water by quoting GPLv2 verbatim (which includes the or later than language). So for these files Christoph did the conversion to v2 only SPDX tags and Thomas converted to v2 or later tags" So in those cases, where the spdx tag substantially mattered, I took the SCSI tree conversion of it, but then also took the opportunity to turn the old-style "GPL-2.0" into a new-style "GPL-2.0-only" tag. Similarly, when there were whitespace differences or other differences to the comments around the copyright notices, I took the version from the SCSI tree as being the more specific conversion. Finally, in the spdx conversions that had no conflicts (because the treewide ones hadn't been done for those files), I just took the SCSI tree version as-is, even if it was old-style. The old-style conversions are perfectly valid, even if the "-only" and "-or-later" versions are perhaps more descriptive. * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (185 commits) scsi: qla2xxx: move IO flush to the front of NVME rport unregistration scsi: qla2xxx: Fix NVME cmd and LS cmd timeout race condition scsi: qla2xxx: on session delete, return nvme cmd scsi: qla2xxx: Fix kernel crash after disconnecting NVMe devices scsi: megaraid_sas: Update driver version to 07.710.06.00-rc1 scsi: megaraid_sas: Introduce various Aero performance modes scsi: megaraid_sas: Use high IOPS queues based on IO workload scsi: megaraid_sas: Set affinity for high IOPS reply queues scsi: megaraid_sas: Enable coalescing for high IOPS queues scsi: megaraid_sas: Add support for High IOPS queues scsi: megaraid_sas: Add support for MPI toolbox commands scsi: megaraid_sas: Offload Aero RAID5/6 division calculations to driver scsi: megaraid_sas: RAID1 PCI bandwidth limit algorithm is applicable for only Ventura scsi: megaraid_sas: megaraid_sas: Add check for count returned by HOST_DEVICE_LIST DCMD scsi: megaraid_sas: Handle sequence JBOD map failure at driver level scsi: megaraid_sas: Don't send FPIO to RL Bypass queue scsi: megaraid_sas: In probe context, retry IOC INIT once if firmware is in fault scsi: megaraid_sas: Release Mutex lock before OCR in case of DCMD timeout scsi: megaraid_sas: Call disable_irq from process IRQ poll scsi: megaraid_sas: Remove few debug counters from IO path ...	2019-07-11 15:14:01 -07:00
James Smart	41b194b843	lpfc: add sysfs interface to post NVME RSCN To support scenarios which aren't bound to nvmetcli add port scenarios, which is currently where the nvmet_fc transport invokes the discovery event callbacks, a syfs attribute is added to lpfc which can be written to cause an RSCN to be generated for the nport. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
James Smart	657add4e5e	scsi: lpfc: Fix poor use of hardware queues if fewer irq vectors While fixing the resources per socket, realized the driver was not using hardware queues (up to 1 per cpu) if there were fewer interrupt vectors. The driver was only using the hardware queue assigned to the cpu with the vector. Rework the affinity map check to use the additional hardware queue elements that had been allocated. If the cpu count exceeds the hardware queue count - share, but choose what is shared with by: hyperthread peer, core peer, socket peer, or finally similar cpu in a different socket. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-06-18 19:46:22 -04:00
James Smart	d9954a2d18	scsi: lpfc: Fix oops when driver is loaded with 1 interrupt vector The driver was coded expecting enough hardware queues and interrupt vectors such that at least there was one per socket. In the case where there were fewer than sockets, cpus were left unassigned thus null pointers. Rework the affinity mappings. Map settings for the cpu's that are in the irq cpu mask. For each cpu not in the mask, map to another cpu that does have a mask. Choice of the "other" cpu will attempt to map to the same cpu but differing hyperthread, or cpu within in same core, or cpu within same socket, or finally cpu in the base socket. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-06-18 19:46:22 -04:00
James Smart	f6978f4163	scsi: lpfc: Revert message logging on unsupported topology Turns out the message change in 12.2.0.1 for unsupported topology makes the linux driver out of sync with other products. Revert the message back to the prior content for product consistency. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-06-18 19:46:21 -04:00
James Smart	79080d349f	scsi: lpfc: correct rcu unlock issue in lpfc_nvme_info_show Many of the exit cases were not releasing the rcu read lock. Corrected the exit paths. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-05-13 20:32:50 -04:00
James Smart	e2a8be5696	scsi: lpfc: resolve lockdep warnings There were a number of erroneous comments and incorrect older lockdep checks that were causing a number of warnings. Resolve the following: - Inconsistent lock state warnings in lpfc_nvme_info_show(). - Fixed comments and code on sequences where ring lock is now held instead of hbalock. - Reworked calling sequences around lpfc_sli_iocbq_lookup(). Rather than locking prior to the routine and have routine guess on what lock, take the lock within the routine. The lockdep check becomes unnecessary. - Fixed comments and removed erroneous hbalock checks. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> CC: Bart Van Assche <bvanassche@acm.org> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-05-13 20:32:49 -04:00
Bart Van Assche	a73cb81492	scsi: lpfc: Move trunk_errmsg[] from a header file into a .c file Arrays should be defined in .c files instead of in a header file. This patch reduces the size of the lpfc kernel module. Cc: James Smart <james.smart@broadcom.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-04-03 23:11:36 -04:00
Bart Van Assche	3999df75bc	scsi: lpfc: Declare local functions static This patch avoids that the compiler complains about missing declarations when building with W=1. Cc: James Smart <james.smart@broadcom.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-04-03 23:11:35 -04:00
Silvio Cesare	e7f7b6f38a	scsi: lpfc: change snprintf to scnprintf for possible overflow Change snprintf to scnprintf. There are generally two cases where using snprintf causes problems. 1) Uses of size += snprintf(buf, SIZE - size, fmt, ...) In this case, if snprintf would have written more characters than what the buffer size (SIZE) is, then size will end up larger than SIZE. In later uses of snprintf, SIZE - size will result in a negative number, leading to problems. Note that size might already be too large by using size = snprintf before the code reaches a case of size += snprintf. 2) If size is ultimately used as a length parameter for a copy back to user space, then it will potentially allow for a buffer overflow and information disclosure when size is greater than SIZE. When the size is used to index the buffer directly, we can have memory corruption. This also means when size = snprintf... is used, it may also cause problems since size may become large. Copying to userspace is mitigated by the HARDENED_USERCOPY kernel configuration. The solution to these issues is to use scnprintf which returns the number of characters actually written to the buffer, so the size variable will never exceed SIZE. Signed-off-by: Silvio Cesare <silvio.cesare@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: James Smart <james.smart@broadcom.com> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-03-25 22:14:16 -04:00
James Smart	e4771ec3c8	scsi: lpfc: Fix protocol support on G6 and G7 adapters Invalid test is allowing Loop to be a supported topology on G6 and G7 adapters. The chips do not support loop as their link speeds prohibit loop per standard. Correct the conditional so that loop is not reported. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-03-19 13:15:09 -04:00
James Smart	4645f7b56a	scsi: lpfc: Coordinate adapter error handling with offline handling The driver periodically checks for adapter error in a background thread. If the thread detects an error, the adapter will be reset including the deletion and reallocation of workqueues on the adapter. Simultaneously, there may be a user-space request to offline the adapter which may try to do many of the same steps, in parallel, on a different thread. As memory was deallocated while unexpected, the parallel offline request hit a bad pointer. Add coordination between the two threads. The error recovery thread has precedence. So, when an error is detected, a flag is set on the adapter to indicate the error thread is terminating the adapter. But, before doing that work, it will look for a flag that is set by the offline flow, and if set, will wait for it to complete before then processing the error handling path. Similarly, in the offline thread, it first checks for whether the error thread is resetting the adapter, and if so, will then wait for the error thread to finish. Only after it has finished, will it set its flag and offline the adapter. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-03-19 12:57:02 -04:00
James Smart	982ab128dc	scsi: lpfc: Fix lpfc_nvmet_mrq attribute handling when 0 Currently, when lpfc_nvmet_mrq is 0 it could mean 2 different things depending on when its looked at. If at module load time it specifies the default number of hardware queues to allocate, with 0 meaning default to the number of CPUs. But post module load, a value of zero means to disable mrq use. Changed the driver so that enablement of mrq is based on whether nvme target mode is enabled or not. When enabled, mrq is enabled. Thus, the cfg_nvemt_mrq field only specifies the number of mrq queues to enable, with 0 defaulting to the number of cpus. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-03-19 12:57:02 -04:00
James Smart	0d041215f0	scsi: lpfc: Update 12.2.0.0 file copyrights to 2019 For files modified as part of 12.2.0.0 patches, update copyright to 2019 Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:50 -05:00
James Smart	f6e8479052	scsi: lpfc: Fix default driver parameter collision for allowing NPIV support The conversion to enable SCSI and NVME fc4 support ran into an issue with NPIV support. With NVME, NPIV is not currently supported, but with SCSI it was. The driver reverted to its lowest setting meaning NPIV with SCSI was not allowed. Convert the NPIV checks and implementation so that SCSI can continue to allow NPIV support. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:50 -05:00
James Smart	b1684a0b42	scsi: lpfc: Enable SCSI and NVME fc4s by default Now that performance mods don't split resources by protocol and enable both protocols by default, there's no reason not to enable concurrent SCSI and NVME fc4 support. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:50 -05:00
James Smart	222e9239c6	scsi: lpfc: Resize cpu maps structures based on possible cpus The work done to date utilized the number of present cpus when sizing per-cpu structures. Structures should have been sized based on the max possible cpu count. Convert the driver over to possible cpu count for sizing allocation. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:50 -05:00
James Smart	32517fc097	scsi: lpfc: Rework EQ/CQ processing to address interrupt coalescing When driving high iop counts, auto_imax coalescing kicks in and drives the performance to extremely small iops levels. There are two issues: 1) auto_imax is enabled by default. The auto algorithm, when iops gets high, divides the iops by the hdwq count and uses that value to calculate EQ_Delay. The EQ_Delay is set uniformly on all EQs whether they have load or not. The EQ_delay is only manipulated every 5s (a long time). Thus there were large 5s swings of no interrupt delay followed by large/maximum delay, before repeating. 2) When processing a CQ, the driver got mixed up on the rate of when to ring the doorbell to keep the chip appraised of the eqe or cqe consumption as well as how how long to sit in the thread and process queue entries. Currently, the driver capped its work at 64 entries (very small) and exited/rearmed the CQ. Thus, on heavy loads, additional overheads were taken to exit and re-enter the interrupt handler. Worse, if in the large/maximum coalescing windows,k it could be a while before getting back to servicing. The issues are corrected by the following: - A change in defaults. Auto_imax is turned OFF and fcp_imax is set to 0. Thus all interrupts are immediate. - Cleanup of field names and their meanings. Existing names were non-intuitive or used for duplicate things. - Added max_proc_limit field, to control the length of time the handlers would service completions. - Reworked EQ handling: Added common routine that walks eq, applying notify interval and max processing limits. Use queue_claimed to claim ownership of the queue while processing. Always rearm the queue whenever the common routine is called. Rework queue element processing, namely to eliminate hba_index vs host_index. Only one index is necessary. The queue entry can be marked invalid and the host_index updated immediately after eqe processing. After rework, xx_release routines are now DB write functions. Renamed the routines as such. Moved lpfc_sli4_eq_flush(), which does similar action, to same area. Replaced the 2 individual loops that walk an eq with a call to the common routine. Slightly revised lpfc_sli4_hba_handle_eqe() calling syntax. Added per-cpu counters to detect interrupt rates and scale interrupt coalescing values. - Reworked CQ handling: Added common routine that walks cq, applying notify interval and max processing limits. Use queue_claimed to claim ownership of the queue while processing. Always rearm the queue whenever the common routine is called. Rework queue element processing, namely to eliminate hba_index vs host_index. Only one index is necessary. The queue entry can be marked invalid and the host_index updated immediately after cqe processing. After rework, xx_release routines are now DB write functions. Renamed the routines as such. Replaced the 3 individual loops that walk a cq with a call to the common routine. Redefined lpfc_sli4_sp_handle_mcqe() to commong handler definition with queue reference. Add increment for mbox completion to handler. - Added a new module/sysfs attribute: lpfc_cq_max_proc_limit To allow dynamic changing of the CQ max_proc_limit value being used. Although this leaves an EQ as an immediate interrupt, that interrupt will only occur if a CQ bound to it is in an armed state and has cqe's to process. By staying in the cq processing routine longer, high loads will avoid generating more interrupts as they will only rearm as the processing thread exits. The immediately interrupt is also beneficial to idle or lower-processing CQ's as they get serviced immediately without being penalized by sharing an EQ with a more loaded CQ. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:49 -05:00
James Smart	cb733e3587	scsi: lpfc: cleanup: convert eq_delay to usdelay Review of the eq coalescing logic showed the code was a bit fragmented. Sometimes it would save/set via an interrupt max value, while in others it would do so via a usdelay. There were also two places changing eq delay, one place that issued mailbox commands, and another that changed via register writes if supported. Clean this up by: - Standardizing the operation of lpfc_modify_hba_eq_delay() routine so that it is always told of a us delay to impose. The routine then chooses the best way to set that - via register or via mbx. - Rather than two value types stored in eq->q_mode (usdelay if change via register, imax if change via mbox) - q_mode always contains usdelay. Before any value change, old vs new value is compared and only if different is a change done. - Revised the dmult calculation. dmult is not set based on overall imax divided by hardware queues - instead imax applies to a single cpu and the value will be replicated to all cpus. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:49 -05:00
James Smart	6a828b0f61	scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues So far MSIX vector allocation assumed it would be 1:1 with hardware queues. However, there are several reasons why fewer MSIX vectors may be allocated than hardware queues such as the platform being out of vectors or adapter limits being less than cpu count. This patch reworks the MSIX/EQ relationships with the per-cpu hardware queues so they can function independently. MSIX vectors will be equitably split been cpu sockets/cores and then the per-cpu hardware queues will be mapped to the vectors most efficient for them. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:49 -05:00
James Smart	b3295c2a75	scsi: lpfc: Fix setting affinity hints to correlate with hardware queues The desired affinity for the hardware queue behavior is for hdwq 0 to be affinitized with cpu 0, hdwq 1 to cpu 1, and so on. The implementation so far does not do this if the number of cpus is greater than the number of hardware queues (e.g. hardware queue allocation was administratively reduced or hardware queue resources could not scale to the cpu count). Correct the queue affinitization logic when queue count is less than cpu count. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:09 -05:00
James Smart	45aa312e21	scsi: lpfc: Allow override of hardware queue selection policies Default behavior is to use the information from the upper IO stacks to select the hardware queue to use for IO submission. Which typically has good cpu affinity. However, the driver, when used on some variants of the upstream kernel, has found queuing information to be suboptimal for FCP or IO completion locked on particular cpus. For command submission situations, the lpfc_fcp_io_sched module parameter can be set to specify a hardware queue selection policy that overrides the os stack information. For IO completion situations, rather than queing cq processing based on the cpu servicing the interrupting event, schedule the cq processing on the cpu associated with the hardware queue's cq. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:09 -05:00
James Smart	c490850a09	scsi: lpfc: Adapt partitioned XRI lists to efficient sharing The XRI get/put lists were partitioned per hardware queue. However, the adapter rarely had sufficient resources to give a large number of resources per queue. As such, it became common for a cpu to encounter a lack of XRI resource and request the upper io stack to retry after returning a BUSY condition. This occurred even though other cpus were idle and not using their resources. Create as efficient a scheme as possible to move resources to the cpus that need them. Each cpu maintains a small private pool which it allocates from for io. There is a watermark that the cpu attempts to keep in the private pool. The private pool, when empty, pulls from a global pool from the cpu. When the cpu's global pool is empty it will pull from other cpu's global pool. As there many cpu global pools (1 per cpu or hardware queue count) and as each cpu selects what cpu to pull from at different rates and at different times, it creates a radomizing effect that minimizes the number of cpu's that will contend with each other when the steal XRI's from another cpu's global pool. On io completion, a cpu will push the XRI back on to its private pool. A watermark level is maintained for the private pool such that when it is exceeded it will move XRI's to the CPU global pool so that other cpu's may allocate them. On NVME, as heartbeat commands are critical to get placed on the wire, a single expedite pool is maintained. When a heartbeat is to be sent, it will allocate an XRI from the expedite pool rather than the normal cpu private/global pools. On any io completion, if a reduction in the expedite pools is seen, it will be replenished before the XRI is placed on the cpu private pool. Statistics are added to aid understanding the XRI levels on each cpu and their behaviors. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:09 -05:00
James Smart	4c47efc140	scsi: lpfc: Move SCSI and NVME Stats to hardware queue structures Many io statistics were being sampled and saved using adapter-based data structures. This was creating a lot of contention and cache thrashing in the I/O path. Move the statistics to the hardware queue data structures. Given the per-queue data structures, use of atomic types is lessened. Add new sysfs and debugfs stat routines to collate the per hardware queue values and report at an adapter level. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:08 -05:00
James Smart	5e5b511d8b	scsi: lpfc: Partition XRI buffer list across Hardware Queues Once the IO buff allocations were made shared, there was a single XRI buffer list shared by all hardware queues. A single list isn't great for performance when shared across the per-cpu hardware queues. Create a separate XRI IO buffer get/put list for each Hardware Queue. As SGLs and associated IO buffers get allocated/posted to the firmware; round robin their assignment across all available hardware Queues so that there is an equitable assignment. Modify SCSI and NVME IO submit code paths to use the Hardware Queue logic for XRI allocation. Add a debugfs interface to display hardware queue statistics Added new empty_io_bufs counter to track if a cpu runs out of XRIs. Replace common_ variables/names with io_ to make meanings clearer. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:24:22 -05:00
James Smart	cdb42becdd	scsi: lpfc: Replace io_channels for nvme and fcp with general hdw_queues per cpu Currently, both nvme and fcp each have their own concept of an io_channel, which is a combination wq/cq and associated msix. Different cpus would share an io_channel. The driver is now moving to per-cpu wq/cq pairs and msix vectors. The driver will still use separate wq/cq pairs per protocol on each cpu, but the protocols will share the msix vector. Given the elimination of the nvme and fcp io channels, the module parameters will be removed. A new parameter, lpfc_hdw_queue is added which allows the wq/cq pair allocation per cpu to be overridden and allocated to lesser value. If lpfc_hdw_queue is zero, the number of pairs allocated will be based on the number of cpus. If non-zero, the parameter specifies the number of queues to allocate. At this time, the maximum non-zero value is 64. To manage this new paradigm, a new hardware queue structure is created to track queue activity and relationships. As MSIX vector allocation must be known before setting up the relationships, msix allocation now occurs before queue datastructures are allocated. If the number of vectors allocated is less than the desired hardware queues, the hardware queue counts will be reduced to the number of vectors Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:22:42 -05:00
James Smart	0794d601d1	scsi: lpfc: Implement common IO buffers between NVME and SCSI Currently, both NVME and SCSI get their IO buffers from separate pools. XRI's are associated 1:1 with IO buffers, so XRI's are also split between protocols. Eliminate the independent pools and use a single pool. Each buffer structure now has a common section and a protocol section. Per protocol routines for SGL initialization are removed and replaced by common routines. Initialization of the buffers is only done on the common area. All other fields, which are protocol specific, are initialized when the buffer is allocated for use in the per-protocol allocation routine. In the past, the SCSI side allocated IO buffers as part of slave_alloc calls until the maximum XRIs for SCSI was reached. As all XRIs are now common and may be used for either protocol, allocation for everything is done as part of adapter initialization and the scsi side has no action in slave alloc. As XRI's are no longer split, the lpfc_xri_split module parameter is removed. Adapters based on SLI3 will continue to use the older scsi_buf_list_get/put routines. All SLI4 adapters utilize the new IO buffer scheme Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:22:42 -05:00
James Smart	5021267af1	scsi: lpfc: Adding ability to reset chip via pci bus reset This patch adds a "pci_bus_reset" option to the board_mode sysfs attribute. This option uses the pci_reset_bus() api to reset the PCIe link the adapter is on, which will reset the chip/adapter. Prior to issuing this option, all functions on the same chip must be placed in the offline state by the admin. After the reset, all of the instances may be brought online again. The primary purpose of this functionality is to support cases where firmware update required a chip reset but the admin did not want to reboot the machine in order to instantiate the firmware update. Sanity checks take place prior to the reset to ensure the adapter is the sole entity on the PCIe bus and that all functions are in the offline state. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-12-19 22:13:08 -05:00
James Smart	719162bd5b	scsi: lpfc: Enable Management features for IF_TYPE=6 Addition of support for if_type=6 missed several checks for interface type, resulting in the failure of several key management features such as firmware dump and loopback testing. Correct the checks on the if_type so that both SLI4 IF_TYPE's 2 and 6 are supported. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-12-12 20:33:08 -05:00
James Smart	76558b2573	scsi: lpfc: Correct topology type reporting on G7 adapters Driver missed classifying the chip type for G7 when reporting supported topologies. This resulted in loop being shown as supported on FC links that are not supported per the standard. Add the chip classifications to the topology checks in the driver. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-12-07 22:35:33 -05:00
James Smart	8b47ae69e0	scsi: lpfc: Cap NPIV vports to 256 Depending on the chipset, the number of NPIV vports may vary and be in excess of what most switches support (256). To avoid confusion with the users, limit the reported NPIV vports to 256. Additionally correct the 16G adapter which is reporting a bogus NPIV vport number if the link is down. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-12-07 22:35:32 -05:00
James Smart	3e1f071892	scsi: lpfc: refactor mailbox structure context fields The driver data structure for managing a mailbox command contained two context fields. Unfortunately, the context were considered "generic" to be used at the whim of the command code. Of course, one section of code used fields this way, while another did it that way, and eventually there were mixups. Refactored the structure so that the generic contexts become a node context and a buffer context and all code standardizes on their use. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-12-07 22:35:32 -05:00
James Smart	1dc5ec2452	scsi: lpfc: add Trunking support Add trunking support to the driver. Trunking is found on more recent asics. In general, trunking appears as a single "port" to the driver and overall behavior doesn't differ. Link speed is reported as an aggregate value, while link speed control is done on a per-physical link basis with all links in the trunk symmetrical. Some commands returning port information are updated to additionally provide trunking information. And new ACQEs are generated to report physical link events relative to the trunk. This patch contains the following modifications: - Added link speed settings of 128GB and 256GB. - Added handling of trunk-related ACQEs, mainly logging and trapping of physical link statuses. - Added additional bsg interface to query trunk state by applications. - Augment link_state sysfs attribtute to display trunk link status Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-11-06 20:42:51 -05:00
James Smart	7ea92eb458	scsi: lpfc: Implement GID_PT on Nameserver query to support faster failover The switches seem to respond faster to GID_PT vs GID_FT NameServer queries. Add support for GID_PT to be used over GID_FT to enable faster storage failover detection. Includes addition of new module parameter to select between GID_PT and GID_FT (GID_FT is default). Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-11-06 20:42:51 -05:00
Linus Torvalds	d49f8a52b1	SCSI misc on 20181024 This is mostly updates of the usual drivers: UFS, esp_scsi, NCR5380, qla2xxx, lpfc, libsas, hisi_sas. In addition there's a set of mostly small updates to the target subsystem a set of conversions to the generic DMA API, which do have some potential for issues in the older drivers but we'll handle those as case by case fixes. A new myrs for the DAC960/mylex raid controllers to replace the block based DAC960 which is also being removed by Jens in this merge window. Plus the usual slew of trivial changes. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCW9BQJSYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishU3MAP41T8yW UJQDCprj65pCR+9mOUWzgMvgAW/15ouK89x/7AD/XAEQZqoAgpFUbgnoZWGddZkS LykIzSiLHP4qeDOh1TQ= =2JMU -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is mostly updates of the usual drivers: UFS, esp_scsi, NCR5380, qla2xxx, lpfc, libsas, hisi_sas. In addition there's a set of mostly small updates to the target subsystem a set of conversions to the generic DMA API, which do have some potential for issues in the older drivers but we'll handle those as case by case fixes. A new myrs driver for the DAC960/mylex raid controllers to replace the block based DAC960 which is also being removed by Jens in this merge window. Plus the usual slew of trivial changes" [ "myrs" stands for "MYlex Raid Scsi". Obviously. Silly of me to even wonder. There's also a "myrb" driver, where the 'b' stands for 'block'. Truly, somebody has got mad naming skillz. - Linus ] * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (237 commits) scsi: myrs: Fix the processor absent message in processor_show() scsi: myrs: Fix a logical vs bitwise bug scsi: hisi_sas: Fix NULL pointer dereference scsi: myrs: fix build failure on 32 bit scsi: fnic: replace gross legacy tag hack with blk-mq hack scsi: mesh: switch to generic DMA API scsi: ips: switch to generic DMA API scsi: smartpqi: fully convert to the generic DMA API scsi: vmw_pscsi: switch to generic DMA API scsi: snic: switch to generic DMA API scsi: qla4xxx: fully convert to the generic DMA API scsi: qla2xxx: fully convert to the generic DMA API scsi: qla1280: switch to generic DMA API scsi: qedi: fully convert to the generic DMA API scsi: qedf: fully convert to the generic DMA API scsi: pm8001: switch to generic DMA API scsi: nsp32: switch to generic DMA API scsi: mvsas: fully convert to the generic DMA API scsi: mvumi: switch to generic DMA API scsi: mpt3sas: switch to generic DMA API ...	2018-10-25 07:40:30 -07:00
James Smart	9e21017826	scsi: lpfc: Synchronize access to remoteport via rport The driver currently uses the ndlp to get the local rport which is then used to get the nvme transport remoteport pointer. There can be cases where a stale remoteport pointer is obtained as synchronization isn't done through the different dereferences. Correct by using locks to synchronize the dereferences. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-09-20 22:02:36 -04:00
James Smart	d2cc9bcd7f	scsi: lpfc: add support to retrieve firmware logs This patch adds the ability to read firmware logs from the adapter. The driver registers a buffer with the adapter that is then written to by the adapter. The adapter posts CQEs to indicate content updates in the buffer. While the adapter is writing to the buffer in a circular fashion, an application will poll the driver to read the next amount of log data from the buffer. Driver log buffer size is configurable via the ras_fwlog_buffsize sysfs attribute. Verbosity to be used by firmware when logging to host memory is controlled through the ras_fwlog_level attribute. The ras_fwlog_func attribute enables or disables loggy by firmware. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-09-11 20:37:33 -04:00
James Smart	faf0a5f829	scsi: lpfc: Raise nvme defaults to support a larger io and more connectivity When nvme is enabled, change the default for two parameters: sg_seg_cnt - raise the per-io sg list size so that 1MB ios are supported (based on a 4k buffer per element). iocb_cnt - raise the number of buffers used for things like NVME LS request/responses to allow more concurrent requests to for larger nvme configs. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-09-11 20:37:33 -04:00
James Smart	5b9e70b22c	scsi: lpfc: raise sg count for nvme to use available sg resources The driver allocates a sg list per io struture based on a fixed maximum size. When it registers with the protocol transports and indicates the max sg list size it supports, the driver manipulates the fixed value to report a lesser amount so that it has reserved space for sg elements that are used for DIF. The driver initialization path sets the cfg_sg_seg_cnt field to the manipulated value for scsi. NVME initialization ran afterward and capped it's maximum by the manipulated value for SCSI. This erroneously made NVME report the SCSI-reduce-for-DIF value that reduced the max io size for nvme and wasted sg elements. Rework the driver so that cfg_sg_seg_cnt becomes the overall maximum size and allow the max size to be tunable. A separate (new) scsi sg count is then setup with the scsi-modified reduced value. NVME then initializes based off the overall maximum. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-09-11 20:37:33 -04:00
James Smart	9abd9990e9	scsi: lpfc: Default fdmi_on to on Change default behavior for fdmi registration to on. [mkp: patch was mangled] Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-08-27 12:26:10 -04:00
James Smart	06b6fa3815	scsi: lpfc: Remove lpfc_enable_pbde as module parameter Enablement of the PBDE optimization brought out some incompatible behaviors under error scenarios. Best to disable and remove the PBDE optimization. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-08-02 15:45:19 -04:00
James Smart	b615a20adf	scsi: lpfc: Fix sysfs Speed value on CNA ports CNA ports were showing speed as "unknown" even if the link is up. Add speed decoding for FCOE-based adapters. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-08-02 15:45:18 -04:00
James Smart	414abe0ab6	scsi: lpfc: Make PBDE optimizations configurable The PBDE optimizations aren't supported in all firmware revs. Make optimizations configurable in case there's a side effect on old firmware. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-07-10 22:15:09 -04:00
James Smart	afff0d2321	scsi: lpfc: Add Buffer overflow check, when nvme_info larger than PAGE_SIZE Kernel crashes during fill_read_buffer when nvme_info sysfs file read. With multiple NVME targets, approx 40, nvme_info may grow larger than PAGE_SIZE bytes. snprintf(buf + len, PAGE_SIZE - len, ...) logic is flawed as PAGE_SIZE - len can be < 0 and is accepted by snprintf. This results in buffer overflow, and is detected with check from dev_attr_show and fill_read_buffer. Change to use scnprintf to a tmp array, before calling strlcat to ensure no buffer overflow over PAGE_SIZE bytes. Message "6314" created as a new message indicating when there is more nvme info, but is truncated to fit within PAGE_SIZE bytes. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-07-10 22:15:08 -04:00
Arnd Bergmann	c4d6204dc1	scsi: lpfc: use monotonic timestamps for statistics The get_seconds() function suffers from a possible overflow in 2038 or 2106, as well as jitter due to settimeofday or leap second updates, and is deprecated. As we are interested in elapsed time only, using ktime_get_seconds() to read the CLOCK_MONOTONIC timebase is ideal here. This also lets us remove the hack that tries to deal with get_seconds() going slightly backwards, which cannot happen with montonic timestamps. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-06-26 12:00:27 -04:00
James Smart	4d5e789a2e	scsi: lpfc: correct oversubscription of nvme io requests for an adapter Under large configurations, the driver would start to log message 6065 - NVME out of buffers (exchanges). The driver is using the ndlp cmd_qdepth value when determining the max outstanding ios for an adapter. This value, by default, is set to 65536, which exceeds the maximum exchange counts supported on an adapter. The ndlp cmd_qdepth has no relevance and outstanding io count should be capped at the max exchange count with IO requests beyond that level getting bounced back with an EBUSY status so that they are retried by the block layer. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-05-28 22:40:33 -04:00
James Smart	3e21d1cb0f	scsi: lpfc: Comment cleanup regarding Broadcom copyright header Fix small formatting and wording nits in Broadcom copyright header Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-05-08 01:03:16 -04:00

1 2 3 4 5 ...

324 commits