linux-stable

Commit Graph

Author	SHA1	Message	Date
Justin Tee	a7b94c1592	scsi: lpfc: Replace blk_irq_poll intr handler with threaded IRQ It has been determined that the threaded IRQ API accomplishes effectively the same performance metrics as blk_irq_poll. As blk_irq_poll is mostly scheduled by the softirqd and handled in softirq context, this is not entirely desired from a Fibre Channel driver context. A threaded IRQ model fits cleaner. This patch replaces the blk_irq_poll logic with threaded IRQ. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230417191558.83100-7-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-05-08 07:16:05 -04:00
Justin Tee	27c2bcf00a	scsi: lpfc: Skip waiting for register ready bits when in unrecoverable state During tolerance tests that force an HBA to become unresponsive, rmmod hangs resulting in the inability to remove the driver. The lpfc_pci_remove_one_s4() routine attempts to submit a clean up mailbox command via the lpfc_sli4_post_sync_mbox() routine, but ends up waiting forever for a mailbox register to set its ready bit. Because the HBA is in an unrecoverable and unresponsive state, the ready bit will never be set. Create a new routine called lpfc_sli4_unrecoverable_port(), which checks a port status register's error notification bits. Use the lpfc_sli4_unrecoverable_port() routine in ready bit check routines to early return error if port is deemed unrecoverable. Also, when the lpfc_handle_eratt_s4() handler detects an unrecoverable state, call the lpfc_sli4_offline_eratt() routine to kick off flushing outstanding I/O. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230301231626.9621-8-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-09 21:21:45 -05:00
Justin Tee	191b5a3877	scsi: lpfc: Copyright updates for 14.2.0.10 patches Update copyrights to 2023 for files modified in the 14.2.0.10 patch set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-01-12 00:03:15 -05:00
Justin Tee	96fb8c34e5	scsi: lpfc: Introduce new attention types for lpfc_sli4_async_fc_evt() handler Define new FC Link ACQE with new attention types 0x8 (Link Activation Failure) and 0x9 (Link Reset Protocol Event). Both attention types are meant to be informational-only type ACQEs with no action required. 0x8 is reported for diagnostic purposes, while 0x9 is posted during a normal link up transition when activating BB Credit Recovery feature. As such, modify lpfc_sli4_async_fc_evt() logic to log the attention types according to its severity and early return when informational-only attention types are encountered. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-01-12 00:03:15 -05:00
Justin Tee	d99af587d5	scsi: lpfc: Fix MI capability display in cmf_info sysfs attribute The dynamic mi_ver value holds the currently configured MI setting. mi_ver was being displayed as part of the cmf_info sysfs attribute, when the output string meant to display MI capabilities instead. Add a mi_cap member in the lpfc_pc_sli4_params structure that will store MI capabilities during initialization so that cmf_info prints out capabilities instead of current configuration. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20221116011921.105995-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-11-17 18:18:42 +00:00
James Smart	a4de8356b6	scsi: lpfc: Fix various issues reported by tools This patch fixes below Smatch reported issues: 1. lpfc_hbadisc.c:3020 lpfc_mbx_cmpl_fcf_rr_read_fcf_rec() error: uninitialized symbol 'vlan_id'. 2. lpfc_hbadisc.c:3121 lpfc_mbx_cmpl_read_fcf_rec() error: uninitialized symbol 'vlan_id'. 3. lpfc_init.c:335 lpfc_dump_wakeup_param_cmpl() warn: always true condition '(prg->dist < 4) => (0-3 < 4)' 4. lpfc_init.c:2419 lpfc_parse_vpd() warn: inconsistent indenting. 5. lpfc_init.c:13248 lpfc_sli4_enable_msi() warn: 'phba->pcidev->irq' 2147483648 can't fit into 65535 'eqhdl->irq' 6. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get() error: uninitialized symbol 'ext_cnt' 7. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get() error: uninitialized symbol 'ext_size' 8. lpfc_vmid.c:248 lpfc_vmid_get_appid() warn: sleeping in atomic context. 9. lpfc_init.c:8342 lpfc_sli4_driver_resource_setup() warn: missing error code 'rc'. 10. lpfc_init.c:13573 lpfc_sli4_hba_unset() warn: variable dereferenced before check 'phba->pport' (see line 13546) 11. lpfc_auth.c:1923 lpfc_auth_handle_dhchap_reply() error: double free of 'hash_value' Fixes: 1. Initialize vlan_id to LPFC_FCOE_NULL_VID. 2. Initialize vlan_id to LPFC_FCOE_NULL_VID. 3. prg->dist is a 2 bit field. Its value can only be between 0-3. Remove redundent check 'if (prg->dist < 4)'. 4. Fix inconsistent indenting. Moved logic into helper function lpfc_fill_vpd(). 5. Define 'eqhdl->irq' as int value as pci_irq_vector() returns int. Also, check for return value of pci_irq_vector() and log message in case of failure. 6. Initialize 'ext_cnt' to 0. 7. Initialize 'ext_size' to 0. 8. Use alloc_percpu_gfp() with GFP_ATOMIC flag. 9. 'rc' was not updated when dma_pool_create() fails. Update 'rc = -ENOMEM' when dma_pool_create() fails before calling goto statement. 10. Add check for 'phba->pport' in lpfc_cpuhp_remove(). 11. Initialize 'hash_value' to NULL, same like 'aug_chal' variable. Link: https://lore.kernel.org/r/20220911221505.117655-13-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-09-15 22:18:28 -04:00
James Smart	1b6f71f7fc	scsi: lpfc: Change FA-PWWN detection methodology Do not rely on vendor version field of the CSPs to determine if we are in a FA-PWWN environment. Instead, use the following procedure: First, during HBA initialization, driver does a READ_CONFIG to determine if FA-PWWN is configured on the HBA. A LPFC_FAWWPN_CONFIG hba_flag is set accordingly. Next, when the link comes up before the driver gets a link up event, the firmware logs into the fabric with FA-PWWN. If the fabric port does not support FA-PWWN, the driver will get a Misconfigured FA-WWN async event before the link up. A LPFC_FAWWPN_FABRIC hba_flag will be set accordingly. Finally, if the fabric supports FA-PWWN, the firmware will replace its CSPs WWN with the Fabric Assigned ones. Then after link up, the driver will retrieve the Fabric Assigned WWN when it does a READ_SPARAM mbox command. Link: https://lore.kernel.org/r/20220412222008.126521-23-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-04-18 22:48:47 -04:00
James Smart	f45775bf56	scsi: lpfc: Copyright updates for 14.2.0.0 patches Update copyrights to 2022 for files modified in the 14.2.0.0 patch set. Link: https://lore.kernel.org/r/20220225022308.16486-18-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-03-15 13:51:50 -04:00
James Smart	25ac2c970b	scsi: lpfc: Fix EEH support for NVMe I/O Injecting errors on the PCI slot while the driver is handling NVMe I/O will cause crashes and hangs. There are several rather difficult scenarios occurring. The main issue is that the adapter can report a PCI error before or simultaneously to the PCI subsystem reporting the error. Both paths have different entry points and currently there is no interlock between them. Thus multiple teardown paths are competing and all heck breaks loose. Complicating things is the NVMs path. To a large degree, I/O was able to be shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a similar call. At best, it works on a per-controller basis, but even at the controller level, it's a controller "reset" call. All of which means I/O is still flowing on different CPUs with reset paths expecting hw access (mailbox commands) to execute properly. The following modifications are made: - A new flag is set in PCI error entrypoints so the driver can track being called by that path. - An interlock is added in the SLI hw error path and the PCI error path such that only one of the paths proceeds with the teardown logic. - RPI cleanup is patched such that RPIs are marked unregistered w/o mbx cmds in cases of hw error. - If entering the SLI port re-init calls, a case where SLI error teardown was quick and beat the PCI calls now reporting error, check whether the SLI port is still live on the PCI bus. - In the PCI reset code to bring the adapter back, recheck the IRQ settings. Different checks for SLI3 vs SLI4. - In I/O completions, that may be called as part of the cleanup or underway just before the hw error, check the state of the adapter. If in error, shortcut handling that would expect further adapter completions as the hw error won't be sending them. - In routines waiting on I/O completions, which may have been in progress prior to the hw error, detect the device is being torn down and abort from their waits and just give up. This points to a larger issue in the driver on ref-counting for data structures, as it doesn't have ref-counting on q and port structures. We'll do this fix for now as it would be a major rework to be done differently. - Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being failed back due to hw error. - In I/O buf allocation, done at the start of new I/Os, check hw state and fail if hw error. Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-09-14 23:33:21 -04:00
James Smart	02243836ad	scsi: lpfc: Add support for the CM framework Complete the enablement of the cm framework feature in the adapter. Perform the following: - Detect the presence of the congestion management framework feature. When the cm framework is present: - Issue the SET_FEATURE command to enable the feature. - Register the cm statistics buffer with the adapter. - Read the cm enablement buffer to determine the cm framework state for cm management. When cm management is enabled: - Monitor all FPIN and congestion signalling events, incrementing counters. - Regularly sync with the adapter to communicate congestion events and to receive an rx request limit. - Monitor requests for rx data and ensure that no more than the adapter prescribed limit is issued on the link. If the limit is exceeded, SCSI and/or NVMe traffic is temporarily suspended. - Maintain the minute, hourly, daily statistics buffer. - Monitor for congestion enablement change events, causing a reread of the enablement buffer and acting on any change in enablement. And: - Add teardown logic, including buffer deregistration, on adapter detachment or reset. Link: https://lore.kernel.org/r/20210816162901.121235-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-24 22:56:34 -04:00
James Smart	f2af8ffc63	scsi: lpfc: Copyright updates for 12.8.0.11 patches Update copyrights for files modified by the 12.8.0.11 patch set. Link: https://lore.kernel.org/r/20210707184351.67872-21-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-07-18 22:30:38 -04:00
James Smart	16a93e83c8	scsi: lpfc: Improve firmware download logging Define additional status fields in mailbox commands to help provide additional information when downloading new firmware. Link: https://lore.kernel.org/r/20210707184351.67872-4-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-07-18 22:30:35 -04:00
James Smart	8aaa7bcf07	scsi: lpfc: Add FDMI Vendor MIB support Created new attribute lpfc_enable_mi, which by default is enabled. Add command definition bits for SLI-4 parameters that recognize whether the adapter has MIB information support and what revision of MIB data. Using the adapter information, register vendor-specific MIB support with FDMI. The registration will be done every link up. During FDMI registration, encountered a couple of errors when reverting to FDMI rev1. Code needed to exist once reverting. Fixed these. Link: https://lore.kernel.org/r/20201020202719.54726-8-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-10-26 21:42:39 -04:00
James Smart	e7dab164a9	scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpi The following call trace was seen during HBA reset testing: BUG: scheduling while atomic: swapper/2/0/0x10000100 ... Call Trace: dump_stack+0x19/0x1b __schedule_bug+0x64/0x72 __schedule+0x782/0x840 __cond_resched+0x26/0x30 _cond_resched+0x3a/0x50 mempool_alloc+0xa0/0x170 lpfc_unreg_rpi+0x151/0x630 [lpfc] lpfc_sli_abts_recover_port+0x171/0x190 [lpfc] lpfc_sli4_abts_err_handler+0xb2/0x1f0 [lpfc] lpfc_sli4_io_xri_aborted+0x256/0x300 [lpfc] lpfc_sli4_sp_handle_abort_xri_wcqe.isra.51+0xa3/0x190 [lpfc] lpfc_sli4_fp_handle_cqe+0x89/0x4d0 [lpfc] __lpfc_sli4_process_cq+0xdb/0x2e0 [lpfc] __lpfc_sli4_hba_process_cq+0x41/0x100 [lpfc] lpfc_cq_poll_hdler+0x1a/0x30 [lpfc] irq_poll_softirq+0xc7/0x100 __do_softirq+0xf5/0x280 call_softirq+0x1c/0x30 do_softirq+0x65/0xa0 irq_exit+0x105/0x110 do_IRQ+0x56/0xf0 common_interrupt+0x16a/0x16a With the conversion to blk_io_poll for better interrupt latency in normal cases, it introduced this code path, executed when I/O aborts or logouts are seen, which attempts to allocate memory for a mailbox command to be issued. The allocation is GFP_KERNEL, thus it could attempt to sleep. Fix by creating a work element that performs the event handling for the remote port. This will have the mailbox commands and other items performed in the work element, not the irq. A much better method as the "irq" routine does not stall while performing all this deep handling code. Ensure that allocation failures are handled and send LOGO on failure. Additionally, enlarge the mailbox memory pool to reduce the possibility of additional allocation in this path. Link: https://lore.kernel.org/r/20201020202719.54726-3-james.smart@broadcom.com Fixes: `317aeb83c9` ("scsi: lpfc: Add blk_io_poll support for latency improvment") Cc: <stable@vger.kernel.org> # v5.9+ Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-10-26 21:42:38 -04:00
Dick Kennedy	317aeb83c9	scsi: lpfc: Add blk_io_poll support for latency improvment Although the existing implementation is very good at high I/O load, on tests involving light load, especially on only a few hardware queues, latency was a little higher than it can be due to using workqueue scheduling. Other tasks in the system can delay handling. Change the lower level to use irq_poll by default which uses a softirq for I/O completion. This gives better latency as variance in when the cq is processed is reduced over the workqueue interface. However, as high load is better served by not being in softirq when the CPU is loaded, work queues are still used under high I/O load. Link: https://lore.kernel.org/r/20200630215001.70793-13-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-02 23:06:42 -04:00
Dick Kennedy	3048e3e805	scsi: lpfc: Change default queue allocation for reduced memory consumption By default, the driver attempts to allocate a hdwq per logical cpu in order to provide good cpu affinity. Some systems have extremely high cpu counts and this can significantly raise memory consumption. In testing on x86 platforms (non-AMD) it is found that sharing of a hdwq by a physical cpu and its HT cpu can occur with little performance degredation. By sharing, the hdwq count can be halved, significantly reducing the memory overhead. Change the default behavior of the driver on non-AMD x86 platforms to share a hdwq by the cpu and its HT cpu. Link: https://lore.kernel.org/r/20200501214310.91713-6-jsmart2021@gmail.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-05-07 22:47:24 -04:00
James Smart	840eda9602	scsi: lpfc: Fix erroneous cpu limit of 128 on I/O statistics The cpu io statistics were capped by a hard define limit of 128. This effectively was a max number of CPUs, not an actual CPU count, nor actual CPU numbers which can be even larger than both of those values. This made stats off/misleading and on large CPU count systems, wrong. Fix the stats so that all CPUs can have a stats struct. Fix the looping such that it loops by hdwq, finds CPUs that used the hdwq, and sum the stats, then display. Link: https://lore.kernel.org/r/20200322181304.37655-9-jsmart2021@gmail.com Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-03-29 18:10:48 -04:00
James Smart	dcaa213679	scsi: lpfc: Change default IRQ model on AMD architectures The current driver attempts to allocate an interrupt vector per cpu using the systems managed IRQ allocator (flag PCI_IRQ_AFFINITY). The system IRQ allocator will either provide the per-cpu vector, or return fewer vectors. When fewer vectors, they are evenly spread between the numa nodes on the system. When run on an AMD architecture, if interrupts occur to a cpu that is not in the same numa node as the adapter generating the interrupt, there are extreme costs and overheads in performance. Thus, if 1:1 vector allocation is used, or the "balanced" vectors in the other numa nodes, performance can be hit significantly. A much more performant model is to allocate interrupts only on the cpus that are in the numa node where the adapter resides. I/O completion is still performed by the cpu where the I/O was generated. Unfortunately, there is no flag to request the managed IRQ subsystem allocate vectors only for the CPUs in the numa node as the adapter. On AMD architecture, revert the irq allocation to the normal style (non-managed) and then use irq_set_affinity_hint() to set the cpu affinity and disable user-space rebalancing. Tie the support into CPU offline/online. If the cpu being offlined owns a vector, the vector is re-affinitized to one of the other CPUs on the same numa node. If there are no more CPUs on the numa node, the vector has all affinity removed and lets the system determine where it's serviced. Similarly, when the cpu that owned a vector comes online, the vector is reaffinitized to the cpu. Link: https://lore.kernel.org/r/20191105005708.7399-10-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-11-06 00:04:04 -05:00
James Smart	93a4d6f401	scsi: lpfc: Add registration for CPU Offline/Online events The recent affinitization didn't address cpu offlining/onlining. If an interrupt vector is shared and the low order cpu owning the vector is offlined, as interrupts are managed, the vector is taken offline. This causes the other CPUs sharing the vector will hang as they can't get io completions. Correct by registering callbacks with the system for Offline/Online events. When a cpu is taken offline, its eq, which is tied to an interrupt vector is found. If the cpu is the "owner" of the vector and if the eq/vector is shared by other CPUs, the eq is placed into a polled mode. Additionally, code paths that perform io submission on the "sharing CPUs" will check the eq state and poll for completion after submission of new io to a wq that uses the eq. Similarly, when a cpu comes back online and owns an offlined vector, the eq is taken out of polled mode and rearmed to start driving interrupts for eq. Link: https://lore.kernel.org/r/20191105005708.7399-9-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-11-06 00:04:04 -05:00
James Smart	83c6cb1ae8	scsi: lpfc: Add FC-AL support to lpe32000 models In the past, the lpe32000 models, based their main support being for 32G, and as FC-AL is not supported in the FC standards past 8G, did not support FC-AL operation. This patch adds private-loop FC-AL support for the LPE32000 adapters when a link is 8G or below. To avoid conditions where link rate may change, which would cause non-connectivity to the AL device, FC-AL mode must become a persistent setting and the link kept at a speed supporting FC-AL. The patch: - Adds a pls attribute indicating whether the adapter properly supports FC-AL. - Adds support for the adapter to indicate that topology should be fixed and the topology types to be configured. - Adds a pt attribute to report the persistent topology if present. Link: https://lore.kernel.org/r/20191018211832.7917-15-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-10-24 21:02:06 -04:00
James Smart	8156d378c4	scsi: lpfc: Revise interrupt coalescing for missing scenarios The existing "auto eq delay" mechanism was sometimes skipping over an EQ, not ramping the coalescing down under light load fast enough, and in other cases never kicked in as cpu sharing by multiple vectors didn't quite add up right. Tweak the interrupt mechanism such that: - Add a flag to the EQ to force checking for colaescing values when being serviced in the interrupt handler. The flag will be set by any CQ bound to the EQ whenever the number of CQ elements process in a single scan meets or exceeds the hardware queue notify level. E.g. there's a significant number of completions happening. - In the heartbeat work item that checks coalescing: - Replace the structure that was counting the number of EQs that interrupted on a single cpu with a new structure that looks at the EQ to see whether EQ currently has a coalescing value (thus it should be re-evaluate) or was marked by the new flag indicating heavy completions. - When a cpu, which may be servicing multiple vectors, had at least 1 EQ that should be checked, a new coalescing delay is calculated based on the number of interrupts that occurred on the cpu. - The new coalescing value is then applied to the EQs that had interrupted on the cpu. Link: https://lore.kernel.org/r/20191018211832.7917-11-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-10-24 21:02:05 -04:00
Linus Torvalds	10fd71780f	SCSI misc on 20190919 This is mostly update of the usual drivers: qla2xxx, ufs, smartpqi, lpfc, hisi_sas, qedf, mpt3sas; plus a whole load of minor updates. The only core change this time around is the addition of request batching for virtio. Since batching requires an additional flag to use, it should be invisible to the rest of the drivers. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXYQE/yYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXs9AP4usPY5 OpMlF6OiKFNeJrCdhCScVghf9uHbc7UA6cP+EgD/bCtRgcDe1ZjOTYWdeTwvwWqA ltWYonnv6Lg3b1f9yqI= =jRC/ -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is mostly update of the usual drivers: qla2xxx, ufs, smartpqi, lpfc, hisi_sas, qedf, mpt3sas; plus a whole load of minor updates. The only core change this time around is the addition of request batching for virtio. Since batching requires an additional flag to use, it should be invisible to the rest of the drivers" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (264 commits) scsi: hisi_sas: Fix the conflict between device gone and host reset scsi: hisi_sas: Add BIST support for phy loopback scsi: hisi_sas: Add hisi_sas_debugfs_alloc() to centralise allocation scsi: hisi_sas: Remove some unused function arguments scsi: hisi_sas: Remove redundant work declaration scsi: hisi_sas: Remove hisi_sas_hw.slot_complete scsi: hisi_sas: Assign NCQ tag for all NCQ commands scsi: hisi_sas: Update all the registers after suspend and resume scsi: hisi_sas: Retry 3 times TMF IO for SAS disks when init device scsi: hisi_sas: Remove sleep after issue phy reset if sas_smp_phy_control() fails scsi: hisi_sas: Directly return when running I_T_nexus reset if phy disabled scsi: hisi_sas: Use true/false as input parameter of sas_phy_reset() scsi: hisi_sas: add debugfs auto-trigger for internal abort time out scsi: virtio_scsi: unplug LUNs when events missed scsi: scsi_dh_rdac: zero cdb in send_mode_select() scsi: fcoe: fix null-ptr-deref Read in fc_release_transport scsi: ufs-hisi: use devm_platform_ioremap_resource() to simplify code scsi: ufshcd: use devm_platform_ioremap_resource() to simplify code scsi: hisi_sas: use devm_platform_ioremap_resource() to simplify code scsi: ufs: Use kmemdup in ufshcd_read_string_desc() ...	2019-09-21 10:50:15 -07:00
James Smart	0622800d2e	scsi: lpfc: Raise config max for lpfc_fcp_mq_threshold variable Raise the config max for lpfc_fcp_mq_threshold variable to 256. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> CC: Hannes Reinecke <hare@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-08-29 18:12:27 -04:00
James Smart	7f9989bace	scsi: lpfc: Resolve checker warning for lpfc_new_io_buf() Per Dan Carpenter: The patch d79c9e9d4b3d: "scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware." from Aug 14, 2019, leads to the following static checker warning: drivers/scsi/lpfc/lpfc_init.c:4107 lpfc_new_io_buf() error: not allocating enough data 784 vs 768 There was no need to compare sizes nor to allocate size based on a define. Change allocation to use actual structure length Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> CC: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-08-29 18:07:59 -04:00
James Smart	c00f62e6c5	scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair Currently, each hardware queue, typically allocated per-cpu, consists of a WQ/CQ pair per protocol. Meaning if both SCSI and NVMe are supported 2 WQ/CQ pairs will exist for the hardware queue. Separate queues are unnecessary. The current implementation wastes memory backing the 2nd set of queues, and the use of double the SLI-4 WQ/CQ's means less hardware queues can be supported which means there may not always be enough to have a pair per cpu. If there is only 1 pair per cpu, more cpu's may get their own WQ/CQ. Rework the implementation to use a single WQ/CQ pair by both protocols. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-08-19 22:41:12 -04:00
James Smart	d79c9e9d4b	scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware. Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI to contain the exchanges Scatter/Gather List. This caps the number of SGL elements that can be in the SGL. There are not extensions to extend the list out of the 2 pages. The G7 hardware adds a SGE type that allows the SGL to be vectored to a different scatter/gather list segment. And that segment can contain a SGE to go to another segment and so on. The initial segment must still be pre-registered for the XRI, but it can be a much smaller amount (256Bytes) as it can now be dynamically grown. This much smaller allocation can handle the SG list for most normal I/O, and the dynamic aspect allows it to support many MB's if needed. The implementation creates a pool which contains "segments" and which is initially sized to hold the initial small segment per xri. If an I/O requires additional segments, they are allocated from the pool. If the pool has no more segments, the pool is grown based on what is now needed. After the I/O completes, the additional segments are returned to the pool for use by other I/Os. Once allocated, the additional segments are not released under the assumption of "if needed once, it will be needed again". Pools are kept on a per-hardware queue basis, which is typically 1:1 per cpu, but may be shared by multiple cpus. The switch to the smaller initial allocation significantly reduces the memory footprint of the driver (which only grows if large ios are issued). Based on the several K of XRIs for the adapter, the 8KB->256B reduction can conserve 32MBs or more. It has been observed with per-cpu resource pools that allocating a resource on CPU A, may be put back on CPU B. While the get routines are distributed evenly, only a limited subset of CPUs may be handling the put routines. This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because all the resources are being put on a limited subset of CPUs. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-08-19 22:41:12 -04:00
James Smart	77ffd3465b	scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQ When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of MQ resources based on shost values set by the driver. In newer cases of the driver, which attempts to set nr_hw_queues to the cpu count, the multipliers become excessive, with a single shost having SCSI-MQ pre-allocation reaching into the multiple GBytes range. NPIV, which creates additional shosts, only multiply this overhead. On lower-memory systems, this can exhaust system memory very quickly, resulting in a system crash or failures in the driver or elsewhere due to low memory conditions. After testing several scenarios, the situation can be mitigated by limiting the value set in shost->nr_hw_queues to 4. Although the shost values were changed, the driver still had per-cpu hardware queues of its own that allowed parallelization per-cpu. Testing revealed that even with the smallish number for nr_hw_queues for SCSI-MQ, performance levels remained near maximum with the within-driver affiinitization. A module parameter was created to allow the value set for the nr_hw_queues to be tunable. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-08-19 22:14:10 -04:00
James Smart	657add4e5e	scsi: lpfc: Fix poor use of hardware queues if fewer irq vectors While fixing the resources per socket, realized the driver was not using hardware queues (up to 1 per cpu) if there were fewer interrupt vectors. The driver was only using the hardware queue assigned to the cpu with the vector. Rework the affinity map check to use the additional hardware queue elements that had been allocated. If the cpu count exceeds the hardware queue count - share, but choose what is shared with by: hyperthread peer, core peer, socket peer, or finally similar cpu in a different socket. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-06-18 19:46:22 -04:00
James Smart	d9954a2d18	scsi: lpfc: Fix oops when driver is loaded with 1 interrupt vector The driver was coded expecting enough hardware queues and interrupt vectors such that at least there was one per socket. In the case where there were fewer than sockets, cpus were left unassigned thus null pointers. Rework the affinity mappings. Map settings for the cpu's that are in the irq cpu mask. For each cpu not in the mask, map to another cpu that does have a mask. Choice of the "other" cpu will attempt to map to the same cpu but differing hyperthread, or cpu within in same core, or cpu within same socket, or finally cpu in the base socket. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-06-18 19:46:22 -04:00
James Smart	c15e07047e	scsi: lpfc: Rework misleading nvme not supported in firmware message The driver unconditionally says fw doesn't support nvme when in truth it was a driver parameter settings that disabled nvme support. Rework the code validating nvme support to accurately report what condition is disabling nvme support. Save state on whether nvme fw supports nvme in case sysfs attributes change dynamically. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-06-18 19:46:21 -04:00
James Smart	79d8c4ce01	scsi: lpfc: Fix nvmet handling of received ABTS for unmapped frames The driver currently is relying on firmware to match ABTSs to existing exchanges. This works fine as long as an exchange has been assigned to the io and work posted to it. However, for unmapped frames (rxid=0xFFFF), the driver has yet to assign an xri. The driver was blindly saying it couldn't match the ABTS and sending the BA_xxx. However, the command frame may have been in queues waiting on xri's before posting to the nvmet_fc layer. When xri's became available, the command frame would still be pushed to the transport and that io would execute, even though the io had been killed by ABTS. The initiator, seeing the io ABTS'd, would reuse the exchange for a different io which would be received on the target and pushed up. If the "zombie" io then came back down and started transmitting, the initiator would match the oxid and accept erroneous data. Bad things happened. Add tracking of active exchanges in the target to allow matching of a received ABTS against active or pending IO requests. If the ABTS is matched to a pending or active IO, the drive initiates cleanup and conditionally notifies the transport. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-06-18 19:46:21 -04:00
James Smart	d74a89aab9	scsi: lpfc: Separate CQ processing for nvmet_fc upcalls Currently the driver is notified of new command frame receipt by CQEs. As part of the CQE processing, the driver upcalls the nvmet_fc transport to deliver the command. nvmet_fc, as part of receiving the command builds out a context for it, where one of the first steps is to allocate memory for the io. When running with tests that do large ios (1MB), it was found on some systems, the total number of outstanding I/O's, at 1MB per, completely consumed the system's memory. Thus additional ios were getting blocked in the memory allocator. Given that this blocked the lpfc thread processing CQEs, there were lots of other commands that were received and which are then held up, and given CQEs are serially processed, the aggregate delays for an IO waiting behind the others became cummulative - enough so that the initiator hit timeouts for the ios. The basic fix is to avoid the direct upcall and instead schedule a work item for each io as it is received. This allows the cq processing to complete very quickly, and each io can then run or block on it's own. However, this general solution hurts latency when there are few ios. As such, implemented the fix such that the driver watches how many CQEs it has processed sequentially in one run. As long as the count is below a threshold, the direct nvmet_fc upcall will be made. Only when the count is exceeded will it revert to work scheduling. Given that debug of this showed a surprisingly long delay in cq processing, the io timer stats were updated to better reflect the processing of the different points. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-06-18 19:46:21 -04:00
James Smart	92f3b32718	scsi: lpfc: Fixup eq_clr_intr references Declaring interrupt clear routines as inline is bogus as they are used as an indirect pointer. Remove the inline references. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-03-20 20:03:47 -04:00
James Bottomley	c88725dd14	scsi: lpfc: Fix build error You can't declare a function inline in a header if it doesn't have a body available to the compiler. So realistically you either don't declare it inline or you make it a static inline in the header. I think the latter applies in this case, so this should be the fix Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Acked-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-03-20 20:03:47 -04:00
James Smart	c1a21ebc0f	scsi: lpfc: Specify node affinity for queue memory allocation Change the SLI4 queue creation code to use NUMA node based memory allocation based on the cpu the queues will be related to. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-03-19 13:15:09 -04:00
James Smart	9afbee3d62	scsi: lpfc: Reduce memory footprint for lpfc_queue Currently the driver maintains a sideband structure which has a pointer for each queue element. However, at 8 bytes per pointer, and up to 4k elements per queue, and 100s of queues, this can take up a lot of memory. Convert the driver to using an access routine that calculates the element address based on its index rather than using the pointer table. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-03-19 13:15:09 -04:00
James Smart	9a66d990c7	scsi: lpfc: Add loopback testing to trunking mode When in trunking mode, the adapter can be placed into diagnostic mode and each link in the trunk tested via loopback. Add support to the driver to perform per-link loopback testing when in trunking mode. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-03-19 13:15:09 -04:00
James Smart	f3339800f9	scsi: lpfc: Fix link speed reporting for 4-link trunk Driver is using uint16_t and is encountering an overflow of the 16bits when calculating link speed. Fix by using a u32 type. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-03-19 13:15:09 -04:00
James Smart	0d041215f0	scsi: lpfc: Update 12.2.0.0 file copyrights to 2019 For files modified as part of 12.2.0.0 patches, update copyright to 2019 Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:50 -05:00
James Smart	222e9239c6	scsi: lpfc: Resize cpu maps structures based on possible cpus The work done to date utilized the number of present cpus when sizing per-cpu structures. Structures should have been sized based on the max possible cpu count. Convert the driver over to possible cpu count for sizing allocation. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:50 -05:00
James Smart	32517fc097	scsi: lpfc: Rework EQ/CQ processing to address interrupt coalescing When driving high iop counts, auto_imax coalescing kicks in and drives the performance to extremely small iops levels. There are two issues: 1) auto_imax is enabled by default. The auto algorithm, when iops gets high, divides the iops by the hdwq count and uses that value to calculate EQ_Delay. The EQ_Delay is set uniformly on all EQs whether they have load or not. The EQ_delay is only manipulated every 5s (a long time). Thus there were large 5s swings of no interrupt delay followed by large/maximum delay, before repeating. 2) When processing a CQ, the driver got mixed up on the rate of when to ring the doorbell to keep the chip appraised of the eqe or cqe consumption as well as how how long to sit in the thread and process queue entries. Currently, the driver capped its work at 64 entries (very small) and exited/rearmed the CQ. Thus, on heavy loads, additional overheads were taken to exit and re-enter the interrupt handler. Worse, if in the large/maximum coalescing windows,k it could be a while before getting back to servicing. The issues are corrected by the following: - A change in defaults. Auto_imax is turned OFF and fcp_imax is set to 0. Thus all interrupts are immediate. - Cleanup of field names and their meanings. Existing names were non-intuitive or used for duplicate things. - Added max_proc_limit field, to control the length of time the handlers would service completions. - Reworked EQ handling: Added common routine that walks eq, applying notify interval and max processing limits. Use queue_claimed to claim ownership of the queue while processing. Always rearm the queue whenever the common routine is called. Rework queue element processing, namely to eliminate hba_index vs host_index. Only one index is necessary. The queue entry can be marked invalid and the host_index updated immediately after eqe processing. After rework, xx_release routines are now DB write functions. Renamed the routines as such. Moved lpfc_sli4_eq_flush(), which does similar action, to same area. Replaced the 2 individual loops that walk an eq with a call to the common routine. Slightly revised lpfc_sli4_hba_handle_eqe() calling syntax. Added per-cpu counters to detect interrupt rates and scale interrupt coalescing values. - Reworked CQ handling: Added common routine that walks cq, applying notify interval and max processing limits. Use queue_claimed to claim ownership of the queue while processing. Always rearm the queue whenever the common routine is called. Rework queue element processing, namely to eliminate hba_index vs host_index. Only one index is necessary. The queue entry can be marked invalid and the host_index updated immediately after cqe processing. After rework, xx_release routines are now DB write functions. Renamed the routines as such. Replaced the 3 individual loops that walk a cq with a call to the common routine. Redefined lpfc_sli4_sp_handle_mcqe() to commong handler definition with queue reference. Add increment for mbox completion to handler. - Added a new module/sysfs attribute: lpfc_cq_max_proc_limit To allow dynamic changing of the CQ max_proc_limit value being used. Although this leaves an EQ as an immediate interrupt, that interrupt will only occur if a CQ bound to it is in an armed state and has cqe's to process. By staying in the cq processing routine longer, high loads will avoid generating more interrupts as they will only rearm as the processing thread exits. The immediately interrupt is also beneficial to idle or lower-processing CQ's as they get serviced immediately without being penalized by sharing an EQ with a more loaded CQ. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:49 -05:00
James Smart	cb733e3587	scsi: lpfc: cleanup: convert eq_delay to usdelay Review of the eq coalescing logic showed the code was a bit fragmented. Sometimes it would save/set via an interrupt max value, while in others it would do so via a usdelay. There were also two places changing eq delay, one place that issued mailbox commands, and another that changed via register writes if supported. Clean this up by: - Standardizing the operation of lpfc_modify_hba_eq_delay() routine so that it is always told of a us delay to impose. The routine then chooses the best way to set that - via register or via mbx. - Rather than two value types stored in eq->q_mode (usdelay if change via register, imax if change via mbox) - q_mode always contains usdelay. Before any value change, old vs new value is compared and only if different is a change done. - Revised the dmult calculation. dmult is not set based on overall imax divided by hardware queues - instead imax applies to a single cpu and the value will be replicated to all cpus. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:49 -05:00
James Smart	6a828b0f61	scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues So far MSIX vector allocation assumed it would be 1:1 with hardware queues. However, there are several reasons why fewer MSIX vectors may be allocated than hardware queues such as the platform being out of vectors or adapter limits being less than cpu count. This patch reworks the MSIX/EQ relationships with the per-cpu hardware queues so they can function independently. MSIX vectors will be equitably split been cpu sockets/cores and then the per-cpu hardware queues will be mapped to the vectors most efficient for them. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:49 -05:00
James Smart	b3295c2a75	scsi: lpfc: Fix setting affinity hints to correlate with hardware queues The desired affinity for the hardware queue behavior is for hdwq 0 to be affinitized with cpu 0, hdwq 1 to cpu 1, and so on. The implementation so far does not do this if the number of cpus is greater than the number of hardware queues (e.g. hardware queue allocation was administratively reduced or hardware queue resources could not scale to the cpu count). Correct the queue affinitization logic when queue count is less than cpu count. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:09 -05:00
James Smart	c490850a09	scsi: lpfc: Adapt partitioned XRI lists to efficient sharing The XRI get/put lists were partitioned per hardware queue. However, the adapter rarely had sufficient resources to give a large number of resources per queue. As such, it became common for a cpu to encounter a lack of XRI resource and request the upper io stack to retry after returning a BUSY condition. This occurred even though other cpus were idle and not using their resources. Create as efficient a scheme as possible to move resources to the cpus that need them. Each cpu maintains a small private pool which it allocates from for io. There is a watermark that the cpu attempts to keep in the private pool. The private pool, when empty, pulls from a global pool from the cpu. When the cpu's global pool is empty it will pull from other cpu's global pool. As there many cpu global pools (1 per cpu or hardware queue count) and as each cpu selects what cpu to pull from at different rates and at different times, it creates a radomizing effect that minimizes the number of cpu's that will contend with each other when the steal XRI's from another cpu's global pool. On io completion, a cpu will push the XRI back on to its private pool. A watermark level is maintained for the private pool such that when it is exceeded it will move XRI's to the CPU global pool so that other cpu's may allocate them. On NVME, as heartbeat commands are critical to get placed on the wire, a single expedite pool is maintained. When a heartbeat is to be sent, it will allocate an XRI from the expedite pool rather than the normal cpu private/global pools. On any io completion, if a reduction in the expedite pools is seen, it will be replenished before the XRI is placed on the cpu private pool. Statistics are added to aid understanding the XRI levels on each cpu and their behaviors. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:09 -05:00
James Smart	4c47efc140	scsi: lpfc: Move SCSI and NVME Stats to hardware queue structures Many io statistics were being sampled and saved using adapter-based data structures. This was creating a lot of contention and cache thrashing in the I/O path. Move the statistics to the hardware queue data structures. Given the per-queue data structures, use of atomic types is lessened. Add new sysfs and debugfs stat routines to collate the per hardware queue values and report at an adapter level. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:29:08 -05:00
James Smart	63df6d637e	scsi: lpfc: Adapt cpucheck debugfs logic to Hardware Queues Similar to the io execution path that reports cpu context information, the debugfs routines for cpu information needs to be aligned with new hardware queue implementation. Convert debugfs cnd nvme cpucheck statistics to report information per Hardware Queue. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:28:11 -05:00
James Smart	5e5b511d8b	scsi: lpfc: Partition XRI buffer list across Hardware Queues Once the IO buff allocations were made shared, there was a single XRI buffer list shared by all hardware queues. A single list isn't great for performance when shared across the per-cpu hardware queues. Create a separate XRI IO buffer get/put list for each Hardware Queue. As SGLs and associated IO buffers get allocated/posted to the firmware; round robin their assignment across all available hardware Queues so that there is an equitable assignment. Modify SCSI and NVME IO submit code paths to use the Hardware Queue logic for XRI allocation. Add a debugfs interface to display hardware queue statistics Added new empty_io_bufs counter to track if a cpu runs out of XRIs. Replace common_ variables/names with io_ to make meanings clearer. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:24:22 -05:00
James Smart	cdb42becdd	scsi: lpfc: Replace io_channels for nvme and fcp with general hdw_queues per cpu Currently, both nvme and fcp each have their own concept of an io_channel, which is a combination wq/cq and associated msix. Different cpus would share an io_channel. The driver is now moving to per-cpu wq/cq pairs and msix vectors. The driver will still use separate wq/cq pairs per protocol on each cpu, but the protocols will share the msix vector. Given the elimination of the nvme and fcp io channels, the module parameters will be removed. A new parameter, lpfc_hdw_queue is added which allows the wq/cq pair allocation per cpu to be overridden and allocated to lesser value. If lpfc_hdw_queue is zero, the number of pairs allocated will be based on the number of cpus. If non-zero, the parameter specifies the number of queues to allocate. At this time, the maximum non-zero value is 64. To manage this new paradigm, a new hardware queue structure is created to track queue activity and relationships. As MSIX vector allocation must be known before setting up the relationships, msix allocation now occurs before queue datastructures are allocated. If the number of vectors allocated is less than the desired hardware queues, the hardware queue counts will be reduced to the number of vectors Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:22:42 -05:00
James Smart	7370d10ac9	scsi: lpfc: Remove extra vector and SLI4 queue for Expresslane There is a extra queue and msix vector for expresslane. Now that the driver will be doing queues per cpu, this oddball queue is no longer needed. Expresslane will utilize the normal per-cpu queues. Updated debugfs sli4 queue output to go along with the change Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2019-02-05 22:22:42 -05:00

1 2 3 4

166 Commits