linux-stable/drivers/scsi/lpfc/lpfc_sli.h

484 lines
17 KiB
C
Raw Normal View History

/*******************************************************************
* This file is part of the Emulex Linux Device Driver for *
* Fibre Channel Host Bus Adapters. *
* Copyright (C) 2017-2022 Broadcom. All Rights Reserved. The term *
* Broadcom refers to Broadcom Inc. and/or its subsidiaries. *
* Copyright (C) 2004-2016 Emulex. All rights reserved. *
* EMULEX and SLI are trademarks of Emulex. *
* www.broadcom.com *
* *
* This program is free software; you can redistribute it and/or *
* modify it under the terms of version 2 of the GNU General *
* Public License as published by the Free Software Foundation. *
* This program is distributed in the hope that it will be useful. *
* ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND *
* WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, *
* FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE *
* DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD *
* TO BE LEGALLY INVALID. See the GNU General Public License for *
* more details, a copy of which can be found in the file COPYING *
* included with this package. *
*******************************************************************/
scsi: lpfc: Adapt partitioned XRI lists to efficient sharing The XRI get/put lists were partitioned per hardware queue. However, the adapter rarely had sufficient resources to give a large number of resources per queue. As such, it became common for a cpu to encounter a lack of XRI resource and request the upper io stack to retry after returning a BUSY condition. This occurred even though other cpus were idle and not using their resources. Create as efficient a scheme as possible to move resources to the cpus that need them. Each cpu maintains a small private pool which it allocates from for io. There is a watermark that the cpu attempts to keep in the private pool. The private pool, when empty, pulls from a global pool from the cpu. When the cpu's global pool is empty it will pull from other cpu's global pool. As there many cpu global pools (1 per cpu or hardware queue count) and as each cpu selects what cpu to pull from at different rates and at different times, it creates a radomizing effect that minimizes the number of cpu's that will contend with each other when the steal XRI's from another cpu's global pool. On io completion, a cpu will push the XRI back on to its private pool. A watermark level is maintained for the private pool such that when it is exceeded it will move XRI's to the CPU global pool so that other cpu's may allocate them. On NVME, as heartbeat commands are critical to get placed on the wire, a single expedite pool is maintained. When a heartbeat is to be sent, it will allocate an XRI from the expedite pool rather than the normal cpu private/global pools. On any io completion, if a reduction in the expedite pools is seen, it will be replenished before the XRI is placed on the cpu private pool. Statistics are added to aid understanding the XRI levels on each cpu and their behaviors. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-28 19:14:28 +00:00
#if defined(CONFIG_DEBUG_FS) && !defined(CONFIG_SCSI_LPFC_DEBUG_FS)
#define CONFIG_SCSI_LPFC_DEBUG_FS
#endif
/* forward declaration for LPFC_IOCB_t's use */
struct lpfc_hba;
struct lpfc_vport;
/* Define the context types that SLI handles for abort and sums. */
typedef enum _lpfc_ctx_cmd {
LPFC_CTX_LUN,
LPFC_CTX_TGT,
LPFC_CTX_HOST
} lpfc_ctx_cmd;
/* Enumeration to describe the thread lock context. */
enum lpfc_mbox_ctx {
MBOX_THD_UNLOCKED,
MBOX_THD_LOCKED
};
2022-02-25 02:22:52 +00:00
union lpfc_vmid_tag {
uint32_t app_id;
uint8_t cs_ctl_vmid;
struct lpfc_vmid_context *vmid_context; /* UVEM context information */
};
struct lpfc_cq_event {
struct list_head list;
uint16_t hdwq;
union {
struct lpfc_mcqe mcqe_cmpl;
struct lpfc_acqe_link acqe_link;
struct lpfc_acqe_fip acqe_fip;
struct lpfc_acqe_dcbx acqe_dcbx;
struct lpfc_acqe_grp5 acqe_grp5;
struct lpfc_acqe_fc_la acqe_fc;
struct lpfc_acqe_sli acqe_sli;
struct lpfc_rcqe rcqe_cmpl;
struct sli4_wcqe_xri_aborted wcqe_axri;
struct lpfc_wcqe_complete wcqe_cmpl;
} cqe;
};
/* This structure is used to handle IOCB requests / responses */
struct lpfc_iocbq {
/* lpfc_iocbqs are used in double linked lists */
struct list_head list;
struct list_head clist;
struct list_head dlist;
uint16_t iotag; /* pre-assigned IO tag */
uint16_t sli4_lxritag; /* logical pre-assigned XRI. */
uint16_t sli4_xritag; /* pre-assigned XRI, (OXID) tag. */
scsi: lpfc: NVME Initiator: Base modifications NVME Initiator: Base modifications This patch adds base modifications for NVME initiator support. The base modifications consist of: - Formal split of SLI3 rings from SLI-4 WQs (sometimes referred to as rings as well) as implementation now widely varies between the two. - Addition of configuration modes: SCSI initiator only; NVME initiator only; NVME target only; and SCSI and NVME initiator. The configuration mode drives overall adapter configuration, offloads enabled, and resource splits. NVME support is only available on SLI-4 devices and newer fw. - Implements the following based on configuration mode: - Exchange resources are split by protocol; Obviously, if only 1 mode, then no split occurs. Default is 50/50. module attribute allows tuning. - Pools and config parameters are separated per-protocol - Each protocol has it's own set of queues, but share interrupt vectors. SCSI: SLI3 devices have few queues and the original style of queue allocation remains. SLI4 devices piggy back on an "io-channel" concept that eventually needs to merge with scsi-mq/blk-mq support (it is underway). For now, the paradigm continues as it existed prior. io channel allocates N msix and N WQs (N=4 default) and either round robins or uses cpu # modulo N for scheduling. A bunch of module parameters allow the configuration to be tuned. NVME (initiator): Allocates an msix per cpu (or whatever pci_alloc_irq_vectors gets) Allocates a WQ per cpu, and maps the WQs to msix on a WQ # modulo msix vector count basis. Module parameters exist to cap/control the config if desired. - Each protocol has its own buffer and dma pools. I apologize for the size of the patch. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> ---- Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-12 21:52:30 +00:00
uint16_t hba_wqidx; /* index to HBA work queue */
struct lpfc_cq_event cq_event;
scsi: lpfc: NVME Initiator: Base modifications NVME Initiator: Base modifications This patch adds base modifications for NVME initiator support. The base modifications consist of: - Formal split of SLI3 rings from SLI-4 WQs (sometimes referred to as rings as well) as implementation now widely varies between the two. - Addition of configuration modes: SCSI initiator only; NVME initiator only; NVME target only; and SCSI and NVME initiator. The configuration mode drives overall adapter configuration, offloads enabled, and resource splits. NVME support is only available on SLI-4 devices and newer fw. - Implements the following based on configuration mode: - Exchange resources are split by protocol; Obviously, if only 1 mode, then no split occurs. Default is 50/50. module attribute allows tuning. - Pools and config parameters are separated per-protocol - Each protocol has it's own set of queues, but share interrupt vectors. SCSI: SLI3 devices have few queues and the original style of queue allocation remains. SLI4 devices piggy back on an "io-channel" concept that eventually needs to merge with scsi-mq/blk-mq support (it is underway). For now, the paradigm continues as it existed prior. io channel allocates N msix and N WQs (N=4 default) and either round robins or uses cpu # modulo N for scheduling. A bunch of module parameters allow the configuration to be tuned. NVME (initiator): Allocates an msix per cpu (or whatever pci_alloc_irq_vectors gets) Allocates a WQ per cpu, and maps the WQs to msix on a WQ # modulo msix vector count basis. Module parameters exist to cap/control the config if desired. - Each protocol has its own buffer and dma pools. I apologize for the size of the patch. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> ---- Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-12 21:52:30 +00:00
uint64_t isr_timestamp;
union lpfc_wqe128 wqe; /* SLI-4 */
IOCB_t iocb; /* SLI-3 */
2022-02-25 02:22:52 +00:00
struct lpfc_wcqe_complete wcqe_cmpl; /* WQE cmpl */
scsi: lpfc: NVME Initiator: Base modifications NVME Initiator: Base modifications This patch adds base modifications for NVME initiator support. The base modifications consist of: - Formal split of SLI3 rings from SLI-4 WQs (sometimes referred to as rings as well) as implementation now widely varies between the two. - Addition of configuration modes: SCSI initiator only; NVME initiator only; NVME target only; and SCSI and NVME initiator. The configuration mode drives overall adapter configuration, offloads enabled, and resource splits. NVME support is only available on SLI-4 devices and newer fw. - Implements the following based on configuration mode: - Exchange resources are split by protocol; Obviously, if only 1 mode, then no split occurs. Default is 50/50. module attribute allows tuning. - Pools and config parameters are separated per-protocol - Each protocol has it's own set of queues, but share interrupt vectors. SCSI: SLI3 devices have few queues and the original style of queue allocation remains. SLI4 devices piggy back on an "io-channel" concept that eventually needs to merge with scsi-mq/blk-mq support (it is underway). For now, the paradigm continues as it existed prior. io channel allocates N msix and N WQs (N=4 default) and either round robins or uses cpu # modulo N for scheduling. A bunch of module parameters allow the configuration to be tuned. NVME (initiator): Allocates an msix per cpu (or whatever pci_alloc_irq_vectors gets) Allocates a WQ per cpu, and maps the WQs to msix on a WQ # modulo msix vector count basis. Module parameters exist to cap/control the config if desired. - Each protocol has its own buffer and dma pools. I apologize for the size of the patch. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> ---- Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-12 21:52:30 +00:00
u32 unsol_rcv_len; /* Receive len in usol path */
/* Pack the u8's together and make them module-4. */
u8 num_bdes; /* Number of BDEs */
u8 abort_bls; /* ABTS by initiator or responder */
u8 abort_rctl; /* ACC or RJT flag */
u8 priority; /* OAS priority */
u8 retry; /* retry counter for IOCB cmd - if needed */
u8 rsvd1; /* Pad for u32 */
u8 rsvd2; /* Pad for u32 */
u8 rsvd3; /* Pad for u32 */
2022-02-25 02:22:52 +00:00
u32 cmd_flag;
#define LPFC_IO_LIBDFC 1 /* libdfc iocb */
#define LPFC_IO_WAKE 2 /* Synchronous I/O completed */
#define LPFC_IO_WAKE_TMO LPFC_IO_WAKE /* Synchronous I/O timed out */
#define LPFC_IO_FCP 4 /* FCP command -- iocbq in scsi_buf */
#define LPFC_DRIVER_ABORTED 8 /* driver aborted this request */
#define LPFC_IO_FABRIC 0x10 /* Iocb send using fabric scheduler */
#define LPFC_DELAY_MEM_FREE 0x20 /* Defer free'ing of FC data */
#define LPFC_EXCHANGE_BUSY 0x40 /* SLI4 hba reported XB in response */
#define LPFC_USE_FCPWQIDX 0x80 /* Submit to specified FCPWQ index */
#define DSS_SECURITY_OP 0x100 /* security IO */
#define LPFC_IO_ON_TXCMPLQ 0x200 /* The IO is still on the TXCMPLQ */
#define LPFC_IO_DIF_PASS 0x400 /* T10 DIF IO pass-thru prot */
#define LPFC_IO_DIF_STRIP 0x800 /* T10 DIF IO strip prot */
#define LPFC_IO_DIF_INSERT 0x1000 /* T10 DIF IO insert prot */
#define LPFC_IO_CMD_OUTSTANDING 0x2000 /* timeout handler abort window */
#define LPFC_FIP_ELS_ID_MASK 0xc000 /* ELS_ID range 0-3, non-shifted mask */
#define LPFC_FIP_ELS_ID_SHIFT 14
#define LPFC_IO_OAS 0x10000 /* OAS FCP IO */
#define LPFC_IO_FOF 0x20000 /* FOF FCP IO */
#define LPFC_IO_LOOPBACK 0x40000 /* Loopback IO */
scsi: lpfc: NVME Initiator: Base modifications NVME Initiator: Base modifications This patch adds base modifications for NVME initiator support. The base modifications consist of: - Formal split of SLI3 rings from SLI-4 WQs (sometimes referred to as rings as well) as implementation now widely varies between the two. - Addition of configuration modes: SCSI initiator only; NVME initiator only; NVME target only; and SCSI and NVME initiator. The configuration mode drives overall adapter configuration, offloads enabled, and resource splits. NVME support is only available on SLI-4 devices and newer fw. - Implements the following based on configuration mode: - Exchange resources are split by protocol; Obviously, if only 1 mode, then no split occurs. Default is 50/50. module attribute allows tuning. - Pools and config parameters are separated per-protocol - Each protocol has it's own set of queues, but share interrupt vectors. SCSI: SLI3 devices have few queues and the original style of queue allocation remains. SLI4 devices piggy back on an "io-channel" concept that eventually needs to merge with scsi-mq/blk-mq support (it is underway). For now, the paradigm continues as it existed prior. io channel allocates N msix and N WQs (N=4 default) and either round robins or uses cpu # modulo N for scheduling. A bunch of module parameters allow the configuration to be tuned. NVME (initiator): Allocates an msix per cpu (or whatever pci_alloc_irq_vectors gets) Allocates a WQ per cpu, and maps the WQs to msix on a WQ # modulo msix vector count basis. Module parameters exist to cap/control the config if desired. - Each protocol has its own buffer and dma pools. I apologize for the size of the patch. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> ---- Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-12 21:52:30 +00:00
#define LPFC_PRLI_NVME_REQ 0x80000 /* This is an NVME PRLI. */
#define LPFC_PRLI_FCP_REQ 0x100000 /* This is an NVME PRLI. */
#define LPFC_IO_NVME 0x200000 /* NVME FCP command */
#define LPFC_IO_NVME_LS 0x400000 /* NVME LS command */
#define LPFC_IO_NVMET 0x800000 /* NVMET command */
#define LPFC_IO_VMID 0x1000000 /* VMID tagged IO */
#define LPFC_IO_CMF 0x4000000 /* CMF command */
uint32_t drvrTimeout; /* driver timeout in seconds */
struct lpfc_vport *vport;/* virtual port pointer */
struct lpfc_dmabuf *cmd_dmabuf;
struct lpfc_dmabuf *rsp_dmabuf;
struct lpfc_dmabuf *bpl_dmabuf;
uint32_t event_tag; /* LA Event tag */
union {
wait_queue_head_t *wait_queue;
struct lpfcMboxq *mbox;
struct lpfc_node_rrq *rrq;
struct nvmefc_ls_req *nvme_lsreq;
struct lpfc_async_xchg_ctx *axchg;
struct bsg_job_data *dd_data;
} context_un;
struct lpfc_io_buf *io_buf;
struct lpfc_iocbq *rsp_iocb;
struct lpfc_nodelist *ndlp;
2022-02-25 02:22:52 +00:00
union lpfc_vmid_tag vmid_tag;
void (*fabric_cmd_cmpl)(struct lpfc_hba *phba, struct lpfc_iocbq *cmd,
struct lpfc_iocbq *rsp);
void (*wait_cmd_cmpl)(struct lpfc_hba *phba, struct lpfc_iocbq *cmd,
struct lpfc_iocbq *rsp);
void (*cmd_cmpl)(struct lpfc_hba *phba, struct lpfc_iocbq *cmd,
struct lpfc_iocbq *rsp);
};
#define SLI_IOCB_RET_IOCB 1 /* Return IOCB if cmd ring full */
#define IOCB_SUCCESS 0
#define IOCB_BUSY 1
#define IOCB_ERROR 2
#define IOCB_TIMEDOUT 3
scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers This patch reworks the abort interfaces such that SLI-3 retains the iocb-based formatting and completions and SLI-4 now uses native WQEs and completion routines. The following changes are made: - The code is refactored from a confusing 2 routine sequence of xx_abort_iotag_issue(), which creates/formats and abort cmd, and xx_issue_abort_tag(), which then issues and handles the completion of the abort cmd - into a single interface of xx_issue_abort_iotag(). The new interface will determine whether SLI-3 or SLI-4 and then call the appropriate handler. A completion handler can now be specified to address the differences in completion handling. Note: original code is all iocb based, with SLI-4 converting to SLI-3 for the SCSI/ELS path, and NVMe natively using wqes. - The SLI-3 side is refactored: The older iocb-base lpfc_sli_issue_abort_iotag() routine is combined with the logic of lpfc_sli_abort_iotag_issue() as well as the iocb-specific code in lpfc_abort_handler() and lpfc_sli_abort_iocb() to create the new single SLI-3 abort routine that formats and issues the iocb. - The SLI-4 side is refactored and added to: The native WQE abort code in NVMe is moved to the new SLI-4 issue_abort_iotag() routine. Items in SCSI that set fields not set by NVMe is migrated into the new routine. Thus the routine supports NVMe and SCSI initiators. The nvmet block (target) formats the abort slightly different (like the old NVMe initiator) thus it has its own prep routine stolen from NVMe initiator and it retains the current code it has for issuing the WQE (does not use the commonized routine the initiators do). SLI-4 completion handlers were also added. - lpfc_abort_handler now becomes a wrapper that determines whether SLI-3 or SLI-4 and calls the proper abort handler. Link: https://lore.kernel.org/r/20201115192646.12977-16-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-15 19:26:44 +00:00
#define IOCB_ABORTED 4
#define IOCB_ABORTING 5
#define IOCB_NORESOURCE 6
scsi: lpfc: NVME Initiator: Base modifications NVME Initiator: Base modifications This patch adds base modifications for NVME initiator support. The base modifications consist of: - Formal split of SLI3 rings from SLI-4 WQs (sometimes referred to as rings as well) as implementation now widely varies between the two. - Addition of configuration modes: SCSI initiator only; NVME initiator only; NVME target only; and SCSI and NVME initiator. The configuration mode drives overall adapter configuration, offloads enabled, and resource splits. NVME support is only available on SLI-4 devices and newer fw. - Implements the following based on configuration mode: - Exchange resources are split by protocol; Obviously, if only 1 mode, then no split occurs. Default is 50/50. module attribute allows tuning. - Pools and config parameters are separated per-protocol - Each protocol has it's own set of queues, but share interrupt vectors. SCSI: SLI3 devices have few queues and the original style of queue allocation remains. SLI4 devices piggy back on an "io-channel" concept that eventually needs to merge with scsi-mq/blk-mq support (it is underway). For now, the paradigm continues as it existed prior. io channel allocates N msix and N WQs (N=4 default) and either round robins or uses cpu # modulo N for scheduling. A bunch of module parameters allow the configuration to be tuned. NVME (initiator): Allocates an msix per cpu (or whatever pci_alloc_irq_vectors gets) Allocates a WQ per cpu, and maps the WQs to msix on a WQ # modulo msix vector count basis. Module parameters exist to cap/control the config if desired. - Each protocol has its own buffer and dma pools. I apologize for the size of the patch. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> ---- Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-12 21:52:30 +00:00
#define SLI_WQE_RET_WQE 1 /* Return WQE if cmd ring full */
#define WQE_SUCCESS 0
#define WQE_BUSY 1
#define WQE_ERROR 2
#define WQE_TIMEDOUT 3
#define WQE_ABORTED 4
scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers This patch reworks the abort interfaces such that SLI-3 retains the iocb-based formatting and completions and SLI-4 now uses native WQEs and completion routines. The following changes are made: - The code is refactored from a confusing 2 routine sequence of xx_abort_iotag_issue(), which creates/formats and abort cmd, and xx_issue_abort_tag(), which then issues and handles the completion of the abort cmd - into a single interface of xx_issue_abort_iotag(). The new interface will determine whether SLI-3 or SLI-4 and then call the appropriate handler. A completion handler can now be specified to address the differences in completion handling. Note: original code is all iocb based, with SLI-4 converting to SLI-3 for the SCSI/ELS path, and NVMe natively using wqes. - The SLI-3 side is refactored: The older iocb-base lpfc_sli_issue_abort_iotag() routine is combined with the logic of lpfc_sli_abort_iotag_issue() as well as the iocb-specific code in lpfc_abort_handler() and lpfc_sli_abort_iocb() to create the new single SLI-3 abort routine that formats and issues the iocb. - The SLI-4 side is refactored and added to: The native WQE abort code in NVMe is moved to the new SLI-4 issue_abort_iotag() routine. Items in SCSI that set fields not set by NVMe is migrated into the new routine. Thus the routine supports NVMe and SCSI initiators. The nvmet block (target) formats the abort slightly different (like the old NVMe initiator) thus it has its own prep routine stolen from NVMe initiator and it retains the current code it has for issuing the WQE (does not use the commonized routine the initiators do). SLI-4 completion handlers were also added. - lpfc_abort_handler now becomes a wrapper that determines whether SLI-3 or SLI-4 and calls the proper abort handler. Link: https://lore.kernel.org/r/20201115192646.12977-16-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-15 19:26:44 +00:00
#define WQE_ABORTING 5
#define WQE_NORESOURCE 6
scsi: lpfc: NVME Initiator: Base modifications NVME Initiator: Base modifications This patch adds base modifications for NVME initiator support. The base modifications consist of: - Formal split of SLI3 rings from SLI-4 WQs (sometimes referred to as rings as well) as implementation now widely varies between the two. - Addition of configuration modes: SCSI initiator only; NVME initiator only; NVME target only; and SCSI and NVME initiator. The configuration mode drives overall adapter configuration, offloads enabled, and resource splits. NVME support is only available on SLI-4 devices and newer fw. - Implements the following based on configuration mode: - Exchange resources are split by protocol; Obviously, if only 1 mode, then no split occurs. Default is 50/50. module attribute allows tuning. - Pools and config parameters are separated per-protocol - Each protocol has it's own set of queues, but share interrupt vectors. SCSI: SLI3 devices have few queues and the original style of queue allocation remains. SLI4 devices piggy back on an "io-channel" concept that eventually needs to merge with scsi-mq/blk-mq support (it is underway). For now, the paradigm continues as it existed prior. io channel allocates N msix and N WQs (N=4 default) and either round robins or uses cpu # modulo N for scheduling. A bunch of module parameters allow the configuration to be tuned. NVME (initiator): Allocates an msix per cpu (or whatever pci_alloc_irq_vectors gets) Allocates a WQ per cpu, and maps the WQs to msix on a WQ # modulo msix vector count basis. Module parameters exist to cap/control the config if desired. - Each protocol has its own buffer and dma pools. I apologize for the size of the patch. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> ---- Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-12 21:52:30 +00:00
#define LPFC_MBX_WAKE 1
#define LPFC_MBX_IMED_UNREG 2
typedef struct lpfcMboxq {
/* MBOXQs are used in single linked lists */
struct list_head list; /* ptr to next mailbox command */
union {
MAILBOX_t mb; /* Mailbox cmd */
struct lpfc_mqe mqe;
} u;
struct lpfc_vport *vport; /* virtual port pointer */
void *ctx_ndlp; /* caller ndlp information */
void *ctx_buf; /* caller buffer information */
void *context3;
void (*mbox_cmpl) (struct lpfc_hba *, struct lpfcMboxq *);
uint8_t mbox_flag;
uint16_t in_ext_byte_len;
uint16_t out_ext_byte_len;
uint8_t mbox_offset_word;
struct lpfc_mcqe mcqe;
struct lpfc_mbx_nembed_sge_virt *sge_array;
} LPFC_MBOXQ_t;
#define MBX_POLL 1 /* poll mailbox till command done, then
return */
#define MBX_NOWAIT 2 /* issue command then return immediately */
#define LPFC_MAX_RING_MASK 5 /* max num of rctl/type masks allowed per
ring */
#define LPFC_SLI3_MAX_RING 4 /* Max num of SLI3 rings used by driver.
For SLI4, an additional ring for each
FCP WQ will be allocated. */
struct lpfc_sli_ring;
struct lpfc_sli_ring_mask {
uint8_t profile; /* profile associated with ring */
uint8_t rctl; /* rctl / type pair configured for ring */
uint8_t type; /* rctl / type pair configured for ring */
uint8_t rsvd;
/* rcv'd unsol event */
void (*lpfc_sli_rcv_unsol_event) (struct lpfc_hba *,
struct lpfc_sli_ring *,
struct lpfc_iocbq *);
};
/* Structure used to hold SLI statistical counters and info */
struct lpfc_sli_ring_stat {
uint64_t iocb_event; /* IOCB event counters */
uint64_t iocb_cmd; /* IOCB cmd issued */
uint64_t iocb_rsp; /* IOCB rsp received */
uint64_t iocb_cmd_delay; /* IOCB cmd ring delay */
uint64_t iocb_cmd_full; /* IOCB cmd ring full */
uint64_t iocb_cmd_empty; /* IOCB cmd ring is now empty */
uint64_t iocb_rsp_full; /* IOCB rsp ring full */
};
struct lpfc_sli3_ring {
uint32_t local_getidx; /* last available cmd index (from cmdGetInx) */
uint32_t next_cmdidx; /* next_cmd index */
uint32_t rspidx; /* current index in response ring */
uint32_t cmdidx; /* current index in command ring */
uint16_t numCiocb; /* number of command iocb's per ring */
uint16_t numRiocb; /* number of rsp iocb's per ring */
uint16_t sizeCiocb; /* Size of command iocb's in this ring */
uint16_t sizeRiocb; /* Size of response iocb's in this ring */
uint32_t *cmdringaddr; /* virtual address for cmd rings */
uint32_t *rspringaddr; /* virtual address for rsp rings */
};
struct lpfc_sli4_ring {
struct lpfc_queue *wqp; /* Pointer to associated WQ */
};
/* Structure used to hold SLI ring information */
struct lpfc_sli_ring {
uint16_t flag; /* ring flags */
#define LPFC_DEFERRED_RING_EVENT 0x001 /* Deferred processing a ring event */
#define LPFC_CALL_RING_AVAILABLE 0x002 /* indicates cmd was full */
#define LPFC_STOP_IOCB_EVENT 0x020 /* Stop processing IOCB cmds event */
uint16_t abtsiotag; /* tracks next iotag to use for ABTS */
uint8_t rsvd;
uint8_t ringno; /* ring number */
spinlock_t ring_lock; /* lock for issuing commands */
uint32_t fast_iotag; /* max fastlookup based iotag */
uint32_t iotag_ctr; /* keeps track of the next iotag to use */
uint32_t iotag_max; /* max iotag value to use */
struct list_head txq;
uint16_t txq_cnt; /* current length of queue */
uint16_t txq_max; /* max length */
struct list_head txcmplq;
uint16_t txcmplq_cnt; /* current length of queue */
uint16_t txcmplq_max; /* max length */
uint32_t missbufcnt; /* keep track of buffers to post */
struct list_head postbufq;
uint16_t postbufq_cnt; /* current length of queue */
uint16_t postbufq_max; /* max length */
struct list_head iocb_continueq;
uint16_t iocb_continueq_cnt; /* current length of queue */
uint16_t iocb_continueq_max; /* max length */
struct list_head iocb_continue_saveq;
struct lpfc_sli_ring_mask prt[LPFC_MAX_RING_MASK];
uint32_t num_mask; /* number of mask entries in prt array */
void (*lpfc_sli_rcv_async_status) (struct lpfc_hba *,
struct lpfc_sli_ring *, struct lpfc_iocbq *);
struct lpfc_sli_ring_stat stats; /* SLI statistical info */
/* cmd ring available */
void (*lpfc_sli_cmd_available) (struct lpfc_hba *,
struct lpfc_sli_ring *);
union {
struct lpfc_sli3_ring sli3;
struct lpfc_sli4_ring sli4;
} sli;
};
/* Structure used for configuring rings to a specific profile or rctl / type */
struct lpfc_hbq_init {
uint32_t rn; /* Receive buffer notification */
uint32_t entry_count; /* max # of entries in HBQ */
uint32_t headerLen; /* 0 if not profile 4 or 5 */
uint32_t logEntry; /* Set to 1 if this HBQ used for LogEntry */
uint32_t profile; /* Selection profile 0=all, 7=logentry */
uint32_t ring_mask; /* Binds HBQ to a ring e.g. Ring0=b0001,
* ring2=b0100 */
uint32_t hbq_index; /* index of this hbq in ring .HBQs[] */
uint32_t seqlenoff;
uint32_t maxlen;
uint32_t seqlenbcnt;
uint32_t cmdcodeoff;
uint32_t cmdmatch[8];
uint32_t mask_count; /* number of mask entries in prt array */
struct hbq_mask hbqMasks[6];
/* Non-config rings fields to keep track of buffer allocations */
uint32_t buffer_count; /* number of buffers allocated */
uint32_t init_count; /* number to allocate when initialized */
uint32_t add_count; /* number to allocate when starved */
} ;
/* Structure used to hold SLI statistical counters and info */
struct lpfc_sli_stat {
uint64_t mbox_stat_err; /* Mbox cmds completed status error */
uint64_t mbox_cmd; /* Mailbox commands issued */
uint64_t sli_intr; /* Count of Host Attention interrupts */
uint64_t sli_prev_intr; /* Previous cnt of Host Attention interrupts */
uint64_t sli_ips; /* Host Attention interrupts per sec */
uint32_t err_attn_event; /* Error Attn event counters */
uint32_t link_event; /* Link event counters */
uint32_t mbox_event; /* Mailbox event counters */
uint32_t mbox_busy; /* Mailbox cmd busy */
};
/* Structure to store link status values when port stats are reset */
struct lpfc_lnk_stat {
uint32_t link_failure_count;
uint32_t loss_of_sync_count;
uint32_t loss_of_signal_count;
uint32_t prim_seq_protocol_err_count;
uint32_t invalid_tx_word_count;
uint32_t invalid_crc_count;
uint32_t error_frames;
uint32_t link_events;
};
/* Structure used to hold SLI information */
struct lpfc_sli {
uint32_t num_rings;
uint32_t sli_flag;
/* Additional sli_flags */
#define LPFC_SLI_MBOX_ACTIVE 0x100 /* HBA mailbox is currently active */
#define LPFC_SLI_ACTIVE 0x200 /* SLI in firmware is active */
#define LPFC_PROCESS_LA 0x400 /* Able to process link attention */
#define LPFC_BLOCK_MGMT_IO 0x800 /* Don't allow mgmt mbx or iocb cmds */
#define LPFC_SLI_ASYNC_MBX_BLK 0x2000 /* Async mailbox is blocked */
#define LPFC_SLI_SUPPRESS_RSP 0x4000 /* Suppress RSP feature is supported */
#define LPFC_SLI_USE_EQDR 0x8000 /* EQ Delay Register is supported */
#define LPFC_QUEUE_FREE_INIT 0x10000 /* Queue freeing is in progress */
#define LPFC_QUEUE_FREE_WAIT 0x20000 /* Hold Queue free as it is being
* used outside worker thread
*/
scsi: lpfc: NVME Initiator: Base modifications NVME Initiator: Base modifications This patch adds base modifications for NVME initiator support. The base modifications consist of: - Formal split of SLI3 rings from SLI-4 WQs (sometimes referred to as rings as well) as implementation now widely varies between the two. - Addition of configuration modes: SCSI initiator only; NVME initiator only; NVME target only; and SCSI and NVME initiator. The configuration mode drives overall adapter configuration, offloads enabled, and resource splits. NVME support is only available on SLI-4 devices and newer fw. - Implements the following based on configuration mode: - Exchange resources are split by protocol; Obviously, if only 1 mode, then no split occurs. Default is 50/50. module attribute allows tuning. - Pools and config parameters are separated per-protocol - Each protocol has it's own set of queues, but share interrupt vectors. SCSI: SLI3 devices have few queues and the original style of queue allocation remains. SLI4 devices piggy back on an "io-channel" concept that eventually needs to merge with scsi-mq/blk-mq support (it is underway). For now, the paradigm continues as it existed prior. io channel allocates N msix and N WQs (N=4 default) and either round robins or uses cpu # modulo N for scheduling. A bunch of module parameters allow the configuration to be tuned. NVME (initiator): Allocates an msix per cpu (or whatever pci_alloc_irq_vectors gets) Allocates a WQ per cpu, and maps the WQs to msix on a WQ # modulo msix vector count basis. Module parameters exist to cap/control the config if desired. - Each protocol has its own buffer and dma pools. I apologize for the size of the patch. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> ---- Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-12 21:52:30 +00:00
struct lpfc_sli_ring *sli3_ring;
struct lpfc_sli_stat slistat; /* SLI statistical info */
struct list_head mboxq;
uint16_t mboxq_cnt; /* current length of queue */
uint16_t mboxq_max; /* max length */
LPFC_MBOXQ_t *mbox_active; /* active mboxq information */
struct list_head mboxq_cmpl;
struct timer_list mbox_tmo; /* Hold clk to timeout active mbox
cmd */
#define LPFC_IOCBQ_LOOKUP_INCREMENT 1024
struct lpfc_iocbq ** iocbq_lookup; /* array to lookup IOCB by IOTAG */
size_t iocbq_lookup_len; /* current lengs of the array */
uint16_t last_iotag; /* last allocated IOTAG */
time64_t stats_start; /* in seconds */
struct lpfc_lnk_stat lnk_stat_offsets;
};
/* Timeout for normal outstanding mbox command (Seconds) */
#define LPFC_MBOX_TMO 30
/* Timeout for non-flash-based outstanding sli_config mbox command (Seconds) */
#define LPFC_MBOX_SLI4_CONFIG_TMO 60
/* Timeout for flash-based outstanding sli_config mbox command (Seconds) */
#define LPFC_MBOX_SLI4_CONFIG_EXTENDED_TMO 300
/* Timeout for other flash-based outstanding mbox command (Seconds) */
#define LPFC_MBOX_TMO_FLASH_CMD 300
scsi: lpfc: Adapt partitioned XRI lists to efficient sharing The XRI get/put lists were partitioned per hardware queue. However, the adapter rarely had sufficient resources to give a large number of resources per queue. As such, it became common for a cpu to encounter a lack of XRI resource and request the upper io stack to retry after returning a BUSY condition. This occurred even though other cpus were idle and not using their resources. Create as efficient a scheme as possible to move resources to the cpus that need them. Each cpu maintains a small private pool which it allocates from for io. There is a watermark that the cpu attempts to keep in the private pool. The private pool, when empty, pulls from a global pool from the cpu. When the cpu's global pool is empty it will pull from other cpu's global pool. As there many cpu global pools (1 per cpu or hardware queue count) and as each cpu selects what cpu to pull from at different rates and at different times, it creates a radomizing effect that minimizes the number of cpu's that will contend with each other when the steal XRI's from another cpu's global pool. On io completion, a cpu will push the XRI back on to its private pool. A watermark level is maintained for the private pool such that when it is exceeded it will move XRI's to the CPU global pool so that other cpu's may allocate them. On NVME, as heartbeat commands are critical to get placed on the wire, a single expedite pool is maintained. When a heartbeat is to be sent, it will allocate an XRI from the expedite pool rather than the normal cpu private/global pools. On any io completion, if a reduction in the expedite pools is seen, it will be replenished before the XRI is placed on the cpu private pool. Statistics are added to aid understanding the XRI levels on each cpu and their behaviors. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-28 19:14:28 +00:00
struct lpfc_io_buf {
/* Common fields */
struct list_head list;
void *data;
scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware. Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI to contain the exchanges Scatter/Gather List. This caps the number of SGL elements that can be in the SGL. There are not extensions to extend the list out of the 2 pages. The G7 hardware adds a SGE type that allows the SGL to be vectored to a different scatter/gather list segment. And that segment can contain a SGE to go to another segment and so on. The initial segment must still be pre-registered for the XRI, but it can be a much smaller amount (256Bytes) as it can now be dynamically grown. This much smaller allocation can handle the SG list for most normal I/O, and the dynamic aspect allows it to support many MB's if needed. The implementation creates a pool which contains "segments" and which is initially sized to hold the initial small segment per xri. If an I/O requires additional segments, they are allocated from the pool. If the pool has no more segments, the pool is grown based on what is now needed. After the I/O completes, the additional segments are returned to the pool for use by other I/Os. Once allocated, the additional segments are not released under the assumption of "if needed once, it will be needed again". Pools are kept on a per-hardware queue basis, which is typically 1:1 per cpu, but may be shared by multiple cpus. The switch to the smaller initial allocation significantly reduces the memory footprint of the driver (which only grows if large ios are issued). Based on the several K of XRIs for the adapter, the 8KB->256B reduction can conserve 32MBs or more. It has been observed with per-cpu resource pools that allocating a resource on CPU A, may be put back on CPU B. While the get routines are distributed evenly, only a limited subset of CPUs may be handling the put routines. This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because all the resources are being put on a limited subset of CPUs. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-14 23:57:09 +00:00
scsi: lpfc: Adapt partitioned XRI lists to efficient sharing The XRI get/put lists were partitioned per hardware queue. However, the adapter rarely had sufficient resources to give a large number of resources per queue. As such, it became common for a cpu to encounter a lack of XRI resource and request the upper io stack to retry after returning a BUSY condition. This occurred even though other cpus were idle and not using their resources. Create as efficient a scheme as possible to move resources to the cpus that need them. Each cpu maintains a small private pool which it allocates from for io. There is a watermark that the cpu attempts to keep in the private pool. The private pool, when empty, pulls from a global pool from the cpu. When the cpu's global pool is empty it will pull from other cpu's global pool. As there many cpu global pools (1 per cpu or hardware queue count) and as each cpu selects what cpu to pull from at different rates and at different times, it creates a radomizing effect that minimizes the number of cpu's that will contend with each other when the steal XRI's from another cpu's global pool. On io completion, a cpu will push the XRI back on to its private pool. A watermark level is maintained for the private pool such that when it is exceeded it will move XRI's to the CPU global pool so that other cpu's may allocate them. On NVME, as heartbeat commands are critical to get placed on the wire, a single expedite pool is maintained. When a heartbeat is to be sent, it will allocate an XRI from the expedite pool rather than the normal cpu private/global pools. On any io completion, if a reduction in the expedite pools is seen, it will be replenished before the XRI is placed on the cpu private pool. Statistics are added to aid understanding the XRI levels on each cpu and their behaviors. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-28 19:14:28 +00:00
dma_addr_t dma_handle;
dma_addr_t dma_phys_sgl;
scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware. Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI to contain the exchanges Scatter/Gather List. This caps the number of SGL elements that can be in the SGL. There are not extensions to extend the list out of the 2 pages. The G7 hardware adds a SGE type that allows the SGL to be vectored to a different scatter/gather list segment. And that segment can contain a SGE to go to another segment and so on. The initial segment must still be pre-registered for the XRI, but it can be a much smaller amount (256Bytes) as it can now be dynamically grown. This much smaller allocation can handle the SG list for most normal I/O, and the dynamic aspect allows it to support many MB's if needed. The implementation creates a pool which contains "segments" and which is initially sized to hold the initial small segment per xri. If an I/O requires additional segments, they are allocated from the pool. If the pool has no more segments, the pool is grown based on what is now needed. After the I/O completes, the additional segments are returned to the pool for use by other I/Os. Once allocated, the additional segments are not released under the assumption of "if needed once, it will be needed again". Pools are kept on a per-hardware queue basis, which is typically 1:1 per cpu, but may be shared by multiple cpus. The switch to the smaller initial allocation significantly reduces the memory footprint of the driver (which only grows if large ios are issued). Based on the several K of XRIs for the adapter, the 8KB->256B reduction can conserve 32MBs or more. It has been observed with per-cpu resource pools that allocating a resource on CPU A, may be put back on CPU B. While the get routines are distributed evenly, only a limited subset of CPUs may be handling the put routines. This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because all the resources are being put on a limited subset of CPUs. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-14 23:57:09 +00:00
struct sli4_sge *dma_sgl; /* initial segment chunk */
/* linked list of extra sli4_hybrid_sge */
struct list_head dma_sgl_xtra_list;
/* list head for fcp_cmd_rsp buf */
struct list_head dma_cmd_rsp_list;
scsi: lpfc: Adapt partitioned XRI lists to efficient sharing The XRI get/put lists were partitioned per hardware queue. However, the adapter rarely had sufficient resources to give a large number of resources per queue. As such, it became common for a cpu to encounter a lack of XRI resource and request the upper io stack to retry after returning a BUSY condition. This occurred even though other cpus were idle and not using their resources. Create as efficient a scheme as possible to move resources to the cpus that need them. Each cpu maintains a small private pool which it allocates from for io. There is a watermark that the cpu attempts to keep in the private pool. The private pool, when empty, pulls from a global pool from the cpu. When the cpu's global pool is empty it will pull from other cpu's global pool. As there many cpu global pools (1 per cpu or hardware queue count) and as each cpu selects what cpu to pull from at different rates and at different times, it creates a radomizing effect that minimizes the number of cpu's that will contend with each other when the steal XRI's from another cpu's global pool. On io completion, a cpu will push the XRI back on to its private pool. A watermark level is maintained for the private pool such that when it is exceeded it will move XRI's to the CPU global pool so that other cpu's may allocate them. On NVME, as heartbeat commands are critical to get placed on the wire, a single expedite pool is maintained. When a heartbeat is to be sent, it will allocate an XRI from the expedite pool rather than the normal cpu private/global pools. On any io completion, if a reduction in the expedite pools is seen, it will be replenished before the XRI is placed on the cpu private pool. Statistics are added to aid understanding the XRI levels on each cpu and their behaviors. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-28 19:14:28 +00:00
struct lpfc_iocbq cur_iocbq;
struct lpfc_sli4_hdw_queue *hdwq;
uint16_t hdwq_no;
uint16_t cpu;
struct lpfc_nodelist *ndlp;
uint32_t timeout;
scsi: lpfc: Fix bad ndlp ptr in xri aborted handling In cases where I/O may be aborted, such as driver unload or link bounces, the system will crash based on a bad ndlp pointer. Example: RIP: 0010:lpfc_sli4_abts_err_handler+0x15/0x140 [lpfc] ... lpfc_sli4_io_xri_aborted+0x20d/0x270 [lpfc] lpfc_sli4_sp_handle_abort_xri_wcqe.isra.54+0x84/0x170 [lpfc] lpfc_sli4_fp_handle_cqe+0xc2/0x480 [lpfc] __lpfc_sli4_process_cq+0xc6/0x230 [lpfc] __lpfc_sli4_hba_process_cq+0x29/0xc0 [lpfc] process_one_work+0x14c/0x390 Crash was caused by a bad ndlp address passed to I/O indicated by the XRI aborted CQE. The address was not NULL so the routine deferenced the ndlp ptr. The bad ndlp also caused the lpfc_sli4_io_xri_aborted to call an erroneous io handler. Root cause for the bad ndlp was an lpfc_ncmd that was aborted, put on the abort_io list, completed, taken off the abort_io list, sent to lpfc_release_nvme_buf where it was put back on the abort_io list because the lpfc_ncmd->flags setting LPFC_SBUF_XBUSY was not cleared on the final completion. Rework the exchange busy handling to ensure the flags are properly set for both scsi and nvme. Fixes: c490850a0947 ("scsi: lpfc: Adapt partitioned XRI lists to efficient sharing") Cc: <stable@vger.kernel.org> # v5.1+ Link: https://lore.kernel.org/r/20191018211832.7917-6-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-18 21:18:21 +00:00
uint16_t flags;
scsi: lpfc: Adapt partitioned XRI lists to efficient sharing The XRI get/put lists were partitioned per hardware queue. However, the adapter rarely had sufficient resources to give a large number of resources per queue. As such, it became common for a cpu to encounter a lack of XRI resource and request the upper io stack to retry after returning a BUSY condition. This occurred even though other cpus were idle and not using their resources. Create as efficient a scheme as possible to move resources to the cpus that need them. Each cpu maintains a small private pool which it allocates from for io. There is a watermark that the cpu attempts to keep in the private pool. The private pool, when empty, pulls from a global pool from the cpu. When the cpu's global pool is empty it will pull from other cpu's global pool. As there many cpu global pools (1 per cpu or hardware queue count) and as each cpu selects what cpu to pull from at different rates and at different times, it creates a radomizing effect that minimizes the number of cpu's that will contend with each other when the steal XRI's from another cpu's global pool. On io completion, a cpu will push the XRI back on to its private pool. A watermark level is maintained for the private pool such that when it is exceeded it will move XRI's to the CPU global pool so that other cpu's may allocate them. On NVME, as heartbeat commands are critical to get placed on the wire, a single expedite pool is maintained. When a heartbeat is to be sent, it will allocate an XRI from the expedite pool rather than the normal cpu private/global pools. On any io completion, if a reduction in the expedite pools is seen, it will be replenished before the XRI is placed on the cpu private pool. Statistics are added to aid understanding the XRI levels on each cpu and their behaviors. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-28 19:14:28 +00:00
#define LPFC_SBUF_XBUSY 0x1 /* SLI4 hba reported XB on WCQE cmpl */
#define LPFC_SBUF_BUMP_QDEPTH 0x2 /* bumped queue depth counter */
/* External DIF device IO conversions */
#define LPFC_SBUF_NORMAL_DIF 0x4 /* normal mode to insert/strip */
#define LPFC_SBUF_PASS_DIF 0x8 /* insert/strip mode to passthru */
#define LPFC_SBUF_NOT_POSTED 0x10 /* SGL failed post to FW. */
uint16_t status; /* From IOCB Word 7- ulpStatus */
uint32_t result; /* From IOCB Word 4. */
uint32_t seg_cnt; /* Number of scatter-gather segments returned by
* dma_map_sg. The driver needs this for calls
* to dma_unmap_sg.
*/
unsigned long start_time;
spinlock_t buf_lock; /* lock used in case of simultaneous abort */
scsi: lpfc: Adapt partitioned XRI lists to efficient sharing The XRI get/put lists were partitioned per hardware queue. However, the adapter rarely had sufficient resources to give a large number of resources per queue. As such, it became common for a cpu to encounter a lack of XRI resource and request the upper io stack to retry after returning a BUSY condition. This occurred even though other cpus were idle and not using their resources. Create as efficient a scheme as possible to move resources to the cpus that need them. Each cpu maintains a small private pool which it allocates from for io. There is a watermark that the cpu attempts to keep in the private pool. The private pool, when empty, pulls from a global pool from the cpu. When the cpu's global pool is empty it will pull from other cpu's global pool. As there many cpu global pools (1 per cpu or hardware queue count) and as each cpu selects what cpu to pull from at different rates and at different times, it creates a radomizing effect that minimizes the number of cpu's that will contend with each other when the steal XRI's from another cpu's global pool. On io completion, a cpu will push the XRI back on to its private pool. A watermark level is maintained for the private pool such that when it is exceeded it will move XRI's to the CPU global pool so that other cpu's may allocate them. On NVME, as heartbeat commands are critical to get placed on the wire, a single expedite pool is maintained. When a heartbeat is to be sent, it will allocate an XRI from the expedite pool rather than the normal cpu private/global pools. On any io completion, if a reduction in the expedite pools is seen, it will be replenished before the XRI is placed on the cpu private pool. Statistics are added to aid understanding the XRI levels on each cpu and their behaviors. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-28 19:14:28 +00:00
bool expedite; /* this is an expedite io_buf */
union {
/* SCSI specific fields */
struct {
struct scsi_cmnd *pCmd;
struct lpfc_rport_data *rdata;
uint32_t prot_seg_cnt; /* seg_cnt's counterpart for
* protection data
*/
/*
* data and dma_handle are the kernel virtual and bus
* address of the dma-able buffer containing the
* fcp_cmd, fcp_rsp and a scatter gather bde list that
* supports the sg_tablesize value.
*/
struct fcp_cmnd *fcp_cmnd;
struct fcp_rsp *fcp_rsp;
wait_queue_head_t *waitq;
#ifdef CONFIG_SCSI_LPFC_DEBUG_FS
/* Used to restore any changes to protection data for
* error injection
*/
void *prot_data_segment;
uint32_t prot_data;
uint32_t prot_data_type;
#define LPFC_INJERR_REFTAG 1
#define LPFC_INJERR_APPTAG 2
#define LPFC_INJERR_GUARD 3
#endif
};
/* NVME specific fields */
struct {
struct nvmefc_fcp_req *nvmeCmd;
uint16_t qidx;
};
};
#ifdef CONFIG_SCSI_LPFC_DEBUG_FS
uint64_t ts_cmd_start;
uint64_t ts_last_cmd;
uint64_t ts_cmd_wqput;
uint64_t ts_isr_cmpl;
uint64_t ts_data_io;
#endif
uint64_t rx_cmd_start;
scsi: lpfc: Adapt partitioned XRI lists to efficient sharing The XRI get/put lists were partitioned per hardware queue. However, the adapter rarely had sufficient resources to give a large number of resources per queue. As such, it became common for a cpu to encounter a lack of XRI resource and request the upper io stack to retry after returning a BUSY condition. This occurred even though other cpus were idle and not using their resources. Create as efficient a scheme as possible to move resources to the cpus that need them. Each cpu maintains a small private pool which it allocates from for io. There is a watermark that the cpu attempts to keep in the private pool. The private pool, when empty, pulls from a global pool from the cpu. When the cpu's global pool is empty it will pull from other cpu's global pool. As there many cpu global pools (1 per cpu or hardware queue count) and as each cpu selects what cpu to pull from at different rates and at different times, it creates a radomizing effect that minimizes the number of cpu's that will contend with each other when the steal XRI's from another cpu's global pool. On io completion, a cpu will push the XRI back on to its private pool. A watermark level is maintained for the private pool such that when it is exceeded it will move XRI's to the CPU global pool so that other cpu's may allocate them. On NVME, as heartbeat commands are critical to get placed on the wire, a single expedite pool is maintained. When a heartbeat is to be sent, it will allocate an XRI from the expedite pool rather than the normal cpu private/global pools. On any io completion, if a reduction in the expedite pools is seen, it will be replenished before the XRI is placed on the cpu private pool. Statistics are added to aid understanding the XRI levels on each cpu and their behaviors. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-28 19:14:28 +00:00
};