Commit graph

1881 commits

Author SHA1 Message Date
Linus Torvalds
bdb39c9509 SCSI misc on 20210219
This series consists of the usual driver updates (ufs, ibmvfc,
 qla2xxx, hisi_sas, pm80xx) plus the removal of the gdth driver (which
 is bound to cause conflicts with a trivial change somewhere).  The
 only big major rework of note is the one from Hannes trying to clean
 up our result handling code in the drivers to make it consistent.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYDAdliYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishTblAQCk6wD8
 fcb4TItSRp0DpRzs37zhppEbrBgveuAFHhr5swEA0gL2mHcq0vnyNBinCLnERrE7
 TPYJqUKJNktnjVG7ZWc=
 =wW6p
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (ufs, ibmvfc,
  qla2xxx, hisi_sas, pm80xx) plus the removal of the gdth driver (which
  is bound to cause conflicts with a trivial change somewhere).

  The only big major rework of note is the one from Hannes trying to
  clean up our result handling code in the drivers to make it
  consistent"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (194 commits)
  scsi: MAINTAINERS: Adjust to reflect gdth scsi driver removal
  scsi: ufs: Give clk scaling min gear a value
  scsi: lpfc: Fix 'physical' typos
  scsi: megaraid_mbox: Fix spelling of 'allocated'
  scsi: qla2xxx: Simplify the calculation of variables
  scsi: message: fusion: Fix 'physical' typos
  scsi: target: core: Change ASCQ for residual write
  scsi: target: core: Signal WRITE residuals
  scsi: target: core: Set residuals for 4Kn devices
  scsi: hisi_sas: Add trace FIFO debugfs support
  scsi: hisi_sas: Flush workqueue in hisi_sas_v3_remove()
  scsi: hisi_sas: Enable debugfs support by default
  scsi: hisi_sas: Don't check .nr_hw_queues in hisi_sas_task_prep()
  scsi: hisi_sas: Remove deferred probe check in hisi_sas_v2_probe()
  scsi: lpfc: Add auto select on IRQ_POLL
  scsi: ncr53c8xx: Fix typos
  scsi: lpfc: Fix ancient double free
  scsi: qla2xxx: Fix some memory corruption
  scsi: qla2xxx: Remove redundant NULL check
  scsi: megaraid: Fix ifnullfree.cocci warnings
  ...
2021-02-22 10:24:58 -08:00
James Smart
8c65830ae1 scsi: lpfc: Fix EEH encountering oops with NVMe traffic
In testing, in a configuration with Redfish and native NVMe multipath when
an EEH is injected, a kernel oops is being encountered:

(unreliable)
lpfc_nvme_ls_req+0x328/0x720 [lpfc]
__nvme_fc_send_ls_req.constprop.13+0x1d8/0x3d0 [nvme_fc]
nvme_fc_create_association+0x224/0xd10 [nvme_fc]
nvme_fc_reset_ctrl_work+0x110/0x154 [nvme_fc]
process_one_work+0x304/0x5d

the NBMe transport is issuing a Disconnect LS request, which the driver
receives and tries to post but the work queue used by the driver is already
being torn down by the eeh.

Fix by validating the validity of the work queue before proceeding with the
LS transmit.

Link: https://lore.kernel.org/r/20210127221601.84878-1-jsmart2021@gmail.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-29 13:41:39 -05:00
Bjorn Helgaas
2468d20a48 scsi: lpfc: Fix 'physical' typos
Fix misspellings of "physical".

Link: https://lore.kernel.org/r/20210126211248.2920028-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-29 13:38:33 -05:00
Dan Carpenter
0be310979e scsi: lpfc: Fix ancient double free
The "pmb" pointer is freed at the start of the function and then freed
again in the error handling code.

Link: https://lore.kernel.org/r/YA6E8rO51hE56SVw@mwanda
Fixes: 92d7f7b0cd ("[SCSI] lpfc: NPIV: add NPIV support on top of SLI-3")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-26 22:08:57 -05:00
Eric Curtin
0196e37909 scsi: lpfc: Fix kerneldoc inconsistency in lpfc_sli4_dump_page_a0()
Comment did not match function signature.

Link: https://lore.kernel.org/r/20210119205022.40000-1-ericcurtin17@gmail.com
Signed-off-by: Eric Curtin <ericcurtin17@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-22 22:01:57 -05:00
Muneendra Kumar
7f3a79a7fd scsi: lpfc: Add support for eh_should_retry_cmd()
Add support for eh_should_retry_cmd callback in lpfc_template.

Link: https://lore.kernel.org/r/1609969748-17684-6-git-send-email-muneendra.kumar@broadcom.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-14 22:55:18 -05:00
YANG LI
af0c94afc0 scsi: lpfc: Simplify bool comparison
Fix the following coccicheck warning:

./drivers/scsi/lpfc/lpfc_bsg.c:5392:5-29: WARNING: Comparison to bool

Link: https://lore.kernel.org/r/1610439893-64872-1-git-send-email-abaci-bugfix@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: YANG LI <abaci-bugfix@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:28:22 -05:00
James Smart
181dd9a4c2 scsi: lpfc: Update lpfc version to 12.8.0.7
Update lpfc version to 12.8.0.7

Link: https://lore.kernel.org/r/20210104180240.46824-16-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:37 -05:00
James Smart
0b3ad32e26 scsi: lpfc: Enhancements to LOG_TRACE_EVENT for better readability
While testing recent discovery node rework, several items were seen that
could be done better with respect to the new trace event logic.

1) in the following msg:
      kernel: lpfc 0000:44:00.0: start 35 end 35 cnt 0
   If cnt is zero in the 1st message, there is no reason to display the
   1st message, which is just giving start/end positioning.

   Fix by not displaying message if cnt is 0.

2) If the driver is loaded with module log verbosity off, and later a
   single NPIV host instance verbosity is enabled via sysfs, it enables
   messages on all instances. This is due to the trace log verbosity checks
   (lpfc_dmp_dbg) looking at the phba only. It should look at the phba and
   the vport.

   Fix by enabling a check on both phba and vport.

3) in the following messages:
       2904 Firmware Dump Image Present on Adapter
       2887 Reset Needed: Attempting Port Recovery...
   These messages are not necessary for the trace event log, which is
   primarily for discovery.

   Fix by changing log level on these 2 messages to LOG_SLI.

Link: https://lore.kernel.org/r/20210104180240.46824-15-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:37 -05:00
James Smart
a22d73b655 scsi: lpfc: Implement health checking when aborting I/O
Several errors have occurred where the adapter stops or fails but does not
raise the register values for the driver to detect failure. Thus driver is
unaware of the failure. The failure typically results in I/O timeouts, the
I/O timeout handler failing (after several seconds), and the error handler
escalating recovery policy and resulting in more errors. Eventually, the
driver is in a position where things have spiraled and it can't do recovery
because other recovery ops are still outstanding and it becomes unusable.

Resolve the situation by having the I/O timeout handler (actually a els,
SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O,
perform a mailbox command and look for a response from the hardware.  If
the mailbox command fails, it will mark the adapter offline and then invoke
the adapter reset handler to clean up.

The new I/O timeout test will be limited to a test every 5s. If there are
multiple I/O timeouts concurrently, only the 1st I/O timeout will generate
the mailbox command. Further testing will only occur once a timeout occurs
after a 5s delay from the last mailbox command has expired.

Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:37 -05:00
James Smart
243156c010 scsi: lpfc: Fix crash when nvmet transport calls host_release
When lpfc is running in NVMET mode and supports the NVME-1 addendum
changes, a LIP on a bound NVME Initiator or lipping the lpfc NVMET's link
resulted in an Oops in lpfc_nvmet_host_release.

The fix requires lpfc NVMET to maintain an additional reference on any node
structure that acts as the hosthandle for the NVMET transport.  This
reference get is a one-time addition, is taken prior to the upcall of an
unsolicited LS_REQ, and is released when the NVMET transport releases the
hosthandle during the host_release downcall.

Link: https://lore.kernel.org/r/20210104180240.46824-13-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:37 -05:00
James Smart
ff8a44bff5 scsi: lpfc: Fix vport create logging
When with testing with large numbers of npiv vports and link bounces, the
driver is flooding the messages file, even with log_verbose = 0.

The new LOG_TRACE_EVENT messages are still generating events to the
messages files.

Fix by converting the vport create msg from LOG_TRACE_EVENT to LOG_VPORT.

Link: https://lore.kernel.org/r/20210104180240.46824-12-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart
9ec58ec7d4 scsi: lpfc: Fix NVMe recovery after mailbox timeout
If a mailbox command times out, the SLI port is deemed in error and the
port is reset.  The HBA cleanup is not returning I/Os to the NVMe layer
before the port is unregistered. This is due to the HBA being marked
offline (!SLI_ACTIVE) and cleanup being done by the mailbox timeout handler
rather than an general adapter reset routine.  The mailbox timeout handler
mailbox handler only cleaned up SCSI I/Os.

Fix by reworking the mailbox handler to:

 - After handling the mailbox error, detect the board is already in
   failure (may be due to another error), and leave cleanup to the
   other handler.

 - If the mailbox command timeout is initial detector of the port error,
   continue with the board cleanup and marking the adapter offline
   (!SLI_ACTIVE). Remove the SCSI-only I/O cleanup routine. The generic
   reset adapter routine that is subsequently invoked, will clean up the
   I/Os.

 - Have the reset adapter routine flush all NVMe and SCSI I/Os if the
   adapter has been marked failed (!SLI_ACTIVE).

 - Rework the NVMe I/O terminate routine to take a status code to fail the
   I/O with and update so that cleaned up I/O calls the wqe completion
   routine. Currently it is bypassing the wqe cleanup and calling the NVMe
   I/O completion directly. The wqe completion routine will take care of
   data structure and node cleanup then call the NVMe I/O completion
   handler.

Link: https://lore.kernel.org/r/20210104180240.46824-11-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart
31051249f1 scsi: lpfc: Fix target reset failing
Target reset is failed by the target as an invalid command.

The Target Reset TMF has been obsoleted in T10 for a while, but continues
to be used. On (newer) devices, the TMF is rejected causing the reset
handler to escalate to adapter resets.

Fix by having Target Reset TMF rejections be translated into a LOGO and
re-PLOGI with the target device. This provides the same semantic action
(although, if the device also supports nvme traffic, it will terminate nvme
traffic as well - but it's still recoverable).

Link: https://lore.kernel.org/r/20210104180240.46824-10-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart
da09ae4864 scsi: lpfc: Fix error log messages being logged following SCSI task mgnt
A successful task mgmt command is logging errors, making it look like
problems were encountered.  This is due to log messages for the
device/target and bus reset handlers having the LOG_TRACE_EVENT flag set.

Fix by adjusting the event flag such that the call to the logging routine
only receives a LOG_TRACE_EVENT if a prior call actually failed.

Link: https://lore.kernel.org/r/20210104180240.46824-9-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart
f0871ab68a scsi: lpfc: Prevent duplicate requests to unregister with cpuhp framework
In the lpfc offline routine, called for various reasons such as sysfs
attribute, driver unload, or port error, the driver is calling
__lpfc_cpuhp_remove() to destroy the hot plug data. If the offline routine
is called while the driver is in the process of being unloaded, a request
using lpfc_cpuhp_remove() is also made from lpfc_sli4_hba_unset(). The
cpuhp elements are no longer valid when the second removal request is made.

Fix by only calling the cpuhp removal once when the adapter is in the
process of unloading.

Link: https://lore.kernel.org/r/20210104180240.46824-8-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart
3ba6216aad scsi: lpfc: Fix FW reset action if I/Os are outstanding
If the port is configured for NVME and has any outstanding IOs when a FW
reset is requesteed, outstanding I/Os are not properly cleaned up. This
causes the fw download request to fail.

Fix by clearing the LPFC_SLI_ACTIVE flag to signify the I/O must be
manually flushed by the driver on port reset.

Link: https://lore.kernel.org/r/20210104180240.46824-7-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart
c33b160934 scsi: lpfc: Use the nvme-fc transport supplied timeout for LS requests
When lpfc generates a GEN_REQUEST wqe for the nvme LS (such as Create
Association), the timeout is set to R_A_TOV without regard to the timeout
value supplied by the nvme-fc transport. The driver should be setting the
timeout to the value passed into the routine. Additionally the caller
should be setting the timeout value to the value in the ls request set by
the nvme transport. Instead, it unconditionally is setting it to a driver
defined value.  So the driver actually overrode the value twice.

Fix by using the timeout provided to the routine, and for the caller, set
the timeout to the ls request timeout value.

Link: https://lore.kernel.org/r/20210104180240.46824-6-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:35 -05:00
James Smart
07aaefdf75 scsi: lpfc: Fix crash when a fabric node is released prematurely
The driver's management of the fabric controller (aka pseudo-scsi
initiator) node in SLI3 mode is causing this crash. The crash occurs
because of a node reference imbalance that frees the fabric controller node
while devloss is outstanding from the SCSI transport.  This is triggered by
an odd behavior where the switch reacts to a rejected RDP request with a
PLOGI and nothing else, not even a LOGO.  The driver ACKS the PLOGI and
after successfully registering the RPI, incorrectly registers the fabric
controller node because it has the NLP_FC4_FCP flag still set from the
fabric controller PRLI.  If a LIP is issued, the driver attempts to cleanup
on Link Up and ends up executing too many puts.

Fix by detecting the fabric node type and clearing out the nodes internal
flags that triggered a SCSI transport registration and subsequence dev_loss
event.  The driver cannot count on any persistence from fabric controller
nodes.

Link: https://lore.kernel.org/r/20210104180240.46824-5-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:35 -05:00
James Smart
ecf041fe98 scsi: lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue state
Testing with target ports coming and going, the driver eventually reached a
state where it no longer discovered the target. When the driver has issued
a PRLI and receives a PRLI from the target, it is not properly updating the
node's initiator/target role flags. Thus, when a subsequent RSCN is
received for a target loss, the driver mis-identifies the target as an
initiator and does not initiate LUN scanning.

Fix by always refreshing the ndlp with the latest PRLI state information
whenever a PRLI is processed.  Also clear the ndlp flags when processing a
PLOGI so that there is no carry over through a re-login.

Link: https://lore.kernel.org/r/20210104180240.46824-4-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:35 -05:00
James Smart
d2f2547efd scsi: lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3
A very long time ago, there was a feature: auto sli mode. It gave the user
the ability to auto select the SLI mode (SLI2 or SLI3) to run the port in,
or even force SLI2 mode if configured.  Because of the convoluted logic,
the CONFIG_PORT mbox command ends up being called 2 or 3 times. It should
have been called only once.  Additionally, the driver no longer supports
SLI-2, so only SLI-3 mode should be allowed.

The following changes were made:

 - Force module parameter to SLI3 only.

 - Rip out redundant CONFIG_PORT mbox commands.

 - Force CONFIG_PORT mbox command to be in beginning of enable ISR routine.

 - Added changes for offline to online behavior

Link: https://lore.kernel.org/r/20210104180240.46824-3-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:35 -05:00
James Smart
8e062ce305 scsi: lpfc: Fix PLOGI S_ID of 0 on pt2pt config
Under some pt2pt situations, the other end of the link may issue a LOGO
after successfully completing PLOGI and assigning addresses to the port.
Thus the driver may attempt a new PLOGI to re-create the login, but the
LOGO handling cleared the address back to 0. Once this happens, the other
end, which may be address 0, gets all confused and this cannot be resolved
without an administrative action to bounce the link.

Fix by assuming that address assignment only occurs on the 1st PLOGI after
link up, and regardless of login state, the address assignment sticks.  The
FC standards aren't particularly clear in this situation (it only describes
initial PLOGI), but there is nothing that contradicts this and behaviors on
the devices tested appears to conform to the understanding.

Thus, don't reset the port address to 0 as part of LOGO handling. Port
addresses will only reset on link down.

Link: https://lore.kernel.org/r/20210104180240.46824-2-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:35 -05:00
Gustavo A. R. Silva
e9a7c71171 scsi: lpfc: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by
explicitly adding a break statement instead of letting the code fall
through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Link: https://lore.kernel.org/r/fff8d6f1d33b9e2c94dbe024a4f8df22866d3bf8.1605896060.git.gustavoars@kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-02 12:59:47 -05:00
James Smart
9d8de441db scsi: lpfc: Correct null ndlp reference on routine exit
smatch correctly called out a logic error with accessing a pointer after
checking it for null:

 drivers/scsi/lpfc/lpfc_els.c:2043 lpfc_cmpl_els_plogi()
 error: we previously assumed 'ndlp' could be null (see line 1942)

Adjust the exit point to avoid the trace printf ndlp reference. A trace
entry was already generated when the ndlp was checked for null.

Link: https://lore.kernel.org/r/20201130181226.16675-1-james.smart@broadcom.com
Fixes: 4430f7fd09 ("scsi: lpfc: Rework locations of ndlp reference taking")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-01 00:19:14 -05:00
Vaibhav Gupta
ef6fa16b5d scsi: lpfc: Use generic power management
Drivers should do only device-specific jobs. But in general, drivers using
legacy PCI PM framework for .suspend()/.resume() have to manage many PCI
PM-related tasks themselves which can be done by PCI Core itself. This
brings extra load on the driver and it directly calls PCI helper functions
to handle them.

Switch to the new generic framework by updating function signatures and
define a "struct dev_pm_ops" variable to bind PM callbacks. Also, remove
unnecessary calls to the PCI Helper functions along with the legacy
.suspend & .resume bindings.

Link: https://lore.kernel.org/r/20201102164730.324035-18-vaibhavgupta40@gmail.com
Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-25 23:14:31 -05:00
James Smart
6998ff4e21 scsi: lpfc: Fix variable 'vport' set but not used in lpfc_sli4_abts_err_handler()
Remove vport variable that is assigned but not used in
lpfc_sli4_abts_err_handler().

Link: https://lore.kernel.org/r/20201119203407.121913-1-james.smart@broadcom.com
Fixes: e7dab164a9 ("scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpi")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-19 22:22:28 -05:00
James Smart
185d17e11e scsi: lpfc: Fix missing prototype for lpfc_nvmet_prep_abort_wqe()
lpfc_nvmet_prep_abort_wqe() needs to be declared static.

Link: https://lore.kernel.org/r/20201119203316.121725-1-james.smart@broadcom.com
Fixes: db7531d2b3 ("scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-19 22:21:50 -05:00
James Smart
09b15e3507 scsi: lpfc: Fix set but unused variables in lpfc_dev_loss_tmo_handler()
Remove set but not used variable shost in lpfc_dev_loss_tmo_handler().

Link: https://lore.kernel.org/r/20201119203353.121866-1-james.smart@broadcom.com
Fixes: 52edb2caf6 ("scsi: lpfc: Remove ndlp when a PLOGI/ADISC/PRLI/REG_RPI ultimately fails")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-19 22:21:04 -05:00
James Smart
4a119d8a4c scsi: lpfc: Fix set but not used warnings from Rework remote port lock handling
Remove local variables that are set but not used.

Link: https://lore.kernel.org/r/20201119203340.121819-1-james.smart@broadcom.com
Fixes: c6adba1501 ("scsi: lpfc: Rework remote port lock handling")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-19 22:20:26 -05:00
James Smart
809032ddf9 scsi: lpfc: Fix missing prototype warning for lpfc_fdmi_vendor_attr_mi()
Function needs to be declared as static.

Link: https://lore.kernel.org/r/20201119203328.121772-1-james.smart@broadcom.com
Fixes: 8aaa7bcf07 ("scsi: lpfc: Add FDMI Vendor MIB support")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-19 22:19:41 -05:00
Colin Ian King
14c1dd9504 scsi: lpfc: Fix memory leak on lcb_context
Currently there is an error return path that neglects to free the
allocation for lcb_context.  Fix this by adding a new error free exit path
that kfree's lcb_context before returning.  Use this new kfree exit path in
another exit error path that also kfree's the same object, allowing a line
of code to be removed.

Link: https://lore.kernel.org/r/20201118141314.462471-1-colin.king@canonical.com
Fixes: 4430f7fd09 ("scsi: lpfc: Rework locations of ndlp reference taking")
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Resource leak")
2020-11-19 22:16:00 -05:00
Colin Ian King
61795a5316 scsi: lpfc: Remove dead code on second !ndlp check
Currently there is a null check on the pointer ndlp that exits via error
path issue_ct_rsp_exit followed by another null check on the same pointer
that is almost identical to the previous null check stanza and yet can
never can be reached because the previous check exited via
issue_ct_rsp_exit. This is deadcode and can be removed.

Link: https://lore.kernel.org/r/20201118133744.461385-1-colin.king@canonical.com
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Logically dead code")
2020-11-19 22:15:19 -05:00
Colin Ian King
1e7dddb2e7 scsi: lpfc: Fix pointer defereference before it is null checked issue
There is a null check on pointer lpfc_cmd after the pointer has been
dereferenced when pointers rdata and ndlp are initialized at the start of
the function. Fix this by only assigning rdata and ndlp after the pointer
lpfc_cmd has been null checked.

Link: https://lore.kernel.org/r/20201118131345.460631-1-colin.king@canonical.com
Fixes: 96e209be6e ("scsi: lpfc: Convert SCSI I/O completions to SLI-3 and SLI-4 handlers")
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Dereference before null check")
2020-11-19 22:13:42 -05:00
James Smart
983f761cd5 scsi: lpfc: Update changed file copyrights for 2020
Update Copyright in files changed by the 12.8.0.6 patch set to 2020

Link: https://lore.kernel.org/r/20201115192646.12977-18-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:56 -05:00
James Smart
ab4dfa4dd5 scsi: lpfc: Update lpfc version to 12.8.0.6
Update lpfc version to 12.8.0.6

Link: https://lore.kernel.org/r/20201115192646.12977-17-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:56 -05:00
James Smart
db7531d2b3 scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers
This patch reworks the abort interfaces such that SLI-3 retains the
iocb-based formatting and completions and SLI-4 now uses native WQEs and
completion routines.

The following changes are made:

 - The code is refactored from a confusing 2 routine sequence of
   xx_abort_iotag_issue(), which creates/formats and abort cmd, and
   xx_issue_abort_tag(), which then issues and handles the completion of
   the abort cmd - into a single interface of xx_issue_abort_iotag().  The
   new interface will determine whether SLI-3 or SLI-4 and then call the
   appropriate handler. A completion handler can now be specified to
   address the differences in completion handling.  Note: original code is
   all iocb based, with SLI-4 converting to SLI-3 for the SCSI/ELS path,
   and NVMe natively using wqes.

 - The SLI-3 side is refactored:

   The older iocb-base lpfc_sli_issue_abort_iotag() routine is combined
   with the logic of lpfc_sli_abort_iotag_issue() as well as the
   iocb-specific code in lpfc_abort_handler() and lpfc_sli_abort_iocb() to
   create the new single SLI-3 abort routine that formats and issues the
   iocb.

 - The SLI-4 side is refactored and added to:

   The native WQE abort code in NVMe is moved to the new SLI-4
   issue_abort_iotag() routine. Items in SCSI that set fields not set by
   NVMe is migrated into the new routine. Thus the routine supports NVMe
   and SCSI initiators. The nvmet block (target) formats the abort slightly
   different (like the old NVMe initiator) thus it has its own prep routine
   stolen from NVMe initiator and it retains the current code it has for
   issuing the WQE (does not use the commonized routine the initiators
   do). SLI-4 completion handlers were also added.

 - lpfc_abort_handler now becomes a wrapper that determines whether
   SLI-3 or SLI-4 and calls the proper abort handler.

Link: https://lore.kernel.org/r/20201115192646.12977-16-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:56 -05:00
James Smart
96e209be6e scsi: lpfc: Convert SCSI I/O completions to SLI-3 and SLI-4 handlers
The current driver implementation uses SLI-4 WQE to iocb conversion before
calling the cmpl callback function.

Rework the FCP I/O completion path to utilize the SLI-4 WQE.

This patch converts the SCSI I/O completion paths from the iocb-centric
interfaces to the routines are native for whether I/Os are iocb-based
(SLI-3) or WQE-based (SLI-4).

Most existing routines were iocb-based, so this creates a lot of SLI-4
specific routines to provide the functionality.

Link: https://lore.kernel.org/r/20201115192646.12977-15-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:56 -05:00
James Smart
da255e2e7c scsi: lpfc: Convert SCSI path to use common I/O submission path
This patch converts the SCSI I/O path from the iocb-centric interfaces to
the common I/O submission path which supports native SLI-4 WQEs.

A wrapper routine is put in place to distinguish SLI-3 from SLI. If SLI-3,
the same iocb-centric paths are used, perhaps with refactored code that is
explicitly for SLI-3.  For SLI-4, any iocb-related formatting is replaced
by wqe-based formatting, although much of that is addressed by the common
wqe templates in the SLI-4 path.

Link: https://lore.kernel.org/r/20201115192646.12977-14-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:56 -05:00
James Smart
47ff4c510f scsi: lpfc: Enable common send_io interface for SCSI and NVMe
To set up common use by the SCSI and NVMe I/O paths, create a new routine
that issues FCP I/O commands which can be used by either protocol.  The new
routine addresses SLI-3 vs SLI-4 differences within its implementation.

Replace the (SLI-3 centric) iocb routine in the SCSI path with this new
WQE-centric common routine.

Link: https://lore.kernel.org/r/20201115192646.12977-13-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:56 -05:00
James Smart
840a470181 scsi: lpfc: Enable common wqe_template support for both SCSI and NVMe
The driver is currently using SLI-4 WQE templates only for NVMe.  Refactor
the template and the placement of the service routine so that it can be
used by both SCSI and NVMe.

Link: https://lore.kernel.org/r/20201115192646.12977-12-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:55 -05:00
James Smart
b101eb27fd scsi: lpfc: Refactor WQE structure definitions for common use
In preparation of reworking the driver to use a native SLI-4 WQE interface
for the SCSI and NVMe I/O paths, start by commonizing the WQE exchange type
and command type attributes.

While adjusting these options also noted the variance in the pbde field.
Fix this by setting templates to 0 and in NVMe, which explicitly uses this
option, setting the value.

Link: https://lore.kernel.org/r/20201115192646.12977-11-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:55 -05:00
James Smart
a70e63eee1 scsi: lpfc: Fix NPIV Fabric Node reference counting
While testing initiator-side cable swaps with NPIV, oops occur.  The
reference counts for the Fabric nodes on the NPIV vports isn't balanced,
resulting in premature node removal.

The following fixes were made:

 - Removed the FC_LBIT check in lpfc_linkup_port. This removed the special
   case for vports that didn't have them clean up just like the physical
   port.

 - Removed the unreg_rpi call in lpfc_cleanup_node. In this section, the
   node is being removed in the context of a reference count release and a
   mailbox command can't be issued at this point.

 - Remove special case handling in the default mailbox completion handler
   that allowed the skipping of a node reference. Now, reference counting
   always requires the removal of the reference.

 - Move the location of the DEVICE_RM event is done during LOGO handling as
   the driver has additional work to do on the ndlp before puts/releases
   can be performed.

Link: https://lore.kernel.org/r/20201115192646.12977-10-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:55 -05:00
James Smart
b3f2e67cc2 scsi: lpfc: Fix NPIV discovery and Fabric Node detection
While testing NPIV and link bounces, the vport would not show a fabric node
for the F_Port, would not transition into NPR state during a link fault, or
leave the FDMI node untouched during error injection. Cause for this was
determined to be an inconsistent manner in which F_Port, Nameserver, and
FDMI controller nodes were created and linked. In some cases, the nodes
would never be unregistered from the transport, leaving references
active. In other cases, the fabric nodes may register with the transport
multiple times while still registered.

The following changes were made:

 - Fix the FDISC issue routine, which starts vport (re)creation, to mark
   the F_Port as a fabric node (NLP_FABRIC) and allow the F_Port node to
   fully be created and show up in the node list.

 - When remote ports are cleaned up on vport termination, cleanup the
   nameserver and FDMI controller nodes on the vport so they unregister
   from the transport.

 - On link bounces, don't exclude the NPIV Fabric remote ports from
   transitioning to the NPR state, allowing them to avoid re-registration
   if already registered.

Link: https://lore.kernel.org/r/20201115192646.12977-9-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:55 -05:00
James Smart
9d76d46751 scsi: lpfc: Unsolicited ELS leaves node in incorrect state while dropping it
When a target swap happens, under certain conditions the node sends a
LOGO. The unsolicited ELS logic responds with a reject. The logic may
allocate a new node to handle this. Afterward, the new nodes are dropped
incorrectly leaving them in a mis-matched state and refcounting causes a
use-after-free situation leading to a crash.

It is also possible that the unsolicited els handling finds a node which is
in an UNUSED state. The handling moves these nodes to NPR state with a
refcount of 1. Although the end of the discovery logic assumes a final put
will free such a node, there are codes paths which could increment the
reference count, thus the node is in NPR state and not released.
Eventually this mismatch in state and refcount leads to premature release
of the node causing a crash.

Fix by always using the discovery engine DEVICE RM event to decrement and
release the nodes (rather than explicit code that tried to do it before).
This will take care of moving the node to the UNUSED state and then removes
the final ref count. If there is a trigger to reuse this node, the
transition from the UNUSED state clearly indicates that the initial
reference is then incremented and use can continue.

Link: https://lore.kernel.org/r/20201115192646.12977-8-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:55 -05:00
James Smart
52edb2caf6 scsi: lpfc: Remove ndlp when a PLOGI/ADISC/PRLI/REG_RPI ultimately fails
When a PLOGI/ADISC/PRLI/REG_RPI fails, the node remains in the nodelist in
that state.  Although the driver now frees a node when the ref count goes
to zero, in this case the ref cnt doesn't reach zero because there isn't a
mechanism to release the final reference.  Discovery just stops.

Fix by calling the node discovery state machine DEVICE_RM event whenever
one of these commands fail. This will remove the final reference count and
trigger node release.

Link: https://lore.kernel.org/r/20201115192646.12977-7-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:55 -05:00
James Smart
c6adba1501 scsi: lpfc: Rework remote port lock handling
Currently the discovery layers within the driver use the SCSI midlayer
host_lock to access node-specific structures. This can contend with the I/O
path and is too coarse of a lock.

Rework the driver so that it uses a lock specific to the remote port node
structure when accessing the structure contents. A few of the changes
brought out spots were some slightly reorganized routines worked better.

Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:54 -05:00
James Smart
e9b1108316 scsi: lpfc: Fix refcounting around SCSI and NVMe transport APIs
Due to bug history and code review, the node reference counting approach in
the driver isn't implemented consistently with how the scsi and nvme
transport perform registrations and unregistrations and their callbacks.
This resulted in many bad/stale node pointers.

Reword the driver so that reference handling is performed as follows:

 - The initial node reference is taken on structure allocation

 - Take a reference on any add/register call to the transport

 - Remove a reference on any delete/unregister call to the transport

 - After the node has fully removed from both the SCSI and NVMEe transports
   (dev_loss_callbacks have called back) call the discovery engine
   DEVICE_RM event which will remove the final reference and release the
   node structure.

 - Alter dev_loss handling when a vport or base port is unloading.

 - Remove the put_node handling - no longer needed.

 - Rewrite the vport_delete handling on reference counts.  Part of this
   effort was driven from the FDISC not registering with the transport and
   disrupting the model for node reference counting.

 - Deleted lpfc_nlp_remove.  Pushed it's remaining ops into
   lpfc_nlp_release.

 - Several other small code cleanups.

Link: https://lore.kernel.org/r/20201115192646.12977-5-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:54 -05:00
James Smart
95f0ef8a83 scsi: lpfc: Fix removal of SCSI transport device get and put on dev structure
The lpfc driver is calling get_device and put_device on scsi_fc_transport
device structure. When this code was removed, the driver triggered an oops
in "scsi_is_host_dev" when the first SCSI target was unregistered from the
transport.

The reason the calls were necessary is that the driver is calling
scsi_remove_host too early, before the target rports are unregistered and
the scsi devices disconnected from the scsi_host.  The fc_host was torn
down during fc_remove_host.

Fix by moving the lpfc_pci_remove_one_s3/s4 calls to scsi_remove_host to
after the nodes are cleaned up.  Remove the get_device and put_device calls
and the supporting code.

Link: https://lore.kernel.org/r/20201115192646.12977-4-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:54 -05:00
James Smart
4430f7fd09 scsi: lpfc: Rework locations of ndlp reference taking
Now that the driver has gone to a normal ref interface (with no odd logic)
the discovery logic needs to be updated to reworked so that it properly
takes references when it should and give them up when it should.

Rework the driver for the following get/put model:

 - Move gets to just before an I/O is issued. Add gets for places where an
   I/O was issued without one.

 - Ensure that failures from lpfc_nlp_get() are handled by the driver.

 - Check and fix the placement of lpfc_nlp_puts relative to io completions.
   Note: some of these paths may not release the reference on the exact io
   completion as the reference is held as the code takes another step in
   the discovery thread and which may cause another io to be issued.

 - Rearrange some code for error processing and calling lpfc_nlp_put.

 - Fix some places of incorrect reference freeing that was causing the
   premature releasing of the structure.

 - Nvmet plogi handling performs unreg_rpi's. The reference counts were
   unbalanced resulting in premature node removal. In some cases this
   caused loss of node discovery. Corrected the reftaking around nvmet
   plogis.

Nodes that experience devloss now get released from the node list now that
there is a proper reference taking.

Link: https://lore.kernel.org/r/20201115192646.12977-3-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:54 -05:00
James Smart
307e338097 scsi: lpfc: Rework remote port ref counting and node freeing
When a remote port is disconnected and disappears, its node structure
(ndlp) stays allocated and on a vport node list. While on the list it can
be matched, thus requires validation checks on state to be added in
numerous code paths. If the node comes back, its possible for there to be
multiple node structures for the same device on the vport node list. There
is no reason to keep the node structure around after it is no longer in
existence, and the current implementation creates problems for itself
(multiple nodes) and lots of unnecessary code for state validation.

Additionally, the reference taking on the node structure didn't follow the
normal model used by the kernel kref api. It included lots of odd logic to
match state with reference count.  The combination of this odd logic plus
the way it was implicitly used in the discovery engine made its reference
taking implementation suspect and extremely hard to follow.

Change the driver such that the reference taking routines are now normal
ref increments/decrements and callout on refcount=0.

With this in place, the rework can be done such that the node structure is
fully removed and deallocated when the remote port no longer exists and all
references are removed.  This removal logic, and the basic ref counting are
intrically tied, thus in a single patch.

Link: https://lore.kernel.org/r/20201115192646.12977-2-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:54 -05:00