Commit graph

757 commits

Author SHA1 Message Date
Julian Wiedmann
c3bfffa5ec scsi: zfcp: Avoid benign overflow of the Request Queue's free-level
zfcp_qdio_send() and zfcp_qdio_int_req() run concurrently, adding and
completing SBALs on the Request Queue. There's a theoretical race where
zfcp_qdio_int_req() completes a number of SBALs & increments the queue's
free-level _before_ zfcp_qdio_send() was able to decrement it.

This can cause ->req_q_free to momentarily hold a value larger than
QDIO_MAX_BUFFERS_PER_Q. Luckily zfcp_qdio_send() is always called under
->req_q_lock, and all readers of the free-level also take this lock. So we
can trust that zfcp_qdio_send() will clean up such a temporary overflow
before anyone can actually observe it.

But it's still confusing and annoying to worry about. So adjust the code to
avoid this race.

Link: https://lore.kernel.org/r/7f61f59a1f8db270312e64644f9173b8f1ac895f.1593780621.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-08 00:50:56 -04:00
Julian Wiedmann
6bcb7c171a scsi: zfcp: Replace open-coded list move
Instead of manually moving each element of the unit and port lists into our
temporary on-stack lists, splice them over in one go.

Link: https://lore.kernel.org/r/cacb179f49ece50fd4dce119c61252d632cdc1d4.1593780621.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-08 00:50:55 -04:00
Julian Wiedmann
b43cdb5ac8 scsi: zfcp: Clean up zfcp_erp_action_ready()
We already maintain a pointer to act->adapter. Use it consistently to avoid
any confusion about whose ->erp_ready_head and ->erp_ready_wq we are
accessing.

Link: https://lore.kernel.org/r/d1bb04322f240dee32f4c4a551bc93bc736f4b01.1593780621.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-08 00:50:55 -04:00
Julian Wiedmann
459ad085d8 scsi: zfcp: Fix an outdated comment for zfcp_qdio_send()
zfcp no longer uses the qdio PCI flag, update the comment.

Link: https://lore.kernel.org/r/6717c26fc986bff8776d110e27c199b523684c63.1593780621.git.bblock@linux.ibm.com
Fixes: 21ddaa53f9 ("[SCSI] zfcp: Remove PCI flag")
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-08 00:50:53 -04:00
George Spelvin
0cd0e57ec8 scsi: zfcp: Use prandom_u32_max() for backoff
We don't need crypto-grade random numbers for randomized backoffs.  Instead
use prandom_u32_max(ep_ro) which generates a pseudo-random number uniformly
distributed in the interval [0, ep_ro).

Link: https://lore.kernel.org/r/8fc7c4c4069ff1783f4a9ccd84a923f581a09ec5.1593780621.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: George Spelvin <lkml@sdf.org>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-08 00:50:52 -04:00
Steffen Maier
936e6b85da scsi: zfcp: Fix panic on ERP timeout for previously dismissed ERP action
Suppose that, for unrelated reasons, FSF requests on behalf of recovery are
very slow and can run into the ERP timeout.

In the case at hand, we did adapter recovery to a large degree.  However
due to the slowness a LUN open is pending so the corresponding fc_rport
remains blocked.  After fast_io_fail_tmo we trigger close physical port
recovery for the port under which the LUN should have been opened.  The new
higher order port recovery dismisses the pending LUN open ERP action and
dismisses the pending LUN open FSF request.  Such dismissal decouples the
ERP action from the pending corresponding FSF request by setting
zfcp_fsf_req->erp_action to NULL (among other things)
[zfcp_erp_strategy_check_fsfreq()].

If now the ERP timeout for the pending open LUN request runs out, we must
not use zfcp_fsf_req->erp_action in the ERP timeout handler.  This is a
problem since v4.15 commit 75492a5156 ("s390/scsi: Convert timers to use
timer_setup()"). Before that we intentionally only passed zfcp_erp_action
as context argument to zfcp_erp_timeout_handler().

Note: The lifetime of the corresponding zfcp_fsf_req object continues until
a (late) response or an (unrelated) adapter recovery.

Just like the regular response path ignores dismissed requests
[zfcp_fsf_req_complete() => zfcp_fsf_protstatus_eval() => return early] the
ERP timeout handler now needs to ignore dismissed requests.  So simply
return early in the ERP timeout handler if the FSF request is marked as
dismissed in its status flags.  To protect against the race where
zfcp_erp_strategy_check_fsfreq() dismisses and sets
zfcp_fsf_req->erp_action to NULL after our previous status flag check,
return early if zfcp_fsf_req->erp_action is NULL.  After all, the former
ERP action does not need to be woken up as that was already done as part of
the dismissal above [zfcp_erp_action_dismiss()].

This fixes the following panic due to kernel page fault in IRQ context:

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:000009859238c00b R2:00000e3e7ffd000b R3:00000e3e7ffcc007 S:00000e3e7ffd7000 P:000000000000013d
Oops: 0004 ilc:2 [#1] SMP
Modules linked in: ...
CPU: 82 PID: 311273 Comm: stress Kdump: loaded Tainted: G            E  X   ...
Hardware name: IBM 8561 T01 701 (LPAR)
Krnl PSW : 0404c00180000000 001fffff80549be0 (zfcp_erp_notify+0x40/0xc0 [zfcp])
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000080 00000e3d00000000 00000000000000f0 0000000000030000
           000000010028e700 000000000400a39c 000000010028e700 00000e3e7cf87e02
           0000000010000000 0700098591cb67f0 0000000000000000 0000000000000000
           0000033840e9a000 0000000000000000 001fffe008d6bc18 001fffe008d6bbc8
Krnl Code: 001fffff80549bd4: a7180000            lhi     %r1,0
           001fffff80549bd8: 4120a0f0            la      %r2,240(%r10)
          #001fffff80549bdc: a53e0003            llilh   %r3,3
          >001fffff80549be0: ba132000            cs      %r1,%r3,0(%r2)
           001fffff80549be4: a7740037            brc     7,1fffff80549c52
           001fffff80549be8: e320b0180004        lg      %r2,24(%r11)
           001fffff80549bee: e31020e00004        lg      %r1,224(%r2)
           001fffff80549bf4: 412020e0            la      %r2,224(%r2)
Call Trace:
 [<001fffff80549be0>] zfcp_erp_notify+0x40/0xc0 [zfcp]
 [<00000985915e26f0>] call_timer_fn+0x38/0x190
 [<00000985915e2944>] expire_timers+0xfc/0x190
 [<00000985915e2ac4>] run_timer_softirq+0xec/0x218
 [<0000098591ca7c4c>] __do_softirq+0x144/0x398
 [<00000985915110aa>] do_softirq_own_stack+0x72/0x88
 [<0000098591551b58>] irq_exit+0xb0/0xb8
 [<0000098591510c6a>] do_IRQ+0x82/0xb0
 [<0000098591ca7140>] ext_int_handler+0x128/0x12c
 [<0000098591722d98>] clear_subpage.constprop.13+0x38/0x60
([<000009859172ae4c>] clear_huge_page+0xec/0x250)
 [<000009859177e7a2>] do_huge_pmd_anonymous_page+0x32a/0x768
 [<000009859172a712>] __handle_mm_fault+0x88a/0x900
 [<000009859172a860>] handle_mm_fault+0xd8/0x1b0
 [<0000098591529ef6>] do_dat_exception+0x136/0x3e8
 [<0000098591ca6d34>] pgm_check_handler+0x1c8/0x220
Last Breaking-Event-Address:
 [<001fffff80549c88>] zfcp_erp_timeout_handler+0x10/0x18 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt

Link: https://lore.kernel.org/r/20200623140242.98864-1-maier@linux.ibm.com
Fixes: 75492a5156 ("s390/scsi: Convert timers to use timer_setup()")
Cc: <stable@vger.kernel.org> #4.15+
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-06-24 00:01:09 -04:00
Benjamin Block
d0dff2ac98 scsi: zfcp: Move allocation of the shost object to after xconf- and xport-data
At the moment we allocate and register the Scsi_Host object corresponding
to a zfcp adapter (FCP device) very early in the life cycle of the adapter
- even before we fully discover and initialize the underlying
firmware/hardware. This had the advantage that we could already use the
Scsi_Host object, and fill in all its information during said discover and
initialize.

Due to commit 737eb78e82 ("block: Delay default elevator initialization")
(first released in v5.4), we noticed a regression that would prevent us
from using any storage volume if zfcp is configured with support for DIF or
DIX (zfcp.dif=1 || zfcp.dix=1). Doing so would result in an illegal memory
access as soon as the first request is sent with such an configuration. As
example for a crash resulting from this:

  scsi host0: scsi_eh_0: sleeping
  scsi host0: zfcp
  qdio: 0.0.1900 ZFCP on SC 4bd using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W AP
  scsi 0:0:0:0: scsi scan: INQUIRY pass 1 length 36
  Unable to handle kernel pointer dereference in virtual kernel address space
  Failing address: 0000000000000000 TEID: 0000000000000483
  Fault in home space mode while using kernel ASCE.
  AS:0000000035c7c007 R3:00000001effcc007 S:00000001effd1000 P:000000000000003d
  Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: ...
  CPU: 1 PID: 783 Comm: kworker/u760:5 Kdump: loaded Not tainted 5.6.0-rc2-bb-next+ #1
  Hardware name: ...
  Workqueue: scsi_wq_0 fc_scsi_scan_rport [scsi_transport_fc]
  Krnl PSW : 0704e00180000000 000003ff801fcdae (scsi_queue_rq+0x436/0x740 [scsi_mod])
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
  Krnl GPRS: 0fffffffffffffff 0000000000000000 0000000187150120 0000000000000000
             000003ff80223d20 000000000000018e 000000018adc6400 0000000187711000
             000003e0062337e8 00000001ae719000 0000000187711000 0000000187150000
             00000001ab808100 0000000187150120 000003ff801fcd74 000003e0062336a0
  Krnl Code: 000003ff801fcd9e: e310a35c0012        lt      %r1,860(%r10)
             000003ff801fcda4: a7840010           brc     8,000003ff801fcdc4
            #000003ff801fcda8: e310b2900004       lg      %r1,656(%r11)
            >000003ff801fcdae: d71710001000       xc      0(24,%r1),0(%r1)
             000003ff801fcdb4: e310b2900004       lg      %r1,656(%r11)
             000003ff801fcdba: 41201018           la      %r2,24(%r1)
             000003ff801fcdbe: e32010000024       stg     %r2,0(%r1)
             000003ff801fcdc4: b904002b           lgr     %r2,%r11
  Call Trace:
   [<000003ff801fcdae>] scsi_queue_rq+0x436/0x740 [scsi_mod]
  ([<000003ff801fcd74>] scsi_queue_rq+0x3fc/0x740 [scsi_mod])
   [<00000000349c9970>] blk_mq_dispatch_rq_list+0x390/0x680
   [<00000000349d1596>] blk_mq_sched_dispatch_requests+0x196/0x1a8
   [<00000000349c7a04>] __blk_mq_run_hw_queue+0x144/0x160
   [<00000000349c7ab6>] __blk_mq_delay_run_hw_queue+0x96/0x228
   [<00000000349c7d5a>] blk_mq_run_hw_queue+0xd2/0xe0
   [<00000000349d194a>] blk_mq_sched_insert_request+0x192/0x1d8
   [<00000000349c17b8>] blk_execute_rq_nowait+0x80/0x90
   [<00000000349c1856>] blk_execute_rq+0x6e/0xb0
   [<000003ff801f8ac2>] __scsi_execute+0xe2/0x1f0 [scsi_mod]
   [<000003ff801fef98>] scsi_probe_and_add_lun+0x358/0x840 [scsi_mod]
   [<000003ff8020001c>] __scsi_scan_target+0xc4/0x228 [scsi_mod]
   [<000003ff80200254>] scsi_scan_target+0xd4/0x100 [scsi_mod]
   [<000003ff802d8b96>] fc_scsi_scan_rport+0x96/0xc0 [scsi_transport_fc]
   [<0000000034245ce8>] process_one_work+0x458/0x7d0
   [<00000000342462a2>] worker_thread+0x242/0x448
   [<0000000034250994>] kthread+0x15c/0x170
   [<0000000034e1979c>] ret_from_fork+0x30/0x38
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   [<000003ff801fbc36>] scsi_add_cmd_to_list+0x9e/0xa8 [scsi_mod]
  Kernel panic - not syncing: Fatal exception: panic_on_oops

While this issue is exposed by the commit named above, this is only by
accident. The real issue exists for longer already - basically since it's
possible to use blk-mq via scsi-mq, and blk-mq pre-allocates all requests
for a tag-set during initialization of the same. For a given Scsi_Host
object this is done when adding the object to the midlayer
(`scsi_add_host()` and such). In `scsi_mq_setup_tags()` the midlayer
calculates how much memory is required for a single scsi_cmnd, and its
additional data, which also might include space for additional protection
data - depending on whether the Scsi_Host has any form of protection
capabilities (`scsi_host_get_prot()`).

The problem is now thus, because zfcp does this step before we actually
know whether the firmware/hardware has these capabilities, we don't set any
protection capabilities in the Scsi_Host object. And so, no space is
allocated for additional protection data for requests in the Scsi_Host
tag-set.

Once we go through discover and initialize the FCP device firmware/hardware
fully (this is done via the firmware commands "Exchange Config Data" and
"Exchange Port Data") we find out whether it actually supports DIF and DIX,
and we set the corresponding capabilities in the Scsi_Host object (in
`zfcp_scsi_set_prot()`). Now the Scsi_Host potentially has protection
capabilities, but the already allocated requests in the tag-set don't have
any space allocated for that.

When we then trigger target scanning or add scsi_devices manually, the
midlayer will use requests from that tag-set, and before sending most
requests, it will also call `scsi_mq_prep_fn()`. To prepare the scsi_cmnd
this function will check again whether the used Scsi_Host has any
protection capabilities - and now it potentially has - and if so, it will
try to initialize the assumed to be preallocated structures and thus it
causes the crash, like shown above.

Before delaying the default elevator initialization with the commit named
above, we always would also allocate an elevator for any scsi_device before
ever sending any requests - in contrast to now, where we do it after
device-probing. That elevator in turn would have its own tag-set, and that
is initialized after we went through discovery and initialization of the
underlying firmware/hardware. So requests from that tag-set can be
allocated properly, and if used - unless the user changes/disabled the
default elevator - this would hide the underlying issue.

To fix this for any configuration - with or without an elevator - we move
the allocation and registration of the Scsi_Host object for a given FCP
device to after the first complete discovery and initialization of the
underlying firmware/hardware. By doing that we can make all basic
properties of the Scsi_Host known to the midlayer by the time we call
`scsi_add_host()`, including whether we have any protection capabilities.

To do that we have to delay all the accesses that we would have done in the
past during discovery and initialization, and do them instead once we are
finished with it. The previous patches ramp up to this by fencing and
factoring out all these accesses, and make it possible to re-do them later
on. In addition we make also use of the diagnostic buffers we recently
added with

commit 92953c6e0a ("scsi: zfcp: signal incomplete or error for sync exchange config/port data")
commit 7e418833e6 ("scsi: zfcp: diagnostics buffer caching and use for exchange port data")
commit 088210233e ("scsi: zfcp: add diagnostics buffer for exchange config data")

(first released in v5.5), because these already cache all the information
we need for that "re-do operation" - the information cached are always
updated during xconf or xport data, so it won't be stale.

In addition to the move and re-do, this patch also updates the
function-documentation of `zfcp_scsi_adapter_register()` and changes how it
reports if a Scsi_Host object already exists. In that case future
recovery-operations can skip this step completely and behave much like they
would do in the past - zfcp does not release a once allocated Scsi_Host
object unless the corresponding FCP device is deconstructed completely.

Link: https://lore.kernel.org/r/030dd6da318bbb529f0b5268ec65cebcd20fc0a3.1588956679.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-11 23:19:52 -04:00
Benjamin Block
71159b6ecb scsi: zfcp: Fence early sysfs interfaces for accesses of shost objects
When setting an adapter online for the first time, we also create a couple
of entries for it in the sysfs device tree. This is also true even if the
adapter has not yet ever gone successfully through exchange config and
exchange port data.

When moving the scsi host object allocation and registration to after the
first exchange config and exchange port data, this make the `port_rescan`
attribute susceptible to invalid pointer-dereferences of the shost field
before the adapter is fully initialized.

When written to, it schedules a `scan_work` item that will in turn make use
of the associated fibre channel host object to check the topology used for
this FCP device.

Because scanning for remote ports can't be done successfully without
completing exchange config and exchange port data first, we can simply
fence `port_rescan`, and so prevent the illegal access.

As with cases where we can't get a reference to the adapter, we also return
-ENODEV here. Applications need to handle that errno today already.

After a successful allocation of the scsi host object nothing changes in
the work flow.

Link: https://lore.kernel.org/r/ef65366d309993ca91b6917727590ca7ca166c8f.1588956679.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-11 23:19:50 -04:00
Benjamin Block
971f2abb4c scsi: zfcp: Fence adapter status propagation for common statuses
Common status flags that all main objects - adapter, port, and unit -
support are propagated to sub-objects when set or cleared. For instance,
when setting the status ZFCP_STATUS_COMMON_ERP_INUSE for an adapter object,
we will propagate this to all its child ports and units - same for when
clearing a common status flag.

Units of an adapter object are enumerated via __shost_for_each_device()
over the scsi host object of the corresponding adapter.

Once we move the scsi host object allocation and registration to after the
first exchange config and exchange port data, this won't be possible for
cases where we set or clear common statuses during the very first adapter
recovery.

But since we won't have any port or unit objects yet at that point of time,
we can just fence the status propagation for cases where the scsi host
object is not yet set in the adapter object. It won't change any effective
status propagations, but will prevent us from dereferencing invalid
pointers.

For any later point in the work flow the scsi host object will be set and
thus nothing is changed then.

Link: https://lore.kernel.org/r/f51fe5f236a1e3d1ce53379c308777561bfe35e1.1588956679.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-11 23:19:50 -04:00
Benjamin Block
ac007adc4d scsi: zfcp: Move p-t-p port allocation to after xport data
When doing the very first adapter recovery - initialization - for a FCP
device in a point-to-point topology we also allocate the port object
corresponding to the attached remote port, and trigger a port recovery for
it that will run after the adapter recovery finished.

Right now this happens right after we finished with the exchange config
data command, and uses the fibre channel host object corresponding to the
FCP device to determine whether a point-to-point topology is used.

When moving the scsi host object allocation and registration - and thus
also the fibre channel host object allocation - to after the first exchange
config and exchange port data, this use of the fc_host object is not
possible anymore at that point in the work flow.

But the allocation and recovery trigger doesn't have notable side-effects
on the following exchange port data processing, so we can move those to
after xport data, and thus also to after the scsi host object allocation,
once we move it. Then the fc_host object can be used again, like it is now.

For any further adapter recoveries this doesn't change anything, because at
that point the port object already exists and recovery is triggered
elsewhere for existing port objects.

Link: https://lore.kernel.org/r/73e5d4ac21e2b37bf0c3ca8e530bc5a5c6e74f8f.1588956679.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-11 23:19:49 -04:00
Benjamin Block
990486f3a8 scsi: zfcp: Fence fc_host updates during link-down handling
When receiving a notification that a FCP device lost its local link we
usually update the fibre channel host object which represents that FCP
device to reflect that.

This notification/information can also surface when the FCP device is
running through adapter recovery (exchange config and exchange port data
return incomplete).

When moving the scsi host object allocation and registration - and thus
also the fibre channel host object allocation - to after the first exchange
config and exchange port data, and this happens during the very first
adapter recovery, these updates can not be done until after the scsi host
object is allocated.

Reorder the fc_host updates in zfcp_fsf_fc_host_link_down() so that they
only happen after a check of whether the scsi host object is already
allocated or not.

During the first adapter recovery this will cause the skip of these updates
if a link-down condition is detected, but we can repeat them after we
allocated the scsi host object, if necessary.

For any further link-down handling the only changes in the work flow are
the slightly reordered assignments in zfcp_fsf_fc_host_link_down().

Link: https://lore.kernel.org/r/f841f2cda61dcd7b8549910c44e1831927459edf.1588956679.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-11 23:19:48 -04:00
Benjamin Block
52e61fde5e scsi: zfcp: Move fc_host updates during xport data handling into fenced function
When executing exchange port data for a FCP device for the first time, or
after an adapter recovery, we update several properties of the fibre
channel host object which represents that FCP device.

When moving the scsi host object allocation and registration - and thus
also the fibre channel host object allocation - to after the first exchange
config and exchange port data, this is not possible for the former case.

Move all these update into separate, and fenced function that first checks
whether the scsi host object already exists or not, before making the
updates.

During the first ever exchange port data in the adapter life cycle this
will make the exchange port data handler skip over this update step, but we
can repeat it later, after we allocated the scsi host object.

For any further recovery of that adapter the work flow is only changed
slightly because then the scsi host object already exists and we don't free
it until we release the adapter completely at the end of its life cycle.

Link: https://lore.kernel.org/r/ae454c2dc6da0b02907c489af91d0b211d331825.1588956679.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-11 23:19:48 -04:00
Benjamin Block
bd1684817d scsi: zfcp: Move shost updates during xconfig data handling into fenced function
When executing exchange config data for a FCP device for the first time, or
after an adapter recovery, we update several properties of the scsi host or
fibre channel host object that represent that FCP device.

When moving the scsi host object allocation and registration - and thus
also the fibre channel host object allocation - to after the first exchange
config and exchange port data, this is not possible for the former case.

Move all these update into separate, and fenced function that first checks
whether the scsi host object already exists or not, before making the
updates.

During the first ever exchange config data in the adapter life cycle this
will make the exchange config data handler skip over this update step, but
we can repeat it later, after we allocated the scsi host object.

For any further recovery of that adapter the work flow is only changed
slightly because then the scsi host object already exists and we don't free
it until we release the adapter completely at the end of its life cycle.

Link: https://lore.kernel.org/r/5fc3f4d38d4334f7aa595497c6f7865fb1102e0f.1588956679.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-11 23:19:47 -04:00
Benjamin Block
978857c7e3 scsi: zfcp: Move shost modification after QDIO (re-)open into fenced function
When establishing and activating the QDIO queue pair for a FCP device for
the first time, or after an adapter recovery, we publish some of its
characteristics to the scsi host object representing that FCP device.

When moving the scsi host object allocation and registration to after the
first exchange config and exchange port data, this is not possible for the
former case - QDIO open for the first time - because that happens before
exchange config and exchange port data.

Move the scsi host object update into a fenced function that checks whether
the object already exists or not. This way we can repeat that step later,
once we are past the allocation.

Once the first recovery succeeds we don't release the scsi host object
anymore, so further recoveries do work as before.

Link: https://lore.kernel.org/r/a214ebf508f71e3690113e3e90edab1cea0e24e3.1588956679.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-11 23:19:46 -04:00
Linus Torvalds
93f3321f65 SCSI misc on 20200410
This is a batch of changes that didn't make it in the initial pull
 request because the lpfc series had to be rebased to redo an incorrect
 split.  It's basically driver updates to lpfc, target, bnx2fc and ufs
 with the rest being minor updates except the sr_block_release one
 which fixes a use after free introduced by the removal of the global
 mutex in the first patch set.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXpC3hSYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishRTaAP9umhxu
 8rRnJ5hsxXRmxOUzO5BGe403ffcBeAiEKQ2n3gEAjeoxZAaqKuDDDRfXyRnBpt9Z
 QuBrgpm1gdXrJT5DDj4=
 =+4Qg
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
 "This is a batch of changes that didn't make it in the initial pull
  request because the lpfc series had to be rebased to redo an incorrect
  split.

  It's basically driver updates to lpfc, target, bnx2fc and ufs with the
  rest being minor updates except the sr_block_release one which fixes a
  use after free introduced by the removal of the global mutex in the
  first patch set"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (35 commits)
  scsi: core: Add DID_ALLOC_FAILURE and DID_MEDIUM_ERROR to hostbyte_table
  scsi: ufs: Use ufshcd_config_pwr_mode() when scaling gear
  scsi: bnx2fc: fix boolreturn.cocci warnings
  scsi: zfcp: use fallthrough;
  scsi: aacraid: do not overwrite retval in aac_reset_adapter()
  scsi: sr: Fix sr_block_release()
  scsi: aic7xxx: Remove more FreeBSD-specific code
  scsi: mpt3sas: Fix kernel panic observed on soft HBA unplug
  scsi: ufs: set device as active power mode after resetting device
  scsi: iscsi: Report unbind session event when the target has been removed
  scsi: lpfc: Change default SCSI LUN QD to 64
  scsi: libfc: rport state move to PLOGI if all PRLI retry exhausted
  scsi: libfc: If PRLI rejected, move rport to PLOGI state
  scsi: bnx2fc: Update the driver version to 2.12.13
  scsi: bnx2fc: Fix SCSI command completion after cleanup is posted
  scsi: bnx2fc: Process the RQE with CQE in interrupt context
  scsi: target: use the stack for XCOPY passthrough cmds
  scsi: target: increase XCOPY I/O size
  scsi: target: avoid per-loop XCOPY buffer allocations
  scsi: target: drop xcopy DISK BLOCK LENGTH debug
  ...
2020-04-10 12:21:11 -07:00
Julian Wiedmann
1da1092dbf s390/qdio: remove cdev from init_data
It's no longer needed.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-04-06 13:13:50 +02:00
Julian Wiedmann
d8564e19da s390/qdio: allow for non-contiguous SBAL array in init_data
Upper-layer drivers allocate their SBALs by calling qdio_alloc_buffers()
for each individual queue. But when later passing the SBAL addresses to
qdio_establish(), they need to be in a single array of pointers.
So if the driver uses multiple Input or Output queues, it needs to
allocate a temporary array just to present all its SBAL pointers in this
layout.

This patch slightly changes the format of the QDIO initialization data,
so that drivers can pass a per-queue array where each element points to
a queue's SBAL array.
zfcp doesn't use multiple queues, so the impact there is trivial.
For qeth this brings a nice reduction in complexity, and removes
a page-sized allocation.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-04-06 13:13:50 +02:00
Julian Wiedmann
ad96401cdb zfcp: inline zfcp_qdio_setup_init_data()
In preparation for a subsequent patch, move the setup of init_data into
the only caller.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-04-06 13:13:50 +02:00
Julian Wiedmann
3db1db93e3 s390/qdio: cleanly split alloc and establish
All that qdio_allocate() actually uses from the init_data is the cdev,
and the number of Input and Output Queues. Have the driver pass those as
parameters, and defer the init_data processing into qdio_establish().
This includes writing per-device(!) trace entries, and most of the
sanity checks.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-04-06 13:13:50 +02:00
Linus Torvalds
79f51b7b9c SCSI misc on 20200402
update changing all our txt files to rst ones.  Excluding that, we
 have the usual driver updates (qla2xxx, ufs, lpfc, zfcp, ibmvfc,
 pm80xx, aacraid), a treewide update for scnprintf and some other minor
 updates.  The major core update is Hannes moving functions out of the
 aacraid driver and into the core.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXoYKiyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSasAP4iGwSB
 Y8tFaZgWadu76+wj5MdqTBoXdhnIuFF0rZG3pQEAiIKdsfQlbSFdm75+gUtx5hG/
 GOilX/pJczTRJDCGNis=
 =g7Sk
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series has a huge amount of churn because it pulls in Mauro's doc
  update changing all our txt files to rst ones.

  Excluding that, we have the usual driver updates (qla2xxx, ufs, lpfc,
  zfcp, ibmvfc, pm80xx, aacraid), a treewide update for scnprintf and
  some other minor updates.

  The major core change is Hannes moving functions out of the aacraid
  driver and into the core"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (223 commits)
  scsi: aic7xxx: aic97xx: Remove FreeBSD-specific code
  scsi: ufs: Do not rely on prefetched data
  scsi: dc395x: remove dc395x_bios_param
  scsi: libiscsi: Fix error count for active session
  scsi: hpsa: correct race condition in offload enabled
  scsi: message: fusion: Replace zero-length array with flexible-array member
  scsi: qedi: Add PCI shutdown handler support
  scsi: qedi: Add MFW error recovery process
  scsi: ufs: Enable block layer runtime PM for well-known logical units
  scsi: ufs-qcom: Override devfreq parameters
  scsi: ufshcd: Let vendor override devfreq parameters
  scsi: ufshcd: Update the set frequency to devfreq
  scsi: ufs: Resume ufs host before accessing ufs device
  scsi: ufs-mediatek: customize the delay for enabling host
  scsi: ufs: make HCE polling more compact to improve initialization latency
  scsi: ufs: allow custom delay prior to host enabling
  scsi: ufs-mediatek: use common delay function
  scsi: ufs: introduce common and flexible delay function
  scsi: ufs: use an enum for host capabilities
  scsi: ufs: fix uninitialized tx_lanes in ufshcd_disable_tx_lcc()
  ...
2020-04-02 17:03:53 -07:00
Joe Perches
cec9cbac52 scsi: zfcp: use fallthrough;
Convert the various uses of fallthrough comments to fallthrough;

Done via script
Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
[bblock@linux.ibm.com: resolved merge conflict with recently upstream-sent patch "zfcp: expose fabric name as common fc_host sysfs attribute"]
Link: https://lore.kernel.org/r/d14669a67a17392490d3184117941123765db1a4.1585663010.git.bblock@linux.ibm.com
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-31 22:24:02 -04:00
Jens Remus
42cabdaf10 scsi: zfcp: log FC Endpoint Security errors
Log any FC Endpoint Security errors to the kernel ring buffer with rate-
limiting.

Link: https://lore.kernel.org/r/20200312174505.51294-11-maier@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-17 13:12:42 -04:00
Jens Remus
e53d92856e scsi: zfcp: enhance handling of FC Endpoint Security errors
Enable for explicit FCP channel FC Endpoint Security error reporting and
handle any FSF security errors according to specification. Take the
following recovery actions when a FSF_SECURITY_ERROR is reported for the
specified FSF commands:

- Open Port: Retry the command if possible
- Send FCP : Physically close the remote port and reopen

For Open Port the command status is set to error, which triggers a retry.
For Send FCP the command status is set to error and recovery is triggered
to physically reopen the remote port.

Link: https://lore.kernel.org/r/20200312174505.51294-10-maier@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-17 13:12:41 -04:00
Jens Remus
616da39e00 scsi: zfcp: trace FC Endpoint Security of FCP devices and connections
Trace changes in Fibre Channel Endpoint Security capabilities of FCP
devices as well as changes in Fibre Channel Endpoint Security state of
their connections to FC remote ports as FC Endpoint Security changes with
trace level 3 in HBA DBF.

A change in FC Endpoint Security capabilities of FCP devices is traced as
response to FSF command FSF_QTCB_EXCHANGE_PORT_DATA with a trace tag of
"fsfcesa" and a WWPN of ZFCP_DBF_INVALID_WWPN = 0x0000000000000000 (see
FC-FS-4 §18 "Name_Identifier Formats", NAA field).

A change in FC Endpoint Security state of connections between FCP devices
and FC remote ports is traced as response to FSF command
FSF_QTCB_OPEN_PORT_WITH_DID with a trace tag of "fsfcesp".

Example trace record of FC Endpoint Security capability change of FCP
device formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ...
Caller         : 0x...
Record ID      : 5                    ZFCP_DBF_HBA_FCES
Tag            : fsfcesa              FSF FC Endpoint Security adapter
Request ID     : 0x...
Request status : 0x00000010
FSF cmnd       : 0x0000000e           FSF_QTCB_EXCHANGE_PORT_DATA
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000000           FSF_GOOD
FSF stat qual  : n/a
Prot stat      : n/a
Prot stat qual : n/a
Port handle    : 0x00000000           none (invalid)
LUN handle     : n/a
WWPN           : 0x0000000000000000   ZFCP_DBF_INVALID_WWPN
FCES old       : 0x00000000           old FC Endpoint Security
FCES new       : 0x00000007           new FC Endpoint Security

Example trace record of FC Endpoint Security change of connection to
FC remote port formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ...
Caller         : 0x...
Record ID      : 5                    ZFCP_DBF_HBA_FCES
Tag            : fsfcesp              FSF FC Endpoint Security port
Request ID     : 0x...
Request status : 0x00000010
FSF cmnd       : 0x00000005           FSF_QTCB_OPEN_PORT_WITH_DID
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000000           FSF_GOOD
FSF stat qual  : n/a
Prot stat      : n/a
Prot stat qual : n/a
Port handle    : 0x...
WWPN           : 0x500507630401120c   WWPN
FCES old       : 0x00000000           old FC Endpoint Security
FCES new       : 0x00000004           new FC Endpoint Security

Link: https://lore.kernel.org/r/20200312174505.51294-9-maier@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-17 13:12:40 -04:00
Jens Remus
f0d26ae847 scsi: zfcp: log FC Endpoint Security of connections
Log the usage of and subsequent changes in FC Endpoint Security of
connections between FCP devices and FC remote ports to the kernel ring
buffer. Activation of FC Endpoint Security is logged as informational.
Change and deactivation are logged as warning.

No logging takes place, if FC Endpoint Security is not used (i.e. never
activated) on a connection or if it does not change during reopen of a port
(e.g. due to adapter or port recovery).

Link: https://lore.kernel.org/r/20200312174505.51294-8-maier@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-17 13:12:39 -04:00
Jens Remus
a17c784600 scsi: zfcp: report FC Endpoint Security in sysfs
Add an interface to read Fibre Channel Endpoint Security information of FCP
channels and their connections to FC remote ports. It comes in the form of
new sysfs attributes that are attached to the CCW device representing the
FCP device and its zfcp port objects.

The read-only sysfs attribute "fc_security" of a CCW device representing a
FCP device shows the FC Endpoint Security capabilities of the device.
Possible values are: "unknown", "unsupported", "none", or a comma-
separated list of one or more mnemonics and/or one hexadecimal value
representing the supported FC Endpoint Security:

  Authentication: Authentication supported
  Encryption    : Encryption supported

The read-only sysfs attribute "fc_security" of a zfcp port object shows the
FC Endpoint Security used on the connection between its parent FCP device
and the FC remote port. Possible values are: "unknown", "unsupported",
"none", or a mnemonic or hexadecimal value representing the FC Endpoint
Security used:

  Authentication: Connection has been authenticated
  Encryption    : Connection is encrypted

Both sysfs attributes may return hexadecimal values instead of mnemonics,
if the mnemonic lookup table does not contain an entry for the FC Endpoint
Security reported by the FCP device.

Link: https://lore.kernel.org/r/20200312174505.51294-7-maier@linux.ibm.com
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-17 13:12:38 -04:00
Jens Remus
185f2d2d59 scsi: zfcp: auto variables for dereferenced structs in open port handler
Introduce automatic variables for adapter and QTCB bottom in
zfcp_fsf_open_port_handler(). This facilitates subsequent changes to meet
the 80 character per line limit.

Link: https://lore.kernel.org/r/20200312174505.51294-6-maier@linux.ibm.com
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-17 13:12:36 -04:00
Steffen Maier
7e0e4e0958 scsi: zfcp: fix fc_host attributes that should be unknown on local link down
When we get an unsolicited notification on local link went down,
zfcp_fsf_status_read_link_down() calls zfcp_fsf_link_down_info_eval().
This only blocks rports, and sets ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED and
ZFCP_STATUS_COMMON_ERP_FAILED. Only the fc_host port_state changes to
"Linkdown", because zfcp_scsi_get_host_port_state() is an active callback
and uses the adapter status.

Other fc_host attributes model, port_id, port_type, speed, fabric_name (and
zfcp device attributes card_version, peer_wwpn, peer_wwnn, peer_d_id) which
depend on a local link, continued to show their last known "good" value.

Only if something triggered an exchange config data, some values were
updated to their unknown equivalent via case
FSF_EXCHANGE_CONFIG_DATA_INCOMPLETE due to local link down.  Triggers for
exchange config data are adapter recovery, or reading any of the following
zfcp-specific scsi host sysfs attributes "requests", "megabytes", or
"seconds_active" in /sys/devices/css*/*.*.*/*.*.*/host*/scsi_host/host*/.

The other fc_host attributes active_fc4s and permanent_port_name continued
to show their last known "good" value.  Only if something triggered an
exchange port data, some values changed.  Active_fc4s became all zeros as
unknown equivalent during link down.  Permanent_port_name does not depend
on a local link. But for non-NPIV FCP devices, permanent_port_name
erroneously became whatever value fc_host port_name had at that point in
time (see previous paragraph).  Triggers for exchange port data are the
zfcp-specific scsi host sysfs attribute "utilization", or
[{reset,get}_fc_host_stats] write anything into "reset_statistics" or read
any of the other attributes under
/sys/devices/css*/*.*.*/*.*.*/host*/fc_host/host*/statistics/.

(cf. v4.9 commit bd77befa5b ("zfcp: fix fc_host port_type with NPIV"))

This is particularly confusing when using "lszfcp -b <fcpdevbusid> -Ha" or
dbginfo.sh which read fc_host attributes and also scsi_host attributes.
After link down, the first invocation produces (abbreviated):

Class = "fc_host"
    active_fc4s         = "0x00 0x00 0x01 0x00 ..."
    ...
    fabric_name         = "0x10000027f8e04c49"
    ...
    permanent_port_name = "0xc05076e4588059c1"
    port_id             = "0x244800"
    port_state          = "Linkdown"
    port_type           = "NPort (fabric via point-to-point)"
    ...
    speed               = "16 Gbit"
Class = "scsi_host"
    ...
    megabytes           = "0 0"
    ...
    requests            = "0 0 0"
    seconds_active      = "37"
    ...
    utilization         = "0 0 0"

The second and next invocations produce (abbreviated):

Class = "fc_host"
    active_fc4s         = "0x00 0x00 0x00 0x00 ..."
    ...
    fabric_name         = "0x0"
    ...
    permanent_port_name = "0x0"
    port_id             = "0x000000"
    port_state          = "Linkdown"
    port_type           = "Unknown"
    ...
    speed               = "unknown"
Class = "scsi_host"
    ...
    megabytes           = "0 0"
    ...
    requests            = "0 0 0"
    seconds_active      = "38"
    ...
    utilization         = "0 0 0"

Factor out the resetting of local link dependent fc_host attributes from
zfcp_fsf_exchange_config_data_handler() case
FSF_EXCHANGE_CONFIG_DATA_INCOMPLETE into a new helper function
zfcp_fsf_fc_host_link_down().  All code places that detect local link down
(SRB, FSF_PROT_LINK_DOWN, xconf data/port incomplete) call
zfcp_fsf_link_down_info_eval().  Call the new helper from there. This works
because zfcp_fsf_link_down_info_eval() and thus the helper is called before
zfcp_fsf_exchange_{config,port}_evaluate().

Port_name and node_name are always valid, so never reset them.

Get the permanent_port_name from exchange port data unconditionally as it
always has a valid known good value, even during link down.

Note: Rather than hardcode in zfcp_fsf_exchange_config_evaluate(), fc_host
supported_classes could theoretically get its value from
fsf_qtcb_bottom_port.class_of_service in zfcp_fsf_exchange_port_evaluate().

When the link comes back, we get a different notification, perform adapter
recovery, and this triggers an implicit exchange config data followed by
exchange port data filling in the link dependent fc_host attributes with
known good values again.

Link: https://lore.kernel.org/r/20200312174505.51294-5-maier@linux.ibm.com
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-17 13:12:36 -04:00
Steffen Maier
538c6e910b scsi: zfcp: wire previously driver-specific sysfs attributes also to fc_host
Manufacturer, HBA model, firmware version, and hardware version.  Use the
same value format as for the driver-specific attributes.  Keep the
driver-specific attributes for stable user space sysfs API.

Link: https://lore.kernel.org/r/20200312174505.51294-4-maier@linux.ibm.com
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-17 13:12:35 -04:00
Steffen Maier
e05a10a055 scsi: zfcp: expose fabric name as common fc_host sysfs attribute
FICON Express8S or older, as well as card features newer than FICON
Express16S+ have no certain firmware level requirement.

FICON Express16S or FICON Express16S+ have the following
minimum firmware level requirements to show a proper fabric name value:

 z13 machine
  FICON Express16S  , MCL P08424.005 , LIC version 0x00000721
 z14 machine
  FICON Express16S  , MCL P42611.008 , LIC version 0x10200069
  FICON Express16S+ , MCL P42625.010 , LIC version 0x10300147

Otherwise, the read value is not the fabric name.

Each FCP channel of these card features might need one SAN fabric re-login
after concurrent microcode update in order to show the proper fabric name.
Possible ways to trigger a SAN fabric re-login are one of: Pull fibres
between FCP channel port and SAN switch port on either side and re-plug,
disable SAN switch port adjacent to FCP channel port and re-enable switch
port, or at Service Element toggle off all CHPIDs of FCP channel over all
LPARs and toggle CHPIDs on again.  Zfcp operating subchannels (FCP devices)
on such FCP channel recovers a fabric re-login.

Initialize fabric name for any topology and have it an invalid WWPN 0x0 for
anything but fabric topology.  Otherwise for e.g. point-to-point topology
one could see the initial -1 from fc_host_setup() and after a link unplug
our fabric name would turn to 0x0 (with subsequent commit ("zfcp: fix
fc_host attributes that should be unknown on local link down") and stay 0x0
on link replug.  I did not initialize to 0x0 somewhere even earlier in the
code path such that it would not flap from real to 0x0 to real on e.g. an
exchange config data with fabric topology.

Link: https://lore.kernel.org/r/20200312174505.51294-3-maier@linux.ibm.com
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-17 13:12:34 -04:00
Steffen Maier
819732be9f scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point
v2.6.27 commit cc8c282963 ("[SCSI] zfcp: Automatically attach remote
ports") introduced zfcp automatic port scan.

Before that, the user had to use the sysfs attribute "port_add" of an FCP
device (adapter) to add and open remote (target) ports, even for the remote
peer port in point-to-point topology. That code path did a proper port open
recovery trigger taking the erp_lock.

Since above commit, a new helper function zfcp_erp_open_ptp_port()
performed an UNlocked port open recovery trigger. This can race with other
parallel recovery triggers. In zfcp_erp_action_enqueue() this could corrupt
e.g. adapter->erp_total_count or adapter->erp_ready_head.

As already found for fabric topology in v4.17 commit fa89adba19 ("scsi:
zfcp: fix infinite iteration on ERP ready list"), there was an endless loop
during tracing of rport (un)block.  A subsequent v4.18 commit 9e156c54ac
("scsi: zfcp: assert that the ERP lock is held when tracing a recovery
trigger") introduced a lockdep assertion for that case.

As a side effect, that lockdep assertion now uncovered the unlocked code
path for PtP. It is from within an adapter ERP action:

zfcp_erp_strategy[1479]  intentionally DROPs erp lock around
                         zfcp_erp_strategy_do_action()
zfcp_erp_strategy_do_action[1441]      NO erp lock
zfcp_erp_adapter_strategy[876]         NO erp lock
zfcp_erp_adapter_strategy_open[855]    NO erp lock
zfcp_erp_adapter_strategy_open_fsf[806]NO erp lock
zfcp_erp_adapter_strat_fsf_xconf[772]  erp lock only around
                                       zfcp_erp_action_to_running(),
                                       BUT *_not_* around
                                       zfcp_erp_enqueue_ptp_port()
zfcp_erp_enqueue_ptp_port[728]         BUG: *_not_* taking erp lock
_zfcp_erp_port_reopen[432]             assumes to be called with erp lock
zfcp_erp_action_enqueue[314]           assumes to be called with erp lock
zfcp_dbf_rec_trig[288]                 _checks_ to be called with erp lock:
	lockdep_assert_held(&adapter->erp_lock);

It causes the following lockdep warning:

WARNING: CPU: 2 PID: 775 at drivers/s390/scsi/zfcp_dbf.c:288
                            zfcp_dbf_rec_trig+0x16a/0x188
no locks held by zfcperp0.0.17c0/775.

Fix this by using the proper locked recovery trigger helper function.

Link: https://lore.kernel.org/r/20200312174505.51294-2-maier@linux.ibm.com
Fixes: cc8c282963 ("[SCSI] zfcp: Automatically attach remote ports")
Cc: <stable@vger.kernel.org> #v2.6.27+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-17 13:12:33 -04:00
Linus Torvalds
7557c1b3f7 SCSI fixes on 20200229
Four small fixes.  Three are in drivers for fairly obvious bugs.  The
 fourth is a set of regressions introduced by the compat_ioctl changes
 because some of the compat updates wrongly replaced .ioctl instead of
 .compat_ioctl.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXlpxDCYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSXsAPwOGPkU
 ObFbUs75Tdmk1M7jqtxgBsNhuNta0S8d7dJ3aAEA/YBtGGQWoeEGivUKwzwA4cwL
 1w1GbhPEblpMNO8keVA=
 =I7qk
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four small fixes.

  Three are in drivers for fairly obvious bugs. The fourth is a set of
  regressions introduced by the compat_ioctl changes because some of the
  compat updates wrongly replaced .ioctl instead of .compat_ioctl"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places
  scsi: zfcp: fix wrong data and display format of SFP+ temperature
  scsi: sd_sbc: Fix sd_zbc_report_zones()
  scsi: libfc: free response frame from GPN_ID
2020-02-29 09:58:47 -06:00
Benjamin Block
a3fd4bfe85 scsi: zfcp: fix wrong data and display format of SFP+ temperature
When implementing support for retrieval of local diagnostic data from the
FCP channel, the wrong data format was assumed for the temperature of the
local SFP+ connector. The Fibre Channel Link Services (FC-LS-3)
specification is not clear on the format of the stored integer, and only
after consulting the SNIA specification SFF-8472 did we realize it is
stored as two's complement. Thus, the used data and display format is
wrong, and highly misleading for users when the temperature should drop
below 0°C (however unlikely that may be).

To fix this, change the data format in `struct fsf_qtcb_bottom_port` from
unsigned to signed, and change the printf format string used to generate
`zfcp_sysfs_adapter_diag_sfp_temperature_show()` from `%hu` to `%hd`.

Link: https://lore.kernel.org/r/d6e3be5428da5c9490cfff4df7cae868bc9f1a7e.1582039501.git.bblock@linux.ibm.com
Fixes: a10a61e807 ("scsi: zfcp: support retrieval of SFP Data via Exchange Port Data")
Fixes: 6028f7c4cd ("scsi: zfcp: introduce sysfs interface for diagnostics of local SFP transceiver")
Cc: <stable@vger.kernel.org> # 5.5+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-24 12:51:15 -05:00
Julian Wiedmann
2db01da8d2 s390/qdio: fill SBALEs with absolute addresses
sbale->addr holds an absolute address (or for some FCP usage, an opaque
request ID), and should only be used with proper virt/phys translation.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-02-19 17:26:32 +01:00
Steffen Maier
100843f176 scsi: zfcp: trace channel log even for FCP command responses
While v2.6.26 commit b75db73159 ("[SCSI] zfcp: Add qtcb dump to hba debug
trace") is right that we don't want to flood the (payload) trace ring
buffer, we don't trace successful FCP command responses by default.  So we
can include the channel log for problem determination with failed responses
of any FSF request type.

Fixes: b75db73159 ("[SCSI] zfcp: Add qtcb dump to hba debug trace")
Fixes: a54ca0f62f ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Cc: <stable@vger.kernel.org> #2.6.38+
Link: https://lore.kernel.org/r/e37597b5c4ae123aaa85fd86c23a9f71e994e4a9.1572018132.git.bblock@linux.ibm.com
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 22:16:15 -04:00
Steffen Maier
e76acc5194 scsi: zfcp: proper indentation to reduce confusion in zfcp_erp_required_act
No functional change.

The unary not operator only applies to the sub expression before the
logical or. So we return early if (not running) or failed.

Link: https://lore.kernel.org/r/df4f897f6e83eaa528465d0858d5a22daac47a2f.1572018132.git.bblock@linux.ibm.com
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 22:16:15 -04:00
Benjamin Block
48910f8c35 scsi: zfcp: move maximum age of diagnostic buffers into a per-adapter variable
Replace the static define (ZFCP_DIAG_MAX_AGE) with a per-adapter variable
(${adapter}->diagnostics->max_age). This new variable is exported via
sysfs, along with other, already existing adapter variables, and can both
be read and written. This way users can choose how much time should pass
between refreshes of diagnostic buffers. The default value for the age
remains to be five seconds.

By setting this new variable to 0, the caching of diagnostic buffers for
userspace accesses can also be completely removed.

All diagnostic buffers of a given adapter are subject to this setting in
the same way.

Link: https://lore.kernel.org/r/b1d0977cc884b16dd4ca6418e4320c56a4c31d63.1572018132.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 22:16:15 -04:00
Benjamin Block
8a72db70b5 scsi: zfcp: implicitly refresh config-data diagnostics when reading sysfs
Adds implicit updates of cached diagnostics via Exchange Config Data when
reading sysfs attributes interfacing them. Right now this only affects the
new B2B-Credit diagnostic attribute.

This uses the same mechanism previously also used for cached diagnostics
of Exchange Port Data.

Link: https://lore.kernel.org/r/60a94f55f2630b74b468fed5f39880208abb2679.1572018132.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 22:16:15 -04:00
Benjamin Block
5a2876f0d1 scsi: zfcp: introduce sysfs interface to read the local B2B-Credit
In addition to the diagnostic data from the local SFP transceiver this
patch adds an interface to read the advertised buffer-to-buffer credit from
the local FC_Port.

With this patch the userspace-interface will only read data stored in the
corresponding "diagnostic buffer" (that was stored during completion of a
previous Exchange Config Data command). Implicit updating will follow later
in this series.

Link: https://lore.kernel.org/r/8a53aef87b53c50cfb1a3425b799bacb6f82b832.1572018132.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 22:16:15 -04:00
Benjamin Block
8155eb0785 scsi: zfcp: implicitly refresh port-data diagnostics when reading sysfs
This patch adds implicit updates to the sysfs entries that read the
diagnostic data stored in the "caching buffer" for Exchange Port Data.

An update is triggered once the buffer is older than ZFCP_DIAG_MAX_AGE
milliseconds (5s). This entails sending an Exchange Port Data command to
the FCP-Channel, and during its ingress path updating the cached data and
the timestamp. To prevent multiple concurrent userspace-applications from
triggering this update in parallel we synchronize all of them using a
wait-queue (waiting threads are interruptible; the updating thread is not).

Link: https://lore.kernel.org/r/c145b5cfc99a63b6a018b1184fbd27bb09c955f5.1572018132.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 22:16:15 -04:00
Benjamin Block
6028f7c4cd scsi: zfcp: introduce sysfs interface for diagnostics of local SFP transceiver
This adds an interface to read the diagnostics of the local SFP transceiver
of an FCP-Channel from userspace. This comes in the form of new sysfs
entries that are attached to the CCW device representing the FCP
device. Each type of data gets its own sysfs entry; the whole collection of
entries is pooled into a new child-directory of the CCW device node:
"diagnostics".

Adds sysfs entries for:
 * sfp_invalid:    boolean value evaluating to whether the following 5
                   fields are invalid; {0, 1}; 1 - invalid
 * temperature:    transceiver temp.; unit 1/256°C;
                   range [-128°C, +128°C]
 * vcc:            supply voltage; unit 100μV; range [0, 6.55V]
 * tx_bias:        transmitter laser bias current; unit 2μA;
                   range [0, 131mA]
 * tx_power:       coupled TX output power; unit 0.1μW; range [0, 6.5mW]
 * rx_power:       received optical power; unit 0.1μW; range [0, 6.5mW]

 * optical_port:   boolean value evaluating to whether the FCP-Channel has
                   an optical port; {0, 1}; 1 - optical
 * fec_active:     boolean value evaluating to whether 16G FEC is active;
                   {0, 1}; 1 - active
 * port_tx_type:   nibble describing the port type; {0, 1, 2, 3};
                   0 - unknown,             1 - short wave,
                   2 - long wave LC 1310nm, 3 - long wave LL 1550nm
 * connector_type: two bits describing the connector type; {0, 1};
                   0 - unknown,             1 - SFP+

This is only supported if the FCP-Channel in turn supports reporting the
SFP Diagnostic Data, otherwise read() on these new entries will return
EOPNOTSUPP (this affects only adapters older than FICON Express8S, on
Mainframe generations older than z14). Other possible errors for read()
include ENOLINK, ENODEV and ENOMEM.

With this patch the userspace-interface will only read data stored in
the corresponding "diagnostic buffer" (that was stored during completion
of an previous Exchange Port Data command). Implicit updating will
follow later in this series.

Link: https://lore.kernel.org/r/1f9cce7c829c881e7d71a3f10c5b57f3dd84ab32.1572018132.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 22:16:15 -04:00
Benjamin Block
a10a61e807 scsi: zfcp: support retrieval of SFP Data via Exchange Port Data
A new FCP channel feature allows us to read the diagnostics from our local
SFP transceivers. To make use of that add a flag
(FSF_FEATURE_REQUEST_SFP_DATA) to the feature-set we request from the FCP
channel. Whether the channel actually implements this can be determined via
an other new flag (FSF_FEATURE_REPORT_SFP_DATA), that is set in the
adapter_features field of the adapter structure after Exchange Config Data
finished.

Also add the corresponding definitions in the QTCB Bottom for Exchange Port
Data. These new definitions are only valid, if FSF_FEATURE_REPORT_SFP_DATA
is set.

Link: https://lore.kernel.org/r/ee1eba4de71eb06b4d82207ad4f428429346156f.1572018132.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 22:16:15 -04:00
Benjamin Block
088210233e scsi: zfcp: add diagnostics buffer for exchange config data
In the same vein as the previous patch, add diagnostic data capture for the
Exchange Config Data command.

Link: https://lore.kernel.org/r/7d8ac0a6cad403fa8f8b888693476a84e80a277b.1572018131.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 22:16:14 -04:00
Benjamin Block
7e418833e6 scsi: zfcp: diagnostics buffer caching and use for exchange port data
The FCP channel exposes two central interfaces to receive information about
the local FCP-Adapter/-Port: Exchange Port and Exchange Config Data. Using
these commands can negatively impact the adapter if we allow them to be
sent at a very high rate.

The later parts of this patchset will introduce new user-interfaces to
receive more diagnostics from the adapter. To prevent any negative impact
from using those, this patch adds a simple caching-mechanism that will
prevent a malicious/faulty userspace-application from generating an
abnormal high amount of Exchange Port/Config Data traffic.

Relevant diagnostic data that is received via Exchange Config/Port Data is
cached in buffers associated with the corresponding adapter-struct.  Each
buffer is associated with a timestamp that signals how old the data is,
and, added via a following patch in this series, lets userspace-interfaces
determine when the data is too old and needs to be updated.

Buffer-updates are made during the normal response path of the
corresponding command. With this patch only the output of the Exchange Port
Data command is captured.

Link: https://lore.kernel.org/r/054ca020ce0a53dc0d9176428bea373898944e6a.1572018130.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 22:16:14 -04:00
Benjamin Block
92953c6e0a scsi: zfcp: signal incomplete or error for sync exchange config/port data
Adds a new FSF-Request status flag (ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE)
that signal that the data received using Exchange Config Data or Exchange
Port Data was incomplete. This new flags is set in the respective handlers
during the response path.

With this patch, only the synchronous FSF-functions for each command got
support for the new flag, otherwise it is transparent.

Together with this new flag and already existing status flags the
synchronous FSF-functions are extended to now detect whether the received
data is complete, incomplete or completely invalid (this includes cases
where a command ran into a timeout). This is now signaled back to the
caller, where previously only failures on the request path would result in
a bad return-code.

For complete data the return-code remains 0. For incomplete data a new
return-code -EAGAIN is added to the function-interface. For completely
invalid data the already existing return-code -EIO is reused - formerly
this was used to signal failures on the request path.

Existing callers of the FSF-functions are adjusted so that they behave as
before for return-code 0 and -EAGAIN, to not change the user-interface. As
-EIO existed all along, it was already exposed to the user - and needed
handling - and will now also be exposed in this new special case.

Link: https://lore.kernel.org/r/e14f0702fa2b00a4d1f37c7981a13f2dd1ea2c83.1572018130.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-28 22:16:14 -04:00
Steffen Maier
2190168aae scsi: zfcp: fix reaction on bit error threshold notification
On excessive bit errors for the FCP channel ingress fibre path, the channel
notifies us.  Previously, we only emitted a kernel message and a trace
record.  Since performance can become suboptimal with I/O timeouts due to
bit errors, we now stop using an FCP device by default on channel
notification so multipath on top can timely failover to other paths.  A new
module parameter zfcp.ber_stop can be used to get zfcp old behavior.

User explanation of new kernel message:

 * Description:
 * The FCP channel reported that its bit error threshold has been exceeded.
 * These errors might result from a problem with the physical components
 * of the local fibre link into the FCP channel.
 * The problem might be damage or malfunction of the cable or
 * cable connection between the FCP channel and
 * the adjacent fabric switch port or the point-to-point peer.
 * Find details about the errors in the HBA trace for the FCP device.
 * The zfcp device driver closed down the FCP device
 * to limit the performance impact from possible I/O command timeouts.
 * User action:
 * Check for problems on the local fibre link, ensure that fibre optics are
 * clean and functional, and all cables are properly plugged.
 * After the repair action, you can manually recover the FCP device by
 * writing "0" into its "failed" sysfs attribute.
 * If recovery through sysfs is not possible, set the CHPID of the device
 * offline and back online on the service element.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org> #2.6.30+
Link: https://lore.kernel.org/r/20191001104949.42810-1-maier@linux.ibm.com
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-03 21:45:53 -04:00
Linus Torvalds
f65420df91 SCSI fixes on 20190720
This is the final round of mostly small fixes in our initial submit.
 It's mostly minor fixes and driver updates.  The only change of note
 is adding a virt_boundary_mask to the SCSI host and host template to
 parametrise this for NVMe devices instead of having them do a call in
 slave_alloc.  It's a fairly straightforward conversion except in the
 two NVMe handling drivers that didn't set it who now have a virtual
 infinity parameter added.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXTJS/yYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQTNAQCsTdkA
 IN1BvDBbE+KO8mvL5DuRxLtnDU6Pq5K6fkrE3gD/a1GkqyPPaJIuspq7fQY87DH/
 o7VsJd/5uGphIE2Ls+M=
 =38XV
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is the final round of mostly small fixes in our initial submit.

  It's mostly minor fixes and driver updates. The only change of note is
  adding a virt_boundary_mask to the SCSI host and host template to
  parametrise this for NVMe devices instead of having them do a call in
  slave_alloc. It's a fairly straightforward conversion except in the
  two NVMe handling drivers that didn't set it who now have a virtual
  infinity parameter added"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
  scsi: megaraid_sas: set an unlimited max_segment_size
  scsi: mpt3sas: set an unlimited max_segment_size for SAS 3.0 HBAs
  scsi: IB/srp: set virt_boundary_mask in the scsi host
  scsi: IB/iser: set virt_boundary_mask in the scsi host
  scsi: storvsc: set virt_boundary_mask in the scsi host template
  scsi: ufshcd: set max_segment_size in the scsi host template
  scsi: core: take the DMA max mapping size into account
  scsi: core: add a host / host template field for the virt boundary
  scsi: core: Fix race on creating sense cache
  scsi: sd_zbc: Fix compilation warning
  scsi: libfc: fix null pointer dereference on a null lport
  scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized
  scsi: zfcp: fix request object use-after-free in send path causing wrong traces
  scsi: zfcp: fix request object use-after-free in send path causing seqno errors
  scsi: megaraid_sas: Update driver version to 07.710.50.00
  scsi: megaraid_sas: Add module parameter for FW Async event logging
  scsi: megaraid_sas: Enable msix_load_balance for Invader and later controllers
  scsi: megaraid_sas: Fix calculation of target ID
  scsi: lpfc: reduce stack size with CONFIG_GCC_PLUGIN_STRUCTLEAK_VERBOSE
  scsi: devinfo: BLIST_TRY_VPD_PAGES for SanDisk Cruzer Blade
  ...
2019-07-20 10:04:58 -07:00
Benjamin Block
4846470888 scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized
GCC v9 emits this warning:
      CC      drivers/s390/scsi/zfcp_erp.o
    drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_action_enqueue':
    drivers/s390/scsi/zfcp_erp.c:217:26: warning: 'erp_action' may be used uninitialized in this function [-Wmaybe-uninitialized]
      217 |  struct zfcp_erp_action *erp_action;
          |                          ^~~~~~~~~~

This is a possible false positive case, as also documented in the GCC
documentations:
    https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized

The actual code-sequence is like this:
    Various callers can invoke the function below with the argument "want"
    being one of:
    ZFCP_ERP_ACTION_REOPEN_ADAPTER,
    ZFCP_ERP_ACTION_REOPEN_PORT_FORCED,
    ZFCP_ERP_ACTION_REOPEN_PORT, or
    ZFCP_ERP_ACTION_REOPEN_LUN.

    zfcp_erp_action_enqueue(want, ...)
        ...
        need = zfcp_erp_required_act(want, ...)
            need = want
            ...
            maybe: need = ZFCP_ERP_ACTION_REOPEN_PORT
            maybe: need = ZFCP_ERP_ACTION_REOPEN_ADAPTER
            ...
            return need
        ...
        zfcp_erp_setup_act(need, ...)
            struct zfcp_erp_action *erp_action; // <== line 217
            ...
            switch(need) {
            case ZFCP_ERP_ACTION_REOPEN_LUN:
                    ...
                    erp_action = &zfcp_sdev->erp_action;
                    WARN_ON_ONCE(erp_action->port != port); // <== access
                    ...
                    break;
            case ZFCP_ERP_ACTION_REOPEN_PORT:
            case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
                    ...
                    erp_action = &port->erp_action;
                    WARN_ON_ONCE(erp_action->port != port); // <== access
                    ...
                    break;
            case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
                    ...
                    erp_action = &adapter->erp_action;
                    WARN_ON_ONCE(erp_action->port != NULL); // <== access
                    ...
                    break;
            }
            ...
            WARN_ON_ONCE(erp_action->adapter != adapter); // <== access

When zfcp_erp_setup_act() is called, 'need' will never be anything else
than one of the 4 possible enumeration-names that are used in the
switch-case, and 'erp_action' is initialized for every one of them, before
it is used. Thus the warning is a false positive, as documented.

We introduce the extra if{} in the beginning to create an extra code-flow,
so the compiler can be convinced that the switch-case will never see any
other value.

BUG_ON()/BUG() is intentionally not used to not crash anything, should
this ever happen anyway - right now it's impossible, as argued above; and
it doesn't introduce a 'default:' switch-case to retain warnings should
'enum zfcp_erp_act_type' ever be extended and no explicit case be
introduced. See also v5.0 commit 399b6c8bc9 ("scsi: zfcp: drop old
default switch case which might paper over missing case").

Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-11 21:04:23 -04:00
Benjamin Block
106d45f350 scsi: zfcp: fix request object use-after-free in send path causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.

But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).

To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.

Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Fixes: d27a7cb919 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-11 21:04:22 -04:00
Benjamin Block
b76becde2b scsi: zfcp: fix request object use-after-free in send path causing seqno errors
With a recent change to our send path for FSF commands we introduced a
possible use-after-free of request-objects, that might further lead to
zfcp crafting bad requests, which the FCP channel correctly complains
about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled
by an adapter-wide recovery.

The following sequence illustrates the possible use-after-free:

    Send Path:

        int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action)
        {
                struct zfcp_fsf_req *req;
                ...
                spin_lock_irq(&qdio->req_q_lock);
        //                     ^^^^^^^^^^^^^^^^
        //                     protects QDIO queue during sending
                ...
                req = zfcp_fsf_req_create(qdio,
                                          FSF_QTCB_OPEN_PORT_WITH_DID,
                                          SBAL_SFLAGS0_TYPE_READ,
                                          qdio->adapter->pool.erp_req);
        //            ^^^^^^^^^^^^^^^^^^^
        //            allocation of the request-object
                ...
                retval = zfcp_fsf_req_send(req);
                ...
                spin_unlock_irq(&qdio->req_q_lock);
                return retval;
        }

        static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
        {
                struct zfcp_adapter *adapter = req->adapter;
                struct zfcp_qdio *qdio = adapter->qdio;
                ...
                zfcp_reqlist_add(adapter->req_list, req);
        //      ^^^^^^^^^^^^^^^^
        //      add request to our driver-internal hash-table for tracking
        //      (protected by separate lock req_list->lock)
                ...
                if (zfcp_qdio_send(qdio, &req->qdio_req)) {
        //          ^^^^^^^^^^^^^^
        //          hand-off the request to FCP channel;
        //          the request can complete at any point now
                        ...
                }

                /* Don't increase for unsolicited status */
                if (!zfcp_fsf_req_is_status_read_buffer(req))
        //           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        //           possible use-after-free
                        adapter->fsf_req_seq_no++;
        //                       ^^^^^^^^^^^^^^^^
        //                       because of the use-after-free we might
        //                       miss this accounting, and as follow-up
        //                       this results in the FCP channel error
        //                       FSF_PROT_SEQ_NUMB_ERROR
                adapter->req_no++;

                return 0;
        }

        static inline bool
        zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req)
        {
                return req->qtcb == NULL;
        //             ^^^^^^^^^
        //             possible use-after-free
        }

    Response Path:

        void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx)
        {
                ...
                struct zfcp_fsf_req *fsf_req;
                ...
                for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) {
                        ...
                        fsf_req = zfcp_reqlist_find_rm(adapter->req_list,
                                                       req_id);
        //                        ^^^^^^^^^^^^^^^^^^^^
        //                        remove request from our driver-internal
        //                        hash-table (lock req_list->lock)
                        ...
                        zfcp_fsf_req_complete(fsf_req);
                }
        }

        static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req)
        {
                ...
                if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP))
                        zfcp_fsf_req_free(req);
        //              ^^^^^^^^^^^^^^^^^
        //              free memory for request-object
                else
                        complete(&req->completion);
        //              ^^^^^^^^
        //              completion notification for code-paths that wait
        //              synchronous for the completion of the request; in
        //              those the memory is freed separately
        }

The result of the use-after-free only affects the send path, and can not
lead to any data corruption. In case we miss the sequence-number
accounting, because the memory was already re-purposed, the next FSF
command will fail with said FCP channel error, and we will recover the
whole adapter. This causes no additional errors, but it slows down
traffic.  There is a slight chance of the same thing happen again
recursively after the adapter recovery, but so far this has not been seen.

This was seen under z/VM, where the send path might run on a virtual CPU
that gets scheduled away by z/VM, while the return path might still run,
and so create the necessary timing. Running with KASAN can also slow down
the kernel sufficiently to run into this user-after-free, and then see the
report by KASAN.

To fix this, simply pull the test for the sequence-number accounting in
front of the hand-off to the FCP channel (this information doesn't change
during hand-off), but leave the sequence-number accounting itself where it
is.

To make future regressions of the same kind less likely, add comments to
all closely related code-paths.

Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Fixes: f9eca02276 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header")
Cc: <stable@vger.kernel.org> #5.0+
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-11 21:04:22 -04:00
Linus Torvalds
1f7563f743 SCSI sg on 20190709
This topic branch covers a fundamental change in how our sg lists are
 allocated to make mq more efficient by reducing the size of the
 preallocated sg list.  This necessitates a large number of driver
 changes because the previous guarantee that if a driver specified
 SG_ALL as the size of its scatter list, it would get a non-chained
 list and didn't need to bother with scatterlist iterators is now
 broken and every driver *must* use scatterlist iterators.
 
 This was broken out as a separate topic because we need to convert all
 the drivers before pulling the trigger and unconverted drivers kept
 being found, necessitating a rebase.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXSTzzCYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZB+AP9I8j/s
 wWfg0Z3WNuf4D5I3rH4x1J3cQTqPJed+RjwgcQEA1gZvtOTg1ZEn/CYMVnaB92x0
 t6MZSchIaFXeqfD+E7U=
 =cv8o
 -----END PGP SIGNATURE-----

Merge tag 'scsi-sg' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI scatter-gather list updates from James Bottomley:
 "This topic branch covers a fundamental change in how our sg lists are
  allocated to make mq more efficient by reducing the size of the
  preallocated sg list.

  This necessitates a large number of driver changes because the
  previous guarantee that if a driver specified SG_ALL as the size of
  its scatter list, it would get a non-chained list and didn't need to
  bother with scatterlist iterators is now broken and every driver
  *must* use scatterlist iterators.

  This was broken out as a separate topic because we need to convert all
  the drivers before pulling the trigger and unconverted drivers kept
  being found, necessitating a rebase"

* tag 'scsi-sg' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (21 commits)
  scsi: core: don't preallocate small SGL in case of NO_SG_CHAIN
  scsi: lib/sg_pool.c: clear 'first_chunk' in case of no preallocation
  scsi: core: avoid preallocating big SGL for data
  scsi: core: avoid preallocating big SGL for protection information
  scsi: lib/sg_pool.c: improve APIs for allocating sg pool
  scsi: esp: use sg helper to iterate over scatterlist
  scsi: NCR5380: use sg helper to iterate over scatterlist
  scsi: wd33c93: use sg helper to iterate over scatterlist
  scsi: ppa: use sg helper to iterate over scatterlist
  scsi: pcmcia: nsp_cs: use sg helper to iterate over scatterlist
  scsi: imm: use sg helper to iterate over scatterlist
  scsi: aha152x: use sg helper to iterate over scatterlist
  scsi: s390: zfcp_fc: use sg helper to iterate over scatterlist
  scsi: staging: unisys: visorhba: use sg helper to iterate over scatterlist
  scsi: usb: image: microtek: use sg helper to iterate over scatterlist
  scsi: pmcraid: use sg helper to iterate over scatterlist
  scsi: ipr: use sg helper to iterate over scatterlist
  scsi: mvumi: use sg helper to iterate over scatterlist
  scsi: lpfc: use sg helper to iterate over scatterlist
  scsi: advansys: use sg helper to iterate over scatterlist
  ...
2019-07-11 15:17:41 -07:00
Ming Lei
013be03840 scsi: s390: zfcp_fc: use sg helper to iterate over scatterlist
Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Cc: Steffen Maier <maier@linux.ibm.com>
Cc: Benjamin Block <bblock@linux.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Acked-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-20 15:21:33 -04:00
Steffen Maier
ef4021fe5f scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)
When the user tries to remove a zfcp port via sysfs, we only rejected it if
there are zfcp unit children under the port. With purely automatically
scanned LUNs there are no zfcp units but only SCSI devices. In such cases,
the port_remove erroneously continued. We close the port and this
implicitly closes all LUNs under the port. The SCSI devices survive with
their private zfcp_scsi_dev still holding a reference to the "removed"
zfcp_port (still allocated but invisible in sysfs) [zfcp_get_port_by_wwpn
in zfcp_scsi_slave_alloc]. This is not a problem as long as the fc_rport
stays blocked. Once (auto) port scan brings back the removed port, we
unblock its fc_rport again by design.  However, there is no mechanism that
would recover (open) the LUNs under the port (no "ersfs_3" without
zfcp_unit [zfcp_erp_strategy_followup_success]).  Any pending or new I/O to
such LUN leads to repeated:

  Done: NEEDS_RETRY Result: hostbyte=DID_IMM_RETRY driverbyte=DRIVER_OK

See also v4.10 commit 6f2ce1c6af ("scsi: zfcp: fix rport unblock race
with LUN recovery"). Even a manual LUN recovery
(echo 0 > /sys/bus/scsi/devices/H:C:T:L/zfcp_failed)
does not help, as the LUN links to the old "removed" port which remains
to lack ZFCP_STATUS_COMMON_RUNNING [zfcp_erp_required_act].
The only workaround is to first ensure that the fc_rport is blocked
(e.g. port_remove again in case it was re-discovered by (auto) port scan),
then delete the SCSI devices, and finally re-discover by (auto) port scan.
The port scan includes an fc_rport unblock, which in turn triggers
a new scan on the scsi target to freshly get new pure auto scan LUNs.

Fix this by rejecting port_remove also if there are SCSI devices
(even without any zfcp_unit) under this port. Re-use mechanics from v3.7
commit d99b601b63 ("[SCSI] zfcp: restore refcount check on port_remove").
However, we have to give up zfcp_sysfs_port_units_mutex earlier in unit_add
to prevent a deadlock with scsi_host scan taking shost->scan_mutex first
and then zfcp_sysfs_port_units_mutex now in our zfcp_scsi_slave_alloc().

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: b62a8d9b45 ("[SCSI] zfcp: Use SCSI device data zfcp scsi dev instead of zfcp unit")
Fixes: f8210e3488 ("[SCSI] zfcp: Allow midlayer to scan for LUNs when running in NPIV mode")
Cc: <stable@vger.kernel.org> #2.6.37+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-29 21:52:31 -04:00
Steffen Maier
d27e5e07f9 scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove
With this early return due to zfcp_unit child(ren), we don't use the
zfcp_port reference from the earlier zfcp_get_port_by_wwpn() anymore and
need to put it.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: d99b601b63 ("[SCSI] zfcp: restore refcount check on port_remove")
Cc: <stable@vger.kernel.org> #3.7+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-29 21:50:51 -04:00
Steffen Maier
c820657917 scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN
If an incoming ELS of type RSCN contains more than one element, zfcp
suboptimally causes repeated erp trigger NOP trace records for each
previously failed port. These could be ports that went away.  It loops over
each RSCN element, and for each of those in an inner loop over all
zfcp_ports.

The trigger to recover failed ports should be just the reception of some
RSCN, no matter how many elements it has. So we can loop over failed ports
separately, and only then loop over each RSCN element to handle the
non-failed ports.

The call chain was:

  zfcp_fc_incoming_rscn
    for (i = 1; i < no_entries; i++)
      _zfcp_fc_incoming_rscn
        list_for_each_entry(port, &adapter->port_list, list)
          if (masked port->d_id match) zfcp_fc_test_link
          if (!port->d_id) zfcp_erp_port_reopen "fcrscn1"   <===

In order the reduce the "flooding" of the REC trace area in such cases, we
factor out handling the failed ports to be outside of the entries loop:

  zfcp_fc_incoming_rscn
    if (no_entries > 1)                                     <===
      list_for_each_entry(port, &adapter->port_list, list)  <===
        if (!port->d_id) zfcp_erp_port_reopen "fcrscn1"     <===
    for (i = 1; i < no_entries; i++)
      _zfcp_fc_incoming_rscn
        list_for_each_entry(port, &adapter->port_list, list)
          if (masked port->d_id match) zfcp_fc_test_link

Abbreviated example trace records before this code change:

Tag            : fcrscn1
WWPN           : 0x500507630310d327
ERP want       : 0x02
ERP need       : 0x02

Tag            : fcrscn1
WWPN           : 0x500507630310d327
ERP want       : 0x02
ERP need       : 0x00                 NOP => superfluous trace record

The last trace entry repeats if there are more than 2 RSCN elements.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27 21:26:12 -04:00
Steffen Maier
242ec14551 scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices
Suppose more than one non-NPIV FCP device is active on the same channel.
Send I/O to storage and have some of the pending I/O run into a SCSI
command timeout, e.g. due to bit errors on the fibre. Now the error
situation stops. However, we saw FCP requests continue to timeout in the
channel. The abort will be successful, but the subsequent TUR fails.
Scsi_eh starts. The LUN reset fails. The target reset fails.  The host
reset only did an FCP device recovery. However, for non-NPIV FCP devices,
this does not close and reopen ports on the SAN-side if other non-NPIV FCP
device(s) share the same open ports.

In order to resolve the continuing FCP request timeouts, we need to
explicitly close and reopen ports on the SAN-side.

This was missing since the beginning of zfcp in v2.6.0 history commit
ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter.").

Note: The FSF requests for forced port reopen could run into FSF request
timeouts due to other reasons. This would trigger an internal FCP device
recovery. Pending forced port reopen recoveries would get dismissed. So
some ports might not get fully reopened during this host reset handler.
However, subsequent I/O would trigger the above described escalation and
eventually all ports would be forced reopen to resolve any continuing FCP
request timeouts due to earlier bit errors.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org> #3.0+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27 21:26:12 -04:00
Steffen Maier
fe67888fc0 scsi: zfcp: fix rport unblock if deleted SCSI devices on Scsi_Host
An already deleted SCSI device can exist on the Scsi_Host and remain there
because something still holds a reference.  A new SCSI device with the same
H:C:T:L and FCP device, target port WWPN, and FCP LUN can be created.  When
we try to unblock an rport, we still find the deleted SCSI device and
return early because the zfcp_scsi_dev of that SCSI device is not
ZFCP_STATUS_COMMON_UNBLOCKED. Hence we miss to unblock the rport, even if
the new proper SCSI device would be in good state.

Therefore, skip deleted SCSI devices when iterating the sdevs of the shost.
[cf. __scsi_device_lookup{_by_target}() or scsi_device_get()]

The following abbreviated trace sequence can indicate such problem:

Area           : REC
Tag            : ersfs_3
LUN            : 0x4045400300000000
WWPN           : 0x50050763031bd327
LUN status     : 0x40000000     not ZFCP_STATUS_COMMON_UNBLOCKED
Ready count    : n		not incremented yet
Running count  : 0x00000000
ERP want       : 0x01
ERP need       : 0xc1		ZFCP_ERP_ACTION_NONE

Area           : REC
Tag            : ersfs_3
LUN            : 0x4045400300000000
WWPN           : 0x50050763031bd327
LUN status     : 0x41000000
Ready count    : n+1
Running count  : 0x00000000
ERP want       : 0x01
ERP need       : 0x01

...

Area           : REC
Level          : 4		only with increased trace level
Tag            : ertru_l
LUN            : 0x4045400300000000
WWPN           : 0x50050763031bd327
LUN status     : 0x40000000
Request ID     : 0x0000000000000000
ERP status     : 0x01800000
ERP step       : 0x1000
ERP action     : 0x01
ERP count      : 0x00

NOT followed by a trace record with tag "scpaddy"
for WWPN 0x50050763031bd327.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 6f2ce1c6af ("scsi: zfcp: fix rport unblock race with LUN recovery")
Cc: <stable@vger.kernel.org> #2.6.32+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27 21:26:12 -04:00
Linus Torvalds
3591b19511 s390 updates for the 5.1 merge window
- A copy of Arnds compat wrapper generation series
 
  - Pass information about the KVM guest to the host in form the control
    program code and the control program version code
 
  - Map IOV resources to support PCI physical functions on s390
 
  - Add vector load and store alignment hints to improve performance
 
  - Use the "jdd" constraint with gcc 9 to make jump labels working again
 
  - Remove amode workaround for old z/VM releases from the DCSS code
 
  - Add support for in-kernel performance measurements using the
    CPU measurement counter facility
 
  - Introduce a new PMU device cpum_cf_diag to capture counters and
    store thenn as event raw data.
 
  - Bug fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJcfh4QAAoJEDjwexyKj9rgXVAH/RzVbi3vznldujSNfCFTZKPu
 EmFFAZIfbhifW3szfylyOJL52pFhxjcWzY0hkFEkbs2t90sn8l1BNkDscYZtfNHC
 XvN3N9LsHyxOeyxvQuWLSio58qm+Lr1L0UrIhbMvqyAVkOLmIHvybFwi83OkMptm
 djoL8NbuNsAA2s26y2bZLNtU7FmOW5smJIlnt7H4dmK4SFylqZKS/EnUZxGDgn+7
 UrrTTOQUir0QZ8vraANsP1M0/LqPcd2YusLmj4jOdZ5Muc2Ch2AA991FofqdKShO
 /8cGlsIzwHWGgdnP/YDea5gbetvonayYduixKy3EnYpWQ9iogiBjH4G7QNxcncs=
 =v26J
 -----END PGP SIGNATURE-----

Merge tag 's390-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Martin Schwidefsky:

 - A copy of Arnds compat wrapper generation series

 - Pass information about the KVM guest to the host in form the control
   program code and the control program version code

 - Map IOV resources to support PCI physical functions on s390

 - Add vector load and store alignment hints to improve performance

 - Use the "jdd" constraint with gcc 9 to make jump labels working again

 - Remove amode workaround for old z/VM releases from the DCSS code

 - Add support for in-kernel performance measurements using the CPU
   measurement counter facility

 - Introduce a new PMU device cpum_cf_diag to capture counters and store
   thenn as event raw data.

 - Bug fixes and cleanups

* tag 's390-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
  Revert "s390/cpum_cf: Add kernel message exaplanations"
  s390/dasd: fix read device characteristic with CONFIG_VMAP_STACK=y
  s390/suspend: fix prefix register reset in swsusp_arch_resume
  s390: warn about clearing als implied facilities
  s390: allow overriding facilities via command line
  s390: clean up redundant facilities list setup
  s390/als: remove duplicated in-place implementation of stfle
  s390/cio: Use cpa range elsewhere within vfio-ccw
  s390/cio: Fix vfio-ccw handling of recursive TICs
  s390: vfio_ap: link the vfio_ap devices to the vfio_ap bus subsystem
  s390/cpum_cf: Handle EBUSY return code from CPU counter facility reservation
  s390/cpum_cf: Add kernel message exaplanations
  s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace
  s390/cpum_cf: add ctr_stcctm() function
  s390/cpum_cf: move common functions into a separate file
  s390/cpum_cf: introduce kernel_cpumcf_avail() function
  s390/cpu_mf: replace stcctm5() with the stcctm() function
  s390/cpu_mf: add store cpu counter multiple instruction support
  s390/cpum_cf: Add minimal in-kernel interface for counter measurements
  s390/cpum_cf: introduce kernel_cpumcf_alert() to obtain measurement alerts
  ...
2019-03-05 11:13:10 -08:00
Julian Wiedmann
bdf117674e s390/qdio: make SBAL address array type-safe
There is no need to use void pointers, all drivers are in agreement
about the underlying data structure of the SBAL arrays.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-02-07 11:57:07 +01:00
Steffen Maier
b63195698d scsi: zfcp: fix sysfs block queue limit output for max_segment_size
Since v2.6.35 commit 683229845f ("[SCSI] zfcp: Report scatter-gather
limits to SCSI and block layer"), zfcp set dma_parms.max_segment_size ==
PAGE_SIZE (but without using the setter dma_set_max_seg_size()) and
scsi_host_template.dma_boundary == PAGE_SIZE - 1.

v5.0-rc1 commit 50c2e9107f ("scsi: introduce a max_segment_size
host_template parameters") introduced a new field
scsi_host_template.max_segment_size. If an LLDD such as zfcp does not set
it, scsi_host_alloc() uses BLK_MAX_SEGMENT_SIZE = 65536 for
Scsi_Host.max_segment_size. __scsi_init_queue() announced the minimum of
Scsi_Host.max_segment_size and dma_parms.max_segment_size to the block
layer. For zfcp: min(65536, 4096) == 4096 which was still good.

v5.0 commit a8cf59a669 ("scsi: communicate max segment size to the DMA
mapping code") announces Scsi_Host.max_segment_size to the block layer and
overwrites dma_parms.max_segment_size with Scsi_Host.max_segment_size.  For
zfcp dma_parms.max_segment_size == Scsi_Host.max_segment_size == 65536
which is also reflected in block queue limits.

$ cd /sys/bus/ccw/drivers/zfcp
$ cd 0.0.3c40/host5/rport-5:0-4/target5:0:4/5:0:4:10/block/sdi/queue
$ cat max_segment_size
65536

Zfcp I/O still works because dma_boundary implicitly still keeps the
effective max segment size <= PAGE_SIZE.  However, dma_boundary does not
seem visible to user space, but max_segment_size is visible and shows a
misleading wrong value.  Fix it and inherit the stable tag of a8cf59a669.

Devices on our bus ccw support DMA but no DMA mapping. Of multiple device
types on the ccw bus, only zfcp needs dma_parms for SCSI limits.  So, leave
dma_parms setup in zfcp and do not move it to the bus.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 50c2e9107f ("scsi: introduce a max_segment_size host_template parameters")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-29 01:14:59 -05:00
Christoph Hellwig
2a3d4eb8e2 scsi: flip the default on use_clustering
Most SCSI drivers want to enable "clustering", that is merging of
segments so that they might span more than a single page.  Remove the
ENABLE_CLUSTERING define, and require drivers to explicitly set
DISABLE_CLUSTERING to disable this feature.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-18 23:13:12 -05:00
Steffen Maier
7171455354 scsi: zfcp: improve kdoc for return of zfcp_status_read_refill()
Complements

v2.6.35 commit 64deb6efdc ("[SCSI] zfcp: Use status_read_buf_num
provided by FCP channel") which replaced the hardcoded 16 with a
variable value

Also complements already existing fixups for above commit

v2.6.35 commit 8d88cf3f3b ("[SCSI] zfcp: Update status read mempool")
v3.10   commit 9edf7d75ee ("[SCSI] zfcp: status read buffers on first adapter open with link down")

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-07 22:04:25 -05:00
Steffen Maier
60a161b7e5 scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
Suppose adapter (open) recovery is between opened QDIO queues and before
(the end of) initial posting of status read buffers (SRBs). This time
window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING
causing by design looping with exponential increase sleeps in the function
performing exchange config data during recovery
[zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up.

Suppose an event occurs for which the FCP channel would send an unsolicited
notification to zfcp by means of a previously posted SRB.  We saw it with
local cable pull (link down) in multi-initiator zoning with multiple
NPIV-enabled subchannels of the same shared FCP channel.

As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial
status read buffers from within the adapter's ERP thread, the channel does
send an unsolicited notification.

Since v2.6.27 commit d26ab06ede ("[SCSI] zfcp: receiving an unsolicted
status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules
adapter->stat_work to re-fill the just consumed SRB from a work item.

Now the ERP thread and the work item post SRBs in parallel.  Both contexts
call the helper function zfcp_status_read_refill().  The tracking of
missing (to be posted / re-filled) SRBs is not thread-safe due to separate
atomic_read() and atomic_dec(), in order to depend on posting
success. Hence, both contexts can see
atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts
one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue
(trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in
zfcp_qdio_handler_error().

An obvious and seemingly clean fix would be to schedule stat_work from the
ERP thread and wait for it to finish. This would serialize all SRB
re-fills. However, we already have another work item wait on the ERP
thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls
zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the
open port recoveries during zfcp auto port scan, but in fact it waits for
any pending recovery including an adapter recovery. This approach leads to
a deadlock.  [see also v3.19 commit 18f87a67e6 ("zfcp: auto port scan
resiliency"); v2.6.37 commit d3e1088d68
("[SCSI] zfcp: No ERP escalation on gpn_ft eval");
v2.6.28 commit fca55b6fb5
("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP")
fixing v2.6.27 commit c57a39a45a
("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port");
v2.6.27 commit cc8c282963
("[SCSI] zfcp: Automatically attach remote ports")]

Instead make the accounting of missing SRBs atomic for parallel execution
in both the ERP thread and adapter->stat_work.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: d26ab06ede ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall")
Cc: <stable@vger.kernel.org> #2.6.27+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-07 22:03:24 -05:00
Fedor Loshakov
636db60b8e scsi: zfcp: make DIX experimental, disabled, and independent of DIF
Introduce separate zfcp module parameters to individually select support
for: DIF which should work (zfcp.dif, which used to be DIF+DIX, disabled)
or DIX+DIF which can cause trouble (zfcp.dix, new, disabled).

If DIX is enabled, we warn on zfcp driver initialization.  As before, this
also reduces the maximum I/O request size to half, to support the worst
case of merged single sector requests with one protection data scatter
gather element per sector. This can impact the maximum throughput.

In DIF-only mode (zfcp.dif=1 zfcp.dix=0), we can use the full maximum I/O
request size as there is no protection data for zfcp.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Fedor Loshakov <loshakov@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-07 21:36:40 -05:00
Steffen Maier
399b6c8bc9 scsi: zfcp: drop old default switch case which might paper over missing case
This was introduced with v2.6.27 commit 287ac01acf ("[SCSI] zfcp: Cleanup
code in zfcp_erp.c") but would now suppress helpful -Wswitch compiler
warnings when building with W=1 such as the following forced example:

drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_setup_act':
drivers/s390/scsi/zfcp_erp.c:220:2: warning: enumeration value 'ZFCP_ERP_ACTION_REOPEN_PORT' not handled in switch [-Wswitch]
  switch (need) {
  ^~~~~~

But then again, only with W=1 we would notice unhandled enum cases.
Without the default cases and a missed unhandled enum case, the code might
perform unforeseen things we might not want...

As of today, we never run through the removed default case, so removing it
is no functional change.  In the future, we never should run through a
default case but introduce the necessary specific case(s) to handle new
functionality.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:18 -05:00
Steffen Maier
0c902936e5 scsi: zfcp: drop default switch case which might paper over missing case
This was introduced with v4.18 commit 8c3d20aada ("scsi: zfcp: fix
missing REC trigger trace for all objects in ERP_FAILED") but would now
suppress helpful -Wswitch compiler warnings when building with W=1 such as
the following forced example:

drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_handle_failed':
drivers/s390/scsi/zfcp_erp.c:126:2: warning: enumeration value 'ZFCP_ERP_ACTION_REOPEN_PORT_FORCED' not handled in switch [-Wswitch]
  switch (want) {
  ^~~~~~

But then again, only with W=1 we would notice unhandled enum cases.
Without the default cases and a missed unhandled enum case, the code might
perform unforeseen things we might not want...

As of today, we never run through the removed default case, so removing it
is no functional change.  In the future, we never should run through a
default case but introduce the necessary specific case(s) to handle new
functionality.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:18 -05:00
Steffen Maier
3505144e54 scsi: zfcp: silence -Wimplicit-fallthrough in zfcp_erp_lun_strategy()
For some reason the already existing substring "fall through" in the
comment is not sufficient for GCC to silence -Wimplicit-fallthrough.

  CC [M]  drivers/s390/scsi/zfcp_erp.o
drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_lun_strategy':
drivers/s390/scsi/zfcp_erp.c:1065:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
  if (atomic_read(&zfcp_sdev->status) & ZFCP_STATUS_COMMON_OPEN)
      ^
drivers/s390/scsi/zfcp_erp.c:1068:2: note: here
  case ZFCP_ERP_STEP_LUN_CLOSING:
  ^~~~

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:18 -05:00
Steffen Maier
623cd180c1 scsi: zfcp: silence remaining kdoc warnings in header files
Improve whatever the following simple invocation reported:
$ ./scripts/kernel-doc -none drivers/s390/scsi/*.h

While at it, improve some related kdoc,
including struct zfcp_fsf_ct_els in zfcp_fsf.h.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:18 -05:00
Steffen Maier
8684d61481 scsi: zfcp: silence all W=1 build warnings for existing kdoc
While at it also improve some copy & paste kdoc mistakes.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:18 -05:00
Steffen Maier
e0effe8935 scsi: zfcp: properly format LUN (and WWPN) for LUN sharing violation kmsg
zfcp: <devbusid>: LUN 0x0 on port 0x5005076......... ...
zfcp: <devbusid>: LUN 0x1000000000000 on port 0x5005076......... ...

should be

zfcp: <devbusid>: LUN 0x0000000000000000 on port 0x5005076......... ...
zfcp: <devbusid>: LUN 0x0001000000000000 on port 0x5005076.........
                  is already in use by CSS., MIF Image ID .

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:18 -05:00
Steffen Maier
d5fcdced31 scsi: zfcp: use enum zfcp_erp_act_result for argument/return of affected functions
With that instead of just "int" it becomes clear which functions return
this type and which ones also accept it as argument they just pass through
in some cases or modify in other cases.  v2.6.27 commit 287ac01acf
("[SCSI] zfcp: Cleanup code in zfcp_erp.c") introduced the enum which was
cpp defines previously.

Silence some false -Wswitch compiler warning cases with individual
NOP cases. When adding more enum values and building with W=1 we
would get compiler warnings about missed new cases.

Consistently use the variable name "result", so change "retval" in
zfcp_erp_strategy() to "result". This avoids confusion with other compile
unit variables "retval" having different semantics and type.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:18 -05:00
Steffen Maier
0023beece0 scsi: zfcp: use enum zfcp_erp_steps for struct zfcp_erp_action.step
Use the already defined enum for this purpose to get at least some build
checking (even though an enum is type equivalent to an int in C).  v2.6.27
commit 287ac01acf ("[SCSI] zfcp: Cleanup code in zfcp_erp.c") introduced
the enum which was cpp defines previously.

Since struct zfcp_erp_action type is embedded into other structures living
in zfcp_def.h, we have to move enum zfcp_erp_act_type from its private
definition in zfcp_erp.c to the zfcp-global zfcp_def.h

Silence some false -Wswitch compiler warning cases with individual NOP
cases. When adding more enum values and building with W=1 we would get
compiler warnings about missed new cases.

Add missing break statements in some of the above switch cases.  No
functional change, but making it future-proof.  I think all of these should
have had a break statement ever since, even if these switch cases happened
to be the last ones in the switch statement body.

"Fall through" in the context of switch case usually means not to have a
break and fall through to the subsequent switch case. However, I think this
old comment meant that here we do not have an _early return_ in the switch
case but the code path continues after the switch case body.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:18 -05:00
Steffen Maier
df91eefd08 scsi: zfcp: the action field of zfcp_erp_action is actually the type
&zfcp_erp_action.action ==> &zfcp_erp_action.type

While at it, make use of the already defined enum for this purpose to get
at least some build checking (even though an enum is type equivalent to an
int in C). v2.6.27 commit 287ac01acf ("[SCSI] zfcp: Cleanup code in
zfcp_erp.c") introduced the enum which was cpp defines previously.

To prevent compiler warnings with the switch(act->type), we have to
separate the recently added eyecatchers from enum zfcp_erp_act_type.

Since struct zfcp_erp_action type is embedded into other structures living
in zfcp_def.h, we have to move enum zfcp_erp_act_type from its private
definition in zfcp_erp.c to the zfcp-global zfcp_def.h.

Silence one false -Wswitch compiler warning case: LUNs as the leaves in our
object tree do not have any follow-up success recovery.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:18 -05:00
Steffen Maier
208d096154 scsi: zfcp: clarify function argument name for trace tag string
v2.6.30 commit 5ffd51a5e4 ("[SCSI] zfcp: replace current ERP logging with
a more convenient version") changed trace record distinguishing from a
numerical ID to a 7 character string called "trace tag". While starting to
use function arguments with different type and semantics, it did not change
the argument name accordingly.

v2.6.38 commit ae0904f60f ("[SCSI] zfcp: Redesign of the debug tracing
for recovery actions.") renamed variable names "id" into "tag" but only
within zfcp_dbf.*, not within zfcp_erp.c.

This was a bit confusing since the remainder of zfcp does use the term
"trace tag". Also "id" is quite generic and it's not obvious for what.
Just unify it consistently and use the "dbf" prefix to relate the arguments
to the code in zfcp_dbf.*.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:18 -05:00
Steffen Maier
64eba38418 scsi: zfcp: ERP thread setup kdoc update
zfcp_erp_thread_setup() update complements v2.6.32 commit 347c6a965d
("[SCSI] zfcp: Use kthread API for zfcp erp thread").

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:18 -05:00
Steffen Maier
724e144387 scsi: zfcp: update kernel message for invalid FCP_CMND length, it's not the CDB
The CDB is just a part inside of FCP_CMND, see zfcp_fc_scsi_to_fcp().
While at it, fix the device driver reaction: adapter not LUN shutdown.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:17 -05:00
Steffen Maier
9704154fa0 scsi: zfcp: drop duplicate seq_no from zfcp_fsf_req which is also in QTCB header
There is no point for double bookkeeping especially just for tracing.  The
trace can take it from the QTCB which always exists for non-SRB responses
traced with zfcp_dbf_hba_fsf_res().

As a side effect, this removes an alignment hole and reduces the size of
struct zfcp_fsf_req, and thus of each pending request, by 8 bytes.

Before:
$ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko
...
	struct fsf_qtcb *          qtcb;                 /*   144     8 */
	u32                        seq_no;               /*   152     4 */
	/* XXX 4 bytes hole, try to pack */
	void *                     data;                 /*   160     8 */
...
	/* size: 296, cachelines: 2, members: 14 */
	/* sum members: 288, holes: 2, sum holes: 8 */
	/* last cacheline: 40 bytes */
After:
$ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko
...
	struct fsf_qtcb *          qtcb;                 /*   144     8 */
	void *                     data;                 /*   152     8 */
...
	/* size: 288, cachelines: 2, members: 13 */
        /* sum members: 284, holes: 1, sum holes: 4 */

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:17 -05:00
Steffen Maier
f9eca02276 scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header
Status read buffers (SRBs, unsolicited notifications) never use a QTCB
[zfcp_fsf_req_create()]. zfcp_fsf_req_send() already uses this to
distinguish SRBs from other FSF request types. We can re-use this method in
zfcp_fsf_req_complete(). Introduce a helper function to make the check for
req->qtcb less magic.

SRBs always are FSF_QTCB_UNSOLICITED_STATUS, so we can hard-code this for
the two trace functions dealing with SRBs.

All other FSF request types have a QTCB and we can get the fsf_command from
there.

zfcp_dbf_hba_fsf_response() and thus zfcp_dbf_hba_fsf_res() are only called
for non-SRB requests so it's safe to dereference the QTCB
[zfcp_fsf_req_complete() returns early on SRB, else calls
zfcp_fsf_protstatus_eval() which calls zfcp_dbf_hba_fsf_response()].  In
zfcp_scsi_forget_cmnd() we guard the QTCB dereference with a preceding NULL
check and rely on boolean shortcut evaluation.

As a side effect, this causes an alignment hole which we can close in
a later patch after having cleaned up all fields of struct zfcp_fsf_req.
Before:
$ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko
...
	u32                        status;               /*   136     4 */
	u32                        fsf_command;          /*   140     4 */
	struct fsf_qtcb *          qtcb;                 /*   144     8 */
...
After:
$ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko
...
	u32                        status;               /*   136     4 */
	/* XXX 4 bytes hole, try to pack */
	struct fsf_qtcb *          qtcb;                 /*   144     8 */
...

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:17 -05:00
Steffen Maier
2c53d8a0cc scsi: zfcp: drop unnecessary forward prototype for struct zfcp_fsf_req
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:17 -05:00
Steffen Maier
21cb0bcc73 scsi: zfcp: group sort internal structure definitions for proximity
Have structures just before the structures that use them (without
disrupting sequences of using structures such as zfcp_unit and
zfcp_scsi_dev):

 - zfcp_adapter_mempool embedded in zfcp_adapter,

 - zfcp_latenc... embedded in zfcp_scsi_dev.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:17 -05:00
Steffen Maier
eb67f93ffa scsi: zfcp: namespace prefix for internal latency data structures
In contrast to struct fsf_qual_latency_info, the ones here are not FSF but
software defined zfcp-internal.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:17 -05:00
Steffen Maier
e0c1da39d7 scsi: zfcp: update width in comment for ZFCP_COMMON_FLAGS mask
v2.6.10 history commit 4062e12b2ba2 ("[PATCH] s390: zfcp act enhancements")
extended this mask by one nibble with the introduction of
ZFCP_STATUS_COMMON_ACCESS_DENIED == 0x00800000 for ACT (access control
table).

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:17 -05:00
Steffen Maier
a0e86d9555 scsi: zfcp: move scsi_eh & non-ERP timeout defines owned by and local to zfcp_fsf.c
Also clarify namespace prefix for the timeout used for FSF requests on
behalf of SCSI error recovery: It is zfcp_fsf_ not zfcp_scsi_.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:17 -05:00
Steffen Maier
c24635acce scsi: zfcp: drop unnecessary forward prototype for struct zfcp_reqlist
While struct zfcp_adapter contains a pointer to zfcp_reqlist, the pointer
field does not need to know the structure or even a prototype.

The prototype was introduced with v2.6.34 commit b6bd2fb92a ("[SCSI]
zfcp: Move FSF request tracking code to new file").

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:17 -05:00
Steffen Maier
58f3ead547 scsi: zfcp: move SG table helper from aux to fc and make them static
Since commit 663e0890e3 ("[SCSI] zfcp: remove access control tables
interface") these helper functions are only used for auto port scan in
zfcp_fc.c. Also change them to the corresponding namespace prefix.

This is a small cleanup for the miscellaneous catchall compile unit
zfcp_aux.c.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:17 -05:00
zhong jiang
6be552276e scsi: zfcp: remove unnecessary null pointer check before mempool_destroy
mempool_destroy has taken null pointer check into account. so remove the
redundant check.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Benjamin Block <bblock@linux.ibm.com>
[maier@linux.ibm.com: depends on v4.3 4e3ca3e033 ("mm/mempool: allow NULL `pool' pointer in mempool_destroy()")]
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 15:01:17 -05:00
Vasily Gorbik
c6756b7d0c s390/dasd,zfcp: fix gcc 8 stringop-truncation warnings
ccw "busid" should always be NUL-terminated, as evident from e.g.
get_ccwdev_by_busid doing "return (strcmp(bus_id, dev_name(dev)) == 0)".

Replace all strncpy initializing busid with strlcpy. This fixes the
following gcc 8 warnings:

drivers/s390/scsi/zfcp_aux.c:104:2: warning: 'strncpy' specified bound 20
equals destination size [-Wstringop-truncation]
  strncpy(busid, token, ZFCP_BUS_ID_SIZE);

drivers/s390/block/dasd_eer.c:316:2: warning: 'strncpy' specified bound 10
equals destination size [-Wstringop-truncation]
  strncpy(header.busid, dev_name(&device->cdev->dev), DASD_EER_BUSID_SIZE);

drivers/s390/block/dasd_eer.c:359:2: warning: 'strncpy' specified bound 10
equals destination size [-Wstringop-truncation]
  strncpy(header.busid, dev_name(&device->cdev->dev), DASD_EER_BUSID_SIZE);

drivers/s390/block/dasd_devmap.c:429:3: warning: 'strncpy' specified bound
20 equals destination size [-Wstringop-truncation]
   strncpy(new->bus_id, bus_id, DASD_BUS_ID_SIZE);

Acked-by: Stefan Haberland <sth@linux.ibm.com>
Acked-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-02 11:24:52 +02:00
Linus Torvalds
5f85942c2e SCSI misc on 20180610
This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc,
 xfcp, hisi_sas, cxlflash, qla2xxx.  In the absence of Nic, we're also
 taking target updates which are mostly minor except for the tcmu
 refactor. The only real core change to worry about is the removal of
 high page bouncing (in sas, storvsc and iscsi).  This has been well
 tested and no problems have shown up so far.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWx1pbCYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishUucAP42pccS
 ziKyiOizuxv9fZ4Q+nXd1A9zhI5tqqpkHjcQegEA40qiZSi3EKGKR8W0UpX7Ntmo
 tqrZJGojx9lnrAM2RbQ=
 =NMXg
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc,
  xfcp, hisi_sas, cxlflash, qla2xxx.

  In the absence of Nic, we're also taking target updates which are
  mostly minor except for the tcmu refactor.

  The only real core change to worry about is the removal of high page
  bouncing (in sas, storvsc and iscsi). This has been well tested and no
  problems have shown up so far"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (268 commits)
  scsi: lpfc: update driver version to 12.0.0.4
  scsi: lpfc: Fix port initialization failure.
  scsi: lpfc: Fix 16gb hbas failing cq create.
  scsi: lpfc: Fix crash in blk_mq layer when executing modprobe -r lpfc
  scsi: lpfc: correct oversubscription of nvme io requests for an adapter
  scsi: lpfc: Fix MDS diagnostics failure (Rx < Tx)
  scsi: hisi_sas: Mark PHY as in reset for nexus reset
  scsi: hisi_sas: Fix return value when get_free_slot() failed
  scsi: hisi_sas: Terminate STP reject quickly for v2 hw
  scsi: hisi_sas: Add v2 hw force PHY function for internal ATA command
  scsi: hisi_sas: Include TMF elements in struct hisi_sas_slot
  scsi: hisi_sas: Try wait commands before before controller reset
  scsi: hisi_sas: Init disks after controller reset
  scsi: hisi_sas: Create a scsi_host_template per HW module
  scsi: hisi_sas: Reset disks when discovered
  scsi: hisi_sas: Add LED feature for v3 hw
  scsi: hisi_sas: Change common allocation mode of device id
  scsi: hisi_sas: change slot index allocation mode
  scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate()
  scsi: hisi_sas: fix a typo in hisi_sas_task_prep()
  ...
2018-06-10 13:01:12 -07:00
Jens Remus
16dad27988 scsi: zfcp: enhance comments on fc_link_speed and supported_speed
The comment on fsf_qtcb_bottom_port.supported_speed did read as if the field
can only assume one of two possible values (i.e. 0x1 for 1 GBit/s or 0x2 for
2 GBit/s). This is not true for two reasons: first it is a flag field and
can thus assume any combination and second there are meanwhile more speeds.

Clarify comment on fsf_qtcb_bottom_port.supported_speed and add a comment to
fsf_qtcb_bottom_config.fc_link_speed.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Acked-by: Benjamin Block <bblock@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:28:17 -04:00
Jens Remus
6e2e490080 scsi: zfcp: add port speed capabilities
Add port speed capabilities as defined in FC-LS RPSC ELS that have a
counterpart FC_PORTSPEED_* defined in scsi/scsi_transport_fc.h.

Suggested-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Acked-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:28:16 -04:00
Jens Remus
9e156c54ac scsi: zfcp: assert that the ERP lock is held when tracing a recovery trigger
Otherwise iterating with list_for_each() over the adapter->erp_ready_head
and adapter->erp_running_head lists can lead to an infinite loop. See commit
"zfcp: fix infinite iteration on erp_ready_head list".

The run-time check is only performed for debug kernels which have the kernel
lock validator enabled. Following is an example of the warning that is
reported, if the ERP lock is not held when calling zfcp_dbf_rec_trig():

WARNING: CPU: 0 PID: 604 at drivers/s390/scsi/zfcp_dbf.c:288 zfcp_dbf_rec_trig+0x172/0x188
Modules linked in: ...
CPU: 0 PID: 604 Comm: kworker/u128:3 Not tainted 4.16.0-... #1
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Workqueue: zfcp_q_0.0.1906 zfcp_scsi_rport_work
Krnl PSW : 00000000330fdbf9 00000000367e9728 (zfcp_dbf_rec_trig+0x172/0x188)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
Krnl GPRS: 00000000c57a5d99 3288200000000000 0000000000000000 000000006cc82740
           00000000009d09d6 0000000000000000 00000000000000ff 0000000000000000
           0000000000000000 0000000000e1b5fe 000000006de01d38 0000000076130958
           000000006cc82548 000000006de01a98 00000000009d09d6 000000006a6d3c80
Krnl Code: 00000000009d0ad2: eb7ff0b80004        lmg        %r7,%r15,184(%r15)
           00000000009d0ad8: c0f4000d7dd0        brcl       15,b80678
          #00000000009d0ade: a7f40001            brc        15,9d0ae0
          >00000000009d0ae2: a7f4ff7d            brc        15,9d09dc
           00000000009d0ae6: e340f0f00004        lg         %r4,240(%r15)
           00000000009d0aec: eb7ff0b80004        lmg        %r7,%r15,184(%r15)
           00000000009d0af2: 07f4                bcr        15,%r4
           00000000009d0af4: 0707                bcr        0,%r7
Call Trace:
([<00000000009d09d6>] zfcp_dbf_rec_trig+0x66/0x188)
 [<00000000009dd740>] zfcp_scsi_rport_work+0x98/0x190
 [<0000000000169b34>] process_one_work+0x3d4/0x6f8
 [<000000000016a08a>] worker_thread+0x232/0x418
 [<000000000017219e>] kthread+0x166/0x178
 [<0000000000b815ea>] kernel_thread_starter+0x6/0xc
 [<0000000000b815e4>] kernel_thread_starter+0x0/0xc
2 locks held by kworker/u128:3/604:
 #0:  ((wq_completion)name){+.+.}, at: [<0000000082af1024>] process_one_work+0x1dc/0x6f8
 #1:  ((work_completion)(&port->rport_work)){+.+.}, at: [<0000000082af1024>] process_one_work+0x1dc/0x6f8
Last Breaking-Event-Address:
 [<00000000009d0ade>] zfcp_dbf_rec_trig+0x16e/0x188
---[ end trace b2f4020572e2c124 ]---

Suggested-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:28:16 -04:00
Steffen Maier
6919298c98 scsi: zfcp: cleanup indentation for posting FC events
I just happened to see the function header indentation of
zfcp_fc_enqueue_event() and I picked some more from checkpatch:

$ checkpatch.pl --strict -f drivers/s390/scsi/zfcp_fc.c
...
CHECK: Alignment should match open parenthesis
 #113: FILE: drivers/s390/scsi/zfcp_fc.c:113:
+		fc_host_post_event(adapter->scsi_host, fc_get_event_number(),
+				event->code, event->data);

CHECK: Blank lines aren't necessary before a close brace '}'
 #118: FILE: drivers/s390/scsi/zfcp_fc.c:118:
+
+}
...

The change complements v2.6.36 commit 2d1e547f75 ("[SCSI] zfcp: Post
events through FC transport class").

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:28:15 -04:00
Steffen Maier
35e9111a1e scsi: zfcp: support SCSI_ADAPTER_RESET via scsi_host sysfs attribute host_reset
Make use of feature introduced with v3.2 commit 2944369144 ("[SCSI] scsi:
Added support for adapter and firmware reset").  The common code interface
was introduced for commit 95d31262b3 ("[SCSI] qla4xxx: Added support for
adapter and firmware reset").

$ echo adapter > /sys/class/scsi_host/host<N>/host_reset

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1                      ZFCP_DBF_REC_TRIG
Tag            : scshr_y                SCSI sysfs host_reset yes
LUN            : 0xffffffffffffffff                     none (invalid)
WWPN           : 0x0000000000000000                     none (invalid)
D_ID           : 0x00000000                             none (invalid)
Adapter status : 0x4500050b
Port status    : 0x00000000                             none (invalid)
LUN status     : 0x00000000                             none (invalid)
Ready count    : 0x00000001
Running count  : 0x00000000
ERP want       : 0x04                   ZFCP_ERP_ACTION_REOPEN_ADAPTER
ERP need       : 0x04                   ZFCP_ERP_ACTION_REOPEN_ADAPTER

This is the common code equivalent to the zfcp-specific
&dev_attr_adapter_failed.attr in zfcp_sysfs_adapter_attrs.attrs[]:

$ echo 0 > /sys/bus/ccw/drivers/zfcp/<devbusid>/failed

The unsupported case returns EOPNOTSUPP:

$ echo firmware > /sys/class/scsi_host/host<N>/host_reset
-bash: echo: write error: Operation not supported

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : scshr_n                        SCSI sysfs host_reset no
Request ID     : 0x0000000000000000                     none (invalid)
SCSI ID        : 0xffffffff                             none (invalid)
SCSI LUN       : 0xffffffff                             none (invalid)
SCSI LUN high  : 0xffffffff                             none (invalid)
SCSI result    : 0xffffffa1                     -EOPNOTSUPP==-95
SCSI retries   : 0xff                                   none (invalid)
SCSI allowed   : 0xff                                   none (invalid)
SCSI scribble  : 0xffffffffffffffff                     none (invalid)
SCSI opcode    : ffffffff ffffffff ffffffff ffffffff    none (invalid)
FCP rsp inf cod: 0xff                                   none (invalid)
FCP rsp IU     : 00000000 00000000 00000000 00000000    none (invalid)
                 00000000 00000000

For any other invalid value, common code returns EINVAL without invoking
our callback:

$ echo foo > /sys/class/scsi_host/host<N>/host_reset
-bash: echo: write error: Invalid argument

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:28:15 -04:00
Steffen Maier
b24bf22d72 scsi: zfcp: explicitly support initiator in scsi_host_template
While the default did already correctly print "Initiator" let's make it
explicit and convert zfcp to the feature.

$ cat /sys/class/scsi_host/host0/supported_mode
Initiator

$ cat /sys/class/scsi_host/host0/active_mode
Initiator

The default worked, because not setting the field has it initialized to zero
== MODE_UNKNOWN. scsi_host_alloc() sets shost->active_mode = MODE_INITIATOR
in this case. The sysfs accessor function show_shost_supported_mode()
assumes MODE_INITIATOR in this case.  This default behavior was introduced
with v2.6.24 commit 7a39ac3f25 ("[SCSI] make supported_mode default to
initiator.").  The feature flag was introduced with v2.6.24 commit
5dc2b89e12 ("[SCSI] add supported_mode and active_mode attributes to the
host").  So there was no release where zfcp would have shown "unknown".

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:28:14 -04:00
Steffen Maier
2fdd45fd20 scsi: zfcp: remove unused return values of ERP trigger functions
Since v2.6.27 commit 553448f6c4 ("[SCSI] zfcp: Message cleanup"), none of
the callers has been interested any more.  Values were not returned
consistently in all ERP trigger functions.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:28:14 -04:00
Steffen Maier
013af857d8 scsi: zfcp: zfcp_erp_action_exists() does only check for running
Simplify its signature to return boolean and rename it to
zfcp_erp_action_is_running() to indicate its actual unmodified semantics.
It has always been used like this since v2.6.0 history commit ea127f975424
("[PATCH] s390 (7/7): zfcp host adapter.").

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:28:13 -04:00
Steffen Maier
cd4a186aaa scsi: zfcp: remove unused ERP enum values
All constant defines were introduced with v2.6.0 history commit ea127f975424
("[PATCH] s390 (7/7): zfcp host adapter.") and refactored into enums with
commit 287ac01acf ("[SCSI] zfcp: Cleanup code in zfcp_erp.c").

ZFCP_STATUS_ERP_DISMISSING and ZFCP_ERP_STEP_FSF_XCONFIG were never used.

v2.6.27 commit 287ac01acf ("[SCSI] zfcp: Cleanup code in zfcp_erp.c")
removed the use of ZFCP_ERP_ACTION_READY on refactoring
zfcp_erp_action_exists() to now only check adapter->erp_running_head but no
longer adapter->erp_ready_head. The same commit could have changed the
function return type from int to "enum zfcp_erp_act_state".
ZFCP_ERP_ACTION_READY was never used outside of zfcp_erp_action_exists().

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:28:12 -04:00
Steffen Maier
d39eda54b7 scsi: zfcp: consistently use function name space prefix
I've been mixing up
zfcp_task_mgmt_function() [SCSI] and
zfcp_fsf_fcp_task_mgmt()  [FSF]
so often lately that I wanted to fix this.

SCSI changes complement v2.6.27 commit f76af7d7e3 ("[SCSI] zfcp: Cleanup
of code in zfcp_scsi.c").

While at it, also fixup the other inconsistencies elsewhere.

ERP changes complement v2.6.27 commit 287ac01acf ("[SCSI] zfcp: Cleanup
code in zfcp_erp.c") which introduced status_change_set().

FC changes complement v2.6.32 commit 6f53a2d2ec ("[SCSI] zfcp: Apply
common naming conventions to zfcp_fc").  by renaming a leftover introduced
with v2.6.27 commit cc8c282963 ("[SCSI] zfcp: Automatically attach remote
ports").

FSF changes fixup v2.6.32 commit a4623c467f ("[SCSI] zfcp: Improve request
allocation through mempools").  which replaced zfcp_fsf_alloc_qtcb()
introduced with v2.6.27 commit c41f8cbddd ("[SCSI] zfcp: zfcp_fsf
cleanup.").

SCSI fc_host statistics were introduced with v2.6.16 commit f6cd94b126
("[SCSI] zfcp: transport class adaptations").

SCSI fc_host port_state was introduced with v2.6.27 commit 85a82392fe
("[SCSI] zfcp: Add port_state attribute to sysfs").

SCSI rport setter for dev_loss_tmo was introduced with v2.6.18 commit
338151e066 ("[SCSI] zfcp: make use of fc_remote_port_delete when target
port is unavailable").

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:28:12 -04:00
Steffen Maier
5c750d58e9 scsi: zfcp: workqueue: set description for port work items with their WWPN as context
As a prerequisite, complement commit 3d1cb2059d ("workqueue: include
workqueue info when printing debug dump of a worker task") to be usable with
kernel modules by exporting the symbol set_worker_desc().  Current built-in
user was introduced with commit ef3b101925 ("writeback: set worker desc to
identify writeback workers in task dumps").

Can help distinguishing work items which do not have adapter scope.
Description is printed out with task dump for debugging on WARN, BUG, panic,
or magic-sysrq [show-task-states(t)].

Example:
$ echo 0 >| /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/failed &
$ echo 't' >| /proc/sysrq-trigger
$ dmesg
sysrq: SysRq : Show State
  task                        PC stack   pid father
...
zfcp_q_0.0.1880 S14640  2165      2 0x02000000
Call Trace:
([<00000000009df464>] __schedule+0xbf4/0xc78)
 [<00000000009df57c>] schedule+0x94/0xc0
 [<0000000000168654>] rescuer_thread+0x33c/0x3a0
 [<000000000016f8be>] kthread+0x166/0x178
 [<00000000009e71f2>] kernel_thread_starter+0x6/0xc
 [<00000000009e71ec>] kernel_thread_starter+0x0/0xc
no locks held by zfcp_q_0.0.1880/2165.
...
kworker/u512:2  D11280  2193      2 0x02000000
Workqueue: zfcp_q_0.0.1880 zfcp_scsi_rport_work [zfcp] (zrpd-50050763031bd327)
                                                        ^^^^^^^^^^^^^^^^^^^^^
Call Trace:
([<00000000009df464>] __schedule+0xbf4/0xc78)
 [<00000000009df57c>] schedule+0x94/0xc0
 [<00000000009e50c0>] schedule_timeout+0x488/0x4d0
 [<00000000001e425c>] msleep+0x5c/0x78                  >>test code only<<
 [<000003ff8008a21e>] zfcp_scsi_rport_work+0xbe/0x100 [zfcp]
 [<0000000000167154>] process_one_work+0x3b4/0x718
 [<000000000016771c>] worker_thread+0x264/0x408
 [<000000000016f8be>] kthread+0x166/0x178
 [<00000000009e71f2>] kernel_thread_starter+0x6/0xc
 [<00000000009e71ec>] kernel_thread_starter+0x0/0xc
2 locks held by kworker/u512:2/2193:
 #0:  (name){++++.+}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718
 #1:  ((&(&port->rport_work)->work)){+.+.+.}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718
...

=============================================
Showing busy workqueues and worker pools:
workqueue zfcp_q_0.0.1880: flags=0x2000a
  pwq 512: cpus=0-255 flags=0x4 nice=0 active=1/1
    in-flight: 2193:zfcp_scsi_rport_work [zfcp]
pool 512: cpus=0-255 flags=0x4 nice=0 hung=0s workers=4 idle: 5 2354 2311

Work items with adapter scope are already identified by the workqueue name
"zfcp_q_<devbusid>" and the work item function name.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:27:21 -04:00
Steffen Maier
674595d851 scsi: zfcp: decouple our scsi_eh callbacks from scsi_cmnd
Note: zfcp_scsi_eh_host_reset_handler() will be converted in a later patch.

zfcp_scsi_eh_device_reset_handler() now only depends on scsi_device.
zfcp_scsi_eh_target_reset_handler() now only depends on scsi_target.
All derive other objects from these intended callback arguments.

zfcp_scsi_eh_target_reset_handler() is special: The FCP channel requires a
valid LUN handle so we try to find ourselves a stand-in scsi_device as
suggested by Hannes Reinecke. If it cannot find a stand-in scsi device,
trace a record like the following (formatted with zfcpdbf from s390-tools):

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : tr_nosd        target reset, no SCSI device
Request ID     : 0x0000000000000000                     none (invalid)
SCSI ID        : 0x00000000     SCSI ID/target denoting scope
SCSI LUN       : 0xffffffff                             none (invalid)
SCSI LUN high  : 0xffffffff                             none (invalid)
SCSI result    : 0x00002003     field re-used for midlayer value: FAILED
SCSI retries   : 0xff                                   none (invalid)
SCSI allowed   : 0xff                                   none (invalid)
SCSI scribble  : 0xffffffffffffffff                     none (invalid)
SCSI opcode    : ffffffff ffffffff ffffffff ffffffff    none (invalid)
FCP rsp inf cod: 0xff                                   none (invalid)
FCP rsp IU     : 00000000 00000000 00000000 00000000    none (invalid)
                 00000000 00000000

Actually change the signature of zfcp_task_mgmt_function() used by
zfcp_scsi_eh_device_reset_handler() & zfcp_scsi_eh_target_reset_handler().
Since it was prepared in a previous patch, we only need to delete a local
auto variable which is now the intended argument.

Suggested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:22:11 -04:00