Commit graph

647972 commits

Author SHA1 Message Date
subhashj@codeaurora.org
4e768e7645 scsi: ufs: add capability to keep auto bkops always enabled
UFS device requires to perform bkops (back ground operations) periodically
but host can control (via auto-bkops parameter of device) when device can
perform bkops based on its performance requirements. In general, host
would like to enable the device's auto-bkops only when it's not doing any
regular data transfer but sometimes device may not behave properly if host
keeps the auto-bkops disabled. This change adds the capability to let the
device auto-bkops always enabled except suspend.

Reviewed-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 18:10:04 -05:00
subhashj@codeaurora.org
0c8f75869e scsi: ufs: set default UFS power management level
UFS device and link can be put in multiple different low power modes hence
UFS driver supports multiple different low power modes.
This change sets the default UFS power management level which should put
the link hibernate state and device in sleep state. This default power
management level gives good  power savings with relatively less enter/exit
latencies.

Reviewed-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 18:10:04 -05:00
subhashj@codeaurora.org
09690d5a6a scsi: ufs: provide sysfs attribute to select the PM level
This patch provides the sysfs attribute to choose the power management
level for UFS runtime and system suspend.

Reviewed-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 18:10:03 -05:00
Sahitya Tummala
fcb0c4b08a scsi: ufs: Add sysfs node to dynamically control clock scaling
Provide an option to enable/disable clock scaling during runtime.
Write 1/0 to "clkscale_enable" sysfs node to enable/disable clock
scaling.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 18:10:03 -05:00
Sahitya Tummala
b427411abb scsi: ufs: Add sysfs node to dynamically control clock gating
Provide an option to enable/disable clock gating during runtime.
Write 1 or 0 to "clkgate_enable" sysfs node to enable/disable
clock gating.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 18:10:03 -05:00
Dolev Raviv
e7d38257a4 scsi: ufs: fix multiple ufs spec violation
When a command to a W-LU is timed out via scsi, error handling
will treat it as any other LU and send commands such as
START_STOP with wrong format or task abort. Those commands are
illegal for W-LU according to the UFS spec.
To solve it, when an error is recognized those steps are skipped
and the last step, reset and restore process, is initiated.

Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 18:10:03 -05:00
subhashj@codeaurora.org
7ff5ab4736 scsi: ufs: add tracing support
This change adds the ftrace support for following:
1. UFS initialization time
2. Clock gating states
3. Clock scaling states
4. Power management APIs latency
5. BKOPs enable/disable

Usage:
	echo 1 > /sys/kernel/debug/tracing/events/ufs/enable
	cat /sys/kernel/debug/tracing/trace_pipe

Reviewed-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 18:10:03 -05:00
Dolev Raviv
66cc820f9c scsi: ufs: dump debug info during failures
Inserts driver dumps for UFS Host Controller registers, Transfer Requests
and Task Management Requests.
The dumps will occur on driver initialization failure, ufshcd_abort() and
on error handling path.

Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 18:10:03 -05:00
Cao jin
364757e468 scsi: qla4xxx: comments correction
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Acked-by: Nilesh Javali <nilesh.javali@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:34:12 -05:00
Colin Ian King
703e747a6b scsi: qedi: return via va_end to match corresponding va_start
Although on most systems va_end is a no-op, it is good practice to use
va_end on the function return path, especially since the va_start
documenation states:

  "Each invocation of va_start() must be matched by a corresponding
   invocation of va_end() in the same function."

Found with static analysis by CoverityScan, CIDs 1389477-1389479

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:31:13 -05:00
Jitendra Bhivare
a5c1be7005 scsi: be2iscsi: Update driver version
Version 11.2.1.0

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:13 -05:00
Ketan Mukadam
5fa7db2111 scsi: be2iscsi: Add warning message for unsupported adapter
Add a warning message to indicate obsolete/unsupported
BE2 Adapter Family devices

Signed-off-by: Ketan Mukadam <ketan.mukadam@avagotech.com>
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:13 -05:00
Jitendra Bhivare
dd940972f3 scsi: be2iscsi: Reinit SGL handle, CID tables after TPE
After TPE recovery, CID table needs to be repopulated as per CIDs in
WRBQ creation responses.

SGL handles table needs to be recreated for posting and its indices need
to be resetted.

This is achieved by calling beiscsi_cleanup_port when disabling and
beiscsi_init_port in enabling port.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:13 -05:00
Jitendra Bhivare
413f365657 scsi: be2iscsi: Add checks to validate CID alloc/free
Set CID slot to 0xffff to indicate empty.
Check if connection already exists in conn_table before binding.
Check if endpoint already NULL before putting back CID.
Break ep->conn link in free_ep to ignore completions after freeing.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:13 -05:00
Jitendra Bhivare
29e80b7ce3 scsi: be2iscsi: Remove wq_name from beiscsi_hba
wq_name is used only to set WQ name when its being allocated.
Remove it from beiscsi_hba structure and define locally.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:13 -05:00
Jitendra Bhivare
fa1261c4b6 scsi: be2iscsi: Remove unused struct members
Fix errors reported in static analysis.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:13 -05:00
Jitendra Bhivare
b7d98ca7fb scsi: be2iscsi: Remove redundant receive buffers posting
This duplicate code got added during manual merging.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:13 -05:00
Jitendra Bhivare
d740105548 scsi: be2iscsi: Fix iSCSI cmd cleanup IOCTL
Prepare the IOCTL with appropriate sizes of buffers of V0 and V1.
Set missing chute number in V1 IOCTL.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:13 -05:00
Jitendra Bhivare
3f7f62ee5b scsi: be2iscsi: Add checks to validate completions
Added check in beiscsi_process_cq for pio_handle.
pio_handle is cleared in beiscsi_put_wrb_handle.
This catches any case where task gets cleaned up just before completion.

Use back_lock before accessing pio_handle.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:13 -05:00
Jitendra Bhivare
392b7d2f12 scsi: be2iscsi: Set WRB invalid bit for SkyHawk
invalid bit in WRB indicates to FW that IO was invalidated before WRB
was fetched from host memory.

For SkyHawk, this invalid bit in WRB is at a different offset.
Use amap_iscsi_wrb_v2 to mark invalid bit for SkyHawk.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:13 -05:00
Jitendra Bhivare
faa0a22d54 scsi: be2iscsi: Take iscsi_task ref in abort handler
Hold the reference of iscsi_task till invalidation completes.
This prevents use of ICD when invalidation of that ICD is being processed.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
Jitendra Bhivare
987132167f scsi: be2iscsi: Fix for crash in beiscsi_eh_device_reset
System crashes when sg_reset is executed in a loop.
CPU: 13 PID: 7073 Comm: sg_reset Tainted: G            E   4.8.0-rc1+ #4
RIP: 0010:[<ffffffffa0825370>]  [<ffffffffa0825370>]
beiscsi_eh_device_reset+0x160/0x520 [be2iscsi]
Call Trace:
[<ffffffff814c7c77>] ? scsi_host_alloc_command+0x47/0xc0
[<ffffffff814caafa>] scsi_try_bus_device_reset+0x2a/0x50
[<ffffffff814cb46e>] scsi_ioctl_reset+0x13e/0x260
[<ffffffff814ca477>] scsi_ioctl+0x137/0x3d0
[<ffffffffa05e4ba2>] sg_ioctl+0x572/0xc20 [sg]
[<ffffffff8123f627>] do_vfs_ioctl+0xa7/0x5d0

The accesses to beiscsi_io_task is being protected in device reset handler
with frwd_lock but the freeing of task can happen under back_lock.

Hold the reference of iscsi_task till invalidation completes.
This prevents use of ICD when invalidation of that ICD is being processed.
Use frwd_lock for iscsi_tasks looping and back_lock to access
beiscsi_io_task structures.

Rewrite mgmt_invalidation_icds to handle allocation and freeing of IOCTL
buffer in one place.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
Jitendra Bhivare
f350501377 scsi: be2iscsi: Fix use of invalidate command table req
Remove shared structure inv_tbl in phba for all sessions to post
invalidation IOCTL.
Always allocate and then free the table after use in reset handler.
Abort handler needs just one instance so define it on stack.
Add checks for BE_INVLDT_CMD_TBL_SZ to not exceed invalidation
command table size in IOCTL.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
Dan Carpenter
19099dc393 scsi: dpt_i2o: double free if adpt_i2o_online_hba() fails
There are two places where adpt_i2o_online_hba() is called.  Both
callers call adpt_i2o_delete_hba(pHba) if adpt_i2o_online_hba() fails
and since we also free it here that causes a double free bug.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
Corentin Labbe
e0075478ca scsi: mptlan: Remove linux/miscdevice.h from mptlan.h
This patch remove linux/miscdevice.h from mptlan.h since mptlan.h does
not contain any miscdevice.  The only user of it is mptctl.c which
already include linux/miscdevice.h So no need to include it twice.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
Kees Cook
5f5fca6db3 scsi: cciss: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
Kees Cook
93380123fb scsi: hpsa: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
James Smart
4b089d18ed scsi: lpfc: lpfc version change to 11.2.0.4
lpfc version change to 11.2.0.4

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
James Smart
6b3b3bdb83 scsi: lpfc: Add missing memory barrier
On loosely ordered memory systems (PPC for example), the WQE elements
were being updated in memory, but not necessarily flushed before the
separate doorbell was written to hw which would cause hw to dma the
WQE element. Thus, the hardware occasionally received partially
updated WQE data.

Add the memory barrier after updating the WQE memory.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
James Smart
2f07784f05 scsi: lpfc: Correct oops on vport port resets
Correct oops on vport port resets. Incorrect WQE type, thus the clearing
code actually overstepped the WQE.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
James Smart
f2bf460cf5 scsi: lpfc: Deprecate lpfc_prot_sg_seg_cnt parameter
Deprecate lpfc_prot_sg_seg_cnt parameter. Eliminates driver from
unnecessarily limiting DIF s/g list length.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
James Smart
b5749fe182 scsi: lpfc: Fix Xlane dynamic LUN set for LUN priority.
Fix Xlane dynamic LUN set for LUN priority. Dynamic changing of the
priority was not getting reflected on the LUN.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
James Smart
104450eb08 scsi: lpfc: FCoE VPort enable-disable does not bring up the VPort
FCoE VPort enable-disable does not bring up the VPort.
VPI structure needed to be initialized before being re-registered.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
James Smart
6c9231f604 scsi: lpfc: Correct host name in symbolic_name field
Correct host name in symbolic_name field of nameserver registrations

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
James Smart
e6c6acc0e0 scsi: lpfc: Correct issue leading to oops during link reset
Correct issue leading to oops during link reset. Missing vport pointer.

[mkp: fixed typo]

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
James Smart
8c6a6f4076 scsi: lpfc: Deprecate lpfc_soft_wwn parameter
Deprecate lpfc_soft_wwn parameter.
No longer allow override of hw-assigned wwns

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
James Smart
b2fd103b05 scsi: lpfc: Correct error in setting OS Driver Version with FW
Correct error in setting OS Driver Version with FW.  Prior length was
too short.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@Suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:12 -05:00
James Smart
e0165f2044 scsi: lpfc: Clear the VendorVersion in the PLOGI/PLOGI ACC payload
Clear the VendorVersion in the PLOGI/PLOGI ACC payload

Vendor version info may have been set on fabric login. Before sending
PLOGI payloads, ensure that it's cleared.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:11 -05:00
Long Li
40630f4628 scsi: storvsc: properly set residual data length on errors
On I/O errors, the Windows driver doesn't set data_transfer_length
on error conditions other than SRB_STATUS_DATA_OVERRUN.
In these cases we need to set data_transfer_length to 0,
indicating there is no data transferred. On SRB_STATUS_DATA_OVERRUN,
data_transfer_length is set by the Windows driver to the actual data transferred.

Reported-by: Shiva Krishna <Shiva.Krishna@nimblestorage.com>
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:11 -05:00
Long Li
bba5dc332e scsi: storvsc: properly handle SRB_ERROR when sense message is present
When sense message is present on error, we should pass along to the upper
layer to decide how to deal with the error.
This patch fixes connectivity issues with Fiber Channel devices.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:11 -05:00
Long Li
3cd6d3d9b1 scsi: storvsc: use tagged SRB requests if supported by the device
Properly set SRB flags when hosting device supports tagged queuing.
This patch improves the performance on Fiber Channel disks.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:11 -05:00
K. Y. Srinivasan
d86adf482b scsi: storvsc: Enable multi-queue support
Enable multi-q support. We will allocate the outgoing channel using
the following policy:

        1. We will make every effort to pick a channel that is in the
           same NUMA node that is initiating the I/O
        2. The mapping between the guest CPU and the outgoing channel
           is persistent.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:11 -05:00
K. Y. Srinivasan
9779652835 scsi: storvsc: Remove the restriction on max segment size
Remove the artificially imposed restriction on max segment size.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:11 -05:00
K. Y. Srinivasan
f64dad2628 scsi: storvsc: Enable tracking of queue depth
Enable tracking of queue depth.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 00:21:11 -05:00
Linus Torvalds
0c744ea4f7 Linux 4.10-rc2 2017-01-01 14:31:53 -08:00
Linus Torvalds
4759d386d5 Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull DAX updates from Dan Williams:
 "The completion of Jan's DAX work for 4.10.

  As I mentioned in the libnvdimm-for-4.10 pull request, these are some
  final fixes for the DAX dirty-cacheline-tracking invalidation work
  that was merged through the -mm, ext4, and xfs trees in -rc1. These
  patches were prepared prior to the merge window, but we waited for
  4.10-rc1 to have a stable merge base after all the prerequisites were
  merged.

  Quoting Jan on the overall changes in these patches:

     "So I'd like all these 6 patches to go for rc2. The first three
      patches fix invalidation of exceptional DAX entries (a bug which
      is there for a long time) - without these patches data loss can
      occur on power failure even though user called fsync(2). The other
      three patches change locking of DAX faults so that ->iomap_begin()
      is called in a more relaxed locking context and we are safe to
      start a transaction there for ext4"

  These have received a build success notification from the kbuild
  robot, and pass the latest libnvdimm unit tests. There have not been
  any -next releases since -rc1, so they have not appeared there"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  ext4: Simplify DAX fault path
  dax: Call ->iomap_begin without entry lock during dax fault
  dax: Finish fault completely when loading holes
  dax: Avoid page invalidation races and unnecessary radix tree traversals
  mm: Invalidate DAX radix tree entries only if appropriate
  ext2: Return BH_New buffers for zeroed blocks
2017-01-01 12:27:05 -08:00
Linus Torvalds
238d1d0f79 Two small fixes:
- A merge error on my part broke the DocBook build.  I've requisitioned
    one of tglx's frozen sharks for appropriate disciplinary action and
    resolved to be more careful about testing the DocBook stuff as long as
    it's still around.
 
  - Fix an error in unaligned-memory-access.txt
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYZoL/AAoJEI3ONVYwIuV6zCQP/2q0yjKH0qVe+FQqjqBkAcfs
 m6B/8QXKLEiTfS4sthvIa2iQgN6ZKWTg8DyVndXoGhE5qgZZAuctxxNj2nwpt0XD
 jjTrfLVhyASXCCCfSDPcujtgrQElNOm814+ZsAo5e4KCJk1V7Ap1NlwJpd6beld5
 ac6INH3r8X3Qgm2BNbiMJ3VApubcjAkjJRi75mc1JIZGtIAf8ePzjkKVpGyTsaaM
 qQQDbx5mXNrYPXFJYHgCOGcVGkWRAqIcDiYscS6XHpmj5DWfz/x6OdicdeB9l7b4
 Jw4TWVNTXTWz0THRUD74N1SdHwwetnWnZ3abcaorFIA/yr8tw1OuMmEqGn5VNGX5
 Bxk8YYMjjezH4y7+XC7IGhZ1dcrGrdc3AJTqZYIoTOI+HxGJ4RjZsqHghgHy3Qvi
 YgLumVmXLQQ6gSTE7sRixq/2hpLUlUrR0po+zQcwO/8Ka0JA860qhYcTjNzRdO1s
 csQIJtybytbtX28lz4eMxadFcwTtMyYMyVTslUsP/HJQefNcpPz97OQ0zV7UTmi8
 zpHRBo5GnDJNL0u/R6tNDsDo5XaXzy+uR/HDUz/oBn7QofhtrUsgBX2DHeV6pqr7
 guvfVVhQ4I3r7/BGyFaiXf28W9f3ADwE1jbibLjEA2IrLxaClIGLxLS/fGql5MAI
 FQhqlQhsKFeVTAxMtC9j
 =W4e5
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "Two small fixes:

   - A merge error on my part broke the DocBook build. I've
     requisitioned one of tglx's frozen sharks for appropriate
     disciplinary action and resolved to be more careful about testing
     the DocBook stuff as long as it's still around.

   - Fix an error in unaligned-memory-access.txt"

* tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux:
  Documentation/unaligned-memory-access.txt: fix incorrect comparison operator
  docs: Fix build failure
2016-12-30 09:32:26 -08:00
Linus Torvalds
f3de082c12 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "This fixes a boot failure on some platforms when crypto self test is
  enabled along with the new acomp interface"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: testmgr - Use heap buffer for acomp test input
2016-12-30 09:29:50 -08:00
Olof Johansson
98473f9f3f mm/filemap: fix parameters to test_bit()
mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte':
  mm/filemap.c:933:9: error: too few arguments to function 'test_bit'
    return test_bit(PG_waiters);
         ^~~~~~~~

Fixes: b91e1302ad ('mm: optimize PageWaiters bit use for unlock_page()')
Signed-off-by: Olof Johansson <olof@lixom.net>
Brown-paper-bag-by: Linus Torvalds <dummy@duh.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-29 14:46:39 -08:00
Linus Torvalds
b91e1302ad mm: optimize PageWaiters bit use for unlock_page()
In commit 6290602709 ("mm: add PageWaiters indicating tasks are
waiting for a page bit") Nick Piggin made our page locking no longer
unconditionally touch the hashed page waitqueue, which not only helps
performance in general, but is particularly helpful on NUMA machines
where the hashed wait queues can bounce around a lot.

However, the "clear lock bit atomically and then test the waiters bit"
sequence turns out to be much more expensive than it needs to be,
because you get a nasty stall when trying to access the same word that
just got updated atomically.

On architectures where locking is done with LL/SC, this would be trivial
to fix with a new primitive that clears one bit and tests another
atomically, but that ends up not working on x86, where the only atomic
operations that return the result end up being cmpxchg and xadd.  The
atomic bit operations return the old value of the same bit we changed,
not the value of an unrelated bit.

On x86, we could put the lock bit in the high bit of the byte, and use
"xadd" with that bit (where the overflow ends up not touching other
bits), and look at the other bits of the result.  However, an even
simpler model is to just use a regular atomic "and" to clear the lock
bit, and then the sign bit in eflags will indicate the resulting state
of the unrelated bit #7.

So by moving the PageWaiters bit up to bit #7, we can atomically clear
the lock bit and test the waiters bit on x86 too.  And architectures
with LL/SC (which is all the usual RISC suspects), the particular bit
doesn't matter, so they are fine with this approach too.

This avoids the extra access to the same atomic word, and thus avoids
the costly stall at page unlock time.

The only downside is that the interface ends up being a bit odd and
specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
love the resulting name of the new primitive, but I'd rather make the
name be descriptive and very clear about the limitation imposed by
trying to work across all relevant architectures than make it be some
generic thing that doesn't make the odd semantics explicit.

So this introduces the new architecture primitive

    clear_bit_unlock_is_negative_byte();

and adds the trivial implementation for x86.  We have a generic
non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
combination) which can be overridden by any architecture that can do
better.  According to Nick, Power has the same hickup x86 has, for
example, but some other architectures may not even care.

All these optimizations mean that my page locking stress-test (which is
just executing a lot of small short-lived shell scripts: "make test" in
the git source tree) no longer makes our page locking look horribly bad.
Before all these optimizations, just the unlock_page() costs were just
over 3% of all CPU overhead on "make test".  After this, it's down to
0.66%, so just a quarter of the cost it used to be.

(The difference on NUMA is bigger, but there this micro-optimization is
likely less noticeable, since the big issue on NUMA was not the accesses
to 'struct page', but the waitqueue accesses that were already removed
by Nick's earlier commit).

Acked-by: Nick Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-29 11:03:15 -08:00