Commit graph

367 commits

Author SHA1 Message Date
Michael Kelley
1463575044 nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices
[ Upstream commit c292a337d0 ]

The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are
non-functional on NVMe devices because the nvme_pr_clear()
and nvme_pr_release() functions set the IEKEY field incorrectly.
The IEKEY field should be set only when the key is zero (i.e,
not specified).  The current code does it backwards.

Furthermore, the NVMe spec describes the persistent
reservation "clear" function as an option on the reservation
release command. The current implementation of nvme_pr_clear()
erroneously uses the reservation register command.

Fix these errors. Note that NVMe version 1.3 and later specify
that setting the IEKEY field will return an error of Invalid
Field in Command.  The fix will set IEKEY when the key is zero,
which is appropriate as these ioctls consider a zero key to
be "unspecified", and the intention of the spec change is
to require a valid key.

Tested on a version 1.4 PCI NVMe device in an Azure VM.

Fixes: 1673f1f08c ("nvme: move block_device_operations and ns/ctrl freeing to common code")
Fixes: 1d277a637a ("NVMe: Add persistent reservation ops")
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26 13:15:33 +02:00
Chaitanya Kulkarni
2d1e103c6d nvme: add new line after variable declatation
[ Upstream commit f1c772d581 ]

Add a new line in functions nvme_pr_preempt(), nvme_pr_clear(), and
nvme_pr_release() after variable declaration which follows the rest of
the code in the nvme/host/core.c.

No functional change(s) in this patch.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: c292a337d0 ("nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26 13:15:33 +02:00
Smith, Kyle Miller (Nimble Kernel)
8321b17789 nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags
[ Upstream commit da42761181 ]

In nvme_alloc_admin_tags, the admin_q can be set to an error (typically
-ENOMEM) if the blk_mq_init_queue call fails to set up the queue, which
is checked immediately after the call. However, when we return the error
message up the stack, to nvme_reset_work the error takes us to
nvme_remove_dead_ctrl()
  nvme_dev_disable()
   nvme_suspend_queue(&dev->queues[0]).

Here, we only check that the admin_q is non-NULL, rather than not
an error or NULL, and begin quiescing a queue that never existed, leading
to bad / NULL pointer dereference.

Signed-off-by: Kyle Smith <kyles@hpe.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 16:52:29 +02:00
Christophe JAILLET
da6fe93816 nvme-pci: Fix an error handling path in 'nvme_probe()'
commit b00c9b7aa0 upstream.

Release resources in the correct order in order not to miss a
'put_device()' if 'nvme_dev_map()' fails.

Fixes: b00a726a9f ("NVMe: Don't unmap controller registers on reset")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22 11:42:57 +02:00
Sagi Grimberg
8daf2ab063 nvmet: don't check iosqes,iocqes for discovery controllers
commit d218a8a300 upstream.

From the base spec, Figure 78:

  "Controller Configuration, these fields are defined as parameters to
   configure an "I/O Controller (IOC)" and not to configure a "Discovery
   Controller (DC).

   ...
   If the controller does not support I/O queues, then this field shall
   be read-only with a value of 0h

Just perform this check for I/O controllers.

Fixes: a07b4970f4 ("nvmet: add a generic NVMe target")
Reported-by: Belanger, Martin <Martin.Belanger@dell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24 10:59:23 +01:00
zhenwei pi
482851c978 nvmet: fix uninitialized work for zero kato
[ Upstream commit 85bd23f3dc ]

When connecting a controller with a zero kato value using the following
command line

   nvme connect -t tcp -n NQN -a ADDR -s PORT --keep-alive-tmo=0

the warning below can be reproduced:

WARNING: CPU: 1 PID: 241 at kernel/workqueue.c:1627 __queue_delayed_work+0x6d/0x90
with trace:
  mod_delayed_work_on+0x59/0x90
  nvmet_update_cc+0xee/0x100 [nvmet]
  nvmet_execute_prop_set+0x72/0x80 [nvmet]
  nvmet_tcp_try_recv_pdu+0x2f7/0x770 [nvmet_tcp]
  nvmet_tcp_io_work+0x63f/0xb2d [nvmet_tcp]
  ...

This is caused by queuing up an uninitialized work.  Althrough the
keep-alive timer is disabled during allocating the controller (fixed in
0d3b6a8d21), ka_work still has a chance to run (called by
nvmet_start_ctrl).

Fixes: 0d3b6a8d21 ("nvmet: Disable keep-alive timer when kato is cleared to 0h")
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29 09:05:41 +01:00
Amit Engel
20696cd172 nvmet: Disable keep-alive timer when kato is cleared to 0h
[ Upstream commit 0d3b6a8d21 ]

Based on nvme spec, when keep alive timeout is set to zero
the keep-alive timer should be disabled.

Signed-off-by: Amit Engel <amit.engel@dell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-12 11:47:32 +02:00
Ard Biesheuvel
9ca487b760 nvme: retain split access workaround for capability reads
[ Upstream commit 3a8ecc935e ]

Commit 7fd8930f26

  "nvme: add a common helper to read Identify Controller data"

has re-introduced an issue that we have attempted to work around in the
past, in commit a310acd7a7 ("NVMe: use split lo_hi_{read,write}q").

The problem is that some PCIe NVMe controllers do not implement 64-bit
outbound accesses correctly, which is why the commit above switched
to using lo_hi_[read|write]q for all 64-bit BAR accesses occuring in
the code.

In the mean time, the NVMe subsystem has been refactored, and now calls
into the PCIe support layer for NVMe via a .reg_read64() method, which
fails to use lo_hi_readq(), and thus reintroduces the problem that the
workaround above aimed to address.

Given that, at the moment, .reg_read64() is only used to read the
capability register [which is known to tolerate split reads], let's
switch .reg_read64() to lo_hi_readq() as well.

This fixes a boot issue on some ARM boxes with NVMe behind a Synopsys
DesignWare PCIe host controller.

Fixes: 7fd8930f26 ("nvme: add a common helper to read Identify Controller data")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-29 10:24:33 +01:00
Ivan Bornyakov
c39c0be92c nvme: host: core: fix precedence of ternary operator
commit e9a9853c23 upstream.

Ternary operator have lower precedence then bitwise or, so 'cdw10' was
calculated wrong.

Signed-off-by: Ivan Bornyakov <brnkv.i1@gmail.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21 10:42:21 +01:00
Tom Wu
d722a4f1c3 nvmet: fix data units read and written counters in SMART log
[ Upstream commit 3bec2e3754 ]

In nvme spec 1.3 there is a definition for data write/read counters
from SMART log, (See section 5.14.1.2):
	This value is reported in thousands (i.e., a value of 1
	corresponds to 1000 units of 512 bytes read) and is rounded up.

However, in nvme target where value is reported with actual units,
but not thousands of units as the spec requires.

Signed-off-by: Tom Wu <tomwu@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 12:30:25 +02:00
Jaesoo Lee
0e6a89a37a nvme: Fix u32 overflow in the number of namespace list calculation
[ Upstream commit c8e8c77b3b ]

The Number of Namespaces (nn) field in the identify controller data structure is
defined as u32 and the maximum allowed value in NVMe specification is
0xFFFFFFFEUL. This change fixes the possible overflow of the DIV_ROUND_UP()
operation used in nvme_scan_ns_list() by casting the nn to u64.

Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10 09:55:32 +02:00
Yufen Yu
87b622f949 nvme-loop: init nvmet_ctrl fatal_err_work when allocate
[ Upstream commit d11de63f2b ]

After commit 4d43d395fe (workqueue: Try to catch flush_work() without
INIT_WORK()), it can cause warning when delete nvme-loop device, trace
like:

[   76.601272] Call Trace:
[   76.601646]  ? del_timer+0x72/0xa0
[   76.602156]  __cancel_work_timer+0x1ae/0x270
[   76.602791]  cancel_work_sync+0x14/0x20
[   76.603407]  nvmet_ctrl_free+0x1b7/0x2f0 [nvmet]
[   76.604091]  ? free_percpu+0x168/0x300
[   76.604652]  nvmet_sq_destroy+0x106/0x240 [nvmet]
[   76.605346]  nvme_loop_destroy_admin_queue+0x30/0x60 [nvme_loop]
[   76.606220]  nvme_loop_shutdown_ctrl+0xc3/0xf0 [nvme_loop]
[   76.607026]  nvme_loop_delete_ctrl_host+0x19/0x30 [nvme_loop]
[   76.607871]  nvme_do_delete_ctrl+0x75/0xb0
[   76.608477]  nvme_sysfs_delete+0x7d/0xc0
[   76.609057]  dev_attr_store+0x24/0x40
[   76.609603]  sysfs_kf_write+0x4c/0x60
[   76.610144]  kernfs_fop_write+0x19a/0x260
[   76.610742]  __vfs_write+0x1c/0x60
[   76.611246]  vfs_write+0xfa/0x280
[   76.611739]  ksys_write+0x6e/0x120
[   76.612238]  __x64_sys_write+0x1e/0x30
[   76.612787]  do_syscall_64+0xbf/0x3a0
[   76.613329]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

We fix it by moving fatal_err_work init to nvmet_alloc_ctrl(), which may
more reasonable.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-08 07:19:08 +02:00
Raju Rangoju
f63ee3bb14 nvmet-rdma: fix null dereference under heavy load
commit 5cbab6303b upstream.

Under heavy load if we don't have any pre-allocated rsps left, we
dynamically allocate a rsp, but we are not actually allocating memory
for nvme_completion (rsp->req.rsp). In such a case, accessing pointer
fields (req->rsp->status) in nvmet_req_init() will result in crash.

To fix this, allocate the memory for nvme_completion by calling
nvmet_rdma_alloc_rsp()

Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load")

Cc: <stable@vger.kernel.org>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31 08:12:37 +01:00
Israel Rukshin
8d1ee2d54d nvmet-rdma: Add unlikely for response allocated check
commit ad1f824948 upstream.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Raju  Rangoju <rajur@chelsio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31 08:12:37 +01:00
Israel Rukshin
36764b4a43 nvmet-rdma: fix response use after free
[ Upstream commit d7dcdf9d4e ]

nvmet_rdma_release_rsp() may free the response before using it at error
flow.

Fixes: 8407879 ("nvmet-rdma: fix possible bogus dereference under heavy load")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-21 14:11:39 +01:00
Daniel Verkamp
f9dde41946 nvmet: fix space padding in serial number
[ Upstream commit c739969849 ]

Commit 42de82a8b5 previously attempted to fix this, and it did
correctly pad the MN and FR fields with spaces, but the SN field still
contains 0 bytes.  The current code fills out the first 16 bytes with
hex2bin, leaving the last 4 bytes zeroed.  Rather than adding a lot of
error-prone math to avoid overwriting SN twice, just set the whole thing
to spaces up front (it's only 20 bytes).

Fixes: 42de82a8b5 ("nvmet: don't report 0-bytes in serial number")
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10 07:42:55 -08:00
Max Gurtovoy
3cd731e952 nvme-pci: fix CMB sysfs file removal in reset path
[ Upstream commit 1c78f7735b ]

Currently we create the sysfs entry even if we fail mapping
it. In that case, the unmapping will not remove the sysfs created
file. There is no good reason to create a sysfs entry for a non
working CMB and show his characteristics.

Fixes: f63572dff ("nvme: unmap CMB and remove sysfs file in reset path")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10 07:42:46 -08:00
Sagi Grimberg
4483073ed3 nvmet-rdma: fix possible bogus dereference under heavy load
[ Upstream commit 8407879c4e ]

Currently we always repost the recv buffer before we send a response
capsule back to the host. Since ordering is not guaranteed for send
and recv completions, it is posible that we will receive a new request
from the host before we got a send completion for the response capsule.

Today, we pre-allocate 2x rsps the length of the queue, but in reality,
under heavy load there is nothing that is really preventing the gap to
expand until we exhaust all our rsps.

To fix this, if we don't have any pre-allocated rsps left, we dynamically
allocate a rsp and make sure to free it when we are done. If under memory
pressure we fail to allocate a rsp, we silently drop the command and
wait for the host to retry.

Reported-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
[hch: dropped a superflous assignment]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10 08:53:21 +02:00
Max Gurtuvoy
313d65a648 nvmet: reset keep alive timer in controller enable
[ Upstream commit d68a90e148 ]

Controllers that are not yet enabled should not really enforce keep alive
timeouts, but we still want to track a timeout and cleanup in case a host
died before it enabled the controller.  Hence, simply reset the keep
alive timer when the controller is enabled.

Suggested-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-24 13:12:30 +02:00
Keith Busch
062c4965a1 nvme-pci: Remap CMB SQ entries on every controller reset
commit 815c6704bf upstream.

The controller memory buffer is remapped into a kernel address on each
reset, but the driver was setting the submission queue base address
only on the very first queue creation. The remapped address is likely to
change after a reset, so accessing the old address will hit a kernel bug.

This patch fixes that by setting the queue's CMB base address each time
the queue is created.

Fixes: f63572dff1 ("nvme: unmap CMB and remove sysfs file in reset path")
Reported-by: Christian Black <christian.d.black@intel.com>
Cc: Jon Derrick <jonathan.derrick@intel.com>
Cc: <stable@vger.kernel.org> # 4.9+
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-17 11:37:54 +02:00
Scott Bauer
93e54f40c8 nvme: validate admin queue before unquiesce
commit 7dd1ab163c upstream.

With a misbehaving controller it's possible we'll never
enter the live state and create an admin queue. When we
fail out of reset work it's possible we failed out early
enough without setting up the admin queue. We tear down
queues after a failed reset, but needed to do some more
sanitization.

Fixes 443bd90f2c: "nvme: host: unquiesce queue in nvme_kill_queues()"

[  189.650995] nvme nvme1: pci function 0000:0b:00.0
[  317.680055] nvme nvme0: Device not ready; aborting reset
[  317.680183] nvme nvme0: Removing after probe failure status: -19
[  317.681258] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  317.681397] general protection fault: 0000 [#1] SMP KASAN
[  317.682984] CPU: 3 PID: 477 Comm: kworker/3:2 Not tainted 4.13.0-rc1+ #5
[  317.683112] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F5 03/07/2016
[  317.683284] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
[  317.683398] task: ffff8803b0990000 task.stack: ffff8803c2ef0000
[  317.683516] RIP: 0010:blk_mq_unquiesce_queue+0x2b/0xa0
[  317.683614] RSP: 0018:ffff8803c2ef7d40 EFLAGS: 00010282
[  317.683716] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 1ffff1006fbdcde3
[  317.683847] RDX: 0000000000000038 RSI: 1ffff1006f5a9245 RDI: 0000000000000000
[  317.683978] RBP: ffff8803c2ef7d58 R08: 1ffff1007bcdc974 R09: 0000000000000000
[  317.684108] R10: 1ffff1007bcdc975 R11: 0000000000000000 R12: 00000000000001c0
[  317.684239] R13: ffff88037ad49228 R14: ffff88037ad492d0 R15: ffff88037ad492e0
[  317.684371] FS:  0000000000000000(0000) GS:ffff8803de6c0000(0000) knlGS:0000000000000000
[  317.684519] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  317.684627] CR2: 0000002d1860c000 CR3: 000000045b40d000 CR4: 00000000003406e0
[  317.684758] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  317.684888] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  317.685018] Call Trace:
[  317.685084]  nvme_kill_queues+0x4d/0x170 [nvme_core]
[  317.685185]  nvme_remove_dead_ctrl_work+0x3a/0x90 [nvme]
[  317.685289]  process_one_work+0x771/0x1170
[  317.685372]  worker_thread+0xde/0x11e0
[  317.685452]  ? pci_mmcfg_check_reserved+0x110/0x110
[  317.685550]  kthread+0x2d3/0x3d0
[  317.685617]  ? process_one_work+0x1170/0x1170
[  317.685704]  ? kthread_create_on_node+0xc0/0xc0
[  317.685785]  ret_from_fork+0x25/0x30
[  317.685798] Code: 0f 1f 44 00 00 55 48 b8 00 00 00 00 00 fc ff df 48 89 e5 41 54 4c 8d a7 c0 01 00 00 53 48 89 fb 4c 89 e2 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 75 50 48 8b bb c0 01 00 00 e8 33 8a f9 00 0f ba b3
[  317.685872] RIP: blk_mq_unquiesce_queue+0x2b/0xa0 RSP: ffff8803c2ef7d40
[  317.685908] ---[ end trace a3f8704150b1e8b4 ]---

Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[ adapted for 4.9: added check around blk_mq_start_hw_queues() call
  instead of upstream blk_mq_unquiesce_queue() ]
Fixes: 4aae438816 ("nvme: fix hang in remove path")
Signed-off-by: Simon Veith <sveith@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Amit Shah <aams@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-17 11:37:51 +02:00
Martin Wilck
1c4eb2a50e nvmet: don't overwrite identify sn/fr with 0-bytes
commit 42819eb7a0 upstream.

The merged version of my patch "nvmet: don't report 0-bytes in serial
number" fails to remove two lines which should have been replaced,
so that the space-padded strings are overwritten again with 0-bytes.
Fix it.

Fixes: 42de82a8b5 nvmet: don't report 0-bytes in serial number
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-16 09:52:33 +02:00
Martin Wilck
f43d8e4c86 nvmet: don't report 0-bytes in serial number
commit 42de82a8b5 upstream.

The NVME standard mandates that the SN, MN, and FR fields of the Identify
Controller Data Structure be "ASCII strings".  That means that they may
not contain 0-bytes, not even string terminators.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
[hch: fixed for the move of the serial field, updated description]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-16 09:52:32 +02:00
Johannes Thumshirn
1e38f8e986 nvmet: Move serial number from controller to subsystem
commit 2e7f5d2af2 upstream.

The NVMe specification defines the serial number as:

"Serial Number (SN): Contains the serial number for the NVM subsystem
that is assigned by the vendor as an ASCII string. Refer to section
7.10 for unique identifier requirements. Refer to section 1.5 for ASCII
string requirements"

So move it from the controller to the subsystem, where it belongs.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-16 09:52:32 +02:00
Keith Busch
b53761a18e nvme-pci: initialize queue memory before interrupts
commit 161b8be2bd upstream.

A spurious interrupt before the nvme driver has initialized the completion
queue may inadvertently cause the driver to believe it has a completion
to process. This may result in a NULL dereference since the nvmeq's tags
are not set at this point.

The patch initializes the host's CQ memory so that a spurious interrupt
isn't mistaken for a real completion.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-16 09:52:32 +02:00
Johannes Thumshirn
f2a1bf4511 nvme: don't send keep-alives to the discovery controller
[ Upstream commit 74c6c71530 ]

NVMe over Fabrics 1.0 Section 5.2 "Discovery Controller Properties and
Command Support" Figure 31 "Discovery Controller – Admin Commands"
explicitly listst all commands but "Get Log Page" and "Identify" as
reserved, but NetApp report the Linux host is sending Keep Alive
commands to the discovery controller, which is a violation of the
Spec.

We're already checking for discovery controllers when configuring the
keep alive timeout but when creating a discovery controller we're not
hard wiring the keep alive timeout to 0 and thus remain on
NVME_DEFAULT_KATO for the discovery controller.

This can be easily remproduced when issuing a direct connect to the
discovery susbsystem using:
'nvme connect [...] --nqn=nqn.2014-08.org.nvmexpress.discovery'

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: 07bfcd09a2 ("nvme-fabrics: add a generic NVMe over Fabrics library")
Reported-by: Martin George <marting@netapp.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:50:39 +02:00
Max Gurtovoy
c0074250ea nvmet: fix PSDT field check in command format
[ Upstream commit bffd2b6167 ]

PSDT field section according to NVM_Express-1.3:
"This field specifies whether PRPs or SGLs are used for any data
transfer associated with the command. PRPs shall be used for all
Admin commands for NVMe over PCIe. SGLs shall be used for all Admin
and I/O commands for NVMe over Fabrics. This field shall be set to
01b for NVMe over Fabrics 1.0 implementations.

Suggested-by: Idan Burstein <idanb@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:50:33 +02:00
Jianchao Wang
23f9fb0f53 nvme-pci: Fix nvme queue cleanup if IRQ setup fails
[ Upstream commit f25a2dfc20 ]

This patch fixes nvme queue cleanup if requesting an IRQ handler for
the queue's vector fails. It does this by resetting the cq_vector to
the uninitialized value of -1 so it is ignored for a controller reset.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
[changelog updates, removed misc whitespace changes]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:50:31 +02:00
Ming Lei
4aae438816 nvme: fix hang in remove path
[ Upstream commit 82654b6b8e ]

We need to start admin queues too in nvme_kill_queues()
for avoiding hang in remove path[1].

This patch is very similar with 806f026f9b901eaf(nvme: use
blk_mq_start_hw_queues() in nvme_kill_queues()).

[1] hang stack trace
[<ffffffff813c9716>] blk_execute_rq+0x56/0x80
[<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
[<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
[<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
[<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
[<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
[<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
[<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
[<ffffffff8156d951>] device_del+0x101/0x320
[<ffffffff8156db8a>] device_unregister+0x1a/0x60
[<ffffffff8156dc4c>] device_destroy+0x3c/0x50
[<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
[<ffffffff815d4858>] nvme_remove+0x78/0x110
[<ffffffff81452b69>] pci_device_remove+0x39/0xb0
[<ffffffff81572935>] device_release_driver_internal+0x155/0x210
[<ffffffff81572a02>] device_release_driver+0x12/0x20
[<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
[<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
[<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
[<ffffffff810c5ac9>] kthread+0x109/0x140
[<ffffffff8185800c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions")
Reported-by: Rakesh Pandit <rakesh@tuxera.com>
Tested-by: Rakesh Pandit <rakesh@tuxera.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:48:23 +02:00
Rakesh Pandit
383f15ee34 nvme-pci: fix multiple ctrl removal scheduling
[ Upstream commit 82b057caef ]

Commit c5f6ce97c1 tries to address multiple resets but fails as
work_busy doesn't involve any synchronization and can fail.  This is
reproducible easily as can be seen by WARNING below which is triggered
with line:

WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)

Allowing multiple resets can result in multiple controller removal as
well if different conditions inside nvme_reset_work fail and which
might deadlock on device_release_driver.

[  480.327007] WARNING: CPU: 3 PID: 150 at drivers/nvme/host/pci.c:1900 nvme_reset_work+0x36c/0xec0
[  480.327008] Modules linked in: rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast...
[  480.327044]  btusb videobuf2_core ghash_clmulni_intel snd_hwdep cfg80211 acer_wmi hci_uart..
[  480.327065] CPU: 3 PID: 150 Comm: kworker/u16:2 Not tainted 4.12.0-rc1+ #13
[  480.327065] Hardware name: Acer Predator G9-591/Mustang_SLS, BIOS V1.10 03/03/2016
[  480.327066] Workqueue: nvme nvme_reset_work
[  480.327067] task: ffff880498ad8000 task.stack: ffffc90002218000
[  480.327068] RIP: 0010:nvme_reset_work+0x36c/0xec0
[  480.327069] RSP: 0018:ffffc9000221bdb8 EFLAGS: 00010246
[  480.327070] RAX: 0000000000460000 RBX: ffff880498a98128 RCX: dead000000000200
[  480.327070] RDX: 0000000000000001 RSI: ffff8804b1028020 RDI: ffff880498a98128
[  480.327071] RBP: ffffc9000221be50 R08: 0000000000000000 R09: 0000000000000000
[  480.327071] R10: ffffc90001963ce8 R11: 000000000000020d R12: ffff880498a98000
[  480.327072] R13: ffff880498a53500 R14: ffff880498a98130 R15: ffff880498a98128
[  480.327072] FS:  0000000000000000(0000) GS:ffff8804c1cc0000(0000) knlGS:0000000000000000
[  480.327073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  480.327074] CR2: 00007ffcf3c37f78 CR3: 0000000001e09000 CR4: 00000000003406e0
[  480.327074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  480.327075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  480.327075] Call Trace:
[  480.327079]  ? __switch_to+0x227/0x400
[  480.327081]  process_one_work+0x18c/0x3a0
[  480.327082]  worker_thread+0x4e/0x3b0
[  480.327084]  kthread+0x109/0x140
[  480.327085]  ? process_one_work+0x3a0/0x3a0
[  480.327087]  ? kthread_park+0x60/0x60
[  480.327102]  ret_from_fork+0x2c/0x40
[  480.327103] Code: e8 5a dc ff ff 85 c0 41 89 c1 0f.....

This patch addresses the problem by using state of controller to
decide whether reset should be queued or not as state change is
synchronizated using controller spinlock.  Also cancel_work_sync is
used to make sure remove cancels the reset_work and waits for it to
finish.  This patch also changes return value from -ENODEV to more
appropriate -EBUSY if nvme_reset fails to change state.

Fixes: c5f6ce97c1 ("nvme: don't schedule multiple resets")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:48:22 +02:00
Keith Busch
d4ea6118d3 nvme: check hw sectors before setting chunk sectors
[ Upstream commit 249159c5f1 ]

Some devices with IDs matching the "stripe" quirk don't actually have
this quirk, and don't have an MDTS value. When MDTS is not set, the
driver sets the max sectors to UINT_MAX, which is not a power of 2,
hitting a BUG_ON from blk_queue_chunk_sectors. This patch skips setting
chunk sectors for such devices.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03 10:23:20 +01:00
Sagi Grimberg
8f21b63c9d nvme-loop: handle cpu unplug when re-establishing the controller
[ Upstream commit 945dd5bacc ]

If a cpu unplug event has occured, we need to take the minimum
of the provided nr_io_queues and the number of online cpus,
otherwise we won't be able to connect them as blk-mq mapping
won't dispatch to those queues.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:23:37 +01:00
Christoph Hellwig
6a559523ee nvme: use kref_get_unless_zero in nvme_find_get_ns
[ Upstream commit 2dd4122854 ]

For kref_get_unless_zero to protect against lookup vs free races we need
to use it in all places where we aren't guaranteed to already hold a
reference.  There is no such guarantee in nvme_find_get_ns, so switch to
kref_get_unless_zero in this function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:07:31 +01:00
Sagi Grimberg
65bfe003dc nvmet-rdma: Fix a possible uninitialized variable dereference
[ Upstream commit b25634e2a0 ]

When handling a new recv command, we grab a new rsp resource and
check for the queue state being live. In case the queue is not in
live state, we simply restore the rsp back to the free list. However
in this flow we didn't set rsp->queue yet, so we cannot dereference it.

Instead, make sure to initialize rsp->queue (and other rsp members)
as soon as possible so we won't reference uninitialized variables.

Reported-by: Yi Zhang <yizhan@redhat.com>
Reported-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:07:25 +01:00
Sagi Grimberg
571e47760d nvmet: confirm sq percpu has scheduled and switched to atomic
[ Upstream commit d11ea004a4 ]

percpu_ref_kill is not enough to prevent subsequent
percpu_ref_tryget_live from failing. Hence call
perfcpu_ref_kill_confirm to make it safe.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:07:25 +01:00
Sagi Grimberg
af0cee086b nvme-loop: fix a possible use-after-free when destroying the admin queue
[ Upstream commit e4c5d3762e ]

we need to destroy the nvmet sq and let it finish gracefully
before continue to cleanup the queue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:07:25 +01:00
Sagi Grimberg
3976dd677e nvmet: cancel fatal error and flush async work before free controller
[ Upstream commit 06406d81a2 ]

Make sure they are not running and we can free the controller
safely.

Signed-off-by: Roy Shterman <roys@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 22:01:54 +01:00
Jeff Lien
f425b05025 nvme-pci: add quirk for delay before CHK RDY for WDC SN200
commit 8c97eeccf0 upstream.

And increase the existing delay to cover this device as well.

Signed-off-by: Jeff Lien <jeff.lien@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05 11:24:34 +01:00
Daniel Verkamp
82040f5c1b nvmet: fix KATO offset in Set Features
[ Upstream commit 6c73f94930 ]

The Set Features implementation for Keep Alive Timer was using the wrong
structure when retrieving the KATO value; it was treating the Set
Features command as a Property Set command.

The NVMe spec defines the Keep Alive Timer feature as having one input
in CDW11 (4 bytes at offset 44 in the command) whereas the code was
reading 8 bytes at offset 48.

Since the Linux NVMe over Fabrics host never sets this feature, this
code has presumably never been tested.

Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30 08:39:15 +00:00
Christoph Hellwig
c83bbed234 nvme-pci: Use PCI bus address for data/queues in CMB
commit 8969f1f829 upstream.

Currently, NVMe PCI host driver is programming CMB dma address as
I/O SQs addresses. This results in failures on systems where 1:1
outbound mapping is not used (example Broadcom iProc SOCs) because
CMB BAR will be progammed with PCI bus address but NVMe PCI EP will
try to access CMB using dma address.

To have CMB working on systems without 1:1 outbound mapping, we
program PCI bus address for I/O SQs instead of dma address. This
approach will work on systems with/without 1:1 outbound mapping.

Based on a report and previous patch from Abhishek Shah.

Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available")
Reported-by: Abhishek Shah <abhishek.shah@broadcom.com>
Tested-by: Abhishek Shah <abhishek.shah@broadcom.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12 11:51:25 +02:00
Sagi Grimberg
9b6f9da9e5 nvme-rdma: handle cpu unplug when re-establishing the controller
[ Upstream commit c248c64387 ]

If a cpu unplug event has occured, we need to take the minimum
of the provided nr_io_queues and the number of online cpus,
otherwise we won't be able to connect them as blk-mq mapping
won't dispatch to those queues.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08 10:26:10 +02:00
Daniel Verkamp
f52a535c84 nvme-fabrics: generate spec-compliant UUID NQNs
commit 40a5fce495 upstream.

The default host NQN, which is generated based on the host's UUID,
does not follow the UUID-based NQN format laid out in the NVMe 1.3
specification.  Remove the "NVMf:" portion of the NQN to match the spec.

Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Marta Rybczynska
d17cc7b7a7 nvme-rdma: remove race conditions from IB signalling
commit 5e599d73c1 upstream.

This patch improves the way the RDMA IB signalling is done by using atomic
operations for the signalling variable. This avoids race conditions on
sig_count.

The signalling interval changes slightly and is now the largest power of
two not larger than queue depth / 2.

ilog() usage idea by Bart Van Assche.

Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27 15:08:03 -07:00
Parav Pandit
6149abe7f4 nvmet-rdma: Fix missing dma sync to nvme data structures
[ Upstream commit 748ff8408f ]

This patch performs dma sync operations on nvme_command
and nvme_completion.

nvme_command is synced
(a) on receiving of the recv queue completion for cpu access.
(b) before posting recv wqe back to rdma adapter for device access.

nvme_completion is synced
(a) on receiving of the recv queue completion of associated
nvme_command for cpu access.
(b) before posting send wqe to rdma adapter for device access.

This patch is generated for git://git.infradead.org/nvme-fabrics.git
Branch: nvmf-4.10

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:41:55 +02:00
Ming Lei
21f33b1577 nvme: avoid to use blk_mq_abort_requeue_list()
commit 986f75c876 upstream.

NVMe may add request into requeue list simply and not kick off the
requeue if hw queues are stopped. Then blk_mq_abort_requeue_list()
is called in both nvme_kill_queues() and nvme_ns_remove() for
dealing with this issue.

Unfortunately blk_mq_abort_requeue_list() is absolutely a
race maker, for example, one request may be requeued during
the aborting. So this patch just calls blk_mq_kick_requeue_list() in
nvme_kill_queues() to handle this issue like what nvme_start_queues()
does. Now all requests in requeue list when queues are stopped will be
handled by blk_mq_kick_requeue_list() when queues are restarted, either
in nvme_start_queues() or in nvme_kill_queues().

Reported-by: Zhang Yi <yizhan@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:48 +02:00
Ming Lei
510b0ec7f6 nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()
commit 806f026f9b upstream.

Inside nvme_kill_queues(), we have to start hw queues for
draining requests in sw queues, .dispatch list and requeue list,
so use blk_mq_start_hw_queues() instead of blk_mq_start_stopped_hw_queues()
which only run queues if queues are stopped, but the queues may have
been started already, for example nvme_start_queues() is called in reset work
function.

blk_mq_start_hw_queues() run hw queues in current context, instead
of running asynchronously like before. Given nvme_kill_queues() is
run from either remove context or reset worker context, both are fine
to run hw queue directly. And the mutex of namespaces_mutex isn't a
problem too becasue nvme_start_freeze() runs hw queue in this way
already.

Reported-by: Zhang Yi <yizhan@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:48 +02:00
Marta Rybczynska
ae05780892 nvme-rdma: support devices with queue size < 32
commit 0544f5494a upstream.

In the case of small NVMe-oF queue size (<32) we may enter a deadlock
caused by the fact that the IB completions aren't sent waiting for 32
and the send queue will fill up.

The error is seen as (using mlx5):
[ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273):
[ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12

This patch changes the way the signaling is done so that it depends on
the queue depth now. The magic define has been removed completely.

Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Signed-off-by: Samuel Jones <sjones@kalray.eu>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:48 +02:00
Jon Derrick
6d6a43a086 nvme: unmap CMB and remove sysfs file in reset path
commit f63572dff1 upstream.

CMB doesn't get unmapped until removal while getting remapped on every
reset. Add the unmapping and sysfs file removal to the reset path in
nvme_pci_disable to match the mapping path in nvme_pci_enable.

Fixes: 202021c1a ("nvme : Add sysfs entry for NVMe CMBs when appropriate")

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Reviewed-By: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:44:46 +02:00
Keith Busch
43cfff65c9 nvme: simplify stripe quirk
[ Upstream commit e6282aef7b ]

Some OEMs believe they own the Identify Controller vendor specific
region and will repurpose it with their own values. While not common,
we can't rely on the PCI VID:DID to tell use how to decode the field
we reserved for this as the stripe size so we need to do something else
for the list of devices using this quirk.

The field was supposed to allow flexibility on the device's back-end
striping, but it turned out that never materialized; the chunk is always
the same as MDTS in the products subscribing to this quirk, so this
patch removes the stripe_size field and sets the chunk to the max hw
transfer size for the devices using this quirk.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-12 12:41:18 +02:00
Keith Busch
02b23e059a nvme/pci: Disable on removal when disconnected
commit 6db28eda26 upstream.

If the device is not present, the driver should disable the queues
immediately. Prior to this, the driver was relying on the watchdog timer
to kill the queues if requests were outstanding to the device, and that
just delays removal up to one second.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08 09:30:36 +02:00