linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-27 22:51:31 +00:00

Author	SHA1	Message	Date
Marcin Dziegielewski	f41d427cdd	lightnvm: prevent race condition on pblk remove When we trigger nvm target remove during device hot unplug, there is a probability to hit a general protection fault. This is caused by use of nvm_dev thay may be freed from another (hot unplug) thread (in the nvm_unregister function). Introduce lock in nvme_ioctl_dev_remove function to prevent this situation. Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:18 -06:00
Marcin Dziegielewski	4bbae69922	lightnvm: pblk: set propper line as data_line after gc In current implementation of l2p recovery, when we are after gc and we have open line, we are not setting current data line properly (we set last line from the device instead of last line ordered by seq_nr) and in consequence, kernel panic and data corruption. Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:18 -06:00
Chansol Kim	0503871223	lightnvm: pblk: fix bio leak when bio is split For large size io where blk_queue_split needs to be called inside pblk_rw_io, results in bio leak as bio_endio is not called on the newly allocated. One way to observe this is to mounting ext4 filesystem on the target and issuing 1MB io with dd, e.g., dd bs=1MB if=/dev/null of=/mount/myvolume. kmemleak reports: unreferenced object 0xffff88803d7d0100 (size 256): comm "kworker/u16:1", pid 68, jiffies 4294899333 (age 284.120s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 60 e8 31 81 88 ff ff .........`.1.... 01 40 00 00 06 06 00 00 00 00 00 00 05 00 00 00 .@.............. backtrace: [<000000001f5aa04f>] kmem_cache_alloc+0x204/0x3c0 [<0000000040945aab>] mempool_alloc_slab+0x1d/0x30 [<00000000b4959ab4>] mempool_alloc+0x83/0x220 [<00000000646bad9b>] bio_alloc_bioset+0x229/0x320 [<000000009264b251>] bio_clone_fast+0x26/0xc0 [<0000000008250252>] bio_split+0x41/0x110 [<00000000e365cad0>] blk_queue_split+0x349/0x930 [<00000000eb5426bc>] pblk_make_rq+0x1b5/0x1f0 [<00000000eea09cec>] generic_make_request+0x2f9/0x690 [<00000000ae6acede>] submit_bio+0x12e/0x1f0 [<00000000f9b8b82a>] ext4_io_submit+0x64/0x80 [<000000009e4f817d>] ext4_bio_write_page+0x32e/0x890 [<00000000cbd0d106>] mpage_submit_page+0x65/0xc0 [<000000000eec7359>] mpage_map_and_submit_buffers+0x171/0x330 [<000000009a7afcb6>] ext4_writepages+0xd5e/0x1650 [<000000004476b096>] do_writepages+0x39/0xc0 In case there is a need for a split, blk_queue_split returns the newly allocated bio to the caller by changing the value of pointer passed as a reference, while the original is passed to generic_make_requests. Although pblk_rw_io's local variable bio* has changed and passed to pblk_submit_read and pblk_write_to_cache, work is done on this new bio, and pblk_rw_io returns NVM_IO_DONE, pblk_make_rq calls bio_endio on the old bio because it passed bio pointer by value to pblk_rw_io. pblk_rw_io is unfolded into pblk_make_rq so that there is no copying of bio* and bio_endio is called on the correct bio*. Signed-off-by: Chansol Kim <chansol.kim@samsung.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:17 -06:00
Igor Konopko	a14669ebc0	lightnvm: Inherit mdts from the parent nvme device Current lightnvm and pblk implementation does not care about NVMe max data transfer size, which can be smaller than 64*K=256K. There are existing NVMe controllers which NVMe max data transfer size is lower that 256K (for example 128K, which happens for existing NVMe controllers which are NVMe spec compliant). Such a controllers are not able to handle command which contains 64 PPAs, since the the size of DMAed buffer will be above the capabilities of such a controller. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:17 -06:00
Igor Konopko	d38954ed1b	lightnvm: pblk: set proper read status in bio Currently in case of read errors, bi_status is not set properly which leads to returning inproper data to layers above. This patch fix that by setting proper status in case of read errors. Also remove unnecessary warn_once(), which does not make sense in that place, since user bio is not used for interation with drive and thus bi_status will not be set here. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:17 -06:00
Igor Konopko	6e46b8b24f	lightnvm: pblk: cleanly fail when there is not enough memory L2P table can be huge in many cases, since it typically requires 1GB of DRAM for 1TB of drive. When there is not enough memory available, OOM killer turns on and kills random processes, which can be very annoying for users. This patch changes the flag for L2P table allocation on order to handle this situation in more user friendly way. GFP_KERNEL and __GPF_HIGHMEM are default flags used in parameterless vmalloc() calls, so they are also keeped in that patch. Additionally __GFP_NOWARN flag is added in order to hide very long dmesg warn in case of the allocation failures. The most important flag introduced in that patch is __GFP_RETRY_MAYFAIL, which would cause allocator to try use free memory and if not available to drop caches, but not to run OOM killer. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:17 -06:00
Igor Konopko	75c89bef6a	lightnvm: pblk: ensure that erase is chunk aligned The sector bits in the erase command may be uninitialized are uninitialized, causing the erase LBA to be unaligned to the chunk size. This is unexpected situation, since erase shall always be chunk aligned based on OCSSD the 2.0 specification. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:17 -06:00
Igor Konopko	4ca8852419	lightnvm: pblk: fix race during put line In the pblk_put_line_back function, a race condition with __pblk_map_invalidate can make a line not part of any lists. Fix gc_list by resetting it to null fixes the above issue. Fixes: `a4bd217` ("lightnvm: physical block device (pblk) target") Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:17 -06:00
Igor Konopko	d378561b8e	lightnvm: pblk: gracefully handle GC vmalloc fail Currently when we fail on rq data allocation in gc, it skips moving active data and moves line straigt to its free state. Losing user data in the process. Move the data allocation to an earlier phase of GC, where we can still fail gracefully by moving line back to the closed state. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:17 -06:00
Igor Konopko	605bcef7f7	lightnvm: pblk: remove unused smeta_ssec field smeta_ssec field in pblk_line is never used after it was replaced by the function pblk_line_smeta_start(). Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:17 -06:00
Igor Konopko	847a3a2788	lightnvm: pblk: reduce L2P memory footprint Currently L2P map size is calculated based on the total number of available sectors, which is redundant, since it contains mapping for overprovisioning as well (11% by default). Change this size to the real capacity and thus reduce the memory footprint significantly - with default op value it is approx. 110MB of DRAM less for every 1TB of media. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:16 -06:00
Igor Konopko	8935ebfc5d	lightnvm: pblk: rollback on error during gc read A line is left unsigned to the blocks lists in case pblk_gc_line returns an error. This moves the line back to be appropriate list, which can then be picked up by the garbage collector. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:16 -06:00
Igor Konopko	7e5434eece	lightnvm: pblk: line reference fix in GC Fixes the GC error case when moving a line back to closed state while releasing additional references. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-06 10:19:16 -06:00
Ming Lei	662156641b	block: don't drain in-progress dispatch in blk_cleanup_queue() Now freeing hw queue resource is moved to hctx's release handler, we don't need to worry about the race between blk_cleanup_queue and run queue any more. So don't drain in-progress dispatch in blk_cleanup_queue(). This is basically revert of `c2856ae2f3` ("blk-mq: quiesce queue before freeing queue"). Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:11 -06:00
Ming Lei	1b97871b50	blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release hctx is always released after requeue is freed. With holding queue's kobject refcount, it is safe for driver to run queue, so one run queue might be scheduled after blk_sync_queue() is done. So moving the cancel of hctx->run_work into blk_mq_hw_sysfs_release() for avoiding run released queue. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:09 -06:00
Ming Lei	2f8f1336a4	blk-mq: always free hctx after request queue is freed In normal queue cleanup path, hctx is released after request queue is freed, see blk_mq_release(). However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because of hw queues shrinking. This way is easy to cause use-after-free, because: one implicit rule is that it is safe to call almost all block layer APIs if the request queue is alive; and one hctx may be retrieved by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues(); finally use-after-free is triggered. Fixes this issue by always freeing hctx after releasing request queue. If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce a per-queue list to hold them, then try to resuse these hctxs if numa node is matched. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:08 -06:00
Ming Lei	7c6c5b7c91	blk-mq: split blk_mq_alloc_and_init_hctx into two parts Split blk_mq_alloc_and_init_hctx into two parts, and one is blk_mq_alloc_hctx() for allocating all hctx resources, another is blk_mq_init_hctx() for initializing hctx, which serves as counter-part of blk_mq_exit_hctx(). Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: Martin K . Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:06 -06:00
Ming Lei	c7e2d94b3d	blk-mq: free hw queue's resource in hctx's release handler Once blk_cleanup_queue() returns, tags shouldn't be used any more, because blk_mq_free_tag_set() may be called. Commit `45a9c9d909` ("blk-mq: Fix a use-after-free") fixes this issue exactly. However, that commit introduces another issue. Before `45a9c9d909`, we are allowed to run queue during cleaning up queue if the queue's kobj refcount is held. After that commit, queue can't be run during queue cleaning up, otherwise oops can be triggered easily because some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue(). We have invented ways for addressing this kind of issue before, such as: `8dc765d438` ("SCSI: fix queue cleanup race before queue initialization is done") `c2856ae2f3` ("blk-mq: quiesce queue before freeing queue") But still can't cover all cases, recently James reports another such kind of issue: https://marc.info/?l=linux-scsi&m=155389088124782&w=2 This issue can be quite hard to address by previous way, given scsi_run_queue() may run requeues for other LUNs. Fixes the above issue by freeing hctx's resources in its release handler, and this way is safe becasue tags isn't needed for freeing such hctx resource. This approach follows typical design pattern wrt. kobject's release handler. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reported-by: James Smart <james.smart@broadcom.com> Fixes: `45a9c9d909` ("blk-mq: Fix a use-after-free") Cc: stable@vger.kernel.org Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:05 -06:00
Ming Lei	fbc2a15e34	blk-mq: move cancel of requeue_work into blk_mq_release With holding queue's kobject refcount, it is safe for driver to schedule requeue. However, blk_mq_kick_requeue_list() may be called after blk_sync_queue() is done because of concurrent requeue activities, then requeue work may not be completed when freeing queue, and kernel oops is triggered. So moving the cancel of requeue_work into blk_mq_release() for avoiding race between requeue and freeing queue. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:04 -06:00
Ming Lei	e87eb301be	blk-mq: grab .q_usage_counter when queuing request from plug code path Just like aio/io_uring, we need to grab 2 refcount for queuing one request, one is for submission, another is for completion. If the request isn't queued from plug code path, the refcount grabbed in generic_make_request() serves for submission. In theroy, this refcount should have been released after the sumission(async run queue) is done. blk_freeze_queue() works with blk_sync_queue() together for avoiding race between cleanup queue and IO submission, given async run queue activities are canceled because hctx->run_work is scheduled with the refcount held, so it is fine to not hold the refcount when running the run queue work function for dispatch IO. However, if request is staggered into plug list, and finally queued from plug code path, the refcount in submission side is actually missed. And we may start to run queue after queue is removed because the queue's kobject refcount isn't guaranteed to be grabbed in flushing plug list context, then kernel oops is triggered, see the following race: blk_mq_flush_plug_list(): blk_mq_sched_insert_requests() insert requests to sw queue or scheduler queue blk_mq_run_hw_queue Because of concurrent run queue, all requests inserted above may be completed before calling the above blk_mq_run_hw_queue. Then queue can be freed during the above blk_mq_run_hw_queue(). Fixes the issue by grab .q_usage_counter before calling blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is safe because the queue is absolutely alive before inserting request. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:02 -06:00
Jens Axboe	6143393c1b	Merge branch 'nvme-5.2' of git://git.infradead.org/nvme into for-5.2/block Pull NVMe updates from Christoph. * 'nvme-5.2' of git://git.infradead.org/nvme: nvmet: protect discovery change log event list iteration nvme: mark nvme_core_init and nvme_core_exit static nvme: move command size checks to the core nvme-fabrics: check more command sizes nvme-pci: check more command sizes nvme-pci: remove an unneeded variable initialization nvme-pci: unquiesce admin queue on shutdown nvme-pci: shutdown on timeout during deletion nvme-pci: fix psdt field for single segment sgls nvme-multipath: don't print ANA group state by default nvme-multipath: split bios with the ns_head bio_set before submitting nvme-tcp: fix possible null deref on a timed out io queue connect	2019-05-02 16:07:05 -06:00
Raul E Rangel	273938bf7a	block: fix function name in comment The comment was out of date. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Raul E Rangel <rrangel@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-02 15:51:52 -06:00
Sagi Grimberg	6f53e73b9e	nvmet: protect discovery change log event list iteration When we iterate on the discovery subsystem controllers we need to protect against concurrent mutations to it. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Minwoo Im <minwoo.im@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-05-01 09:18:47 -04:00
Christoph Hellwig	893a74b7a7	nvme: mark nvme_core_init and nvme_core_exit static Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>	2019-05-01 09:18:47 -04:00
Christoph Hellwig	811015409f	nvme: move command size checks to the core Most command aren't PCIe specific, so move the size checking for them to core.c Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>	2019-05-01 09:18:36 -04:00
Minwoo Im	a2faf94e57	nvme-fabrics: check more command sizes struct common_command provides a common structure for NVMe-oF command format. It also needs to be checked for unintended size growth. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-05-01 09:17:27 -04:00
Minwoo Im	a97234e1ff	nvme-pci: check more command sizes All the NVMe command has 64bytes fixed size so that it has been assured with BUILD_BUG_ON(). The remaining command structures in linux/nvme.h also need to be checked here. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-05-01 09:17:27 -04:00
Minwoo Im	665648673e	nvme-pci: remove an unneeded variable initialization Variable "n" will be assigned once kstrtoint() succeeds, otherwise it will not be referred because kstrtoint() will return an error which means go out from this function. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-05-01 09:17:27 -04:00
Keith Busch	c8e9e9b764	nvme-pci: unquiesce admin queue on shutdown Just like IO queues, the admin queue also will not be restarted after a controller shutdown. Unquiesce this queue so that we do not block request dispatch on a permanently disabled controller. Reported-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-05-01 09:17:22 -04:00
Keith Busch	9dc1a38ef1	nvme-pci: shutdown on timeout during deletion We do not restart a controller in a deleting state for timeout errors. When in this state, unblock potential request dispatchers with failed completions by shutting down the controller on timeout detection. Reported-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-05-01 09:17:18 -04:00
Klaus Birkelund Jensen	049bf37262	nvme-pci: fix psdt field for single segment sgls The shortcut for single segment SGL requests did not set the PSDT field to mark the request as using SGLs. Fixes: `297910571f` ("nvme-pci: optimize mapping single segment requests using SGLs") Signed-off-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-05-01 09:17:16 -04:00
Hannes Reinecke	592b6e7b02	nvme-multipath: don't print ANA group state by default Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-05-01 09:17:16 -04:00
Hannes Reinecke	525aa5a705	nvme-multipath: split bios with the ns_head bio_set before submitting If the bio is moved to a different queue via blk_steal_bios() and the original queue is destroyed in nvme_remove_ns() we'll be ending with a crash in bio_endio() as the mempool for the split bio bvecs had already been destroyed. So split the bio using the original queue (which will remain during the lifetime of the bio) before sending it down to the underlying device. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-05-01 09:17:15 -04:00
Sagi Grimberg	f34e25898a	nvme-tcp: fix possible null deref on a timed out io queue connect If I/O queue connect times out, we might have freed the queue socket already, so check for that on the error path in nvme_tcp_start_queue. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-05-01 09:17:15 -04:00
Jens Axboe	2d5abb9a1e	bcache: make is_discard_enabled() static It's not used outside this file. Fixes: `631207314d` ("bcache: fix failure in journal relplay") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-01 06:34:09 -06:00
Christoph Hellwig	12adb7a013	block: remove the unused blk_queue_dma_pad function Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 16:12:36 -06:00
Christoph Hellwig	3dcf60bcb6	block: add SPDX tags to block layer files missing licensing information Various block layer files do not have any licensing information at all. Add SPDX tags for the default kernel GPLv2 license to those. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 16:12:03 -06:00
Christoph Hellwig	6353599813	block: add a SPDX tag to blk-mq-rdma.h This file has no copyright notice, but was added as part of a commit adding another file using the default kernel GPLv2 license. Add a matching SPDX tag. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 16:12:02 -06:00
Christoph Hellwig	9fcd030baa	sed-opal.h: remove redundant licence boilerplate The file already has the correct SPDX header. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 16:12:00 -06:00
Christoph Hellwig	a497ee34a4	block: switch all files cleared marked as GPLv2 or later to SPDX tags All these files have some form of the usual GPLv2 or later boilerplate. Switch them to use SPDX tags instead. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 16:11:59 -06:00
Christoph Hellwig	8c16567d86	block: switch all files cleared marked as GPLv2 to SPDX tags All these files have some form of the usual GPLv2 boilerplate. Switch them to use SPDX tags instead. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 16:11:57 -06:00
Christoph Hellwig	dcdca753c1	block: clean up __bio_add_pc_page a bit Share the bi_size update by moving the done label up, and duplicate the bv_len update in the two callers to get rid of the bvec_merge label. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 09:26:44 -06:00
Christoph Hellwig	6601e44efd	block: remove bogus comments in __bio_add_pc_page We are never called with file system pages by defintions for the passthrough interface, and we also never undo any addition later these days. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 09:26:42 -06:00
Christoph Hellwig	4713839dfe	block: remove the __bio_add_pc_page export The same page optimization is a rather odd corner case, which is not used outside bio.c and which really should not be used outside of bio.c either - we have better highlevel helpers like the rq/bio mapping helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 09:26:41 -06:00
Christoph Hellwig	2b070cfe58	block: remove the i argument to bio_for_each_segment_all We only have two callers that need the integer loop iterator, and they can easily maintain it themselves. Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 09:26:13 -06:00
Christoph Hellwig	f936b06ae5	bcache: clean up do_btree_node_write a bit Use a variable containing the buffer address instead of the to be removed integer iterator from bio_for_each_segment_all. Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 09:26:11 -06:00
Coly Li	cdca22bcbc	bcache: remove redundant LIST_HEAD(journal) from run_cache_set() Commit `95f18c9d13` ("bcache: avoid potential memleak of list of journal_replay(s) in the CACHE_SYNC branch of run_cache_set") forgets to remove the original define of LIST_HEAD(journal), which makes the change no take effect. This patch removes redundant variable LIST_HEAD(journal) from run_cache_set(), to make Shenghui's fix working. Fixes: `95f18c9d13` ("bcache: avoid potential memleak of list of journal_replay(s) in the CACHE_SYNC branch of run_cache_set") Reported-by: Juha Aatrokoski <juha.aatrokoski@aalto.fi> Cc: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-30 08:20:46 -06:00
Jens Axboe	41d7f2ed84	Merge branch 'nvme-5.2' of git://git.infradead.org/nvme into for-5.2/block Pull NVMe changes from Christoph. * 'nvme-5.2' of git://git.infradead.org/nvme: nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE nvme-rdma: fix typo in struct comment nvme-loop: kill timeout handler nvme-tcp: rename function to have nvme_tcp prefix nvme-rdma: fix a NULL deref when an admin connect times out nvme-tcp: fix a NULL deref when an admin connect times out nvmet-tcp: don't fail maxr2t greater than 1 nvmet-file: clamp-down file namespace lba_shift nvmet: include <linux/scatterlist.h> nvmet: return a specified error it subsys_alloc fails nvmet: rename nvme_completion instances from rsp to cqe nvmet-rdma: remove p2p_client initialization from fast-path	2019-04-26 10:25:19 -06:00
Christoph Hellwig	cc6be13159	mtip32xx: remove trim support The trim support in mtip32xx has been "temporarily" disabled for 6 years, which is 3/4 of the time the driver even exists in the tree. Remove it as it obviously is dead code now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-25 09:11:53 -06:00
Sagi Grimberg	01fa017484	nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE If our target exposed a namespace with a block size that is greater than PAGE_SIZE, set 0 capacity on the namespace as we do not support it. This issue encountered when the nvmet namespace was backed by a tempfile. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-04-25 16:51:42 +02:00

1 2 3 4 5 ...

826623 commits