linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-05 08:26:59 +00:00

Author	SHA1	Message	Date
Christoph Hellwig	44e44b29fb	nvme: move the retries count to struct nvme_request The way NVMe uses this field is entirely different from the older SCSI/BLOCK_PC usage, so move it into struct nvme_request. Also reduce the size of the file to a unsigned char so that we leave space for additional smaller fields that will appear soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-05 12:05:08 -06:00
Christoph Hellwig	83f3aeb386	nvme: mark nvme_max_retries static Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-05 12:05:08 -06:00
Christoph Hellwig	f6324b1bb7	nvme: cleanup nvme_req_needs_retry Don't pass the status explicitly but derive it from the requeust, and unwind the complex condition to be more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-05 12:05:08 -06:00
Christoph Hellwig	987f699a8f	nvme: move ->retries setup to nvme_setup_cmd ->retries is counting the number of times a command is resubmitted, and be cleared on the first time we see the command. We currently don't do that for non-PCIe command, which is easily fixed by moving the setup to common code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-05 12:05:08 -06:00
Christoph Hellwig	8e14be53f4	remove the obsolete hd driver This driver is for pre-IDE hardisk that are only found in PC from the stoneage of personal computing, and which we don't support elsewhere in the kernel these days. It's also been marked broken forever. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-05 09:50:34 -06:00
Bart Van Assche	f2fbc9dd78	blk-mq: Remove blk_mq_queue_data.list The block layer core sets blk_mq_queue_data.list but no block drivers read that member. Hence remove it and also the code that is used to set this member. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-05 09:40:15 -06:00
Jan Kara	142bbdfccc	cfq: Disable writeback throttling by default Writeback throttling does not play well with CFQ since that also tries to throttle async writes. As a result async writeback can get starved in presence of readers. As an example take a benchmark simulating postgreSQL database running over a standard rotating SATA drive. There are 16 processes doing random reads from a huge file (2*machine memory), 1 process doing random writes to the huge file and calling fsync once per 50000 writes and 1 process doing sequential 8k writes to a relatively small file wrapping around at the end of the file and calling fsync every 5 writes. Under this load read latency easily exceeds the target latency of 75 ms (just because there are so many reads happening against a relatively slow disk) and thus writeback is throttled to a point where only 1 write request is allowed at a time. Blktrace data then looks like: 8,0 1 0 8.347751764 0 m N cfq workload slice:40000000 8,0 1 0 8.347755256 0 m N cfq293A / set_active wl_class: 0 wl_type:0 8,0 1 0 8.347784100 0 m N cfq293A / Not idling. st->count:1 8,0 1 3814 8.347763916 5839 UT N [kworker/u9:2] 1 8,0 0 0 8.347777605 0 m N cfq293A / Not idling. st->count:1 8,0 1 0 8.347784100 0 m N cfq293A / Not idling. st->count:1 8,0 3 1596 8.354364057 0 C R 156109528 + 8 (6906954) [0] 8,0 3 0 8.354383193 0 m N cfq6196SN / complete rqnoidle 0 8,0 3 0 8.354386476 0 m N cfq schedule dispatch 8,0 3 0 8.354399397 0 m N cfq293A / Not idling. st->count:1 8,0 3 0 8.354404705 0 m N cfq293A / dispatch_insert 8,0 3 0 8.354409454 0 m N cfq293A / dispatched a request 8,0 3 0 8.354412527 0 m N cfq293A / activate rq, drv=1 8,0 3 1597 8.354414692 0 D W 145961400 + 24 (`6718452`) [swapper/0] 8,0 3 0 8.354484184 0 m N cfq293A / Not idling. st->count:1 8,0 3 0 8.354487536 0 m N cfq293A / slice expired t=0 8,0 3 0 8.354498013 0 m N / served: vt=5888102466265088 min_vt=5888074869387264 8,0 3 0 8.354502692 0 m N cfq293A / sl_used=6737519 disp=1 charge=6737519 iops=0 sect=24 8,0 3 0 8.354505695 0 m N cfq293A / del_from_rr ... 8,0 0 1810 8.354728768 0 C W 145961400 + 24 (314076) [0] 8,0 0 0 8.354746927 0 m N cfq293A / complete rqnoidle 0 ... 8,0 1 3829 8.389886102 5839 G W 145962968 + 24 [kworker/u9:2] 8,0 1 3830 8.389888127 5839 P N [kworker/u9:2] 8,0 1 3831 8.389908102 5839 A W 145978336 + 24 <- (8,4) 44000 8,0 1 3832 8.389910477 5839 Q W 145978336 + 24 [kworker/u9:2] 8,0 1 3833 8.389914248 5839 I W 145962968 + 24 (28146) [kworker/u9:2] 8,0 1 0 8.389919137 0 m N cfq293A / insert_request 8,0 1 0 8.389924305 0 m N cfq293A / add_to_rr 8,0 1 3834 8.389933175 5839 UT N [kworker/u9:2] 1 ... 8,0 0 0 9.455290997 0 m N cfq workload slice:40000000 8,0 0 0 9.455294769 0 m N cfq293A / set_active wl_class:0 wl_type:0 8,0 0 0 9.455303499 0 m N cfq293A / fifo=ffff880003166090 8,0 0 0 9.455306851 0 m N cfq293A / dispatch_insert 8,0 0 0 9.455311251 0 m N cfq293A / dispatched a request 8,0 0 0 9.455314324 0 m N cfq293A / activate rq, drv=1 8,0 0 2043 9.455316210 6204 D W 145962968 + 24 (1065401962) [pgioperf] 8,0 0 0 9.455392407 0 m N cfq293A / Not idling. st->count:1 8,0 0 0 9.455395969 0 m N cfq293A / slice expired t=0 8,0 0 0 9.455404210 0 m N / served: vt=5888958194597888 min_vt=5888941810597888 8,0 0 0 9.455410077 0 m N cfq293A / sl_used=4000000 disp=1 charge=4000000 iops=0 sect=24 8,0 0 0 9.455416851 0 m N cfq293A / del_from_rr ... 8,0 0 2045 9.455648515 0 C W 145962968 + 24 (332305) [0] 8,0 0 0 9.455668350 0 m N cfq293A / complete rqnoidle 0 ... 8,0 1 4371 9.455710115 5839 G W 145978336 + 24 [kworker/u9:2] 8,0 1 4372 9.455712350 5839 P N [kworker/u9:2] 8,0 1 4373 9.455730159 5839 A W 145986616 + 24 <- (8,4) 52280 8,0 1 4374 9.455732674 5839 Q W 145986616 + 24 [kworker/u9:2] 8,0 1 4375 9.455737563 5839 I W 145978336 + 24 (27448) [kworker/u9:2] 8,0 1 0 9.455742871 0 m N cfq293A / insert_request 8,0 1 0 9.455747550 0 m N cfq293A / add_to_rr 8,0 1 4376 9.455756629 5839 UT N [kworker/u9:2] 1 So we can see a Q event for a write request, then IO is blocked by writeback throttling and G and I events for the request happen only once other writeback IO is completed. Thus CFQ always sees only one write request. When it sees it, it queues the async queue behind all the read queues and the async queue gets scheduled after about one second. When it is scheduled, that one request gets dispatched and async queue is expired as it has no more requests to submit. Overall we submit about one write request per second. Although this scheduling is beneficial for read latency, writes are heavily starved and this causes large delays all over the system (due to processes blocking on page lock, transaction starts, etc.). When writeback throttling is disabled, write throughput is about one fifth of a read throughput which roughly matches readers/writers ratio and overall the system stalls are much shorter. Mixing writeback throttling logic with CFQ throttling logic is always a recipe for surprises as CFQ assumes it sees the big part of the picture which is not necessarily true when writeback throttling is blocking requests. So disable writeback throttling logic by default when CFQ is used as an IO scheduler. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-05 08:15:08 -06:00
Adam Manzanares	85003a446e	block: fix inheriting request priority from bio In 4.10 I introduced a patch that associates the ioc priority with each request in the block layer. This work was done in the single queue block layer code. This patch unifies ioc priority to request mapping across the single/multi queue block layers. I have tested this patch with the null block device driver with the following parameters. null_blk queue_mode=2 irqmode=0 use_per_node_hctx=1 nr_devices=1 I have not seen a performance regression with this patch and I would appreciate any feedback or additional testing. I have also verified that io priorities are passed to the device when using the SQ and MQ path to a SATA HDD that supports io priorities. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 15:39:47 -06:00
Christoph Hellwig	77f02a7acd	nvme: factor request completion code into a common helper This avoids duplicating the logic four times, and it also allows to keep some helpers static in core.c or just opencode them. Note that this loses printing the aborted status on completions in the PCI driver as that uses a data structure not available any more. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Christoph Hellwig	4bca70d067	nvme-fc: drop ctrl for all command completions A requeue means we go through nvme_fc_start_fcp_op again and get another controller reference. To make sure the refcount doesn't leak we also need to drop it for every completion that came from the LLDD. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	f2cd54d3eb	nvme-fc: increment request retries counter before requeuing This way our max retry limit holds as well. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	7d9a5e7176	nvme-loop: increment request retries counter before requeuing This way our max retry limit holds as well. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	e806666e25	nvme-rdma: increment request retries counter before requeuing This way our max retry limit holds as well. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
James Smart	62eeacb0e0	nvme_fc: Clean up host fcpio done status handling As Dan Carpenter pointed out: mixing 16-bit nvme status with 32-bit error status from driver. Corrected comment on fcp request struct status field, and converted done routine to explicitly set nvme status codes for nvme status. Signed-off-by: James Smart <james.smart@broadcom.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
James Smart	c820ad4cda	nvmet_fc: Clear SG list to avoid double frees Clear SG list to avoid double frees of payload page list Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
James Smart	f77fc87c37	nvme_fc: correct LS validation LS validations shouldn't have been independent checks. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
James Smart	4083aa986f	nvmet_fc: Sync NVME LS reject reasons with spec nvmet_fc: Sync NVME LS reject reasons with spec Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
James Smart	726a1080e5	nvme_fc: Add check of status_code in ERSP_IU Add check of status_code in ERSP_IU Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
James Smart	0f222ccce3	nvme_fc: Sync FC-NVME header with standard Update FC-NVME definitions to match FC-NVME r1.14 (16-020vB) plus change voted in by 2/22 FC-NVME Adhoc (see HOSTID below). Includes the following: - Addition of "status_code" field to ERSP IU - Addition of FC-NVME LS RJT reason_codes and reason_explanations - CreateAssociation payload, HostID field shortened to 16 bytes Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	fd8563ced8	nvme-rdma: Support ctrl_loss_tmo Before scheduling a reconnect attempt, check nr_reconnects against max_reconnects, if not exhausted (or max_reconnects is not -1), schedule a reconnect attempts, otherwise schedule ctrl removal. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	42a45274c2	nvme-fabrics: Allow ctrl loss timeout configuration When a host sense that its controller session is damaged, it tries to re-establish it periodically (reconnect every reconnect_delay). It may very well be that the controller is gone and never coming back, in this case the host will try to reconnect forever. Add a ctrl_loss_tmo to bound the number of reconnect attempts to a specific controller (default to a reasonable 10 minutes). The timeout configuration is actually translated into number of reconnect attempts and not a schedule on its own but rather divided with reconnect_delay. This is useful to prevent racing flows of remove and reconnect, and it doesn't really matter if we remove slightly sooner than what the user requested. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	7777bdedf3	nvme-rdma: get rid of local reconnect_delay we already have it in opts. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	3b06837630	nvme-loop: retrieve iod from the cqe command_id useful to validate that the we didn't mess up the command_id. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	d89a39be5f	nvme-loop: remove unneeded includes Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	c0e4a6f594	nvme-fc: fix module_init (theoretical) error path If nvmf_register_transport happened to fail (it can't, but theoretically) we leak memory. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	d19eef029d	nvme-loop: fix module_init (theoretical) error path if nvmf_register_transport happend to fail, we need to nvmet_unregister_transport as well. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	a56c79cfd3	nvme-rdma: fix module_init (theoretical) error path If nvmf_register_transport happened to fail (it can't, but theoretically) we leak memory. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Max Gurtovoy	2ca0786d5a	nvmet: use symbolic constants for log identifiers Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Parav Pandit	64a0ca88ea	nvmet: Introduced helper routine for controller status check. This patch introduces helper function for checking controller status during admin and io command processing which returns u16 status. As to bring consistency on returning status, other friend functions also now return u16 status instead of int to match the spec. As part of the theseerror log prints in also prints qid on which command error occured. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Parav Pandit	4151dd9a58	nvmet: Fixed avoided printing nvmet: twice in error logs. This patch avoids printing "nvmet:" twice in error logs as its already coming through pr_fmt macro. Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	4459e04297	iscsi-target: use generic inet_pton_with_scope Instead of parsing address strings, use a generic helper. Acked-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	0928f9b4f1	nvme-rdma: use inet_pton_with_scope helper Both the destination and the host addresses are now parsed using inet_pton_with_scope helper. We also get ipv6 (with address scopes support). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	670c2a3ad5	nvmet-rdma: use generic inet_pton_with_scope Instead of parsing address strings, use a generic helper. This also adds ipv6 (with address scopes) support. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	b1a951fe46	net/utils: generic inet_pton_with_scope helper Several locations in the stack need to handle ipv4/ipv6 (with scope) and port strings conversion to sockaddr. Add a helper that takes either AF_INET, AF_INET6 or AF_UNSPEC (for wildcard) to centralize this handling. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	297186d640	nvme-loop: remove some code duplication Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	782d820ca4	nvme-rdma: Give some more grace for rdma connection establishment The target might be occupied with multiple hosts so lets give it some more grace before failing the connection establishment. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	777dc82395	nvmet-rdma: occasionally flush ongoing controller teardown If we are attacked with establishments/teradowns we need to make sure we do not consume too much system memory. Thus let ongoing controller teardowns complete before accepting new controller establishments. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	dc2ad16ab2	nvme-rdma: handle cpu unplug when re-establishing the controller If a cpu unplug event has occured, we need to take the minimum of the provided nr_io_queues and the number of online cpus, otherwise we won't be able to connect them as blk-mq mapping won't dispatch to those queues. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	8d61413db6	nvmet-rdma: Fix a possible uninitialized variable dereference When handling a new recv command, we grab a new rsp resource and check for the queue state being live. In case the queue is not in live state, we simply restore the rsp back to the free list. However in this flow we didn't set rsp->queue yet, so we cannot dereference it. Instead, make sure to initialize rsp->queue (and other rsp members) as soon as possible so we won't reference uninitialized variables. Reported-by: Yi Zhang <yizhan@redhat.com> Reported-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	427242ce99	nvmet: confirm sq percpu has scheduled and switched to atomic percpu_ref_kill is not enough to prevent subsequent percpu_ref_tryget_live from failing. Hence call perfcpu_ref_kill_confirm to make it safe. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Sagi Grimberg	6ecda70ea9	nvme-loop: handle cpu unplug when re-establishing the controller If a cpu unplug event has occured, we need to take the minimum of the provided nr_io_queues and the number of online cpus, otherwise we won't be able to connect them as blk-mq mapping won't dispatch to those queues. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2017-04-04 09:44:43 -06:00
Sagi Grimberg	d476983ea0	nvme-loop: fix a possible use-after-free when destroying the admin queue we need to destroy the nvmet sq and let it finish gracefully before continue to cleanup the queue. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2017-04-04 09:44:41 -06:00
Eric Biggers	f363b089be	blk-mq: constify struct blk_mq_ops Constify all instances of blk_mq_ops, as they are never modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-03-31 08:28:58 -06:00
Jens Axboe	db5bcf87bb	null_blk: add blocking mode This adds a new module parameter to null_blk, blocking. If set, null_blk will set the BLK_MQ_F_BLOCKING flag, indicating that it sometimes/always needs to block in its ->queue_rq() function. The intent is to help find regressions in blocking drivers, since not many of them exist. If null_blk is loaded with submit_queues > 1 and blocking=1, this shows the regression recently fixed by `bf4907c05e`. Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-03-30 13:44:26 -06:00
Jens Axboe	bf4907c05e	blk-mq: fix schedule-under-preempt for blocking drivers Commit `a4d907b6a3` unified the single and multi queue request handlers, but in the process, it also screwed up the locking balance and calls blk_mq_try_issue_directly() with the ctx preempt lock held. This is a problem for drivers that have set BLK_MQ_F_BLOCKING, since now they can't reliably sleep. While in there, protect against similar issues in the future, by adding a might_sleep() trigger in the BLOCKING path for direct issue or queue run. Reported-by: Josef Bacik <josef@toxicpanda.com> Tested-by: Josef Bacik <josef@toxicpanda.com> Fixes: `a4d907b6a3` ("blk-mq: streamline blk_mq_make_request") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-03-30 12:30:39 -06:00
Colin Ian King	47d752076a	block/sed-opal: fix spelling mistake: "Lifcycle" -> "Lifecycle" trivial fix to spelling mistake in pr_err error message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-03-30 09:22:53 -06:00
Minchan Kim	3e06eb3dac	block: do not put mq context in blk_mq_alloc_request_hctx In blk_mq_alloc_request_hctx, blk_mq_sched_get_request doesn't get sw context so we don't need to put the context with blk_mq_put_ctx. Unless, we will see preempt counter underflow. Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-03-30 08:13:05 -06:00
Jens Axboe	3e8a7069b9	blk-mq: include errors in did_work calculation Currently we return true in blk_mq_dispatch_rq_list() if we queued IO successfully, but we really want to return whether or not the we made progress. Progress includes if we got an error return. If we don't, this can lead to a hang in blk_mq_sched_dispatch_requests() when a driver is draining IO by returning BLK_MQ_QUEUE_ERROR instead of manually ending the IO in error and return BLK_MQ_QUEUE_OK. Tested-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-03-29 13:21:13 -06:00
Josef Bacik	b58e176914	block-mq: don't re-queue if we get a queue error When try to issue a request directly and we fail we will requeue the request, but call blk_mq_end_request() as well. This leads to the completed request being on a queuelist and getting ended twice, which causes list corruption in schedulers and other shenanigans. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-03-29 13:18:18 -06:00
Tahsin Erdogan	457e490f2b	blkcg: allocate struct blkcg_gq outside request queue spinlock blkg_conf_prep() currently calls blkg_lookup_create() while holding request queue spinlock. This means allocating memory for struct blkcg_gq has to be made non-blocking. This causes occasional -ENOMEM failures in call paths like below: pcpu_alloc+0x68f/0x710 __alloc_percpu_gfp+0xd/0x10 __percpu_counter_init+0x55/0xc0 cfq_pd_alloc+0x3b2/0x4e0 blkg_alloc+0x187/0x230 blkg_create+0x489/0x670 blkg_lookup_create+0x9a/0x230 blkg_conf_prep+0x1fb/0x240 __cfqg_set_weight_device.isra.105+0x5c/0x180 cfq_set_weight_on_dfl+0x69/0xc0 cgroup_file_write+0x39/0x1c0 kernfs_fop_write+0x13f/0x1d0 __vfs_write+0x23/0x120 vfs_write+0xc2/0x1f0 SyS_write+0x44/0xb0 entry_SYSCALL_64_fastpath+0x18/0xad In the code path above, percpu allocator cannot call vmalloc() due to queue spinlock. A failure in this call path gives grief to tools which are trying to configure io weights. We see occasional failures happen shortly after reboots even when system is not under any memory pressure. Machines with a lot of cpus are more vulnerable to this condition. Do struct blkcg_gq allocations outside the queue spinlock to allow blocking during memory allocations. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tahsin Erdogan <tahsin@google.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-03-29 11:27:19 -06:00

1 2 3 4 5 ...

662396 commits