linux-stable

Commit Graph

Author	SHA1	Message	Date
Yu Kuai	7a5dc0f4bc	blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init() commit `285febabac` upstream. commit `8c5035dfbb` ("blk-wbt: call rq_qos_add() after wb_normal is initialized") moves wbt_set_write_cache() before rq_qos_add(), which is wrong because wbt_rq_qos() is still NULL. Fix the problem by removing wbt_set_write_cache() and setting 'rwb->wc' directly. Noted that this patch also remove the redundant setting of 'rab->wc'. Fixes: `8c5035dfbb` ("blk-wbt: call rq_qos_add() after wb_normal is initialized") Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20221009101038.1692875-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-24 09:58:30 +02:00
Keith Busch	63a681bcc3	blk-mq: use quiesced elevator switch when reinitializing queues [ Upstream commit `8237c01f16` ] The hctx's run_work may be racing with the elevator switch when reinitializing hardware queues. The queue is merely frozen in this context, but that only prevents requests from allocating and doesn't stop the hctx work from running. The work may get an elevator pointer that's being torn down, and can result in use-after-free errors and kernel panics (example below). Use the quiesced elevator switch instead, and make the previous one static since it is now only used locally. nvme nvme0: resetting controller nvme nvme0: 32/0/0 default/read/poll queues BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000020c8861067 P4D 80000020c8861067 PUD 250f8c8067 PMD 0 Oops: 0000 [#1] SMP PTI Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:kyber_has_work+0x29/0x70 ... Call Trace: __blk_mq_do_dispatch_sched+0x83/0x2b0 __blk_mq_sched_dispatch_requests+0x12e/0x170 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x2b/0x50 process_one_work+0x1ef/0x380 worker_thread+0x2d/0x3e0 Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220927155652.3260724-1-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-24 09:58:28 +02:00
Yu Kuai	cc6f0855bf	blk-throttle: prevent overflow while calculating wait time [ Upstream commit `8d6bbaada2` ] There is a problem found by code review in tg_with_in_bps_limit() that 'bps_limit * jiffy_elapsed_rnd' might overflow. Fix the problem by calling mul_u64_u64_div_u64() instead. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-24 09:58:25 +02:00
Yu Kuai	dd54b94e72	blk-wbt: call rq_qos_add() after wb_normal is initialized commit `8c5035dfbb` upstream. Our test found a problem that wbt inflight counter is negative, which will cause io hang(noted that this problem doesn't exist in mainline): t1: device create t2: issue io add_disk blk_register_queue wbt_enable_default wbt_init rq_qos_add // wb_normal is still 0 /* * in mainline, disk can't be opened before * bdev_add(), however, in old kernels, disk * can be opened before blk_register_queue(). */ blkdev_issue_flush // disk size is 0, however, it's not checked submit_bio_wait submit_bio blk_mq_submit_bio rq_qos_throttle wbt_wait bio_to_wbt_flags rwb_enabled // wb_normal is 0, inflight is not increased wbt_queue_depth_changed(&rwb->rqos); wbt_update_limits // wb_normal is initialized rq_qos_track wbt_track rq->wbt_flags \|= bio_to_wbt_flags(rwb, bio); // wb_normal is not 0，wbt_flags will be set t3: io completion blk_mq_free_request rq_qos_done wbt_done wbt_is_tracked // return true __wbt_done wbt_rqw_done atomic_dec_return(&rqw->inflight); // inflight is decreased commit `8235b5c1e8` ("block: call bdev_add later in device_add_disk") can avoid this problem, however it's better to fix this problem in wbt: 1) Lower kernel can't backport this patch due to lots of refactor. 2) Root cause is that wbt call rq_qos_add() before wb_normal is initialized. Fixes: `e34cbd3074` ("blk-wbt: add general throttling mechanism") Cc: <stable@vger.kernel.org> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20220913105749.3086243-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-24 09:56:59 +02:00
Yu Kuai	8b1f9fde48	blk-throttle: fix that io throttle can only work for single bio commit `320fb0f91e` upstream. Test scripts: cd /sys/fs/cgroup/blkio/ echo "8:0 1024" > blkio.throttle.write_bps_device echo $$ > cgroup.procs dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & Test result: 10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s 10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s The problem is that the second bio is finished after 10s instead of 20s. Root cause: 1) second bio will be flagged: __blk_throtl_bio while (true) { ... if (sq->nr_queued[rw]) -> some bio is throttled already break }; bio_set_flag(bio, BIO_THROTTLED); -> flag the bio 2) flagged bio will be dispatched without waiting: throtl_dispatch_tg tg_may_dispatch tg_with_in_bps_limit if (bps_limit == U64_MAX \|\| bio_flagged(bio, BIO_THROTTLED)) *wait = 0; -> wait time is zero return true; commit `9f5ede3c01` ("block: throttle split bio in case of iops limit") support to count split bios for iops limit, thus it adds flagged bio checking in tg_with_in_bps_limit() so that split bios will only count once for bps limit, however, it introduce a new problem that io throttle won't work if multiple bios are throttled. In order to fix the problem, handle iops/bps limit in different ways: 1) for iops limit, there is no flag to record if the bio is throttled, and iops is always applied. 2) for bps limit, original bio will be flagged with BIO_BPS_THROTTLED, and io throttle will ignore bio with the flag. Noted this patch also remove the code to set flag in __bio_clone(), it's introduced in commit `111be88398` ("block-throttle: avoid double charge"), and author thinks split bio can be resubmited and throttled again, which is wrong because split bio will continue to dispatch from caller. Fixes: `9f5ede3c01` ("block: throttle split bio in case of iops limit") Cc: <stable@vger.kernel.org> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-24 09:56:59 +02:00
Christoph Hellwig	48a12961e8	Revert "block: freeze the queue earlier in del_gendisk" commit `4c66a326b5` upstream. This reverts commit `a09b314005`. Dusty Mabe reported consistent hang during CoreOS shutdown with a MD RAID1 setup. Although apparently similar hangs happened before, and this patch most likely is not the root cause it made it much more severe. Revert it until we can figure out what is going on with the md driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220919144049.978907-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-09-28 11:32:28 +02:00
Rafael Mendonca	98756ca258	block: Do not call blk_put_queue() if gendisk allocation fails commit `aa0c680c3a` upstream. Commit `6f8191fdf4` ("block: simplify disk shutdown") removed the call to blk_get_queue() during gendisk allocation but missed to remove the corresponding cleanup code blk_put_queue() for it. Thus, if the gendisk allocation fails, the request_queue refcount gets decremented and reaches 0, causing blk_mq_release() to be called with a hctx still alive. That triggers a WARNING report, as found by syzkaller: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 23016 at block/blk-mq.c:3881 blk_mq_release+0xf8/0x3e0 block/blk-mq.c:3881 [...] stripped RIP: 0010:blk_mq_release+0xf8/0x3e0 block/blk-mq.c:3881 [...] stripped Call Trace: <TASK> blk_release_queue+0x153/0x270 block/blk-sysfs.c:780 kobject_cleanup lib/kobject.c:673 [inline] kobject_release lib/kobject.c:704 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x1c8/0x540 lib/kobject.c:721 __alloc_disk_node+0x4f7/0x610 block/genhd.c:1388 __blk_mq_alloc_disk+0x13b/0x1f0 block/blk-mq.c:3961 loop_add+0x3e2/0xaf0 drivers/block/loop.c:1978 loop_control_ioctl+0x133/0x620 drivers/block/loop.c:2150 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] stripped Fixes: `6f8191fdf4` ("block: simplify disk shutdown") Reported-by: syzbot+31c9594f6e43b9289b25@syzkaller.appspotmail.com Suggested-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220811232338.254673-1-rafaelmendsr@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-09-28 11:32:23 +02:00
Christoph Hellwig	2f092fd2ce	block: call blk_mq_exit_queue from disk_release for never added disks commit `c5db2cfc62` upstream. To undo the all initialization from blk_mq_init_allocated_queue in case of a probe failure where add_disk is never called we have to call blk_mq_exit_queue from put_disk. This relies on the fact that drivers always call blk_mq_free_tag_set after calling put_disk in the probe error path if they have a gendisk at all. We should be doing this in general, but can't do it for the normal teardown case (yet) as the tagset can be gone by the time the disk is released once it was added. I hope to sort this out properly eventually but for now this isolated hack will do it. Fixes: `6f8191fdf4` ("block: simplify disk shutdown") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220720130541.1323531-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-09-28 11:32:23 +02:00
Christoph Hellwig	47f57236ba	blk-mq: fix error handling in __blk_mq_alloc_disk commit `0a3e5cc7bb` upstream. To fully clean up the queue if the disk allocation fails we need to call blk_mq_destroy_queue and not just blk_put_queue. Fixes: `6f8191fdf4` ("block: simplify disk shutdown") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220720130541.1323531-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-09-28 11:32:23 +02:00
Christoph Hellwig	d27b66257d	block: simplify disk shutdown [ Upstream commit `6f8191fdf4` ] Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for all disks that do not have separately allocated queues, and thus remove the need to call blk_cleanup_queue for them. Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that this function is intended only for separately allocated blk-mq queues. This saves an extra queue freeze for devices without a separately allocated queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: `8fe4ce5836` ("scsi: core: Fix a use-after-free") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-28 11:32:01 +02:00
Christoph Hellwig	fdb28e9688	block: stop setting the nomerges flags in blk_cleanup_queue [ Upstream commit `0e3534022f` ] These flags only apply to file system I/O, and all file system I/O is already drained by del_gendisk and thus can't be in progress when blk_cleanup_queue is called. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: `8fe4ce5836` ("scsi: core: Fix a use-after-free") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-28 11:32:00 +02:00
Christoph Hellwig	ab85cb5297	block: remove QUEUE_FLAG_DEAD [ Upstream commit `1f90307e5f` ] Disallow setting the blk-mq state on any queue that is already dying as setting the state even then is a bad idea, and remove the now unused QUEUE_FLAG_DEAD flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: `8fe4ce5836` ("scsi: core: Fix a use-after-free") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-28 11:32:00 +02:00
Mikulas Patocka	46c716a31f	blk-lib: fix blkdev_issue_secure_erase commit `c4fa368466` upstream. There's a bug in blkdev_issue_secure_erase. The statement "unsigned int len = min_t(sector_t, nr_sects, max_sectors);" sets the variable "len" to the length in sectors, but the statement "bio->bi_iter.bi_size = len" treats it as if it were in bytes. The statements "sector += len << SECTOR_SHIFT" and "nr_sects -= len << SECTOR_SHIFT" are thinko. This patch fixes it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v5.19 Fixes: `44abff2c0b` ("block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD") Link: https://lore.kernel.org/r/alpine.LRH.2.02.2209141549480.28100@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-09-23 14:14:05 +02:00
Stefan Roesch	248c48ced2	block: blk_queue_enter() / __bio_queue_enter() must return -EAGAIN for nowait [ Upstream commit `56f99b8d06` ] Today blk_queue_enter() and __bio_queue_enter() return -EBUSY for the nowait code path. This is not correct: they should return -EAGAIN instead. This problem was detected by fio. The following command exposed the above problem: t/io_uring -p0 -d128 -b4096 -s32 -c32 -F1 -B0 -R0 -X1 -n24 -P1 -u1 -O0 /dev/ng0n1 By applying the patch, the retry case is handled correctly in the slow path. Signed-off-by: Stefan Roesch <shr@fb.com> Fixes: `bfd343aa17` ("blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-23 14:14:04 +02:00
Ming Lei	4c9a8adb14	block: don't add partitions if GD_SUPPRESS_PART_SCAN is set [ Upstream commit `748008e1da` ] Commit `b9684a71fc` ("block, loop: support partitions without scanning") adds GD_SUPPRESS_PART_SCAN for replacing part function of GENHD_FL_NO_PART. But looks blk_add_partitions() is missed, since loop doesn't want to add partitions if GENHD_FL_NO_PART was set. And it causes regression on libblockdev (as called from udisks) which operates with the LO_FLAGS_PARTSCAN. Fixes the issue by not adding partitions if GD_SUPPRESS_PART_SCAN is set. Fixes: `b9684a71fc` ("block, loop: support partitions without scanning") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220823103819.395776-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-15 10:47:15 +02:00
Yu Kuai	b2f10baf4d	blk-mq: fix io hung due to missing commit_rqs commit `65fac0d54f` upstream. Currently, in virtio_scsi, if 'bd->last' is not set to true while dispatching request, such io will stay in driver's queue, and driver will wait for block layer to dispatch more rqs. However, if block layer failed to dispatch more rq, it should trigger commit_rqs to inform driver. There is a problem in blk_mq_try_issue_list_directly() that commit_rqs won't be called: // assume that queue_depth is set to 1, list contains two rq blk_mq_try_issue_list_directly blk_mq_request_issue_directly // dispatch first rq // last is false __blk_mq_try_issue_directly blk_mq_get_dispatch_budget // succeed to get first budget __blk_mq_issue_directly scsi_queue_rq cmd->flags \|= SCMD_LAST virtscsi_queuecommand kick = (sc->flags & SCMD_LAST) != 0 // kick is false, first rq won't issue to disk queued++ blk_mq_request_issue_directly // dispatch second rq __blk_mq_try_issue_directly blk_mq_get_dispatch_budget // failed to get second budget ret == BLK_STS_RESOURCE blk_mq_request_bypass_insert // errors is still 0 if (!list_empty(list) \|\| errors && ...) // won't pass, commit_rqs won't be called In this situation, first rq relied on second rq to dispatch, while second rq relied on first rq to complete, thus they will both hung. Fix the problem by also treat 'BLK_STS_RESOURCE' as 'errors' since it means that request is not queued successfully. Same problem exists in blk_mq_dispatch_rq_list(), 'BLK_STS_RESOURCE' can't be treated as 'errors' here, fix the problem by calling commit_rqs if queue_rq return 'BLK_STS_*RESOURCE'. Fixes: `d666ba98f8` ("blk-mq: add mq_ops->commit_rqs()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220726122224.1790882-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-31 17:18:19 +02:00
Yufen Yu	5d4dc30b4c	blk-mq: run queue no matter whether the request is the last request commit `d3b3859687` upstream. We do test on a virtio scsi device (/dev/sda) and the default mq scheduler is 'none'. We found a IO hung as following: blk_finish_plug blk_mq_plug_issue_direct scsi_mq_get_budget //get budget_token fail and sdev->restarts=1 scsi_end_request scsi_run_queue_async //sdev->restart=0 and run queue blk_mq_request_bypass_insert //add request to hctx->dispatch list //continue to dispath plug list blk_mq_dispatch_plug_list blk_mq_try_issue_list_directly //success issue all requests from plug list After .get_budget fail, scsi_mq_get_budget will increase 'restarts'. Normally, it will run hw queue when io complete and set 'restarts' as 0. But if we run queue before adding request to the dispatch list and blk_mq_dispatch_plug_list also success issue all requests, then on one will run queue, and the request will be stall in the dispatch list and cannot complete forever. It is wrong to use last request of plug list to decide if run queue is needed since all the remained requests in plug list may be from other hctxs. To fix the bug, pass run_queue as true always to blk_mq_request_bypass_insert(). Fix-suggested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Fixes: `dc5fc361d8` ("block: attempt direct issue of plug list") Link: https://lore.kernel.org/r/20220803023355.3687360-1-yuyufen@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-25 11:45:36 +02:00
Jinke Han	0c9bb1acd1	block: don't allow the same type rq_qos add more than once [ Upstream commit `14a6e2eb7d` ] In our test of iocost, we encountered some list add/del corruptions of inner_walk list in ioc_timer_fn. The reason can be described as follows: cpu 0 cpu 1 ioc_qos_write ioc_qos_write ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ... rq_qos_add(q, rqos); } ... rq_qos_add(q, rqos); ... } When the io.cost.qos file is written by two cpus concurrently, rq_qos may be added to one disk twice. In that case, there will be two iocs enabled and running on one disk. They own different iocgs on their active list. In the ioc_timer_fn function, because of the iocgs from two iocs have the same root iocg, the root iocg's walk_list may be overwritten by each other and this leads to list add/del corruptions in building or destroying the inner_walk list. And so far, the blk-rq-qos framework works in case that one instance for one type rq_qos per queue by default. This patch make this explicit and also fix the crash above. Signed-off-by: Jinke Han <hanjinke.666@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220720093616.70584-1-hanjinke.666@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 15:16:10 +02:00
Keith Busch	38fb0d7f39	block: ensure iov_iter advances for added pages [ Upstream commit `325347d965` ] There are cases where a bio may not accept additional pages, and the iov needs to advance to the last data length that was accepted. The zone append used to handle this correctly, but was inadvertently broken when the setup was made common with the normal r/w case. Fixes: `576ed91354` ("block: use bio_add_page in bio_iov_iter_get_pages") Fixes: `c58c0074c5` ("block/bio: remove duplicate append pages code") Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20220712153256.2202024-1-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 15:15:43 +02:00
Keith Busch	70edccb32b	block/bio: remove duplicate append pages code [ Upstream commit `c58c0074c5` ] The getting pages setup for zone append and normal IO are identical. Use common code for each. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220610195830.3574005-3-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 15:15:43 +02:00
Ming Lei	53f2de4b7b	blk-mq: don't create hctx debugfs dir until q->debugfs_dir is created [ Upstream commit `f3ec5d1155` ] blk_mq_debugfs_register_hctx() can be called by blk_mq_update_nr_hw_queues when gendisk isn't added yet, such as nvme tcp. Fixes the warning of 'debugfs: Directory 'hctx0' with parent '/' already present!' which can be observed reliably when running blktests nvme/005. Fixes: `6cfc0081b0` ("blk-mq: no need to check return value of debugfs_create functions") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220711090808.259682-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 15:14:16 +02:00
Keith Busch	f3edcfd871	block: fix infinite loop for invalid zone append [ Upstream commit `b82d9fa257` ] Returning 0 early from __bio_iov_append_get_pages() for the max_append_sectors warning just creates an infinite loop since 0 means success, and the bio will never fill from the unadvancing iov_iter. We could turn the return into an error value, but it will already be turned into an error value later on, so just remove the warning. Clearly no one ever hit it anyway. Fixes: `0512a75b98` ("block: Introduce REQ_OP_ZONE_APPEND") Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220610195830.3574005-2-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 15:14:10 +02:00
Jan Kara	28a9cbc1c9	block: fix default IO priority handling again commit `e589f46445` upstream. Commit `e70344c059` ("block: fix default IO priority handling") introduced an inconsistency in get_current_ioprio() that tasks without IO context return IOPRIO_DEFAULT priority while tasks with freshly allocated IO context will return 0 (IOPRIO_CLASS_NONE/0) IO priority. Tasks without IO context used to be rare before `5a9d041ba2` ("block: move io_context creation into where it's needed") but after this commit they became common because now only BFQ IO scheduler setups task's IO context. Similar inconsistency is there for get_task_ioprio() so this inconsistency is now exposed to userspace and userspace will see different IO priority for tasks operating on devices with BFQ compared to devices without BFQ. Furthemore the changes done by commit `e70344c059` change the behavior when no IO priority is set for BFQ IO scheduler which is also documented in ioprio_set(2) manpage: "If no I/O scheduler has been set for a thread, then by default the I/O priority will follow the CPU nice value (setpriority(2)). In Linux kernels before version 2.6.24, once an I/O priority had been set using ioprio_set(), there was no way to reset the I/O scheduling behavior to the default. Since Linux 2.6.24, specifying ioprio as 0 can be used to reset to the default I/O scheduling behavior." So make sure we default to IOPRIO_CLASS_NONE as used to be the case before commit `e70344c059`. Also cleanup alloc_io_context() to explicitely set this IO priority for the allocated IO context to avoid future surprises. Note that we tweak ioprio_best() to maintain ioprio_get(2) behavior and make this commit easily backportable. CC: stable@vger.kernel.org Fixes: `e70344c059` ("block: fix default IO priority handling") Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-11 13:22:02 +02:00
Muchun Song	957a2b345c	block: fix missing blkcg_bio_issue_init The commit `513616843d` ("block: remove superfluous calls to blkcg_bio_issue_init") has removed blkcg_bio_issue_init from __bio_clone since submit_bio will override ->bi_issue. However, __blk_queue_split is called after blkcg_bio_issue_init (see blk_mq_submit_bio) in submit_bio. In this case, the ->bi_issue is 0. Fix it. Fixes: `513616843d` ("block: remove superfluous calls to blkcg_bio_issue_init") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Link: https://lore.kernel.org/r/20220713140226.68135-1-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-14 10:54:49 -06:00
Li Nan	ca2a3343d6	block: remove WARN_ON() from bd_link_disk_holder Since commit 83cbce957446("block: add error handling for device_add_disk / add_disk"), bdev->bd_holder_dir can not be empty now, so remove WARN_ON() from bd_link_disk_holder. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074100.2251301-1-linan122@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-23 07:48:05 -06:00
Jens Axboe	2645672ffe	block: pop cached rq before potentially blocking rq_qos_throttle() If rq_qos_throttle() ends up blocking, then we will have invalidated and flushed our current plug. Since blk_mq_get_cached_request() hasn't popped the cached request off the plug list just yet, we end holding a pointer to a request that is no longer valid. This insta-crashes with rq->mq_hctx being NULL in the validity checks just after. Pop the request off the cached list before doing rq_qos_throttle() to avoid using a potentially stale request. Fixes: `0a5aa8d161` ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection") Reported-by: Dylan Yudaken <dylany@fb.com> Tested-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-21 10:59:58 -06:00
Damien Le Moal	9243fc4cd2	block: remove queue from struct blk_independent_access_range The request queue pointer in struct blk_independent_access_range is unused. Remove it. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Fixes: `41e46b3c2a` ("block: Fix potential deadlock in blk_ia_range_sysfs_show()") Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220603053529.76405-1-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-19 18:40:11 -06:00
Christoph Hellwig	a09b314005	block: freeze the queue earlier in del_gendisk Freeze the queue earlier in del_gendisk so that the state does not change while we remove debugfs and sysfs files. Ming mentioned that being able to observer request in debugfs might be useful while the queue is being frozen in del_gendisk, which is made possible by this change. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220614074827.458955-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-17 07:31:05 -06:00
Christoph Hellwig	99d055b4fd	block: remove per-disk debugfs files in blk_unregister_queue The block debugfs files are created in blk_register_queue, which is called by add_disk and use a naming scheme based on the disk_name. After del_gendisk returns that name can be reused and thus we must not leave these debugfs files around, otherwise the kernel is unhappy and spews messages like: Directory XXXXX with parent 'block' already present! and the newly created devices will not have working debugfs files. Move the unregistration to blk_unregister_queue instead (which matches the sysfs unregistration) to make sure the debugfs life time rules match those of the disk name. As part of the move also make sure the whole debugfs unregistration is inside a single debugfs_mutex critical section. Note that this breaks blktests block/002, which checks that the debugfs directory has not been removed while blktests is running, but that particular check should simply be removed from the test case. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220614074827.458955-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-17 07:31:05 -06:00
Christoph Hellwig	5cf9c91ba9	block: serialize all debugfs operations using q->debugfs_mutex Various places like I/O schedulers or the QOS infrastructure try to register debugfs files on demans, which can race with creating and removing the main queue debugfs directory. Use the existing debugfs_mutex to serialize all debugfs operations that rely on q->debugfs_dir or the directories hanging off it. To make the teardown code a little simpler declare all debugfs dentry pointers and not just the main one uncoditionally in blkdev.h. Move debugfs_mutex next to the dentries that it protects and document what it is used for. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220614074827.458955-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-17 07:31:05 -06:00
Christoph Hellwig	50e34d7881	block: disable the elevator int del_gendisk The elevator is only used for file system requests, which are stopped in del_gendisk. Move disabling the elevator and freeing the scheduler tags to the end of del_gendisk instead of doing that work in disk_release and blk_cleanup_queue to avoid a use after free on q->tag_set from disk_release as the tag_set might not be alive at that point. Move the blk_qos_exit call as well, as it just depends on the elevator exit and would be the only reason to keep the not exactly cheap queue freeze in disk_release. Fixes: `e155b0c238` ("blk-mq: Use shared tags for shared sbitmap support") Reported-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20220614074827.458955-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-17 07:31:05 -06:00
Bart Van Assche	b96f3cab59	block/bfq: Enable I/O statistics BFQ uses io_start_time_ns. That member variable is only set if I/O statistics are enabled. Hence this patch that enables I/O statistics at the time BFQ is associated with a request queue. Compile-tested only. Reported-by: Cixi Geng <cixi.geng1@unisoc.com> Cc: Cixi Geng <cixi.geng1@unisoc.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Paolo Valente <paolo.valente@unimore.it> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 16:59:28 -06:00
Ming Lei	6cfeadbff3	blk-mq: don't clear flush_rq from tags->rqs[] commit `364b61818f` ("blk-mq: clearing flush request reference in tags->rqs[]") is added to clear the to-be-free flush request from tags->rqs[] for avoiding use-after-free on the flush rq. Yu Kuai reported that blk_mq_clear_flush_rq_mapping() slows down boot time by ~8s because running scsi probe which may create and remove lots of unpresent LUNs on megaraid-sas which uses BLK_MQ_F_TAG_HCTX_SHARED and each request queue has lots of hw queues. Improve the situation by not running blk_mq_clear_flush_rq_mapping if disk isn't added when there can't be any flush request issued. Reviewed-by: Christoph Hellwig <hch@lst.de> Reported-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220616014401.817001-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 14:45:15 -06:00
Ming Lei	4d337cebcb	blk-mq: avoid to touch q->elevator without any protection q->elevator is referred in blk_mq_has_sqsched() without any protection, no .q_usage_counter is held, no queue srcu and rcu read lock is held, so potential use-after-free may be triggered. Fix the issue by adding one queue flag for checking if the elevator uses single queue style dispatch. Meantime the elevator feature flag of ELEVATOR_F_MQ_AWARE isn't needed any more. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220616014401.817001-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 14:45:15 -06:00
Ming Lei	5fd7a84a09	blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none elevator can be tore down by sysfs switch interface or disk release, so hold ->sysfs_lock before referring to q->elevator, then potential use-after-free can be avoided. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220616014401.817001-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 14:45:15 -06:00
Bart Van Assche	14dc7a18ab	block: Fix handling of offline queues in blk_mq_alloc_request_hctx() This patch prevents that test nvme/004 triggers the following: UBSAN: array-index-out-of-bounds in block/blk-mq.h:135:9 index 512 is out of range for type 'long unsigned int [512]' Call Trace: show_stack+0x52/0x58 dump_stack_lvl+0x49/0x5e dump_stack+0x10/0x12 ubsan_epilogue+0x9/0x3b __ubsan_handle_out_of_bounds.cold+0x44/0x49 blk_mq_alloc_request_hctx+0x304/0x310 __nvme_submit_sync_cmd+0x70/0x200 [nvme_core] nvmf_connect_io_queue+0x23e/0x2a0 [nvme_fabrics] nvme_loop_connect_io_queues+0x8d/0xb0 [nvme_loop] nvme_loop_create_ctrl+0x58e/0x7d0 [nvme_loop] nvmf_create_ctrl+0x1d7/0x4d0 [nvme_fabrics] nvmf_dev_write+0xae/0x111 [nvme_fabrics] vfs_write+0x144/0x560 ksys_write+0xb7/0x140 __x64_sys_write+0x42/0x50 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Fixes: `20e4d81393` ("blk-mq: simplify queue mapping & schedule with each possisble CPU") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220615210004.1031820-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 14:43:31 -06:00
Christoph Hellwig	d5a37b1998	block: remove bioset_init_from_src Unused now, and the interface never really made a whole lot of sense to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org>	2022-06-08 14:04:14 -04:00
Linus Torvalds	78c6499c92	for-5.19/drivers-2022-06-02 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKZmkoQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqyrD/4iyg2ULBPyljLoM3Ed8AONbrFBApenKDFN FjiFRZNll8zvJLTtP0GQqJIrljPuRlqb0IkbgqXl+yvPZ+wpB9HgIe2aohkOaqqJ KS49UNR/aIHwC2y7lwlcsdgVqqoPdc4wnZeaQvCsWPCBhCca/k0kR7s3uEhHMK92 OWpV9osl/thLfOBwwt4IEaO1Koz8PM/fCR4XA2KLbs8E4P8EcSFglqi0ap7foLNr pQAJIlPjkmF6nw4xg5fdBjVBo//kVcuf9IBMi5/XinmUL1taFAcn5WyeOvBbi0Fs Sqp/pKkveM8xWZKrDyA/wf8nzRNpBBl6TOQEMFtV6FkZrij3pbKHlCiyL2gBR13e 5gkbVvXgtdqDVlnqlvIV/Swfh5YQFtn7+vlHFOUjP+iObsBRo4fTvDhPTLoO/VCf tIAA7xnq/pquRKS/QGGC7ZxRVc3T1r+EpvbBP5Dc4CDGbbbyZLCSrOh5HWIb0I3k 95GSiipTtf54KqZ9HiG/u+xNAFIdapXgU4Xm+JyDRWLdxSs5Nmy8VgpdflvxBfuo hCvJJw3vtDusjyHc7IafxaZlJVQT9tPgPshs8GfrCMCP19RCALD/5irspFh6dD35 BQTIkhC68XNa0iNn/NTP3uxir/JwRoovxQkA9eD+r1NHsAbL8GTypfr5kKJxDaIK UhawfyZE3Q== =YqhC -----END PGP SIGNATURE----- Merge tag 'for-5.19/drivers-2022-06-02' of git://git.kernel.dk/linux-block Pull more block driver updates from Jens Axboe: "A collection of stragglers that were late on sending in their changes and just followup fixes. - NVMe fixes pull request via Christoph: - set controller enable bit in a separate write (Niklas Cassel) - disable namespace identifiers for the MAXIO MAP1001 (Christoph) - fix a comment typo (Julia Lawall)" - MD fixes pull request via Song: - Remove uses of bdevname (Christoph Hellwig) - Bug fixes (Guoqing Jiang, and Xiao Ni) - bcache fixes series (Coly) - null_blk zoned write fix (Damien) - nbd fixes (Yu, Zhang) - Fix for loop partition scanning (Christoph)" * tag 'for-5.19/drivers-2022-06-02' of git://git.kernel.dk/linux-block: (23 commits) block: null_blk: Fix null_zone_write() nvmet: fix typo in comment nvme: set controller enable bit in a separate write nvme-pci: disable namespace identifiers for the MAXIO MAP1001 bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() nbd: use pr_err to output error message nbd: fix possible overflow on 'first_minor' in nbd_dev_add() nbd: fix io hung while disconnecting device nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed nbd: fix race between nbd_alloc_config() and module removal nbd: call genl_unregister_family() first in nbd_cleanup() md: bcache: check the return value of kzalloc() in detached_dev_do_request() bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() block, loop: support partitions without scanning bcache: avoid journal no-space deadlock by reserving 1 journal bucket bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() bcache: improve multithreaded bch_sectors_dirty_init() bcache: improve multithreaded bch_btree_check() md: fix double free of io_acct_set bioset md: Don't set mddev private to NULL in raid0 pers->free ...	2022-06-03 10:25:56 -07:00
Linus Torvalds	72fbbc3d0e	for-5.19/block-exec-2022-06-02 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKZmh0QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqg6EACCbkwoH7rrr38iU++xP3c9oqFJCfYR95ho qn9/3FiPulua0Dwg8Fbp12ubqqBy/iNj+4Mk7XTo28P7ahjGtKJec2DZguDuHC5X G3kucgQcDOLs1IMWoil+KrnjGC8qeT9ZPFNaUF0IY084NPxnj1wAOjo1J00QVieN WFgHX1sxzBje8abebf3UxAyXImzfyY2uXbp1F3thzf0ZwHXkSDsbWI3fvpdYF4QC p3z6CX0sR+5v7ZLWF3X6H8MBSO+eRlprYji3O/0jVslLBAS8FlTdizQtzx7C6Hsv JZVY4ZsUswYxtsHCBR0McglDeu/iXZRZ9HX4iiOobYJNaXfycMltvS+4Tb/TsFTB GaG6tbL4JS+NT063ctl5h355vUVhIbw6qEhsiF47+0hgawvRr/xxP0aSq1MPmfjw OgG4Jn0htXF47tpKnszfaj3BmgvgV56mV0IGwF5Sh5NXDnHF+MHFmrhRsP1NenjL 12FTnvWGYyTRGDVIVFDkuwaI9o9iNKdFw0JkKIZa/G5RVmmjukvMGTvvSnKmSxJg dgbYLSBA2ZxCcIPjJDvZroe3QxvGNyqUYtxwyWl4a1HK/qljZfwwyRE1rfjJ47hK F8jNEkOThcjr1anoV2nvSLE1mM3SyA/UqDsntIdwnUlG/ObYsByTgtKc+ETs1RzS 8Ovp6lv0Mw== =lBeb -----END PGP SIGNATURE----- Merge tag 'for-5.19/block-exec-2022-06-02' of git://git.kernel.dk/linux-block Pull block request execute cleanups from Jens Axboe: "This change was advertised in the initial core block pull request, but didn't actually make that branch as we deferred it to a post-merge pull request to avoid a bunch of cross branch issues. This series cleans up the block execute path quite nicely" * tag 'for-5.19/block-exec-2022-06-02' of git://git.kernel.dk/linux-block: blk-mq: remove the done argument to blk_execute_rq_nowait blk-mq: avoid a mess of casts for blk_end_sync_rq blk-mq: remove __blk_execute_rq_nowait	2022-06-03 10:21:43 -07:00
Linus Torvalds	34845d92bc	for-5.19/block-2022-06-02 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKZmfQQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpsLlEACPbK/ms8dMDwKjfEF/RMoc7uL/j6oC0cpf 0D2sfMka8D41QdrUfMiUismXZ61dyKdsiX/U/Q0gcjIomnlco8ZeLcLa6DlafjwY DtvO2aCb+eBAkII5sX2WM4ANNgFTy08Y4wBmgEy5En5u4nPlIGZ8DsulQQodqygx 1lJh31OXQKw+2kIyUdAeC0GMiD9nddYDsH0CTFDSZsAijCcOBDOHbDPk27wHapzM GR1UAK5/SA7RfZgIMRHHclF6Ea49/uPJ45crD1T+8p6jLW+ldbxpiRD3ux9BnK2v U7EWS5MLMFAvb/nTLc8T37srJuEhBAT0r2bn614rjOiJofalPeD0eDeHfz4vRpPe +qTQREtpBUtJizYN+8rpcxP8f9S/hmPOBvIKD3XC0TlOo1NCf35fqWLWMli2hkTQ AfcY1auKjC/UYcnR0TQ91aHo1puM4fK5Pdc6lDGznrcxy9t1g1NvKAEL9Y3xK3No paglrliBCUbAN8vogKr4jc7jRkh/GLEqkxV2LIpOVp3lyT9GepvYM1xLQ8X/rszn /Il3fAwf5AyP+1RoVcmmOy1XW0ptUbKXWn03NlxN55Ya8x3tKCwWWDSmL2CP8SwV Vo5Qt+rKUkqA/TmHW8HOd7i+44Sa8oD/6WpSSPkwXN2cgRQmvmtaGmpXCKNTn5tk PMgFJOq3uw== =7JDU -----END PGP SIGNATURE----- Merge tag 'for-5.19/block-2022-06-02' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Just a collection of fixes that have been queued up since the initial merge window pull request, the majority of which are targeted for stable as well. One bio_set fix that fixes an issue with the dm adoption of cached bio structs that got introduced in this merge window" * tag 'for-5.19/block-2022-06-02' of git://git.kernel.dk/linux-block: block: Fix potential deadlock in blk_ia_range_sysfs_show() block: fix bio_clone_blkg_association() to associate with proper blkcg_gq block: remove useless BUG_ON() in blk_mq_put_tag() blk-mq: do not update io_ticks with passthrough requests block: make bioset_exit() fully resilient against being called twice block: use bio_queue_enter instead of blk_queue_enter in bio_poll block: document BLK_STS_AGAIN usage block: take destination bvec offsets into account in bio_copy_data_iter blk-iolatency: Fix inflight count imbalances and IO hangs on offline blk-mq: don't touch ->tagset in blk_mq_get_sq_hctx	2022-06-03 10:14:48 -07:00
Damien Le Moal	41e46b3c2a	block: Fix potential deadlock in blk_ia_range_sysfs_show() When being read, a sysfs attribute is already protected against removal with the kobject node active reference counter. As a result, in blk_ia_range_sysfs_show(), there is no need to take the queue sysfs lock when reading the value of a range attribute. Using the queue sysfs lock in this function creates a potential deadlock situation with the disk removal, something that a lockdep signals with a splat when the device is removed: [ 760.703551] Possible unsafe locking scenario: [ 760.703551] [ 760.703554] CPU0 CPU1 [ 760.703556] ---- ---- [ 760.703558] lock(&q->sysfs_lock); [ 760.703565] lock(kn->active#385); [ 760.703573] lock(&q->sysfs_lock); [ 760.703579] lock(kn->active#385); [ 760.703587] [ 760.703587] * DEADLOCK * Solve this by removing the mutex_lock()/mutex_unlock() calls from blk_ia_range_sysfs_show(). Fixes: `a2247f19ee` ("block: Add independent access ranges support") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220603021905.1441419-1-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-02 23:02:37 -06:00
Jan Kara	22b106e535	block: fix bio_clone_blkg_association() to associate with proper blkcg_gq Commit `d92c370a16` ("block: really clone the block cgroup in bio_clone_blkg_association") changed bio_clone_blkg_association() to just clone bio->bi_blkg reference from source to destination bio. This is however wrong if the source and destination bios are against different block devices because struct blkcg_gq is different for each bdev-blkcg pair. This will result in IOs being accounted (and throttled as a result) multiple times against the same device (src bdev) while throttling of the other device (dst bdev) is ignored. In case of BFQ the inconsistency can even result in crashes in bfq_bic_update_cgroup(). Fix the problem by looking up correct blkcg_gq for the cloned bio. Reported-by: Logan Gunthorpe <logang@deltatee.com> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de> Fixes: `d92c370a16` ("block: really clone the block cgroup in bio_clone_blkg_association") CC: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220602081242.7731-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-02 02:15:05 -06:00
Damien Le Moal	ff47dbd18b	block: remove useless BUG_ON() in blk_mq_put_tag() Since the if condition in blk_mq_put_tag() checks that the tag to put is not a reserved one, the BUG_ON() check in the else branch checking if the tag is indeed a reserved one is useless. Remove it. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220602075159.1273366-1-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-02 02:05:56 -06:00
Haisu Wang	b81c14ca14	blk-mq: do not update io_ticks with passthrough requests Flush or passthrough requests are not accounted as normal IO in completion. To reflect iostat for slow IO, io_ticks is updated when stat show called based on inflight numbers. It may cause inconsistent io_ticks calculation result. So do not account non-passthrough request when check inflight. Fixes: `86d7331299` ("block: update io_ticks when io hang") Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: samuelliao <samuelliao@tencent.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220530064059.1120058-1-haisuwang@tencent.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-30 07:13:12 -06:00
Jens Axboe	605f7415ec	block: make bioset_exit() fully resilient against being called twice Most of bioset_exit() is fine being called twice, as it clears the various allocations etc when they are freed. The exception is bio_alloc_cache_destroy(), which does not clear ->cache when it has freed it. This isn't necessarily a bug, but can be if buggy users does call the exit path more then once, or with just a memset() bioset which has never been initialized. dm appears to be one such user. Fixes: `be4d234d7a` ("bio: add allocation cache abstraction") Link: https://lore.kernel.org/linux-block/YpK7m+14A+pZKs5k@casper.infradead.org/ Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-29 07:36:31 -06:00
Christoph Hellwig	e2e5308672	blk-mq: remove the done argument to blk_execute_rq_nowait Let the caller set it together with the end_io_data instead of passing a pointless argument. Note the the target code did in fact already set it and then just overrode it again by calling blk_execute_rq_nowait. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-28 06:15:27 -06:00
Christoph Hellwig	32ac5a9b8b	blk-mq: avoid a mess of casts for blk_end_sync_rq Instead of trying to cast a __bitwise 32-bit integer to a larger integer and then a pointer, just allow a struct with the blk_status_t and the completion on stack and set the end_io_data to that. Use the opportunity to move the code to where it belongs and drop rather confusing comments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-28 06:15:27 -06:00
Christoph Hellwig	ae948fd6d0	blk-mq: remove __blk_execute_rq_nowait We don't want to plug for synchronous execution that where we immediately wait for the request. Once that is done not a whole lot of code is shared, so just remove __blk_execute_rq_nowait. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-28 06:15:27 -06:00
Christoph Hellwig	ebd076bf7d	block: use bio_queue_enter instead of blk_queue_enter in bio_poll We want to have a valid live gendisk to call ->poll and not just a request_queue, so call the right helper. Fixes: `3e08773c38` ("block: switch polling to be bio based") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220523124302.526186-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-28 06:14:35 -06:00
Christoph Hellwig	403d50341c	block: take destination bvec offsets into account in bio_copy_data_iter Appartly bcache can copy into bios that do not just contain fresh pages but can have offsets into the bio_vecs. Restore support for tht in bio_copy_data_iter. Fixes: `f8b679a070` ("block: rewrite bio_copy_data_iter to use bvec_kmap_local and memcpy_to_bvec") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220524143919.1155501-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-27 20:35:55 -06:00

1 2 3 4 5 ...

6280 Commits