linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-25 11:55:37 +00:00

Author	SHA1	Message	Date
Christoph Hellwig	6cc37a672a	block: call blk_queue_free_zone_bitmaps from disk_release The zone bitmaps are only used for non-passthrough I/O, so free them as soon as the disk is released. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-06 06:46:25 -06:00
John Garry	4cf6e6c010	blk-mq: Drop local variable for reserved tag The local variable is now only referenced once so drop it. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-7-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-06 06:33:53 -06:00
John Garry	2dd6532e95	blk-mq: Drop 'reserved' arg of busy_tag_iter_fn We no longer use the 'reserved' arg in busy_tag_iter_fn for any iter function so it may be dropped. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> #nvme Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/1657109034-206040-6-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-06 06:33:53 -06:00
John Garry	9bdb4833dd	blk-mq: Drop blk_mq_ops.timeout 'reserved' arg With new API blk_mq_is_reserved_rq() we can tell if a request is from the reserved pool, so stop passing 'reserved' arg. There is actually only a single user of that arg for all the callback implementations, which can use blk_mq_is_reserved_rq() instead. This will also allow us to stop passing the same 'reserved' around the blk-mq iter functions next. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-06 06:33:53 -06:00
John Garry	99e48cd685	blk-mq: Add a flag for reserved requests Add a flag for reserved requests so that drivers may know this for any special handling. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-3-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-06 06:33:53 -06:00
Jason Yan	e55cf79814	blk-cgroup: factor out blkcg_free_all_cpd() To reduce some duplicated code, factor out blkcg_free_all_cpd(). No functional change. Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220629070917.3113016-3-yanaijie@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-29 11:09:53 -06:00
Jason Yan	362b8c16f8	blk-cgroup: factor out blkcg_iostat_update() To reduce some duplicated code, factor out blkcg_iostat_update(). No functional change. Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220629070917.3113016-2-yanaijie@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-29 11:09:53 -06:00
Christoph Hellwig	22d0c4080f	block: simplify disk_set_independent_access_ranges Lift setting disk->ia_ranges from disk_register_independent_access_ranges into disk_set_independent_access_ranges, and make the behavior the same for the registered vs non-registered queue cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220629062013.1331068-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-29 08:36:46 -06:00
Christoph Hellwig	6a27d28c81	block: move ->ia_ranges from the request_queue to the gendisk Independent access ranges only matter for file system I/O and are only valid with a registered gendisk, so move them there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220629062013.1331068-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-29 08:36:46 -06:00
Ying Sun	b9a1c179bd	block: remove "select BLK_RQ_IO_DATA_LEN" from BLK_CGROUP_IOCOST dependency The configuration item BLK_RQ_IO_DATA_LEN is not declared in the kernel. Select BLK_RQ_IO_DATA_LEN is meaningless which could be removed. Signed-off-by: Ying Sun <sunying@nj.iscas.ac.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220629062409.19458-1-sunying@nj.iscas.ac.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-29 08:35:57 -06:00
Matthew Wilcox (Oracle)	0e00fa5f83	block: Remove check of PageError If read_mapping_page() sees a page with PageError set, it returns a PTR_ERR(). Checking PageError again is simply dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-06-29 08:51:06 -04:00
Christoph Hellwig	8682b92e5a	blk-mq: cleanup disk sysfs registration Pass a gendisk to the sysfs register/unregister functions and give them descriptive names. Also move the unregistration helper next to the one doing the registration. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-28 11:32:42 -06:00
Christoph Hellwig	eaa870f975	blk-mq: rename blk_mq_sysfs_{,un}register Add a _hctx postfix to better describe what the functions do, match the debugfs equivalents and release the old names for functions that should be using them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-28 11:32:42 -06:00
Christoph Hellwig	81f0c2ef41	block: remove the extra gendisk reference in __blk_mq_register_dev kobject_add already grabs a reference to the parent, no need to have another one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-28 11:32:42 -06:00
Christoph Hellwig	4a8d14bba4	block: use default groups to register the queue attributes Set up the default_groups for blk_queue_ktype instead of manually calling sysfs_create_group. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-28 11:32:42 -06:00
Christoph Hellwig	060f131e9c	block: remove a superflous queue kobject reference kobject_add already adds a reference to the parent that is dropped on deletion, so don't bother grabbing another one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-28 11:32:42 -06:00
Christoph Hellwig	cc5c516df0	block: simplify blktrace sysfs attribute creation Add the trace attributes to the default gendisk attributes, just like we already do for partitions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-28 11:32:42 -06:00
Christoph Hellwig	8b9ab62662	block: remove blk_cleanup_disk blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-28 06:33:15 -06:00
Christoph Hellwig	6f8191fdf4	block: simplify disk shutdown Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for all disks that do not have separately allocated queues, and thus remove the need to call blk_cleanup_queue for them. Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that this function is intended only for separately allocated blk-mq queues. This saves an extra queue freeze for devices without a separately allocated queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-28 06:30:26 -06:00
Christoph Hellwig	0e3534022f	block: stop setting the nomerges flags in blk_cleanup_queue These flags only apply to file system I/O, and all file system I/O is already drained by del_gendisk and thus can't be in progress when blk_cleanup_queue is called. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-28 06:30:26 -06:00
Christoph Hellwig	1f90307e5f	block: remove QUEUE_FLAG_DEAD Disallow setting the blk-mq state on any queue that is already dying as setting the state even then is a bad idea, and remove the now unused QUEUE_FLAG_DEAD flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-28 06:30:26 -06:00
Liu Song	ee78ec1077	blk-mq: blk_mq_tag_busy is no need to return a value Currently "blk_mq_tag_busy" return value has no effect, so adjust it. Some code implementations have also been adjusted to enhance readability. Signed-off-by: Liu Song <liusong@linux.alibaba.com> Link: https://lore.kernel.org/r/1656170121-1619-1-git-send-email-liusong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:12 -06:00
Jan Kara	a78418e6a0	block: Always initialize bio IO priority on submit Currently, IO priority set in task's IO context is not reflected in the bio->bi_ioprio for most IO (only io_uring and direct IO set it). This results in odd results where process is submitting some bios with one priority and other bios with a different (unset) priority and due to differing priorities bios cannot be merged. Make sure bio->bi_ioprio is always set on bio submission. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-9-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:12 -06:00
Jan Kara	9c6227e043	block: Initialize bio priority earlier Bio's IO priority needs to be initialized before we try to merge the bio with other bios. Otherwise we could merge bios which would otherwise receive different IO priorities leading to possible QoS issues. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-8-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:12 -06:00
Jan Kara	82b74cac28	blk-ioprio: Convert from rqos policy to direct call Convert blk-ioprio handling from a rqos policy to a direct call from blk_mq_submit_bio(). Firstly, blk-ioprio is not much of a rqos policy anyway, it just needs a hook in bio submission path to set the bio's IO priority. Secondly, the rqos .track hook gets actually called too late for blk-ioprio purposes and introducing a special rqos hook just for blk-ioprio looks even weirder. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-7-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:12 -06:00
Jan Kara	f258654472	blk-ioprio: Remove unneeded field blkcg->ioprio_set field is not really useful except for avoiding possibly more expensive checks inside blkcg_ioprio_track(). The check for blkcg->prio_policy being equal to POLICY_NO_CHANGE does the same service so just remove the ioprio_set field and replace the check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-6-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:12 -06:00
Jan Kara	4b838d9ee9	block: Fix handling of tasks without ioprio in ioprio_get(2) ioprio_get(2) can be asked to return the best IO priority from several tasks (IOPRIO_WHO_PGRP, IOPRIO_WHO_USER). Currently the call treats tasks without set IO priority as having priority IOPRIO_CLASS_BE/IOPRIO_BE_NORM however this does not really reflect the IO priority the task will get (which depends on task's nice value). Fix the code to use the real IO priority task's IO will use. We have to modify code for ioprio_get(IOPRIO_WHO_PROCESS) to keep returning IOPRIO_CLASS_NONE priority for tasks without set IO priority as a special case to maintain userspace visible API. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220623074840.5960-5-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:12 -06:00
Jan Kara	fc25545e17	block: Make ioprio_best() static Nobody outside of block/ioprio.c uses it. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-4-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:12 -06:00
Jan Kara	893e5d32d5	block: Generalize get_current_ioprio() for any task get_current_ioprio() operates only on current task. We will need the same functionality for other tasks as well. Generalize get_current_ioprio() for that and also move the bulk out of the header file because it is large enough. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-3-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:12 -06:00
Jan Kara	e589f46445	block: fix default IO priority handling again Commit `e70344c059` ("block: fix default IO priority handling") introduced an inconsistency in get_current_ioprio() that tasks without IO context return IOPRIO_DEFAULT priority while tasks with freshly allocated IO context will return 0 (IOPRIO_CLASS_NONE/0) IO priority. Tasks without IO context used to be rare before `5a9d041ba2` ("block: move io_context creation into where it's needed") but after this commit they became common because now only BFQ IO scheduler setups task's IO context. Similar inconsistency is there for get_task_ioprio() so this inconsistency is now exposed to userspace and userspace will see different IO priority for tasks operating on devices with BFQ compared to devices without BFQ. Furthemore the changes done by commit `e70344c059` change the behavior when no IO priority is set for BFQ IO scheduler which is also documented in ioprio_set(2) manpage: "If no I/O scheduler has been set for a thread, then by default the I/O priority will follow the CPU nice value (setpriority(2)). In Linux kernels before version 2.6.24, once an I/O priority had been set using ioprio_set(), there was no way to reset the I/O scheduling behavior to the default. Since Linux 2.6.24, specifying ioprio as 0 can be used to reset to the default I/O scheduling behavior." So make sure we default to IOPRIO_CLASS_NONE as used to be the case before commit `e70344c059`. Also cleanup alloc_io_context() to explicitely set this IO priority for the allocated IO context to avoid future surprises. Note that we tweak ioprio_best() to maintain ioprio_get(2) behavior and make this commit easily backportable. CC: stable@vger.kernel.org Fixes: `e70344c059` ("block: fix default IO priority handling") Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:12 -06:00
Sebastian Andrzej Siewior	3c8f9da41e	blk-mq: Don't disable preemption around __blk_mq_run_hw_queue(). __blk_mq_delay_run_hw_queue() disables preemption to get a stable current CPU number and then invokes __blk_mq_run_hw_queue() if the CPU number is part the mask. __blk_mq_run_hw_queue() acquires a spin_lock_t which is a sleeping lock on PREEMPT_RT and can't be acquired with disabled preemption. It is not required for correctness to invoke __blk_mq_run_hw_queue() on a CPU matching hctx->cpumask. Both (async and direct requests) can run on a CPU not matching hctx->cpumask. The CPU mask without disabling preemption and invoking __blk_mq_run_hw_queue(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/YrLSEiNvagKJaDs5@linutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:12 -06:00
Bart Van Assche	1d87be8212	block: bfq: Fix kernel-doc headers Fix the following warnings: block/bfq-cgroup.c:721: warning: Function parameter or member 'bfqg' not described in '__bfq_bic_change_cgroup' block/bfq-cgroup.c:721: warning: Excess function parameter 'blkcg' description in '__bfq_bic_change_cgroup' block/bfq-cgroup.c:870: warning: Function parameter or member 'ioprio_class' not described in 'bfq_reparent_leaf_entity' block/bfq-cgroup.c:900: warning: Function parameter or member 'ioprio_class' not described in 'bfq_reparent_active_queues' Cc: Jan Kara <jack@suse.cz> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220617210859.106623-1-bvanassche@acm.org	2022-06-27 06:29:12 -06:00
Bart Van Assche	c28c49b09e	block: bfq: Remove an unused function definition This patch is the result of the analysis of a sparse report. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220617204433.102022-1-bvanassche@acm.org	2022-06-27 06:29:12 -06:00
GuoYong Zheng	6c77b152f5	bfq: Remove useless code in bfq_lookup_next_entity It is no need to judge entity is null or not here, directly return entity is ok, so remove it. Signed-off-by: GuoYong Zheng <zhenggy@chinatelecom.cn> Link: https://lore.kernel.org/r/1655461684-19075-1-git-send-email-zhenggy@chinatelecom.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:12 -06:00
Christoph Hellwig	2a9336c42a	block: move blk_queue_get_max_sectors to blk.h blk_queue_get_max_sectors is private to the block layer, so move it out of blkdev.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220614090934.570632-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Christoph Hellwig	efef739d5f	block: fold blk_max_size_offset into get_max_io_size Now that blk_max_size_offset has a single caller left, fold it into that and clean up the naming convention for the local variables there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220614090934.570632-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Christoph Hellwig	84613beda4	block: cleanup variable naming in get_max_io_size get_max_io_size has a very odd choice of variables names and initialization patterns. Switch to more descriptive names and more clear initialization of them. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220614090934.570632-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Christoph Hellwig	c887519074	block: open code blk_max_size_offset in blk_rq_get_max_sectors blk_rq_get_max_sectors always uses q->limits.chunk_sectors as the chunk_sectors argument, and already checks for max_sectors through the call to blk_queue_get_max_sectors. That means much of blk_max_size_offset is not needed and open coding it simplifies the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220614090934.570632-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Bart Van Assche	51ab80f0aa	block: Make blk_mq_get_sq_hctx() select the proper hardware queue type Since the introduction of blk_mq_get_hctx_type() the operation type in the second argument of blk_mq_get_hctx_type() matters. The introduction of blk_mq_get_hctx_type() caused blk_mq_get_sq_hctx() to select a hardware queue of type HCTX_TYPE_READ instead of HCTX_TYPE_DEFAULT. Switch to hardware queue type HCTX_TYPE_DEFAULT since HCTX_TYPE_READ should only be used for read requests. Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220615225549.1054905-4-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Bart Van Assche	7e923f40a4	block: Rename a blk_mq_map_queue() argument Before the introduction of blk_mq_get_hctx_type(), blk_mq_map_queue() only used the flags from its second argument. Since the introduction of blk_mq_get_hctx_type(), blk_mq_map_queue() uses both the operation and the flags encoded in that argument. Rename the second argument of blk_mq_map_queue() to make this clear. Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220615225549.1054905-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Bart Van Assche	62c159a03d	blk-iocost: Simplify ioc_rqos_done() Leave out the superfluous "& REQ_OP_MASK" code. The definition of req_op() shows that that code is superfluous: #define req_op(req) ((req)->cmd_flags & REQ_OP_MASK) Compile-tested only. Cc: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220615225549.1054905-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Bo Liu	798f2a6f73	block: Directly use ida_alloc()/free() Use ida_alloc()/ida_free() instead of ida_simple_get()/ida_simple_remove(). The latter is deprecated and more verbose. Signed-off-by: Bo Liu <liubo03@inspur.com> Reviewed-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20220615081816.4342-1-liubo03@inspur.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Keith Busch	b1a000d3b8	block: relax direct io memory alignment Use the address alignment requirements from the block_device for direct io instead of requiring addresses be aligned to the block size. User space can discover the alignment requirements from the dma_alignment queue attribute. User space can specify any hardware compatible DMA offset for each segment, but every segment length is still required to be a multiple of the block size. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220610195830.3574005-11-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Keith Busch	9cfe3ddecd	block/bounce: count bytes instead of sectors Individual bv_len's may not be a sector size. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220610195830.3574005-8-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Keith Busch	67927d2201	block/merge: count bytes instead of sectors Individual bv_len's may not be a sector size. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220610195830.3574005-7-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Keith Busch	37fee2e42e	block: add a helper function for dio alignment This will make it easier to add more complex acceptable alignment criteria in the future. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220610195830.3574005-6-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Keith Busch	3850e13f28	block: export dma_alignment attribute User space may want to know how to align their buffers to avoid bouncing. Export the queue attribute. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220610195830.3574005-4-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Keith Busch	c58c0074c5	block/bio: remove duplicate append pages code The getting pages setup for zone append and normal IO are identical. Use common code for each. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220610195830.3574005-3-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Keith Busch	b82d9fa257	block: fix infinite loop for invalid zone append Returning 0 early from __bio_iov_append_get_pages() for the max_append_sectors warning just creates an infinite loop since 0 means success, and the bio will never fill from the unadvancing iov_iter. We could turn the return into an error value, but it will already be turned into an error value later on, so just remove the warning. Clearly no one ever hit it anyway. Fixes: `0512a75b98` ("block: Introduce REQ_OP_ZONE_APPEND") Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220610195830.3574005-2-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-27 06:29:11 -06:00
Li Nan	ca2a3343d6	block: remove WARN_ON() from bd_link_disk_holder Since commit 83cbce957446("block: add error handling for device_add_disk / add_disk"), bdev->bd_holder_dir can not be empty now, so remove WARN_ON() from bd_link_disk_holder. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074100.2251301-1-linan122@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-23 07:48:05 -06:00
Bo Liu	873cdda193	scsi: core: bsg: Remove usage of the deprecated ida_simple_xxx() API Use ida_alloc_xxx()/ida_free() instead of ida_simple_get()/ida_simple_remove(). The latter is deprecated and more verbose. Link: https://lore.kernel.org/r/20220617020430.2300-1-liubo03@inspur.com Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-06-21 21:22:51 -04:00
Jens Axboe	2645672ffe	block: pop cached rq before potentially blocking rq_qos_throttle() If rq_qos_throttle() ends up blocking, then we will have invalidated and flushed our current plug. Since blk_mq_get_cached_request() hasn't popped the cached request off the plug list just yet, we end holding a pointer to a request that is no longer valid. This insta-crashes with rq->mq_hctx being NULL in the validity checks just after. Pop the request off the cached list before doing rq_qos_throttle() to avoid using a potentially stale request. Fixes: `0a5aa8d161` ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection") Reported-by: Dylan Yudaken <dylany@fb.com> Tested-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-21 10:59:58 -06:00
Damien Le Moal	9243fc4cd2	block: remove queue from struct blk_independent_access_range The request queue pointer in struct blk_independent_access_range is unused. Remove it. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Fixes: `41e46b3c2a` ("block: Fix potential deadlock in blk_ia_range_sysfs_show()") Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220603053529.76405-1-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-19 18:40:11 -06:00
Christoph Hellwig	a09b314005	block: freeze the queue earlier in del_gendisk Freeze the queue earlier in del_gendisk so that the state does not change while we remove debugfs and sysfs files. Ming mentioned that being able to observer request in debugfs might be useful while the queue is being frozen in del_gendisk, which is made possible by this change. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220614074827.458955-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-17 07:31:05 -06:00
Christoph Hellwig	99d055b4fd	block: remove per-disk debugfs files in blk_unregister_queue The block debugfs files are created in blk_register_queue, which is called by add_disk and use a naming scheme based on the disk_name. After del_gendisk returns that name can be reused and thus we must not leave these debugfs files around, otherwise the kernel is unhappy and spews messages like: Directory XXXXX with parent 'block' already present! and the newly created devices will not have working debugfs files. Move the unregistration to blk_unregister_queue instead (which matches the sysfs unregistration) to make sure the debugfs life time rules match those of the disk name. As part of the move also make sure the whole debugfs unregistration is inside a single debugfs_mutex critical section. Note that this breaks blktests block/002, which checks that the debugfs directory has not been removed while blktests is running, but that particular check should simply be removed from the test case. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220614074827.458955-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-17 07:31:05 -06:00
Christoph Hellwig	5cf9c91ba9	block: serialize all debugfs operations using q->debugfs_mutex Various places like I/O schedulers or the QOS infrastructure try to register debugfs files on demans, which can race with creating and removing the main queue debugfs directory. Use the existing debugfs_mutex to serialize all debugfs operations that rely on q->debugfs_dir or the directories hanging off it. To make the teardown code a little simpler declare all debugfs dentry pointers and not just the main one uncoditionally in blkdev.h. Move debugfs_mutex next to the dentries that it protects and document what it is used for. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220614074827.458955-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-17 07:31:05 -06:00
Christoph Hellwig	50e34d7881	block: disable the elevator int del_gendisk The elevator is only used for file system requests, which are stopped in del_gendisk. Move disabling the elevator and freeing the scheduler tags to the end of del_gendisk instead of doing that work in disk_release and blk_cleanup_queue to avoid a use after free on q->tag_set from disk_release as the tag_set might not be alive at that point. Move the blk_qos_exit call as well, as it just depends on the elevator exit and would be the only reason to keep the not exactly cheap queue freeze in disk_release. Fixes: `e155b0c238` ("blk-mq: Use shared tags for shared sbitmap support") Reported-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20220614074827.458955-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-17 07:31:05 -06:00
Bart Van Assche	b96f3cab59	block/bfq: Enable I/O statistics BFQ uses io_start_time_ns. That member variable is only set if I/O statistics are enabled. Hence this patch that enables I/O statistics at the time BFQ is associated with a request queue. Compile-tested only. Reported-by: Cixi Geng <cixi.geng1@unisoc.com> Cc: Cixi Geng <cixi.geng1@unisoc.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Paolo Valente <paolo.valente@unimore.it> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 16:59:28 -06:00
Ming Lei	6cfeadbff3	blk-mq: don't clear flush_rq from tags->rqs[] commit `364b61818f` ("blk-mq: clearing flush request reference in tags->rqs[]") is added to clear the to-be-free flush request from tags->rqs[] for avoiding use-after-free on the flush rq. Yu Kuai reported that blk_mq_clear_flush_rq_mapping() slows down boot time by ~8s because running scsi probe which may create and remove lots of unpresent LUNs on megaraid-sas which uses BLK_MQ_F_TAG_HCTX_SHARED and each request queue has lots of hw queues. Improve the situation by not running blk_mq_clear_flush_rq_mapping if disk isn't added when there can't be any flush request issued. Reviewed-by: Christoph Hellwig <hch@lst.de> Reported-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220616014401.817001-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 14:45:15 -06:00
Ming Lei	4d337cebcb	blk-mq: avoid to touch q->elevator without any protection q->elevator is referred in blk_mq_has_sqsched() without any protection, no .q_usage_counter is held, no queue srcu and rcu read lock is held, so potential use-after-free may be triggered. Fix the issue by adding one queue flag for checking if the elevator uses single queue style dispatch. Meantime the elevator feature flag of ELEVATOR_F_MQ_AWARE isn't needed any more. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220616014401.817001-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 14:45:15 -06:00
Ming Lei	5fd7a84a09	blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none elevator can be tore down by sysfs switch interface or disk release, so hold ->sysfs_lock before referring to q->elevator, then potential use-after-free can be avoided. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220616014401.817001-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 14:45:15 -06:00
Bart Van Assche	14dc7a18ab	block: Fix handling of offline queues in blk_mq_alloc_request_hctx() This patch prevents that test nvme/004 triggers the following: UBSAN: array-index-out-of-bounds in block/blk-mq.h:135:9 index 512 is out of range for type 'long unsigned int [512]' Call Trace: show_stack+0x52/0x58 dump_stack_lvl+0x49/0x5e dump_stack+0x10/0x12 ubsan_epilogue+0x9/0x3b __ubsan_handle_out_of_bounds.cold+0x44/0x49 blk_mq_alloc_request_hctx+0x304/0x310 __nvme_submit_sync_cmd+0x70/0x200 [nvme_core] nvmf_connect_io_queue+0x23e/0x2a0 [nvme_fabrics] nvme_loop_connect_io_queues+0x8d/0xb0 [nvme_loop] nvme_loop_create_ctrl+0x58e/0x7d0 [nvme_loop] nvmf_create_ctrl+0x1d7/0x4d0 [nvme_fabrics] nvmf_dev_write+0xae/0x111 [nvme_fabrics] vfs_write+0x144/0x560 ksys_write+0xb7/0x140 __x64_sys_write+0x42/0x50 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Fixes: `20e4d81393` ("blk-mq: simplify queue mapping & schedule with each possisble CPU") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220615210004.1031820-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 14:43:31 -06:00
Al Viro	91b94c5d6a	iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC New helper to be used instead of direct checks for IOCB_DSYNC: iocb_is_dsync(iocb). Checks converted, which allows to avoid the IS_SYNC(iocb->ki_filp->f_mapping->host) part (4 cache lines) from iocb_flags() - it's checked in iocb_is_dsync() instead Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-06-10 16:05:15 -04:00
Christoph Hellwig	d5a37b1998	block: remove bioset_init_from_src Unused now, and the interface never really made a whole lot of sense to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org>	2022-06-08 14:04:14 -04:00
Linus Torvalds	78c6499c92	for-5.19/drivers-2022-06-02 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKZmkoQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqyrD/4iyg2ULBPyljLoM3Ed8AONbrFBApenKDFN FjiFRZNll8zvJLTtP0GQqJIrljPuRlqb0IkbgqXl+yvPZ+wpB9HgIe2aohkOaqqJ KS49UNR/aIHwC2y7lwlcsdgVqqoPdc4wnZeaQvCsWPCBhCca/k0kR7s3uEhHMK92 OWpV9osl/thLfOBwwt4IEaO1Koz8PM/fCR4XA2KLbs8E4P8EcSFglqi0ap7foLNr pQAJIlPjkmF6nw4xg5fdBjVBo//kVcuf9IBMi5/XinmUL1taFAcn5WyeOvBbi0Fs Sqp/pKkveM8xWZKrDyA/wf8nzRNpBBl6TOQEMFtV6FkZrij3pbKHlCiyL2gBR13e 5gkbVvXgtdqDVlnqlvIV/Swfh5YQFtn7+vlHFOUjP+iObsBRo4fTvDhPTLoO/VCf tIAA7xnq/pquRKS/QGGC7ZxRVc3T1r+EpvbBP5Dc4CDGbbbyZLCSrOh5HWIb0I3k 95GSiipTtf54KqZ9HiG/u+xNAFIdapXgU4Xm+JyDRWLdxSs5Nmy8VgpdflvxBfuo hCvJJw3vtDusjyHc7IafxaZlJVQT9tPgPshs8GfrCMCP19RCALD/5irspFh6dD35 BQTIkhC68XNa0iNn/NTP3uxir/JwRoovxQkA9eD+r1NHsAbL8GTypfr5kKJxDaIK UhawfyZE3Q== =YqhC -----END PGP SIGNATURE----- Merge tag 'for-5.19/drivers-2022-06-02' of git://git.kernel.dk/linux-block Pull more block driver updates from Jens Axboe: "A collection of stragglers that were late on sending in their changes and just followup fixes. - NVMe fixes pull request via Christoph: - set controller enable bit in a separate write (Niklas Cassel) - disable namespace identifiers for the MAXIO MAP1001 (Christoph) - fix a comment typo (Julia Lawall)" - MD fixes pull request via Song: - Remove uses of bdevname (Christoph Hellwig) - Bug fixes (Guoqing Jiang, and Xiao Ni) - bcache fixes series (Coly) - null_blk zoned write fix (Damien) - nbd fixes (Yu, Zhang) - Fix for loop partition scanning (Christoph)" * tag 'for-5.19/drivers-2022-06-02' of git://git.kernel.dk/linux-block: (23 commits) block: null_blk: Fix null_zone_write() nvmet: fix typo in comment nvme: set controller enable bit in a separate write nvme-pci: disable namespace identifiers for the MAXIO MAP1001 bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() nbd: use pr_err to output error message nbd: fix possible overflow on 'first_minor' in nbd_dev_add() nbd: fix io hung while disconnecting device nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed nbd: fix race between nbd_alloc_config() and module removal nbd: call genl_unregister_family() first in nbd_cleanup() md: bcache: check the return value of kzalloc() in detached_dev_do_request() bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() block, loop: support partitions without scanning bcache: avoid journal no-space deadlock by reserving 1 journal bucket bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() bcache: improve multithreaded bch_sectors_dirty_init() bcache: improve multithreaded bch_btree_check() md: fix double free of io_acct_set bioset md: Don't set mddev private to NULL in raid0 pers->free ...	2022-06-03 10:25:56 -07:00
Linus Torvalds	72fbbc3d0e	for-5.19/block-exec-2022-06-02 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKZmh0QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqg6EACCbkwoH7rrr38iU++xP3c9oqFJCfYR95ho qn9/3FiPulua0Dwg8Fbp12ubqqBy/iNj+4Mk7XTo28P7ahjGtKJec2DZguDuHC5X G3kucgQcDOLs1IMWoil+KrnjGC8qeT9ZPFNaUF0IY084NPxnj1wAOjo1J00QVieN WFgHX1sxzBje8abebf3UxAyXImzfyY2uXbp1F3thzf0ZwHXkSDsbWI3fvpdYF4QC p3z6CX0sR+5v7ZLWF3X6H8MBSO+eRlprYji3O/0jVslLBAS8FlTdizQtzx7C6Hsv JZVY4ZsUswYxtsHCBR0McglDeu/iXZRZ9HX4iiOobYJNaXfycMltvS+4Tb/TsFTB GaG6tbL4JS+NT063ctl5h355vUVhIbw6qEhsiF47+0hgawvRr/xxP0aSq1MPmfjw OgG4Jn0htXF47tpKnszfaj3BmgvgV56mV0IGwF5Sh5NXDnHF+MHFmrhRsP1NenjL 12FTnvWGYyTRGDVIVFDkuwaI9o9iNKdFw0JkKIZa/G5RVmmjukvMGTvvSnKmSxJg dgbYLSBA2ZxCcIPjJDvZroe3QxvGNyqUYtxwyWl4a1HK/qljZfwwyRE1rfjJ47hK F8jNEkOThcjr1anoV2nvSLE1mM3SyA/UqDsntIdwnUlG/ObYsByTgtKc+ETs1RzS 8Ovp6lv0Mw== =lBeb -----END PGP SIGNATURE----- Merge tag 'for-5.19/block-exec-2022-06-02' of git://git.kernel.dk/linux-block Pull block request execute cleanups from Jens Axboe: "This change was advertised in the initial core block pull request, but didn't actually make that branch as we deferred it to a post-merge pull request to avoid a bunch of cross branch issues. This series cleans up the block execute path quite nicely" * tag 'for-5.19/block-exec-2022-06-02' of git://git.kernel.dk/linux-block: blk-mq: remove the done argument to blk_execute_rq_nowait blk-mq: avoid a mess of casts for blk_end_sync_rq blk-mq: remove __blk_execute_rq_nowait	2022-06-03 10:21:43 -07:00
Linus Torvalds	34845d92bc	for-5.19/block-2022-06-02 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKZmfQQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpsLlEACPbK/ms8dMDwKjfEF/RMoc7uL/j6oC0cpf 0D2sfMka8D41QdrUfMiUismXZ61dyKdsiX/U/Q0gcjIomnlco8ZeLcLa6DlafjwY DtvO2aCb+eBAkII5sX2WM4ANNgFTy08Y4wBmgEy5En5u4nPlIGZ8DsulQQodqygx 1lJh31OXQKw+2kIyUdAeC0GMiD9nddYDsH0CTFDSZsAijCcOBDOHbDPk27wHapzM GR1UAK5/SA7RfZgIMRHHclF6Ea49/uPJ45crD1T+8p6jLW+ldbxpiRD3ux9BnK2v U7EWS5MLMFAvb/nTLc8T37srJuEhBAT0r2bn614rjOiJofalPeD0eDeHfz4vRpPe +qTQREtpBUtJizYN+8rpcxP8f9S/hmPOBvIKD3XC0TlOo1NCf35fqWLWMli2hkTQ AfcY1auKjC/UYcnR0TQ91aHo1puM4fK5Pdc6lDGznrcxy9t1g1NvKAEL9Y3xK3No paglrliBCUbAN8vogKr4jc7jRkh/GLEqkxV2LIpOVp3lyT9GepvYM1xLQ8X/rszn /Il3fAwf5AyP+1RoVcmmOy1XW0ptUbKXWn03NlxN55Ya8x3tKCwWWDSmL2CP8SwV Vo5Qt+rKUkqA/TmHW8HOd7i+44Sa8oD/6WpSSPkwXN2cgRQmvmtaGmpXCKNTn5tk PMgFJOq3uw== =7JDU -----END PGP SIGNATURE----- Merge tag 'for-5.19/block-2022-06-02' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Just a collection of fixes that have been queued up since the initial merge window pull request, the majority of which are targeted for stable as well. One bio_set fix that fixes an issue with the dm adoption of cached bio structs that got introduced in this merge window" * tag 'for-5.19/block-2022-06-02' of git://git.kernel.dk/linux-block: block: Fix potential deadlock in blk_ia_range_sysfs_show() block: fix bio_clone_blkg_association() to associate with proper blkcg_gq block: remove useless BUG_ON() in blk_mq_put_tag() blk-mq: do not update io_ticks with passthrough requests block: make bioset_exit() fully resilient against being called twice block: use bio_queue_enter instead of blk_queue_enter in bio_poll block: document BLK_STS_AGAIN usage block: take destination bvec offsets into account in bio_copy_data_iter blk-iolatency: Fix inflight count imbalances and IO hangs on offline blk-mq: don't touch ->tagset in blk_mq_get_sq_hctx	2022-06-03 10:14:48 -07:00
Damien Le Moal	41e46b3c2a	block: Fix potential deadlock in blk_ia_range_sysfs_show() When being read, a sysfs attribute is already protected against removal with the kobject node active reference counter. As a result, in blk_ia_range_sysfs_show(), there is no need to take the queue sysfs lock when reading the value of a range attribute. Using the queue sysfs lock in this function creates a potential deadlock situation with the disk removal, something that a lockdep signals with a splat when the device is removed: [ 760.703551] Possible unsafe locking scenario: [ 760.703551] [ 760.703554] CPU0 CPU1 [ 760.703556] ---- ---- [ 760.703558] lock(&q->sysfs_lock); [ 760.703565] lock(kn->active#385); [ 760.703573] lock(&q->sysfs_lock); [ 760.703579] lock(kn->active#385); [ 760.703587] [ 760.703587] * DEADLOCK * Solve this by removing the mutex_lock()/mutex_unlock() calls from blk_ia_range_sysfs_show(). Fixes: `a2247f19ee` ("block: Add independent access ranges support") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220603021905.1441419-1-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-02 23:02:37 -06:00
Jan Kara	22b106e535	block: fix bio_clone_blkg_association() to associate with proper blkcg_gq Commit `d92c370a16` ("block: really clone the block cgroup in bio_clone_blkg_association") changed bio_clone_blkg_association() to just clone bio->bi_blkg reference from source to destination bio. This is however wrong if the source and destination bios are against different block devices because struct blkcg_gq is different for each bdev-blkcg pair. This will result in IOs being accounted (and throttled as a result) multiple times against the same device (src bdev) while throttling of the other device (dst bdev) is ignored. In case of BFQ the inconsistency can even result in crashes in bfq_bic_update_cgroup(). Fix the problem by looking up correct blkcg_gq for the cloned bio. Reported-by: Logan Gunthorpe <logang@deltatee.com> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de> Fixes: `d92c370a16` ("block: really clone the block cgroup in bio_clone_blkg_association") CC: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220602081242.7731-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-02 02:15:05 -06:00
Damien Le Moal	ff47dbd18b	block: remove useless BUG_ON() in blk_mq_put_tag() Since the if condition in blk_mq_put_tag() checks that the tag to put is not a reserved one, the BUG_ON() check in the else branch checking if the tag is indeed a reserved one is useless. Remove it. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220602075159.1273366-1-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-02 02:05:56 -06:00
Haisu Wang	b81c14ca14	blk-mq: do not update io_ticks with passthrough requests Flush or passthrough requests are not accounted as normal IO in completion. To reflect iostat for slow IO, io_ticks is updated when stat show called based on inflight numbers. It may cause inconsistent io_ticks calculation result. So do not account non-passthrough request when check inflight. Fixes: `86d7331299` ("block: update io_ticks when io hang") Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: samuelliao <samuelliao@tencent.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220530064059.1120058-1-haisuwang@tencent.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-30 07:13:12 -06:00
Jens Axboe	605f7415ec	block: make bioset_exit() fully resilient against being called twice Most of bioset_exit() is fine being called twice, as it clears the various allocations etc when they are freed. The exception is bio_alloc_cache_destroy(), which does not clear ->cache when it has freed it. This isn't necessarily a bug, but can be if buggy users does call the exit path more then once, or with just a memset() bioset which has never been initialized. dm appears to be one such user. Fixes: `be4d234d7a` ("bio: add allocation cache abstraction") Link: https://lore.kernel.org/linux-block/YpK7m+14A+pZKs5k@casper.infradead.org/ Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-29 07:36:31 -06:00
Christoph Hellwig	e2e5308672	blk-mq: remove the done argument to blk_execute_rq_nowait Let the caller set it together with the end_io_data instead of passing a pointless argument. Note the the target code did in fact already set it and then just overrode it again by calling blk_execute_rq_nowait. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-28 06:15:27 -06:00
Christoph Hellwig	32ac5a9b8b	blk-mq: avoid a mess of casts for blk_end_sync_rq Instead of trying to cast a __bitwise 32-bit integer to a larger integer and then a pointer, just allow a struct with the blk_status_t and the completion on stack and set the end_io_data to that. Use the opportunity to move the code to where it belongs and drop rather confusing comments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-28 06:15:27 -06:00
Christoph Hellwig	ae948fd6d0	blk-mq: remove __blk_execute_rq_nowait We don't want to plug for synchronous execution that where we immediately wait for the request. Once that is done not a whole lot of code is shared, so just remove __blk_execute_rq_nowait. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-28 06:15:27 -06:00
Christoph Hellwig	ebd076bf7d	block: use bio_queue_enter instead of blk_queue_enter in bio_poll We want to have a valid live gendisk to call ->poll and not just a request_queue, so call the right helper. Fixes: `3e08773c38` ("block: switch polling to be bio based") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220523124302.526186-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-28 06:14:35 -06:00
Christoph Hellwig	403d50341c	block: take destination bvec offsets into account in bio_copy_data_iter Appartly bcache can copy into bios that do not just contain fresh pages but can have offsets into the bio_vecs. Restore support for tht in bio_copy_data_iter. Fixes: `f8b679a070` ("block: rewrite bio_copy_data_iter to use bvec_kmap_local and memcpy_to_bvec") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220524143919.1155501-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-27 20:35:55 -06:00
Christoph Hellwig	b9684a71fc	block, loop: support partitions without scanning Historically we did distinguish between a flag that surpressed partition scanning, and a combinations of the minors variable and another flag if any partitions were supported. This was generally confusing and doesn't make much sense, but some corner case uses of the loop driver actually do want to support manually added partitions on a device that does not actively scan for partitions. To make things worsee the loop driver also wants to dynamically toggle the scanning for partitions on a live gendisk, which makes the disk->flags updates non-atomic. Introduce a new GD_SUPPRESS_PART_SCAN bit in disk->state that disables just scanning for partitions, and toggle that instead of GENHD_FL_NO_PART in the loop driver. Fixes: `1ebe2e5f9d` ("block: remove GENHD_FL_EXT_DEVT") Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220527055806.1972352-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-27 06:39:00 -06:00
Tejun Heo	8a177a36da	blk-iolatency: Fix inflight count imbalances and IO hangs on offline iolatency needs to track the number of inflight IOs per cgroup. As this tracking can be expensive, it is disabled when no cgroup has iolatency configured for the device. To ensure that the inflight counters stay balanced, iolatency_set_limit() freezes the request_queue while manipulating the enabled counter, which ensures that no IO is in flight and thus all counters are zero. Unfortunately, iolatency_set_limit() isn't the only place where the enabled counter is manipulated. iolatency_pd_offline() can also dec the counter and trigger disabling. As this disabling happens without freezing the q, this can easily happen while some IOs are in flight and thus leak the counts. This can be easily demonstrated by turning on iolatency on an one empty cgroup while IOs are in flight in other cgroups and then removing the cgroup. Note that iolatency shouldn't have been enabled elsewhere in the system to ensure that removing the cgroup disables iolatency for the whole device. The following keeps flipping on and off iolatency on sda: echo +io > /sys/fs/cgroup/cgroup.subtree_control while true; do mkdir -p /sys/fs/cgroup/test echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency sleep 1 rmdir /sys/fs/cgroup/test sleep 1 done and there's concurrent fio generating direct rand reads: fio --name test --filename=/dev/sda --direct=1 --rw=randread \ --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k while monitoring with the following drgn script: while True: for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()): for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list): blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node') pd = blkg.pd[prog['blkcg_policy_iolatency'].plid] if pd.value_() == 0: continue iolat = container_of(pd, 'struct iolatency_grp', 'pd') inflight = iolat.rq_wait.inflight.counter.value_() if inflight: print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} ' f'{cgroup_path(css.cgroup).decode("utf-8")}') time.sleep(1) The monitoring output looks like the following: inflight=1 sda /user.slice inflight=1 sda /user.slice ... inflight=14 sda /user.slice inflight=13 sda /user.slice inflight=17 sda /user.slice inflight=15 sda /user.slice inflight=18 sda /user.slice inflight=17 sda /user.slice inflight=20 sda /user.slice inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19 inflight=19 sda /user.slice inflight=19 sda /user.slice If a cgroup with stuck inflight ends up getting throttled, the throttled IOs will never get issued as there's no completion event to wake it up leading to an indefinite hang. This patch fixes the bug by unifying enable handling into a work item which is automatically kicked off from iolatency_set_min_lat_nsec() which is called from both iolatency_set_limit() and iolatency_pd_offline() paths. Punting to a work item is necessary as iolatency_pd_offline() is called under spinlocks while freezing a request_queue requires a sleepable context. This also simplifies the code reducing LOC sans the comments and avoids the unnecessary freezes which were happening whenever a cgroup's latency target is newly set or cleared. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Liu Bo <bo.liu@linux.alibaba.com> Fixes: `8c772a9bfc` ("blk-iolatency: fix IO hang due to negative inflight counter") Cc: stable@vger.kernel.org # v5.0+ Link: https://lore.kernel.org/r/Yn9ScX6Nx2qIiQQi@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-26 11:43:00 -06:00
Linus Torvalds	fdaf9a5840	Page cache changes for 5.19 - Appoint myself page cache maintainer - Fix how scsicam uses the page cache - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS - Remove the AOP flags entirely - Remove pagecache_write_begin() and pagecache_write_end() - Documentation updates - Convert several address_space operations to use folios: - is_dirty_writeback - readpage becomes read_folio - releasepage becomes release_folio - freepage becomes free_folio - Change filler_t to require a struct file pointer be the first argument like ->read_folio -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmKNMDUACgkQDpNsjXcp gj4/mwf/bpHhXH4ZoNIvtUpTF6rZbqeffmc0VrbxCZDZ6igRnRPglxZ9H9v6L53O 7B0FBQIfxgNKHZpdqGdOkv8cjg/GMe/HJUbEy5wOakYPo4L9fZpHbDZ9HM2Eankj xBqLIBgBJ7doKr+Y62DAN19TVD8jfRfVtli5mqXJoNKf65J7BkxljoTH1L3EXD9d nhLAgyQjR67JQrT/39KMW+17GqLhGefLQ4YnAMONtB6TVwX/lZmigKpzVaCi4r26 bnk5vaR/3PdjtNxIoYvxdc71y2Eg05n2jEq9Wcy1AaDv/5vbyZUlZ2aBSaIVbtKX WfrhN9O3L0bU5qS7p9PoyfLc9wpq8A== =djLv -----END PGP SIGNATURE----- Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache Pull page cache updates from Matthew Wilcox: - Appoint myself page cache maintainer - Fix how scsicam uses the page cache - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS - Remove the AOP flags entirely - Remove pagecache_write_begin() and pagecache_write_end() - Documentation updates - Convert several address_space operations to use folios: - is_dirty_writeback - readpage becomes read_folio - releasepage becomes release_folio - freepage becomes free_folio - Change filler_t to require a struct file pointer be the first argument like ->read_folio * tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits) nilfs2: Fix some kernel-doc comments Appoint myself page cache maintainer fs: Remove aops->freepage secretmem: Convert to free_folio nfs: Convert to free_folio orangefs: Convert to free_folio fs: Add free_folio address space operation fs: Convert drop_buffers() to use a folio fs: Change try_to_free_buffers() to take a folio jbd2: Convert release_buffer_page() to use a folio jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio reiserfs: Convert release_buffer_page() to use a folio fs: Remove last vestiges of releasepage ubifs: Convert to release_folio reiserfs: Convert to release_folio orangefs: Convert to release_folio ocfs2: Convert to release_folio nilfs2: Remove comment about releasepage nfs: Convert to release_folio jfs: Convert to release_folio ...	2022-05-24 19:55:07 -07:00
Linus Torvalds	850f6033cd	Description for this pull request: - fix referencing wrong parent directory information during rename. - introduce a sys_tz mount option to use system timezone. - improve performance while zeroing a cluster with dirsync mount option. - fix slab-out-bounds in exat_clear_bitmap() reported from syzbot. -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmKLhPkWHGxpbmtpbmpl b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCOeXD/9nJofEx/n9KK0pA1WP2zKoLBnP YgLfnWTBlXfErInklW4kg057S6q8M0pDm0iASLw9P6GZNe0VFV5PigrTyAjy5ghW hm3JiFAHIZgaOlOk2NQd/1Qv/IdlnkbRkngXqHcizxEX/LcKZpP+pb1mdW6NzWrt /HagLClFTUhb0Su3DT7TCqiam5lI+lkarRI0Jo4Scstgsn4aT+25jk9N0bfUih7f hGKJpii+5UWCLlBJnyyghrBRQiiPdsETadJdRnHgeDdzKg/UNWxMP0C+G6PmSko/ mScrR+FeH2toURSUESi1Q558z1+3Fhb8rMbl3aWV70FJDmzMwn9YyhPgBrX3x6Gb AF7UBHFvORStYRUmmSMbX9XkY2gNoI9qZMXghDRlgF8t/WWY8VeVnyslaWwqDQhw qXyOIThiCuhLfKTD+r+MM08oUPcyFBtuGvdzDOH7/b56zEDwzab+hSHZz94xPWEz ESk0hNhaJCEvcgEr7IlSSF4k5Ff+hWVKN4R/DD78yxmjHveOuNMibTeE2YgDIFEX SiKFaAiXUuWGVsAPuAeJ+np/7rW3OEBG8yKhtri5vsoXk+Mqd56rp0EkEHzqbVHQ Ki5gua549KNRuxnbXRtWLjCKucwN86mE45WD0P0ORnBOjlfmgg8adp6BBSW5yVD2 SZkfgI0FL8rWdBwRcQ== =fr78 -----END PGP SIGNATURE----- Merge tag 'exfat-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - fix referencing wrong parent directory information during rename - introduce a sys_tz mount option to use system timezone - improve performance while zeroing a cluster with dirsync mount option - fix slab-out-bounds in exat_clear_bitmap() reported from syzbot * tag 'exfat-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: check if cluster num is valid exfat: reduce block requests when zeroing a cluster block: add sync_blockdev_range() exfat: introduce mount option 'sys_tz' exfat: fix referencing wrong parent directory information after renaming	2022-05-24 18:30:27 -07:00
Linus Torvalds	5dc921868c	for-5.19/drivers-2022-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKrTcQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgph/REAC0/7odRfJeTJ1PkJhSKFc7dhyS7rK4du2s 3+z+H6Yeua2yVIJb0mYYGEJcOUUQ9nD2T9424n3NzDOw88U4y8Vg2YEH+UiJBuj4 AJoxPNkQdxL7WzmwHmRNLCcOOFhISLqWiCJSr45d+LP1f6aO24Q9lewYWxtNA4TW mqb7Ne7e3Z77m9rmsCsZ26bzQHg1EEQ6qgjZM9tqMhOeTqYhmrqfrD9KtG8TIkpK N8277E5QcequHf7v6VpKqEOzf3d2kx55JaZdu+oxLPVMED3wJJFwcYF1/xmM7Fgx tp7xCjqqUHXwKvJNCFJpnvw+cXu0Ct7cWOIG4ROCvaTD4vBI1KzZLc0gO7pKFW0Y hNIlMXr4n8PmonS81tMV4TqmRWxedX/jxuaeJCVNr89PqYU4luPpigJZqv7rlGry KZUlktQot22M/7FC2MS6KhgbQKLPrRGTAEyY/JNwBHckCZiduWQFlmKLQ926xQIJ 6vdjSzHK5MrT/d+yow3bGFxAJWloGJ+L+RsH0b+WikF81+6ic9P3AoStgbVilfKD 6sbjcju8SShDlQ+W/Ocm0rHC+i/RDKT3QqItXgfhA/1FfMPODQGc/xcZg+AdTswn VSnUIkvk9/mTO0StilVfNJDfG1QkSpJ5Ilvs/DnIahZj6IG4QbJvtnVNbmQX6ptz AUB4DdGwXg== =geQL -----END PGP SIGNATURE----- Merge tag 'for-5.19/drivers-2022-05-22' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Here are the driver updates queued up for 5.19. This contains: - NVMe pull requests via Christoph: - tighten the PCI presence check (Stefan Roese) - fix a potential NULL pointer dereference in an error path (Kyle Miller Smith) - fix interpretation of the DMRSL field (Tom Yan) - relax the data transfer alignment (Keith Busch) - verbose error logging improvements (Max Gurtovoy, Chaitanya Kulkarni) - misc cleanups (Chaitanya Kulkarni, Christoph) - set non-mdts limits in nvme_scan_work (Chaitanya Kulkarni) - add support for TP4084 - Time-to-Ready Enhancements (Christoph) - MD pull request via Song: - Improve annotation in raid5 code, by Logan Gunthorpe - Support MD_BROKEN flag in raid-1/5/10, by Mariusz Tkaczyk - Other small fixes/cleanups - null_blk series making the configfs side much saner (Damien) - Various minor drbd cleanups and fixes (Haowen, Uladzislau, Jiapeng, Arnd, Cai) - Avoid using the system workqueue (and hence flushing it) in rnbd (Jack) - Avoid using the system workqueue (and hence flushing it) in aoe (Tetsuo) - Series fixing discard_alignment issues in drivers (Christoph) - Small series fixing drivers poking at disk->part0 for openers information (Christoph) - Series fixing deadlocks in loop (Christoph, Tetsuo) - Remove loop.h and add SPDX headers (Christoph) - Various fixes and cleanups (Julia, Xie, Yu)" * tag 'for-5.19/drivers-2022-05-22' of git://git.kernel.dk/linux-block: (72 commits) mtip32xx: fix typo in comment nvme: set non-mdts limits in nvme_scan_work nvme: add support for TP4084 - Time-to-Ready Enhancements nvme: split the enum used for various register constants nbd: Fix hung on disconnect request if socket is closed before nvme-fabrics: add a request timeout helper nvme-pci: harden drive presence detect in nvme_dev_disable() nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags nvme: mark internal passthru request RQF_QUIET nvme: remove unneeded include from constants file nvme: add missing status values to verbose logging nvme: set dma alignment to dword nvme: fix interpretation of DMRSL loop: remove most the top-of-file boilerplate comment from the UAPI header loop: remove most the top-of-file boilerplate comment loop: add a SPDX header loop: remove loop.h block: null_blk: Improve device creation with configfs block: null_blk: Cleanup messages block: null_blk: Cleanup device creation and deletion ...	2022-05-23 14:04:14 -07:00
Linus Torvalds	115cd47132	for-5.19/block-2022-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKrUsQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpgDjD/44hY9h0JsOLoRH1IvFtuaH6n718JXuqG17 hHCfmnAUVqj2jT00IUbVlUTd905bCGpfrodBL3PAmPev1zZHOUd/MnJKrSynJ+/s NJEMZQaHxLmocNDpJ1sZo7UbAFErsZXB0gVYUO8cH2bFYNu84H1mhRCOReYyqmvQ aIAASX5qRB/ciBQCivzAJl2jTdn4WOn5hWi9RLidQB7kSbaXGPmgKAuN88WI4H7A zQgAkEl2EEquyMI5tV1uquS7engJaC/4PsenF0S9iTyrhJLjneczJBJZKMLeMR8d sOm6sKJdpkrfYDyaA4PIkgmLoEGTtwGpqGHl4iXTyinUAxJoca5tmPvBb3wp66GE 2Mr7pumxc1yJID2VHbsERXlOAX3aZNCowx2gum2MTRIO8g11Eu3aaVn2kv37MBJ2 4R2a/cJFl5zj9M8536cG+Yqpy0DDVCCQKUIqEupgEu1dyfpznyWH5BTAHXi1E8td nxUin7uXdD0AJkaR0m04McjS/Bcmc1dc6I8xvkdUFYBqYCZWpKOTiEpIBlHg0XJA sxdngyz5lSYTGVA4o4QCrdR0Tx1n36A1IYFuQj0wzxBJYZ02jEZuII/A3dd+8hiv EY+VeUQeVIXFFuOcY+e0ScPpn7Nr17hAd1en/j2Hcoe4ZE8plqG2QTcnwgflcbis iomvJ4yk0Q== =0Rw1 -----END PGP SIGNATURE----- Merge tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "Here are the core block changes for 5.19. This contains: - blk-throttle accounting fix (Laibin) - Series removing redundant assignments (Michal) - Expose bio cache via the bio_set, so that DM can use it (Mike) - Finish off the bio allocation interface cleanups by dealing with the weirdest member of the family. bio_kmalloc combines a kmalloc for the bio and bio_vecs with a hidden bio_init call and magic cleanup semantics (Christoph) - Clean up the block layer API so that APIs consumed by file systems are (almost) only struct block_device based, so that file systems don't have to poke into block layer internals like the request_queue (Christoph) - Clean up the blk_execute_rq* API (Christoph) - Clean up various lose end in the blk-cgroup code to make it easier to follow in preparation of reworking the blkcg assignment for bios (Christoph) - Fix use-after-free issues in BFQ when processes with merged queues get moved to different cgroups (Jan) - BFQ fixes (Jan) - Various fixes and cleanups (Bart, Chengming, Fanjun, Julia, Ming, Wolfgang, me)" * tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block: (83 commits) blk-mq: fix typo in comment bfq: Remove bfq_requeue_request_body() bfq: Remove superfluous conversion from RQ_BIC() bfq: Allow current waker to defend against a tentative one bfq: Relax waker detection for shared queues blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE() blk-throttle: Set BIO_THROTTLED when bio has been throttled blk-cgroup: Remove unnecessary rcu_read_lock/unlock() blk-cgroup: always terminate io.stat lines block, bfq: make bfq_has_work() more accurate block, bfq: protect 'bfqd->queued' by 'bfqd->lock' block: cleanup the VM accounting in submit_bio block: Fix the bio.bi_opf comment block: reorder the REQ_ flags blk-iocost: combine local_stat and desc_stat to stat block: improve the error message from bio_check_eod block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone block: remove superfluous calls to blkcg_bio_issue_init kthread: unexport kthread_blkcg blk-cgroup: cleanup blkcg_maybe_throttle_current ...	2022-05-23 13:56:39 -07:00
Linus Torvalds	9836e93c0a	for-5.19/io_uring-passthrough-2022-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKovAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpv9oD/4qCs7k3bPZZWZ6xoWb4EObyyWOUifi26lp vpsJHFUbA67S/i4++LV9H18SazWJ7h08ac4bjgZ+NQz40/1WkTN8/Fa76jo+BnNK 7T10Wp4Ak6uwWVrKaA81pnT+G9+xmHlJ3X27aKxzLuT7BEPpShZ6ouFVjTkx9CzN LrLjuCDTOBBN+ZoaroWYfdLwTQX2VCAl9B15lOtQIlFvuuU8VlrvLboY+80K8TvY 1wvTA2HTjnXoYx+/cTTMIFZIwQH3r1hsbwEDD8/YJj1+ouhSRQ1b0p/nk2pA+3ws HF5r/YS/rLBjlPF094IzeOBaUyA433AN1VhZqnII8ek7ViT3W3x+BRrgE9O6ZkWT 0AjX1BXReI5rdFmxBmwsSdBnrSoGaJOf2GdsCCdubXBIi+F/RvyajrPf7PTB5zbW 9WEK/uy3xvZsRVkUGAzOb9QGdvjcllgMzwPJsDegDCw5PdcPdT3mzy6KGIWipFLp j8R+br7hRMpOJv/YpihJDMzSDkQ/r1/SCwR4fpLid/QdSHG/eRTQK6c4Su5bNYEy QDy2F6kQdBVtEJCQHcEOsbhXzSTNBcdB+ujUUM5653FkaHe6y4JbomLrsNx407Id i/4ROwA5K1dioJx503Eap+OhbI5rV+PFytJTwxvLrNyVGccwbH2YOVq80fsVBP2e cZbn6EX4Vg== =/peE -----END PGP SIGNATURE----- Merge tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block Pull io_uring NVMe command passthrough from Jens Axboe: "On top of everything else, this adds support for passthrough for io_uring. The initial feature for this is NVMe passthrough support, which allows non-filesystem based IO commands and admin commands. To support this, io_uring grows support for SQE and CQE members that are twice as big, allowing to pass in a full NVMe command without having to copy data around. And to complete with more than just a single 32-bit value as the output" * tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block: (22 commits) io_uring: cleanup handling of the two task_work lists nvme: enable uring-passthrough for admin commands nvme: helper for uring-passthrough checks blk-mq: fix passthrough plugging nvme: add vectored-io support for uring-cmd nvme: wire-up uring-cmd support for io-passthru on char-device. nvme: refactor nvme_submit_user_cmd() block: wire-up support for passthrough plugging fs,io_uring: add infrastructure for uring-cmd io_uring: support CQE32 for nop operation io_uring: enable CQE32 io_uring: support CQE32 in /proc info io_uring: add tracing for additional CQE32 fields io_uring: overflow processing for CQE32 io_uring: flush completions for CQE32 io_uring: modify io_get_cqe for CQE32 io_uring: add CQE32 completion processing io_uring: add CQE32 setup processing io_uring: change ring size calculation for CQE32 io_uring: store add. return values for CQE32 ...	2022-05-23 13:06:15 -07:00
Ming Lei	5d05426e2d	blk-mq: don't touch ->tagset in blk_mq_get_sq_hctx blk_mq_run_hw_queues() could be run when there isn't queued request and after queue is cleaned up, at that time tagset is freed, because tagset lifetime is covered by driver, and often freed after blk_cleanup_queue() returns. So don't touch ->tagset for figuring out current default hctx by the mapping built in request queue, so use-after-free on tagset can be avoided. Meantime this way should be fast than retrieving mapping from tagset. Cc: "yukuai (C)" <yukuai3@huawei.com> Cc: Jan Kara <jack@suse.cz> Fixes: `b6e68ee825` ("blk-mq: Improve performance of non-mq IO schedulers with multiple HW queues") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220522122350.743103-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-23 06:28:28 -06:00
Yuezhang Mo	97d6fb1b48	block: add sync_blockdev_range() sync_blockdev_range() is to support syncing multiple sectors with as few block device requests as possible, it is helpful to make the block device to give full play to its performance. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-05-23 11:17:30 +09:00
Julia Lawall	2aaf516084	blk-mq: fix typo in comment Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20220521111145.81697-29-Julia.Lawall@inria.fr Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-21 06:32:16 -06:00
Jan Kara	a249ca7dfb	bfq: Remove bfq_requeue_request_body() The function has only a single caller and two lines. Just remove it since it is pointless and just harming readability. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220519105235.31397-4-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-19 06:52:36 -06:00
Jan Kara	e79cf8892e	bfq: Remove superfluous conversion from RQ_BIC() We store struct bfq_io_cq pointer in rq->elv.priv[0] in bfq_init_rq(). Thus a call to icq_to_bic() in RQ_BIC() is wrong. Luckily it does no harm currently because struct io_iq is the first one in struct bfq_io_cq. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220519105235.31397-3-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-19 06:52:36 -06:00
Jan Kara	c5ac56bb61	bfq: Allow current waker to defend against a tentative one The code in bfq_check_waker() ignores wake up events from the current waker. This makes it more likely we select a new tentative waker although the current one is generating more wake up events. Treat current waker the same way as any other process and allow it to reset the waker detection logic. Fixes: `71217df39d` ("block, bfq: make waker-queue detection more robust") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220519105235.31397-2-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-19 06:52:36 -06:00
Jan Kara	f950667356	bfq: Relax waker detection for shared queues Currently we look for waker only if current queue has no requests. This makes sense for bfq queues with a single process however for shared queues when there is a larger number of processes the condition that queue has no requests is difficult to meet because often at least one process has some request in flight although all the others are waiting for the waker to do the work and this harms throughput. Relax the "no queued request for bfq queue" condition to "the current task has no queued requests yet". For this, we also need to start tracking number of requests in flight for each task. This patch (together with the following one) restores the performance for dbench with 128 clients that regressed with commit `c65e6fd460` ("bfq: Do not let waker requests skip proper accounting") because this commit makes requests of wakers properly enter BFQ queues and thus these queues become ineligible for the old waker detection logic. Dbench results: Vanilla 5.18-rc3 5.18-rc3 + revert 5.18-rc3 patched Mean 1237.36 ( 0.00%) 950.16 * 23.21%* 988.35 * 20.12%* Numbers are time to complete workload so lower is better. Fixes: `c65e6fd460` ("bfq: Do not let waker requests skip proper accounting") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220519105235.31397-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-19 06:52:33 -06:00
Jens Axboe	1305e2c9d9	blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE() A previous commit got rid of unnecessary rcu_read_lock() inside the IRQ disabling queue_lock, but this debug statement was left. It's now firing since we are indeed not inside a RCU read lock, but we don't need to be as we're still preempt safe. Get rid of the check, as we have a lockdep assert for holding the queue lock right after it anyway. Link: https://lore.kernel.org/linux-block/46253c48-81cb-0787-20ad-9133afdd9e21@samsung.com/ Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: `77c570a1ea` ("blk-cgroup: Remove unnecessary rcu_read_lock/unlock()") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 16:32:00 -06:00
Laibin Qiu	5a011f889b	blk-throttle: Set BIO_THROTTLED when bio has been throttled 1.In current process, all bio will set the BIO_THROTTLED flag after __blk_throtl_bio(). 2.If bio needs to be throttled, it will start the timer and stop submit bio directly. Bio will submit in blk_throtl_dispatch_work_fn() when the timer expires.But in the current process, if bio is throttled. The BIO_THROTTLED will be set to bio after timer start. If the bio has been completed, it may cause use-after-free blow. BUG: KASAN: use-after-free in blk_throtl_bio+0x12f0/0x2c70 Read of size 2 at addr ffff88801b8902d4 by task fio/26380 dump_stack+0x9b/0xce print_address_description.constprop.6+0x3e/0x60 kasan_report.cold.9+0x22/0x3a blk_throtl_bio+0x12f0/0x2c70 submit_bio_checks+0x701/0x1550 submit_bio_noacct+0x83/0xc80 submit_bio+0xa7/0x330 mpage_readahead+0x380/0x500 read_pages+0x1c1/0xbf0 page_cache_ra_unbounded+0x471/0x6f0 do_page_cache_ra+0xda/0x110 ondemand_readahead+0x442/0xae0 page_cache_async_ra+0x210/0x300 generic_file_buffered_read+0x4d9/0x2130 generic_file_read_iter+0x315/0x490 blkdev_read_iter+0x113/0x1b0 aio_read+0x2ad/0x450 io_submit_one+0xc8e/0x1d60 __se_sys_io_submit+0x125/0x350 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Allocated by task 26380: kasan_save_stack+0x19/0x40 __kasan_kmalloc.constprop.2+0xc1/0xd0 kmem_cache_alloc+0x146/0x440 mempool_alloc+0x125/0x2f0 bio_alloc_bioset+0x353/0x590 mpage_alloc+0x3b/0x240 do_mpage_readpage+0xddf/0x1ef0 mpage_readahead+0x264/0x500 read_pages+0x1c1/0xbf0 page_cache_ra_unbounded+0x471/0x6f0 do_page_cache_ra+0xda/0x110 ondemand_readahead+0x442/0xae0 page_cache_async_ra+0x210/0x300 generic_file_buffered_read+0x4d9/0x2130 generic_file_read_iter+0x315/0x490 blkdev_read_iter+0x113/0x1b0 aio_read+0x2ad/0x450 io_submit_one+0xc8e/0x1d60 __se_sys_io_submit+0x125/0x350 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 0: kasan_save_stack+0x19/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x1b/0x30 __kasan_slab_free+0x111/0x160 kmem_cache_free+0x94/0x460 mempool_free+0xd6/0x320 bio_free+0xe0/0x130 bio_put+0xab/0xe0 bio_endio+0x3a6/0x5d0 blk_update_request+0x590/0x1370 scsi_end_request+0x7d/0x400 scsi_io_completion+0x1aa/0xe50 scsi_softirq_done+0x11b/0x240 blk_mq_complete_request+0xd4/0x120 scsi_mq_done+0xf0/0x200 virtscsi_vq_done+0xbc/0x150 vring_interrupt+0x179/0x390 __handle_irq_event_percpu+0xf7/0x490 handle_irq_event_percpu+0x7b/0x160 handle_irq_event+0xcc/0x170 handle_edge_irq+0x215/0xb20 common_interrupt+0x60/0x120 asm_common_interrupt+0x1e/0x40 Fix this by move BIO_THROTTLED set into the queue_lock. Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220301123919.2381579-1-qiulaibin@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-17 19:32:10 -06:00
Fanjun Kong	77c570a1ea	blk-cgroup: Remove unnecessary rcu_read_lock/unlock() spin_lock_irq/spin_unlock_irq contains preempt_disable/enable(). Which can serve as RCU read-side critical region, so remove rcu_read_lock/unlock(). Signed-off-by: Fanjun Kong <bh1scw@gmail.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220516173930.159535-1-bh1scw@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-17 06:12:23 -06:00
Wolfgang Bumiller	3607849df4	blk-cgroup: always terminate io.stat lines With the removal of seq_get_buf in blkcg_print_one_stat, we cannot make adding the newline conditional on there being relevant stats because the name was already written out unconditionally. Otherwise we may end up with multiple device names in one line which is confusing and doesn't follow the nested-keyed file format. Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Fixes: `252c651a4c` ("blk-cgroup: stop using seq_get_buf") Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220111083159.42340-1-w.bumiller@proxmox.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-17 06:11:17 -06:00
Yu Kuai	ddc25c86b4	block, bfq: make bfq_has_work() more accurate bfq_has_work() is using busy_queues currently, which is not accurate because bfq_queue is busy doesn't represent that it has requests. Since bfqd aready has a counter 'queued' to record how many requests are in bfq, use it instead of busy_queues. Noted that bfq_has_work() can be called with 'bfqd->lock' held, thus the lock can't be held in bfq_has_work() to protect 'bfqd->queued'. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220513023507.2625717-3-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-16 11:39:20 -06:00
Yu Kuai	181490d532	block, bfq: protect 'bfqd->queued' by 'bfqd->lock' If bfq_schedule_dispatch() is called from bfq_idle_slice_timer_body(), then 'bfqd->queued' is read without holding 'bfqd->lock'. This is wrong since it can be wrote concurrently. Fix the problem by holding 'bfqd->lock' in such case. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220513023507.2625717-2-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-16 11:39:20 -06:00
Christoph Hellwig	a3e7689bfa	block: cleanup the VM accounting in submit_bio submit_bio uses some extremely convoluted checks and confusing comments to only account REQ_OP_READ/REQ_OP_WRITE comments. Just switch to the plain obvious checks instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220516063654.2782792-1-hch@lst.de [axboe: fixup WRITE -> REQ_OP_WRITE] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-16 11:37:50 -06:00
Bart Van Assche	725f22a147	block/mq-deadline: Set the fifo_time member also if inserting at head Before commit `322cff70d4` the fifo_time member of requests on a dispatch list was not used. Commit `322cff70d4` introduces code that reads the fifo_time member of requests on dispatch lists. Hence this patch that sets the fifo_time member when adding a request to a dispatch list. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Fixes: `322cff70d4` ("block/mq-deadline: Prioritize high-priority requests") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220513171307.32564-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 17:02:46 -06:00
Ming Lei	a327c341dc	blk-mq: fix passthrough plugging First we can't add request into plug list in blk_mq_request_bypass_insert which may be called when flushing plug list, so nested plug is caused. Second if polled passthrough request is inserted via blk_execute_rq(), it can't be added to plug list too since io polling needs the request to be issued to driver. Fixes the two by moving plugging into blk_execute_rq_no_wait(). Cc: Christoph Hellwig <hch@lst.de> Fixes: `1c2d2fff6d` ("block: wire-up support for passthrough plugging") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220512140010.1458645-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-12 08:45:39 -06:00
Chengming Zhou	2a371f7d5f	blk-iocost: combine local_stat and desc_stat to stat When we flush usage, wait, indebt stat in iocg_flush_stat(), we use local_stat and desc_stat, which has no point since the leaf iocg only has local_stat and the inner iocg only has desc_stat. Also we don't need to flush percpu abs_vusage for these inner iocgs. This patch combine local_stat and desc_stat to stat, only flush percpu abs_vusage for active leaf iocgs, then build inner walk list to propagate. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220510034757.21761-1-zhouchengming@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-11 17:05:52 -06:00
Jens Axboe	1c2d2fff6d	block: wire-up support for passthrough plugging Add support for plugging in passthrough path. When plugging is enabled, the requests are added to a plug instead of getting dispatched to the driver. And when the plug is finished, the whole batch gets dispatched via ->queue_rqs which turns out to be more efficient. Otherwise dispatching used to happen via ->queue_rq, one request at a time. Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220511054750.20432-3-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-11 07:40:58 -06:00
Matthew Wilcox (Oracle)	2c69e20579	fs: Convert block_read_full_page() to block_read_full_folio() This function is NOT converted to handle large folios, so include an assert that the filesystem isn't passing one in. Otherwise, use the folio functions instead of the page functions, where they exist. Convert all filesystems which use block_read_full_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-05-09 16:21:44 -04:00
Matthew Wilcox (Oracle)	9d6b0cd757	fs: Remove flags parameter from aops->write_begin There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-05-08 14:28:19 -04:00
Matthew Wilcox (Oracle)	b3992d1e2e	fs: Remove aop flags parameter from block_write_begin() There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-05-08 14:28:19 -04:00
Christoph Hellwig	069adbac2c	block: improve the error message from bio_check_eod Print the start sector and length separately instead of the combined value to help with debugging. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220504143355.568660-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-04 18:30:10 -06:00
Christoph Hellwig	7ecc56c62b	block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone Device mapper wants to allocate a bio before knowing the device it gets send to, so add explicit support for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20220504142950.567582-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-04 18:29:52 -06:00
Christoph Hellwig	513616843d	block: remove superfluous calls to blkcg_bio_issue_init blkcg_bio_issue_init is called in submit_bio. There is no need to have extra calls that just get overriden in __bio_clone and the two places that copy and pasted from it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20220504142950.567582-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-04 18:29:52 -06:00
Christoph Hellwig	82778259eb	blk-cgroup: cleanup blkcg_maybe_throttle_current Use blkcg_css instead of opencoding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Christoph Hellwig	d200ca143a	blk-cgroup: cleanup blk_cgroup_congested Use blkcg_css instead of open coding it, and switch to a slightly more natural for loop. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Christoph Hellwig	bc5fee91f2	blk-cgroup: move blkcg_css to blk-cgroup.c blkcg_css is only used in blk-cgroup.c, so move it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Christoph Hellwig	c97ab27157	blk-cgroup: remove unneeded includes from <linux/blk-cgroup.h> Remove all the includes that aren't actually needed from <linux/blk-cgroup.h> and push them to the actual source files where needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Christoph Hellwig	7f20ba7c42	blk-cgroup: remove pointless CONFIG_BLOCK ifdefs No need to make BLK_CGROUP stubs conditional on CONFIG_BLOCK as they can't be used without that. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Christoph Hellwig	bbb1ebe7a9	blk-cgroup: replace bio_blkcg with bio_blkcg_css All callers of bio_blkcg actually want the CSS, so replace it with an interface that does return the CSS. This now allows to move struct blkcg_gq to block/blk-cgroup.h instead of exposing it in a public header. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Christoph Hellwig	f4a6a61cb6	blktrace: cleanup the __trace_note_message interface Pass the cgroup_subsys_state instead of a the blkg so that blktrace doesn't need to poke into blk-cgroup internals, and give the name a blk prefix as the current name is way too generic for a public interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Christoph Hellwig	dec223c92a	blk-cgroup: move struct blkcg to block/blk-cgroup.h There is no real need to expose the blkcg structure to the whole kernel. Move it to the private header an expose a helper to let the writeback code access the cgwb_list member. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Christoph Hellwig	397c9f46ee	blk-cgroup: move blkcg_{pin,unpin}_online out of line Move these two functions out of line as there is no good reason to inline them. Also switch to passing a cgroup_subsys_state instead of doing the conversion in the caller to prepare for making the blkcg structure private to blk-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Christoph Hellwig	216889aad3	blk-cgroup: move blk_cgroup_congested out line There is no urgent need to inline this function, so move it out of line. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Christoph Hellwig	db05628435	blk-cgroup: move blkcg_{get,set}_fc_appid out of line No need to have these helpers inline. Also remove the stubs and just use an IS_ENABLED for the get side (the set side already is only built conditionlly). Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Christoph Hellwig	2524a5783e	blk-cgroup: remove __bio_blkcg Remove the unused and deprecated __bio_blkcg helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 14:06:20 -06:00
Ming Lei	9650b453a3	block: ignore RWF_HIPRI hint for sync dio So far bio is marked as REQ_POLLED if RWF_HIPRI/IOCB_HIPRI is passed from userspace sync io interface, then block layer tries to poll until the bio is completed. But the current implementation calls blk_io_schedule() if bio_poll() returns 0, and this way causes io hang or timeout easily. But looks no one reports this kind of issue, which should have been triggered in normal io poll sanity test or blktests block/007 as observed by Changhui, that means it is very likely that no one uses it or no one cares it. Also after io_uring is invented, io poll for sync dio becomes legacy interface. So ignore RWF_HIPRI hint for sync dio. CC: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Reported-by: Changhui Zhong <czhong@redhat.com> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Changhui Zhong <czhong@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220420143110.2679002-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 10:07:42 -06:00
Ming Lei	285d5731a0	Revert "block: release rq qos structures for queue without disk" This reverts commit `daaca3522a`. Commit `daaca3522a` ("block: release rq qos structures for queue without disk") is only needed for v5.15~v5.17, and isn't needed for v5.18, so revert it. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220426024936.3321341-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-02 10:06:40 -06:00
Jan Kara	09df6a75ff	bfq: Fix warning in bfqq_request_over_limit() People are occasionally reporting a warning bfqq_request_over_limit() triggering reporting that BFQ's idea of cgroup hierarchy (and its depth) does not match what generic blkcg code thinks. This can actually happen when bfqq gets moved between BFQ groups while bfqq_request_over_limit() is running. Make sure the code is safe against BFQ queue being moved to a different BFQ group. Fixes: `76f1df88bb` ("bfq: Limit number of requests consumed by each cgroup") CC: stable@vger.kernel.org Link: https://lore.kernel.org/all/CAJCQCtTw_2C7ZSz7as5Gvq=OmnDiio=HRkQekqWpKot84sQhFA@mail.gmail.com/ Reported-by: Chris Murphy <lists@colorremedies.com> Reported-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220407140738.9723-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-29 06:45:37 -06:00
Tejun Heo	4cddeacad6	Revert "block: inherit request start time from bio for BLK_CGROUP" This reverts commit `0006707723`. It has a couple problems: * bio_issue_time() is stored in bio->bi_issue truncated to 51 bits. This overflows in slightly over 26 days. Setting rq->io_start_time_ns with it means that io duration calculation would yield >26days after 26 days of uptime. This, for example, confuses kyber making it cause high IO latencies. * rq->io_start_time_ns should record the time that the IO is issued to the device so that on-device latency can be measured. However, bio_issue_time() is set before the bio goes through the rq-qos controllers (wbt, iolatency, iocost), so when the bio gets throttled in any of the mechanisms, the measured latencies make no sense - on-device latencies end up higher than request-alloc-to-completion latencies. We'll need a smarter way to avoid calling ktime_get_ns() repeatedly back-to-back. For now, let's revert the commit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v5.16+ Link: https://lore.kernel.org/r/YmmeOLfo5lzc+8yI@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-27 13:51:28 -06:00
Tejun Heo	8c936f9ea1	iocost: don't reset the inuse weight of under-weighted debtors When an iocg is in debt, its inuse weight is owned by debt handling and should stay at 1. This invariant was broken when determining the amount of surpluses at the beginning of donation calculation - when an iocg's hierarchical weight is too low, the iocg is excluded from donation calculation and its inuse is reset to its active regardless of its indebtedness, triggering warnings like the following: WARNING: CPU: 5 PID: 0 at block/blk-iocost.c:1416 iocg_kick_waitq+0x392/0x3a0 ... RIP: 0010:iocg_kick_waitq+0x392/0x3a0 Code: 00 00 be ff ff ff ff 48 89 4d a8 e8 98 b2 70 00 48 8b 4d a8 85 c0 0f 85 4a fe ff ff 0f 0b e9 43 fe ff ff 0f 0b e9 4d fe ff ff <0f> 0b e9 50 fe ff ff e8 a2 ae 70 00 66 90 0f 1f 44 00 00 55 48 89 RSP: 0018:ffffc90000200d08 EFLAGS: 00010016 ... <IRQ> ioc_timer_fn+0x2e0/0x1470 call_timer_fn+0xa1/0x2c0 ... As this happens only when an iocg's hierarchical weight is negligible, its impact likely is limited to triggering the warnings. Fix it by skipping resetting inuse of under-weighted debtors. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rik van Riel <riel@surriel.com> Fixes: `c421a3eb2e` ("blk-iocost: revamp debt handling") Cc: stable@vger.kernel.org # v5.10+ Link: https://lore.kernel.org/r/YmjODd4aif9BzFuO@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-27 08:41:11 -06:00
Michal Orzel	e233fe1aa0	block/partitions/ldm: Remove redundant assignments Get rid of the following redundant assignments: - to a variable r_cols from function ldm_parse_cmp3 - to variables r_id1 and r_id2 from functions ldm_parse_dgr3 and ldm_parse_dgr4 - to a variable r_index from function ldm_parse_prt3 that end up in values not being read until the end of function. Reported by clang-tidy [deadcode.DeadStores] Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com> Link: https://lore.kernel.org/r/20220423113811.13335-5-michalorzel.eng@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-23 07:15:26 -06:00
Michal Orzel	87420fa94f	block/partitions/atari: Remove redundant assignment Get rid of redundant assignment to a variable part_fmt from function atari_partition. It is being assigned a value that is never read until the end of function. Reported by clang-tidy [deadcode.DeadStores] Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com> Link: https://lore.kernel.org/r/20220423113811.13335-4-michalorzel.eng@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-23 07:15:26 -06:00
Michal Orzel	834726828b	block/partitions/acorn: Remove redundant assignments Get rid of redundant assignments to a variable slot from function adfspart_check_ADFS. It is being assigned a value that is never read as we are breaking out from the switch case and the function ends. Reported by clang-tidy [deadcode.DeadStores] Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com> Link: https://lore.kernel.org/r/20220423113811.13335-3-michalorzel.eng@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-23 07:15:26 -06:00
Michal Orzel	7ab89db979	block/blk-map: Remove redundant assignment Get rid of redundant assignment to a variable ret from function bio_map_user_iov as it is being assigned a value that is never read. It is being re-assigned in the first instruction after the while loop Reported by clang-tidy [deadcode.DeadStores] Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com> Link: https://lore.kernel.org/r/20220423113811.13335-2-michalorzel.eng@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-23 07:15:26 -06:00
Michal Orzel	3de2e5f28c	block/badblocks: Remove redundant assignments Get rid of redundant assignments to a variable sectors from functions badblocks_check and badblocks_clear. This variable, that is a function parameter, is being assigned a value that is never read until the end of function. Reported by clang-tidy [deadcode.DeadStores] Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com> Link: https://lore.kernel.org/r/20220423113811.13335-1-michalorzel.eng@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-23 07:15:26 -06:00
Christoph Hellwig	9acf381f3e	block: turn bdev->bd_openers into an atomic_t All manipulation of bd_openers is under disk->open_mutex and will remain so for the foreseeable future. But at least one place reads it without the lock (blkdev_get) and there are more to be added. So make sure the compiler does not do turn the increments and decrements into non-atomic sequences by using an atomic_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220330052917.2566582-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-18 06:54:09 -06:00
Ming Lei	5f0614a55e	block: change exported IO accounting interface from gendisk to bdev Export IO accounting interfaces in terms of block_device now that gendisk has become more internal to block core. Rename __part_{start,end}_io_acct's first argument from part to bdev. Rename __part_{start,end}_io_acct to bdev_{start,end}_io_acct and export them. Remove disk_{start,end}_io_acct and update caller (zram) to use bdev_{start,end}_io_acct. DM can now be updated to use bdev_{start,end}_io_acct. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20220418022733.56168-2-snitzer@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-18 06:49:52 -06:00
Christoph Hellwig	44abff2c0b	block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2] Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	7b47ef52d0	block: add a bdev_discard_granularity helper Abstract away implementation details from file systems by providing a block_device based helper to retrieve the discard granularity. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Link: https://lore.kernel.org/r/20220415045258.199825-26-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	70200574cc	block: remove QUEUE_FLAG_DISCARD Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	e3cc28ea28	block: refactor discard bio size limiting Move all the logic to limit the discard bio size into a common helper so that it is better documented. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-23-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	5c4b4a5c6f	block: move {bdev,queue_limit}_discard_alignment out of line No need to inline these fairly larger helpers. Also fix the return value to be unsigned, just like the field in struct queue_limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220415045258.199825-22-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	f0f975a4dd	block: use bdev_discard_alignment in part_discard_alignment_show Use the bdev based alignment helper instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220415045258.199825-21-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	4e1462ffe8	block: remove queue_discard_alignment Just use bdev_alignment_offset in disk_discard_alignment_show instead. That helpers is the same except for an always false branch that doesn't matter in this slow path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220415045258.199825-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	89098b075c	block: move bdev_alignment_offset and queue_limit_alignment_offset out of line No need to inline these fairly larger helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220415045258.199825-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	640f2a2391	block: use bdev_alignment_offset in disk_alignment_offset_show This does the same as the open coded variant except for an extra branch, and allows to remove queue_alignment_offset entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220415045258.199825-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	64dcc7c271	block: use bdev_alignment_offset in part_alignment_offset_show Replace the open coded offset calculation with the proper helper. This is an ABI change in that the -1 for a misaligned partition is properly propagated, which can be considered a bug fix and matches what is done on the whole device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	10f0d2a517	block: add a bdev_nonrot helper Add a helper to check the nonrot flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Jan Kara	075a53b78b	bfq: Make sure bfqg for which we are queueing requests is online Bios queued into BFQ IO scheduler can be associated with a cgroup that was already offlined. This may then cause insertion of this bfq_group into a service tree. But this bfq_group will get freed as soon as last bio associated with it is completed leading to use after free issues for service tree users. Fix the problem by making sure we always operate on online bfq_group. If the bfq_group associated with the bio is not online, we pick the first online parent. CC: stable@vger.kernel.org Fixes: `e21b7a0b98` ("block, bfq: add full hierarchical scheduling and cgroups support") Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:34:32 -06:00
Jan Kara	4e54a2493e	bfq: Get rid of __bio_blkcg() usage BFQ usage of __bio_blkcg() is a relict from the past. Furthermore if bio would not be associated with any blkcg, the usage of __bio_blkcg() in BFQ is prone to races with the task being migrated between cgroups as __bio_blkcg() calls at different places could return different blkcgs. Convert BFQ to the new situation where bio->bi_blkg is initialized in bio_set_dev() and thus practically always valid. This allows us to save blkcg_gq lookup and noticeably simplify the code. CC: stable@vger.kernel.org Fixes: `0fe061b9f0` ("blkcg: fix ref count issue with bio_blkcg() using task_css") Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-8-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:34:29 -06:00
Jan Kara	09f8718680	bfq: Track whether bfq_group is still online Track whether bfq_group is still online. We cannot rely on blkcg_gq->online because that gets cleared only after all policies are offlined and we need something that gets updated already under bfqd->lock when we are cleaning up our bfq_group to be able to guarantee that when we see online bfq_group, it will stay online while we are holding bfqd->lock lock. CC: stable@vger.kernel.org Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-7-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:34:27 -06:00
Jan Kara	5f550ede5e	bfq: Remove pointless bfq_init_rq() calls We call bfq_init_rq() from request merging functions where requests we get should have already gone through bfq_init_rq() during insert and anyway we want to do anything only if the request is already tracked by BFQ. So replace calls to bfq_init_rq() with RQ_BFQQ() instead to simply skip requests untracked by BFQ. We move bfq_init_rq() call in bfq_insert_request() a bit earlier to cover request merging and thus can transfer FIFO position in case of a merge. CC: stable@vger.kernel.org Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-6-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:34:25 -06:00
Jan Kara	fc84e1f941	bfq: Drop pointless unlock-lock pair In bfq_insert_request() we unlock bfqd->lock only to call trace_block_rq_insert() and then lock bfqd->lock again. This is really pointless since tracing is disabled if we really care about performance and even if the tracepoint is enabled, it is a quick call. CC: stable@vger.kernel.org Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-5-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:34:22 -06:00
Jan Kara	ea591cd4eb	bfq: Update cgroup information before merging bio When the process is migrated to a different cgroup (or in case of writeback just starts submitting bios associated with a different cgroup) bfq_merge_bio() can operate with stale cgroup information in bic. Thus the bio can be merged to a request from a different cgroup or it can result in merging of bfqqs for different cgroups or bfqqs of already dead cgroups and causing possible use-after-free issues. Fix the problem by updating cgroup information in bfq_merge_bio(). CC: stable@vger.kernel.org Fixes: `e21b7a0b98` ("block, bfq: add full hierarchical scheduling and cgroups support") Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-4-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:34:20 -06:00
Jan Kara	3bc5e683c6	bfq: Split shared queues on move between cgroups When bfqq is shared by multiple processes it can happen that one of the processes gets moved to a different cgroup (or just starts submitting IO for different cgroup). In case that happens we need to split the merged bfqq as otherwise we will have IO for multiple cgroups in one bfqq and we will just account IO time to wrong entities etc. Similarly if the bfqq is scheduled to merge with another bfqq but the merge didn't happen yet, cancel the merge as it need not be valid anymore. CC: stable@vger.kernel.org Fixes: `e21b7a0b98` ("block, bfq: add full hierarchical scheduling and cgroups support") Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-3-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:34:16 -06:00

1 2 3 4 5 ...

6407 commits