linux-stable/block
Damien Le Moal 9148c0b0cf block: mq-deadline: Do not break sequential write streams to zoned HDDs
commit 015d02f485 upstream.

mq-deadline ensures an in order dispatching of write requests to zoned
block devices using a per zone lock (a bit). This implies that for any
purely sequential write workload, the drive is exercised most of the
time at a maximum queue depth of one.

However, when such sequential write workload crosses a zone boundary
(when sequentially writing multiple contiguous zones), zone write
locking may prevent the last write to one zone to be issued (as the
previous write is still being executed) but allow the first write to the
following zone to be issued (as that zone is not yet being writen and
not locked). This result in an out of order delivery of the sequential
write commands to the device every time a zone boundary is crossed.

While such behavior does not break the sequential write constraint of
zoned block devices (and does not generate any write error), some zoned
hard-disks react badly to seeing these out of order writes, resulting in
lower write throughput.

This problem can be addressed by always dispatching the first request
of a stream of sequential write requests, regardless of the zones
targeted by these sequential writes. To do so, the function
deadline_skip_seq_writes() is introduced and used in
deadline_next_request() to select the next write command to issue if the
target device is an HDD (blk_queue_nonrot() being false).
deadline_fifo_request() is modified using the new
deadline_earlier_request() and deadline_is_seq_write() helpers to ignore
requests in the fifo list that have a preceding request in lba order
that is sequential.

With this fix, a sequential write workload executed with the following
fio command:

fio  --name=seq-write --filename=/dev/sda --zonemode=zbd --direct=1 \
     --size=68719476736  --ioengine=libaio --iodepth=32 --rw=write \
     --bs=65536

results in an increase from 225 MB/s to 250 MB/s of the write throughput
of an SMR HDD (11% increase).

Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20221124021208.242541-3-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-07 11:15:54 +01:00
..
partitions block: don't add partitions if GD_SUPPRESS_PART_SCAN is set 2022-09-03 11:29:03 -06:00
Kconfig block: remove "select BLK_RQ_IO_DATA_LEN" from BLK_CGROUP_IOCOST dependency 2022-06-29 08:35:57 -06:00
Kconfig.iosched block: only build the icq tracking code when needed 2021-12-16 10:59:02 -07:00
Makefile blk-cgroup: move blkcg_{get,set}_fc_appid out of line 2022-05-02 14:06:20 -06:00
badblocks.c block/badblocks: Remove redundant assignments 2022-04-23 07:15:26 -06:00
bdev.c block: stop using bdevname in bdev_write_inode 2022-07-14 10:27:56 -06:00
bfq-cgroup.c block, bfq: fix null pointer dereference in bfq_bio_bfqg() 2022-12-02 17:43:01 +01:00
bfq-iosched.c block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq 2023-01-04 11:26:24 +01:00
bfq-iosched.h block/bfq: Use the new blk_opf_t type 2022-07-14 12:14:30 -06:00
bfq-wf2q.c block: bfq: Fix kernel-doc headers 2022-06-27 06:29:12 -06:00
bio-integrity.c block: pass struct queue_limits to the bio splitting helpers 2022-08-02 21:08:53 -06:00
bio.c bio: safeguard REQ_ALLOC_CACHE bio put 2022-11-10 18:17:26 +01:00
blk-cgroup-fc-appid.c blk-cgroup: move blkcg_{get,set}_fc_appid out of line 2022-05-02 14:06:20 -06:00
blk-cgroup-rwstat.c blk-cgroup: Fix the recursive blkg rwstat 2021-03-05 11:32:15 -07:00
blk-cgroup-rwstat.h block: Use the new blk_opf_t type 2022-07-14 12:14:30 -06:00
blk-cgroup.c blk-iolatency: Fix memory leak on add_disk() failures 2023-01-04 11:26:23 +01:00
blk-cgroup.h blk-cgroup: pass a gendisk to blkcg_init_queue and blkcg_exit_queue 2023-01-04 11:26:22 +01:00
blk-core.c block: make dma_alignment a stacking queue_limit 2022-11-26 09:27:40 +01:00
blk-crypto-fallback.c block: remove superfluous calls to blkcg_bio_issue_init 2022-05-04 18:29:52 -06:00
blk-crypto-internal.h blk-crypto: show crypto capabilities in sysfs 2022-02-28 06:40:23 -07:00
blk-crypto-profile.c blk-crypto: remove blk_crypto_unregister() 2021-11-29 06:38:51 -07:00
blk-crypto-sysfs.c blk-crypto: show crypto capabilities in sysfs 2022-02-28 06:40:23 -07:00
blk-crypto.c blk-crypto: show crypto capabilities in sysfs 2022-02-28 06:40:23 -07:00
blk-flush.c block: Use the new blk_opf_t type 2022-07-14 12:14:30 -06:00
blk-ia-ranges.c block: simplify disk_set_independent_access_ranges 2022-06-29 08:36:46 -06:00
blk-integrity.c blk-crypto: remove blk_crypto_unregister() 2021-11-29 06:38:51 -07:00
blk-ioc.c block: fix default IO priority handling again 2022-06-27 06:29:12 -06:00
blk-iocost.c block: don't allow the same type rq_qos add more than once 2022-07-20 06:44:14 -06:00
blk-iolatency.c block: don't allow the same type rq_qos add more than once 2022-07-20 06:44:14 -06:00
blk-ioprio.c blk-ioprio: Convert from rqos policy to direct call 2022-06-27 06:29:12 -06:00
blk-ioprio.h blk-ioprio: Convert from rqos policy to direct call 2022-06-27 06:29:12 -06:00
blk-lib.c blk-lib: fix blkdev_issue_secure_erase 2022-09-15 00:25:17 -06:00
blk-map.c block: convert to advancing variants of iov_iter_get_pages{,_alloc}() 2022-08-08 22:37:22 -04:00
blk-merge.c block: pass struct queue_limits to the bio splitting helpers 2022-08-02 21:08:53 -06:00
blk-mq-cpumap.c blk-mq: remove the calling of local_memory_node() 2020-10-20 07:08:17 -06:00
blk-mq-debugfs-zoned.c block: move zone related fields to struct gendisk 2022-07-06 06:46:26 -06:00
blk-mq-debugfs.c block: add missing request flags to debugfs code 2022-09-09 05:57:52 -06:00
blk-mq-debugfs.h block: remove per-disk debugfs files in blk_unregister_queue 2022-06-17 07:31:05 -06:00
blk-mq-pci.c
blk-mq-rdma.c
blk-mq-sched.c block: serialize all debugfs operations using q->debugfs_mutex 2022-06-17 07:31:05 -06:00
blk-mq-sched.h block: move blk_mq_sched_assign_ioc to blk-ioc.c 2021-11-29 06:41:29 -07:00
blk-mq-sysfs.c blk-mq: fix possible memleak when register 'hctx' failed 2022-12-31 13:26:45 +01:00
blk-mq-tag.c blk-mq: Drop local variable for reserved tag 2022-07-06 06:33:53 -06:00
blk-mq-tag.h blk-mq: blk_mq_tag_busy is no need to return a value 2022-06-27 06:29:12 -06:00
blk-mq-virtio.c
blk-mq.c blk-mq: avoid double ->queue_rq() because of early timeout 2022-12-31 13:26:42 +01:00
blk-mq.h block: Use the new blk_opf_t type 2022-07-14 12:14:30 -06:00
blk-pm.c scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume() 2021-12-22 23:38:29 -05:00
blk-pm.h block: Remove unused blk_pm_*() function definitions 2021-02-22 06:33:48 -07:00
blk-rq-qos.c block/rq_qos: Use atomic_try_cmpxchg in atomic_inc_below 2022-07-12 14:38:52 -06:00
blk-rq-qos.h block: don't allow the same type rq_qos add more than once 2022-07-20 06:44:14 -06:00
blk-settings.c block: make blk_set_default_limits() private 2022-12-02 17:43:16 +01:00
blk-stat.c block: make queue stat accounting a reference 2021-12-14 17:23:05 -07:00
blk-stat.h block: make queue stat accounting a reference 2021-12-14 17:23:05 -07:00
blk-sysfs.c block: move ->bio_split to the gendisk 2022-08-02 21:08:49 -06:00
blk-throttle.c blk-throttle: pass a gendisk to blk_throtl_init and blk_throtl_exit 2023-01-04 11:26:22 +01:00
blk-throttle.h blk-throttle: pass a gendisk to blk_throtl_init and blk_throtl_exit 2023-01-04 11:26:22 +01:00
blk-timeout.c block: blk-timeout: delete duplicated word 2020-07-31 16:29:47 -06:00
blk-wbt.c blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init() 2022-10-21 12:39:28 +02:00
blk-wbt.h blk-wbt: remove wbt_track stub 2022-03-31 12:58:38 -06:00
blk-zoned.c treewide: Rename enum req_opf into enum req_op 2022-07-14 12:14:30 -06:00
blk.h block: Do not reread partition table on exclusively open device 2023-01-04 11:26:31 +01:00
bounce.c block: change the blk_queue_bounce calling convention 2022-08-02 17:22:54 -06:00
bsg-lib.c blk-mq: Drop blk_mq_ops.timeout 'reserved' arg 2022-07-06 06:33:53 -06:00
bsg.c scsi: core: bsg: Remove usage of the deprecated ida_simple_xxx() API 2022-06-21 21:22:51 -04:00
disk-events.c block: remove genhd.h 2022-02-02 07:49:59 -07:00
elevator.c blk-mq: use quiesced elevator switch when reinitializing queues 2022-10-21 12:39:25 +02:00
elevator.h block: Use the new blk_opf_t type 2022-07-14 12:14:30 -06:00
fops.c new iov_iter flavour - ITER_UBUF 2022-08-08 22:37:15 -04:00
genhd.c block: Do not reread partition table on exclusively open device 2023-01-04 11:26:31 +01:00
holder.c block: remove WARN_ON() from bd_link_disk_holder 2022-06-23 07:48:05 -06:00
ioctl.c block: Do not reread partition table on exclusively open device 2023-01-04 11:26:31 +01:00
ioprio.c block: Fix handling of tasks without ioprio in ioprio_get(2) 2022-06-27 06:29:12 -06:00
kyber-iosched.c block/kyber: Use the new blk_opf_t type 2022-07-14 12:14:30 -06:00
mq-deadline.c block: mq-deadline: Do not break sequential write streams to zoned HDDs 2023-01-07 11:15:54 +01:00
opal_proto.h
sed-opal.c block: sed-opal: kmalloc the cmd/resp buffers 2022-11-26 09:27:29 +01:00
t10-pi.c block: add pi for extended integrity 2022-03-07 12:48:35 -07:00