linux-stable/fs/btrfs
Chris Mason c4b460b5fa btrfs: only subtract from len_to_oe_boundary when it is tracking an extent
commit 09c3717c3a upstream.

bio_ctrl->len_to_oe_boundary is used to make sure we stay inside a zone
as we submit bios for writes.  Every time we add a page to the bio, we
decrement those bytes from len_to_oe_boundary, and then we submit the
bio if we happen to hit zero.

Most of the time, len_to_oe_boundary gets set to U32_MAX.
submit_extent_page() adds pages into our bio, and the size of the bio
ends up limited by:

- Are we contiguous on disk?
- Does bio_add_page() allow us to stuff more in?
- is len_to_oe_boundary > 0?

The len_to_oe_boundary math starts with U32_MAX, which isn't page or
sector aligned, and subtracts from it until it hits zero.  In the
non-zoned case, the last IO we submit before we hit zero is going to be
unaligned, triggering BUGs.

This is hard to trigger because bio_add_page() isn't going to make a bio
of U32_MAX size unless you give it a perfect set of pages and fully
contiguous extents on disk.  We can hit it pretty reliably while making
large swapfiles during provisioning because the machine is freshly
booted, mostly idle, and the disk is freshly formatted.  It's also
possible to trigger with reads when read_ahead_kb is set to 4GB.

The code has been clean up and shifted around a few times, but this flaw
has been lurking since the counter was added.  I think the commit
24e6c80822 ("btrfs: simplify main loop in submit_extent_page") ended
up exposing the bug.

The fix used here is to skip doing math on len_to_oe_boundary unless
we've changed it from the default U32_MAX value.  bio_add_page() is the
real limit we want, and there's no reason to do extra math when block
layer is doing it for us.

Sample reproducer, note you'll need to change the path to the bdi and
device:

  SUBVOL=/btrfs/swapvol
  SWAPFILE=$SUBVOL/swapfile
  SZMB=8192

  mkfs.btrfs -f /dev/vdb
  mount /dev/vdb /btrfs

  btrfs subvol create $SUBVOL
  chattr +C $SUBVOL
  dd if=/dev/zero of=$SWAPFILE bs=1M count=$SZMB
  sync

  echo 4 > /proc/sys/vm/drop_caches

  echo 4194304 > /sys/class/bdi/btrfs-2/read_ahead_kb

  while true; do
	  echo 1 > /proc/sys/vm/drop_caches
	  echo 1 > /proc/sys/vm/drop_caches
	  dd of=/dev/zero if=$SWAPFILE bs=4096M count=2 iflag=fullblock
  done

Fixes: 24e6c80822 ("btrfs: simplify main loop in submit_extent_page")
CC: stable@vger.kernel.org # 6.4+
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-23 17:32:39 +02:00
..
tests btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN 2023-04-17 18:01:14 +02:00
Kconfig block: make blkcg_punt_bio_submit optional 2023-04-17 18:01:22 +02:00
Makefile btrfs: send: genericize the backref cache to allow it to be reused 2023-02-13 17:50:35 +01:00
accessors.c btrfs: add eb to btrfs_node_key_ptr_offset 2022-12-05 18:00:58 +01:00
accessors.h btrfs: add stack helpers for a few btrfs items 2022-12-05 18:00:58 +01:00
acl.c fs: port acl to mnt_idmap 2023-01-19 09:24:28 +01:00
acl.h fs: port ->set_acl() to pass mnt_idmap 2023-01-19 09:24:27 +01:00
async-thread.c
async-thread.h
backref.c btrfs: fix backref walking not returning all inode refs 2023-05-09 22:09:11 +02:00
backref.h btrfs: fix backref walking not returning all inode refs 2023-05-09 22:09:11 +02:00
bio.c btrfs: fix file_offset for REQ_BTRFS_ONE_ORDERED bios that get split 2023-07-19 16:35:13 +02:00
bio.h btrfs: introduce a new helper to submit write bio for repair 2023-04-17 18:01:23 +02:00
block-group.c btrfs: fix use-after-free of new block group that became unused 2023-08-23 17:32:35 +02:00
block-group.h btrfs: fix use-after-free of new block group that became unused 2023-08-23 17:32:35 +02:00
block-rsv.c btrfs: account block group tree when calculating global reserve size 2023-08-03 10:26:07 +02:00
block-rsv.h btrfs: simplify variables in btrfs_block_rsv_refill() 2023-04-17 18:01:19 +02:00
btrfs_inode.h btrfs: avoid iterating over all indexes when logging directory 2023-04-17 19:52:19 +02:00
check-integrity.c btrfs: use btrfs_dev_name() helper to handle missing devices better 2022-12-05 18:00:57 +01:00
check-integrity.h
compression.c btrfs: introduce btrfs_bio::fs_info member 2023-04-17 18:01:23 +02:00
compression.h btrfs: move kthread_associate_blkcg out of btrfs_submit_compressed_write 2023-04-17 18:01:22 +02:00
ctree.c btrfs: abort transaction at update_ref_for_cow() when ref count is zero 2023-07-27 08:56:46 +02:00
ctree.h btrfs: fix infinite directory reads 2023-08-23 17:32:38 +02:00
defrag.c btrfs: remove the wait argument to btrfs_start_ordered_extent 2023-02-13 17:50:34 +01:00
defrag.h btrfs: move defrag related prototypes to their own header 2022-12-05 18:00:46 +01:00
delalloc-space.c btrfs: count extents before taking inode's spinlock when reserving metadata 2023-04-17 18:01:19 +02:00
delalloc-space.h btrfs: move delalloc space related prototypes to delalloc-space.h 2022-12-05 18:00:44 +01:00
delayed-inode.c btrfs: fix infinite directory reads 2023-08-23 17:32:38 +02:00
delayed-inode.h btrfs: fix infinite directory reads 2023-08-23 17:32:38 +02:00
delayed-ref.c btrfs: add helper to calculate space for delayed references 2023-04-17 18:01:19 +02:00
delayed-ref.h btrfs: add helper to calculate space for delayed references 2023-04-17 18:01:19 +02:00
dev-replace.c btrfs: use btrfs_dev_name() helper to handle missing devices better 2022-12-05 18:00:57 +01:00
dev-replace.h btrfs: move dev-replace prototypes into dev-replace.h 2022-12-05 18:00:47 +01:00
dir-item.c btrfs: move dir-item prototypes into dir-item.h 2022-12-05 18:00:46 +01:00
dir-item.h btrfs: move dir-item prototypes into dir-item.h 2022-12-05 18:00:46 +01:00
discard.c btrfs: reinterpret async discard iops_limit=0 as no delay 2023-04-21 00:28:23 +02:00
discard.h
disk-io.c btrfs: reject invalid reloc tree root keys with stack dump 2023-08-16 18:32:30 +02:00
disk-io.h btrfs: rename btrfs_clean_tree_block to btrfs_clear_buffer_dirty 2023-02-15 19:38:53 +01:00
export.c btrfs: move super_block specific helpers into super.h 2022-12-05 18:00:47 +01:00
export.h btrfs: simplify generation check in btrfs_get_dentry 2022-12-05 18:00:41 +01:00
extent-io-tree.c btrfs: fix spelling mistakes found using codespell 2023-02-15 19:38:50 +01:00
extent-io-tree.h btrfs: remove the io_failure_record infrastructure 2023-02-15 19:38:51 +01:00
extent-tree.c btrfs: set cache_block_group_error if we find an error 2023-08-16 18:32:30 +02:00
extent-tree.h btrfs: introduce size class to block group allocator 2023-02-13 17:50:34 +01:00
extent_io.c btrfs: only subtract from len_to_oe_boundary when it is tracking an extent 2023-08-23 17:32:39 +02:00
extent_io.h btrfs: fix dirty_metadata_bytes for redirtied buffers 2023-07-19 16:36:56 +02:00
extent_map.c btrfs: fix incorrect splitting in btrfs_drop_extent_map_range 2023-08-23 17:32:38 +02:00
extent_map.h btrfs: remove no longer used btrfs_next_extent_map() 2022-12-05 18:00:56 +01:00
file-item.c btrfs: handle memory allocation failure in btrfs_csum_one_bio 2023-05-17 13:08:28 +02:00
file-item.h btrfs: scrub: introduce helper to find and fill sector info for a scrub_stripe 2023-04-17 18:01:23 +02:00
file.c iov_iter: add iter_iovec() helper 2023-03-30 08:12:29 -06:00
file.h btrfs: use cached state when looking for delalloc ranges with fiemap 2022-12-05 18:00:56 +01:00
free-space-cache.c btrfs: fix space cache inconsistency after error loading it from disk 2023-05-09 22:08:05 +02:00
free-space-cache.h btrfs: convert discard stat defs to enum 2022-12-05 18:00:45 +01:00
free-space-tree.c btrfs: remove BUG_ON()'s in add_new_free_space() 2023-08-11 12:14:25 +02:00
free-space-tree.h btrfs: make clear_cache mount option to rebuild FST without disabling it 2023-05-10 14:51:27 +02:00
fs.c btrfs: sysfs: update fs features directory asynchronously 2023-02-13 17:50:35 +01:00
fs.h btrfs: scrub: remove scrub_parity structure 2023-04-17 18:01:24 +02:00
inode-item.c btrfs: remove obsolete delayed ref throttling logic when truncating items 2023-04-17 18:01:19 +02:00
inode-item.h btrfs: use struct fscrypt_str instead of struct qstr 2022-12-05 18:00:43 +01:00
inode.c btrfs: fix infinite directory reads 2023-08-23 17:32:38 +02:00
ioctl.c btrfs: fix assertion of exclop condition when starting balance 2023-04-28 16:36:27 +02:00
ioctl.h fs: port ->fileattr_set() to pass mnt_idmap 2023-01-19 09:24:27 +01:00
locking.c btrfs: add block-group tree to lockdep classes 2023-07-19 16:36:57 +02:00
locking.h btrfs: locking: use atomic for DREW lock writers 2023-04-17 18:01:17 +02:00
lru_cache.c btrfs: send: cache utimes operations for directories if possible 2023-02-15 19:38:50 +01:00
lru_cache.h btrfs: remove btrfs_lru_cache_is_full() inline function 2023-04-17 18:01:18 +02:00
lzo.c btrfs: move zero filling of compressed read bios into common code 2023-04-17 18:01:17 +02:00
messages.c btrfs: mark btrfs_assertfail() __noreturn 2023-04-17 19:52:19 +02:00
messages.h btrfs: mark btrfs_assertfail() __noreturn 2023-04-17 19:52:19 +02:00
misc.h btrfs: simplify percent calculation helpers, rename div_factor 2022-12-05 18:00:48 +01:00
ordered-data.c btrfs: fold btrfs_clone_ordered_extent into btrfs_split_ordered_extent 2023-04-17 18:01:21 +02:00
ordered-data.h btrfs: sink parameter len to btrfs_split_ordered_extent 2023-04-17 18:01:21 +02:00
orphan.c btrfs: move orphan prototypes into orphan.h 2022-12-05 18:00:47 +01:00
orphan.h btrfs: move orphan prototypes into orphan.h 2022-12-05 18:00:47 +01:00
print-tree.c btrfs: print-tree: parent bytenr must be aligned to sector size 2023-05-09 22:07:40 +02:00
print-tree.h
props.c btrfs: move super_block specific helpers into super.h 2022-12-05 18:00:47 +01:00
props.h btrfs: make module init/exit match their sequence 2022-12-05 18:00:40 +01:00
qgroup.c btrfs: fix race between quota disable and relocation 2023-08-03 10:25:43 +02:00
qgroup.h btrfs: sink gfp_t parameter to btrfs_qgroup_trace_extent 2022-12-05 18:00:43 +01:00
raid56.c btrfs: raid56: always verify the P/Q contents for scrub 2023-07-27 08:56:34 +02:00
raid56.h btrfs: remove unused raid56 functions which were dedicated for scrub 2023-04-17 19:52:18 +02:00
rcu-string.h btrfs: replace strncpy() with strscpy() 2022-12-05 18:00:59 +01:00
ref-verify.c btrfs: move accessor helpers into accessors.h 2022-12-05 18:00:42 +01:00
ref-verify.h
reflink.c btrfs: pass btrfs_inode to btrfs_inode_unlock 2022-12-05 18:00:53 +01:00
reflink.h
relocation.c btrfs: exit gracefully if reloc roots don't match 2023-08-16 18:32:30 +02:00
relocation.h btrfs: move relocation prototypes into relocation.h 2022-12-05 18:00:47 +01:00
root-tree.c btrfs: move orphan prototypes into orphan.h 2022-12-05 18:00:47 +01:00
root-tree.h btrfs: move root tree prototypes to their own header 2022-12-05 18:00:44 +01:00
scrub.c btrfs: fix replace/scrub failure with metadata_uuid 2023-08-23 17:32:39 +02:00
scrub.h btrfs: scrub: remove scrub_bio structure 2023-04-17 18:01:24 +02:00
send.c btrfs: fix uninitialized variable warnings 2023-04-17 19:52:19 +02:00
send.h btrfs: send add define for v2 buffer size 2022-12-05 18:00:41 +01:00
space-info.c btrfs: add helper to calculate space for delayed references 2023-04-17 18:01:19 +02:00
space-info.h btrfs: update documentation for BTRFS_RESERVE_FLUSH_EVICT flush method 2023-04-17 18:01:18 +02:00
subpage.c btrfs: move the printk helpers out of ctree.h 2022-12-05 18:00:41 +01:00
subpage.h
super.c btrfs: properly enable async discard when switching from RO->RW 2023-06-06 19:44:22 +02:00
super.h btrfs: move super_block specific helpers into super.h 2022-12-05 18:00:47 +01:00
sysfs.c btrfs: sysfs: relax bg_reclaim_threshold for debugging purposes 2023-04-17 18:01:18 +02:00
sysfs.h btrfs: sysfs: update fs features directory asynchronously 2023-02-13 17:50:35 +01:00
transaction.c btrfs: check for commit error at btrfs_attach_transaction_barrier() 2023-08-03 10:26:07 +02:00
transaction.h btrfs: move btrfs_abort_transaction to transaction.c 2023-02-13 17:50:33 +01:00
tree-checker.c btrfs: reject invalid reloc tree root keys with stack dump 2023-08-16 18:32:30 +02:00
tree-checker.h btrfs: move struct btrfs_tree_parent_check out of disk-io.h 2022-12-05 18:00:57 +01:00
tree-log.c btrfs: fix an uninitialized variable warning in btrfs_log_inode 2023-05-26 23:24:04 +02:00
tree-log.h btrfs: use a negative value for BTRFS_LOG_FORCE_COMMIT 2023-02-13 17:50:34 +01:00
tree-mod-log.c btrfs: warn on invalid slot in tree mod log rewind 2023-07-19 16:36:56 +02:00
tree-mod-log.h btrfs: fix SPDX comment in tree-mod-log.h 2022-12-05 18:00:48 +01:00
ulist.c btrfs: constify ulist parameter of ulist_next() 2022-12-05 18:00:50 +01:00
ulist.h btrfs: constify ulist parameter of ulist_next() 2022-12-05 18:00:50 +01:00
uuid-tree.c btrfs: move uuid tree prototypes to uuid-tree.h 2022-12-05 18:00:46 +01:00
uuid-tree.h btrfs: move uuid tree prototypes to uuid-tree.h 2022-12-05 18:00:46 +01:00
verity.c fsverity: pass pos and size to ->write_merkle_tree_block 2023-01-01 15:46:48 -08:00
verity.h btrfs: move verity prototypes into verity.h 2022-12-05 18:00:47 +01:00
volumes.c btrfs: fix BUG_ON condition in btrfs_cancel_balance 2023-08-23 17:32:38 +02:00
volumes.h btrfs: fix remaining u32 overflows when left shifting stripe_nr 2023-06-22 17:03:55 +02:00
xattr.c fs: drop unused posix acl handlers 2023-03-06 09:57:12 +01:00
xattr.h
zlib.c btrfs: move zero filling of compressed read bios into common code 2023-04-17 18:01:17 +02:00
zoned.c btrfs: zoned: do not enable async discard 2023-08-03 10:26:07 +02:00
zoned.h btrfs: pass a btrfs_bio to btrfs_use_append 2023-02-15 19:38:55 +01:00
zstd.c btrfs: move zero filling of compressed read bios into common code 2023-04-17 18:01:17 +02:00