Go to file
Chris Mason c4b460b5fa btrfs: only subtract from len_to_oe_boundary when it is tracking an extent
commit 09c3717c3a upstream.

bio_ctrl->len_to_oe_boundary is used to make sure we stay inside a zone
as we submit bios for writes.  Every time we add a page to the bio, we
decrement those bytes from len_to_oe_boundary, and then we submit the
bio if we happen to hit zero.

Most of the time, len_to_oe_boundary gets set to U32_MAX.
submit_extent_page() adds pages into our bio, and the size of the bio
ends up limited by:

- Are we contiguous on disk?
- Does bio_add_page() allow us to stuff more in?
- is len_to_oe_boundary > 0?

The len_to_oe_boundary math starts with U32_MAX, which isn't page or
sector aligned, and subtracts from it until it hits zero.  In the
non-zoned case, the last IO we submit before we hit zero is going to be
unaligned, triggering BUGs.

This is hard to trigger because bio_add_page() isn't going to make a bio
of U32_MAX size unless you give it a perfect set of pages and fully
contiguous extents on disk.  We can hit it pretty reliably while making
large swapfiles during provisioning because the machine is freshly
booted, mostly idle, and the disk is freshly formatted.  It's also
possible to trigger with reads when read_ahead_kb is set to 4GB.

The code has been clean up and shifted around a few times, but this flaw
has been lurking since the counter was added.  I think the commit
24e6c80822 ("btrfs: simplify main loop in submit_extent_page") ended
up exposing the bug.

The fix used here is to skip doing math on len_to_oe_boundary unless
we've changed it from the default U32_MAX value.  bio_add_page() is the
real limit we want, and there's no reason to do extra math when block
layer is doing it for us.

Sample reproducer, note you'll need to change the path to the bdi and
device:

  SUBVOL=/btrfs/swapvol
  SWAPFILE=$SUBVOL/swapfile
  SZMB=8192

  mkfs.btrfs -f /dev/vdb
  mount /dev/vdb /btrfs

  btrfs subvol create $SUBVOL
  chattr +C $SUBVOL
  dd if=/dev/zero of=$SWAPFILE bs=1M count=$SZMB
  sync

  echo 4 > /proc/sys/vm/drop_caches

  echo 4194304 > /sys/class/bdi/btrfs-2/read_ahead_kb

  while true; do
	  echo 1 > /proc/sys/vm/drop_caches
	  echo 1 > /proc/sys/vm/drop_caches
	  dd of=/dev/zero if=$SWAPFILE bs=4096M count=2 iflag=fullblock
  done

Fixes: 24e6c80822 ("btrfs: simplify main loop in submit_extent_page")
CC: stable@vger.kernel.org # 6.4+
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-23 17:32:39 +02:00
Documentation iommu/amd: Introduce Disable IRTE Caching Support 2023-08-23 17:32:27 +02:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
arch powerpc/rtas_flash: allow user copy to flash block cache objects 2023-08-23 17:32:37 +02:00
block blk-cgroup: hold queue_lock when removing blkg->q_node 2023-08-23 17:32:37 +02:00
certs KEYS: Add missing function documentation 2023-04-24 16:15:52 +03:00
crypto crypto: jitter - correct health test during initialization 2023-07-19 16:36:19 +02:00
drivers tty: serial: fsl_lpuart: Clear the error flags by writing 1 for lpuart32 platforms 2023-08-23 17:32:38 +02:00
fs btrfs: only subtract from len_to_oe_boundary when it is tracking an extent 2023-08-23 17:32:39 +02:00
include media: v4l2-mem2mem: add lock to protect parameter num_rdy 2023-08-23 17:32:30 +02:00
init init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init() 2023-08-08 20:04:49 +02:00
io_uring io_uring: correct check for O_TMPFILE 2023-08-16 18:32:19 +02:00
ipc Merge branch 'work.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2023-02-24 19:20:07 -08:00
kernel ring-buffer: Do not swap cpu_buffer during resize process 2023-08-23 17:32:35 +02:00
lib debugobjects: Recheck debug_objects_enabled before reporting 2023-08-11 12:14:25 +02:00
mm mm: memory-failure: avoid false hwpoison page mapped error info 2023-08-16 18:32:20 +02:00
net Bluetooth: MGMT: Use correct address for memcpy() 2023-08-23 17:32:34 +02:00
rust rust: macros: vtable: fix `HAS_*` redefinition (`gen_const_name`) 2023-08-23 17:32:36 +02:00
samples samples: ftrace: Save required argument registers in sample trampolines 2023-07-23 13:54:09 +02:00
scripts kbuild: rust: avoid creating temporary files 2023-07-27 08:57:06 +02:00
security security: keys: Modify mismatched function name 2023-07-27 08:56:59 +02:00
sound ALSA: hda/realtek: Add quirk for ASUS ROG GZ301V 2023-08-23 17:32:34 +02:00
tools nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID 2023-08-16 18:32:28 +02:00
usr initramfs: Check negative timestamp to prevent broken cpio archive 2023-04-16 17:37:01 +09:00
virt KVM: Grab a reference to KVM for VM and vCPU stats file descriptors 2023-08-03 10:26:01 +02:00
.clang-format cxl for v6.4 2023-04-30 11:51:51 -07:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for *.dtso files 2023-02-26 15:28:23 +09:00
.gitignore linux-kselftest-kunit-6.4-rc1 2023-04-24 12:31:32 -07:00
.mailmap mailmap: add entries for Ben Dooks 2023-06-19 13:19:35 -07:00
.rustfmt.toml rust: add `.rustfmt.toml` 2022-09-28 09:02:20 +02:00
COPYING
CREDITS MAINTAINERS: sctp: move Neil to CREDITS 2023-05-12 08:51:32 +01:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Networking fixes for 6.4-rc8, including fixes from ipsec, bpf, 2023-06-22 17:59:51 -07:00
Makefile Linux 6.4.11 2023-08-16 18:32:31 +02:00
README

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.