linux-stable/fs/btrfs
Filipe Manana 444d70889d btrfs: send: don't issue unnecessary zero writes for trailing hole
commit 5897710b28 upstream.

If we have a sparse file with a trailing hole (from the last extent's end
to i_size) and then create an extent in the file that ends before the
file's i_size, then when doing an incremental send we will issue a write
full of zeroes for the range that starts immediately after the new extent
ends up to i_size. While this isn't incorrect because the file ends up
with exactly the same data, it unnecessarily results in using extra space
at the destination with one or more extents full of zeroes instead of
having a hole. In same cases this results in using megabytes or even
gigabytes of unnecessary space.

Example, reproducer:

   $ cat test.sh
   #!/bin/bash

   DEV=/dev/sdh
   MNT=/mnt/sdh

   mkfs.btrfs -f $DEV
   mount $DEV $MNT

   # Create 1G sparse file.
   xfs_io -f -c "truncate 1G" $MNT/foobar

   # Create base snapshot.
   btrfs subvolume snapshot -r $MNT $MNT/mysnap1

   # Create send stream (full send) for the base snapshot.
   btrfs send -f /tmp/1.snap $MNT/mysnap1

   # Now write one extent at the beginning of the file and one somewhere
   # in the middle, leaving a gap between the end of this second extent
   # and the file's size.
   xfs_io -c "pwrite -S 0xab 0 128K" \
          -c "pwrite -S 0xcd 512M 128K" \
          $MNT/foobar

   # Now create a second snapshot which is going to be used for an
   # incremental send operation.
   btrfs subvolume snapshot -r $MNT $MNT/mysnap2

   # Create send stream (incremental send) for the second snapshot.
   btrfs send -p $MNT/mysnap1 -f /tmp/2.snap $MNT/mysnap2

   # Now recreate the filesystem by receiving both send streams and
   # verify we get the same content that the original filesystem had
   # and file foobar has only two extents with a size of 128K each.
   umount $MNT
   mkfs.btrfs -f $DEV
   mount $DEV $MNT

   btrfs receive -f /tmp/1.snap $MNT
   btrfs receive -f /tmp/2.snap $MNT

   echo -e "\nFile fiemap in the second snapshot:"
   # Should have:
   #
   # 128K extent at file range [0, 128K[
   # hole at file range [128K, 512M[
   # 128K extent file range [512M, 512M + 128K[
   # hole at file range [512M + 128K, 1G[
   xfs_io -r -c "fiemap -v" $MNT/mysnap2/foobar

   # File should be using 256K of data (two 128K extents).
   echo -e "\nSpace used by the file: $(du -h $MNT/mysnap2/foobar | cut -f 1)"

   umount $MNT

Running the test, we can see with fiemap that we get an extent for the
range [512M, 1G[, while in the source filesystem we have an extent for
the range [512M, 512M + 128K[ and a hole for the rest of the file (the
range [512M + 128K, 1G[):

   $ ./test.sh
   (...)
   File fiemap in the second snapshot:
   /mnt/sdh/mysnap2/foobar:
    EXT: FILE-OFFSET        BLOCK-RANGE        TOTAL FLAGS
      0: [0..255]:          26624..26879         256   0x0
      1: [256..1048575]:    hole             1048320
      2: [1048576..2097151]: 2156544..3205119 1048576   0x1

   Space used by the file: 513M

This happens because once we finish processing an inode, at
finish_inode_if_needed(), we always issue a hole (write operations full
of zeros) if there's a gap between the end of the last processed extent
and the file's size, even if that range is already a hole in the parent
snapshot. Fix this by issuing the hole only if the range is not already
a hole.

After this change, running the test above, we get the expected layout:

   $ ./test.sh
   (...)
   File fiemap in the second snapshot:
   /mnt/sdh/mysnap2/foobar:
    EXT: FILE-OFFSET        BLOCK-RANGE      TOTAL FLAGS
      0: [0..255]:          26624..26879       256   0x0
      1: [256..1048575]:    hole             1048320
      2: [1048576..1048831]: 26880..27135       256   0x1
      3: [1048832..2097151]: hole             1048320

   Space used by the file: 256K

A test case for fstests will follow soon.

CC: stable@vger.kernel.org # 6.1+
Reported-by: Dorai Ashok S A <dash.btrfs@inix.me>
Link: https://lore.kernel.org/linux-btrfs/c0bf7818-9c45-46a8-b3d3-513230d0c86e@inix.me/
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06 14:45:10 +00:00
..
tests btrfs: convert btrfs_block_group::needs_free_space to runtime flag 2023-08-23 17:52:28 +02:00
acl.c
async-thread.c
async-thread.h btrfs: remove unused typedefs get_extent_t and btrfs_work_func_t 2022-07-25 17:45:36 +02:00
backref.c btrfs: fix resolving backrefs for inline extent followed by prealloc 2023-01-07 11:11:38 +01:00
backref.h btrfs: ignore fiemap path cache if we have multiple leaves for a data extent 2022-10-11 14:48:07 +02:00
block-group.c btrfs: do not delete unused block group if it may be used soon 2024-02-23 09:12:28 +01:00
block-group.h btrfs: add and use helper to check if block group is used 2024-02-23 09:12:28 +01:00
block-rsv.c btrfs: account block group tree when calculating global reserve size 2023-08-03 10:24:13 +02:00
block-rsv.h btrfs: add KCSAN annotations for unlocked access to block_rsv->full 2022-09-26 12:28:02 +02:00
btrfs_inode.h btrfs: use a runtime flag to indicate an inode is a free space inode 2022-09-26 12:28:07 +02:00
check-integrity.c
check-integrity.h
compression.c fs: fix leaked psi pressure state 2022-11-08 15:57:25 -08:00
compression.h for-5.20-tag 2022-08-03 14:54:52 -07:00
ctree.c btrfs: error out when reallocating block for defrag using a stale transaction 2023-10-25 12:03:11 +02:00
ctree.h btrfs: fix infinite directory reads 2024-01-31 16:17:05 -08:00
delalloc-space.c btrfs: don't reserve space for checksums when writing to nocow files 2024-02-23 09:12:29 +01:00
delalloc-space.h btrfs: add the ability to use NO_FLUSH for data reservations 2022-09-29 17:08:28 +02:00
delayed-inode.c btrfs: fix infinite directory reads 2024-01-31 16:17:05 -08:00
delayed-inode.h btrfs: fix infinite directory reads 2024-01-31 16:17:05 -08:00
delayed-ref.c btrfs: prevent transaction block reserve underflow when starting transaction 2023-10-25 12:03:09 +02:00
delayed-ref.h btrfs: prevent transaction block reserve underflow when starting transaction 2023-10-25 12:03:09 +02:00
dev-replace.c btrfs: dev-replace: properly validate device names 2024-03-06 14:45:10 +00:00
dev-replace.h btrfs: add struct declarations in dev-replace.h 2022-09-26 12:28:07 +02:00
dir-item.c btrfs: use struct fscrypt_str instead of struct qstr 2023-10-10 22:00:36 +02:00
discard.c btrfs: hold block group refcount during async discard 2023-03-10 09:34:06 +01:00
discard.h
disk-io.c btrfs: fix double free of anonymous device after snapshot creation failure 2024-03-06 14:45:10 +00:00
disk-io.h btrfs: fix double free of anonymous device after snapshot creation failure 2024-03-06 14:45:10 +00:00
export.c btrfs: fix type of parameter generation in btrfs_get_dentry 2022-10-24 15:28:58 +02:00
export.h btrfs: fix type of parameter generation in btrfs_get_dentry 2022-10-24 15:28:58 +02:00
extent-io-tree.c btrfs: fix off-by-one in delalloc search during lseek 2023-01-12 12:01:56 +01:00
extent-io-tree.h btrfs: stop tracking failed reads in the I/O tree 2022-09-26 12:28:05 +02:00
extent-tree.c btrfs: zoned: optimize hint byte for zoned allocator 2024-01-31 16:17:10 -08:00
extent_io.c btrfs: don't clear qgroup reserved bit in release_folio 2023-12-20 17:00:26 +01:00
extent_io.h btrfs: move extent io tree unrelated prototypes to their appropriate header 2022-09-26 12:28:04 +02:00
extent_map.c btrfs: fix incorrect splitting in btrfs_drop_extent_map_range 2023-08-23 17:52:31 +02:00
extent_map.h btrfs: get the next extent map during fiemap/lseek more efficiently 2023-04-26 14:28:38 +02:00
file-item.c btrfs: mark the len field in struct btrfs_ordered_sum as unsigned 2024-01-10 17:10:35 +01:00
file.c btrfs: fix qgroup_free_reserved_data int overflow 2024-01-10 17:10:35 +01:00
free-space-cache.c btrfs: zoned: no longer count fresh BG region as zone unusable 2024-01-01 12:39:06 +00:00
free-space-cache.h btrfs: remove use btrfs_remove_free_space_cache instead of variant 2022-09-26 12:27:58 +02:00
free-space-tree.c btrfs: convert btrfs_block_group::needs_free_space to runtime flag 2023-08-23 17:52:28 +02:00
free-space-tree.h btrfs: make clear_cache mount option to rebuild FST without disabling it 2023-05-17 11:53:42 +02:00
inode-item.c btrfs: use struct fscrypt_str instead of struct qstr 2023-10-10 22:00:36 +02:00
inode-item.h btrfs: use struct fscrypt_str instead of struct qstr 2023-10-10 22:00:36 +02:00
inode.c btrfs: don't drop extent_map for free space inode on write error 2024-02-23 09:12:29 +01:00
ioctl.c btrfs: fix double free of anonymous device after snapshot creation failure 2024-03-06 14:45:10 +00:00
Kconfig
locking.c btrfs: add block-group tree to lockdep classes 2023-07-19 16:22:13 +02:00
locking.h btrfs: implement a nowait option for tree searches 2022-09-26 12:46:42 +02:00
lzo.c
Makefile btrfs: move extent state init and alloc functions to their own file 2022-09-26 12:28:03 +02:00
misc.h btrfs: convert the io_failure_tree to a plain rb_tree 2022-09-26 12:28:02 +02:00
ordered-data.c btrfs: fix qgroup_free_reserved_data int overflow 2024-01-10 17:10:35 +01:00
ordered-data.h btrfs: mark the len field in struct btrfs_ordered_sum as unsigned 2024-01-10 17:10:35 +01:00
orphan.c
print-tree.c btrfs: print-tree: parent bytenr must be aligned to sector size 2023-05-17 11:53:42 +02:00
print-tree.h
props.c btrfs: remove the unnecessary result variables 2022-09-26 12:28:00 +02:00
props.h
qgroup.c btrfs: forbid deleting live subvol qgroup 2024-02-23 09:12:29 +01:00
qgroup.h btrfs: fix qgroup_free_reserved_data int overflow 2024-01-10 17:10:35 +01:00
raid56.c btrfs: raid56: avoid double freeing for rbio if full_stripe_write() failed 2022-10-24 15:26:56 +02:00
raid56.h btrfs: properly abstract the parity raid bio handling 2022-09-26 12:27:59 +02:00
rcu-string.h btrfs: replace strncpy() with strscpy() 2023-01-12 12:01:55 +01:00
ref-verify.c btrfs: ref-verify: free ref cache before clearing mount opt 2024-01-31 16:17:07 -08:00
ref-verify.h
reflink.c btrfs: replace delete argument with EXTENT_CLEAR_ALL_BITS 2022-09-26 12:28:05 +02:00
reflink.h
relocation.c btrfs: set page extent mapped after read_folio in relocate_one_page 2023-09-19 12:28:06 +02:00
root-tree.c btrfs: use struct fscrypt_str instead of struct qstr 2023-10-10 22:00:36 +02:00
scrub.c btrfs: scrub: try harder to mark RAID56 block groups read-only 2023-06-21 16:00:52 +02:00
send.c btrfs: send: don't issue unnecessary zero writes for trailing hole 2024-03-06 14:45:10 +00:00
send.h btrfs: send: allow protocol version 3 with CONFIG_BTRFS_DEBUG 2022-10-11 14:46:55 +02:00
space-info.c btrfs: zoned: re-enable metadata over-commit for zoned mode 2023-09-19 12:28:06 +02:00
space-info.h btrfs: move btrfs_init_async_reclaim_work prototype to space-info.h 2022-09-26 12:28:06 +02:00
struct-funcs.c
subpage.c btrfs: convert process_page_range() to use filemap_get_folios_contig() 2022-09-11 20:26:03 -07:00
subpage.h
super.c btrfs: add dmesg output for first mount and last unmount of a filesystem 2023-12-08 08:51:16 +01:00
sysfs.c btrfs: sysfs: validate scrub_speed_max value 2024-01-31 16:16:58 -08:00
sysfs.h
transaction.c btrfs: fix double free of anonymous device after snapshot creation failure 2024-03-06 14:45:10 +00:00
transaction.h
tree-checker.c btrfs: tree-checker: fix inline ref size in error messages 2024-01-31 16:17:07 -08:00
tree-checker.h
tree-defrag.c btrfs: move the auto defrag code to defrag.c 2023-02-22 12:59:40 +01:00
tree-log.c btrfs: initialize start_slot in btrfs_log_prealloc_extents 2023-10-25 12:03:09 +02:00
tree-log.h btrfs: use struct fscrypt_str instead of struct qstr 2023-10-10 22:00:36 +02:00
tree-mod-log.c
tree-mod-log.h
ulist.c
ulist.h
uuid-tree.c
verity.c btrfs: send: add support for fs-verity 2022-09-26 12:27:55 +02:00
volumes.c btrfs: make error messages more clear when getting a chunk map 2023-12-08 08:51:16 +01:00
volumes.h btrfs: add a helper to read the superblock metadata_uuid 2023-09-23 11:11:08 +02:00
xattr.c btrfs: check if root is readonly while setting security xattr 2022-08-22 18:06:30 +02:00
xattr.h
zlib.c btrfs: zlib: zero-initialize zlib workspace 2023-02-14 19:11:40 +01:00
zoned.c btrfs: zoned: no longer count fresh BG region as zone unusable 2024-01-01 12:39:06 +00:00
zoned.h btrfs: zoned: clone zoned device info when cloning a device 2022-11-07 14:35:21 +01:00
zstd.c btrfs: zstd: replace kmap() with kmap_local_page() 2022-07-25 17:45:40 +02:00