linux-stable/fs/btrfs
Filipe Manana 506650dcb3 btrfs: improve the batch insertion of delayed items
When we insert the delayed items of an inode, which corresponds to the
directory index keys for a directory (key type BTRFS_DIR_INDEX_KEY), we
do the following:

1) Pick the first delayed item from the rbtree and insert it into the
   fs/subvolume btree, using btrfs_insert_empty_item() for that;

2) Without releasing the path returned by btrfs_insert_empty_item(),
   keep collecting as many consecutive delayed items from the rbtree
   as possible, as long as each one's BTRFS_DIR_INDEX_KEY key is the
   immediate successor of the previously picked item and as long as
   they fit in the available space of the leaf the path points to;

3) Then insert all the collected items into the leaf;

4) Release the reserve metadata space for each collected item and
   release each item (implies deleting from the rbtree);

5) Unlock the path.

While this is much better than inserting items one by one, it can be
improved in a few aspects:

1) Instead of adding items based on the remaining free space of the
   leaf, collect as many items that can fit in a leaf and bulk insert
   them. This results in less and larger batches, reducing the total
   amount of time to insert the delayed items. For example when adding
   100K files to a directory, we ended up creating 1658 batches with
   very variable sizes ranging from 1 item to 118 items, on a filesystem
   with a node/leaf size of 16K. After this change, we end up with 839
   batches, with the vast majority of them having exactly 120 items;

2) We do the search for more items to batch, by iterating the rbtree,
   while holding a write lock on the leaf;

3) While still holding the leaf locked, we are releasing the reserved
   metadata for each item and then deleting each item, keeping a write
   lock on the leaf for longer than necessary. Releasing the delayed items
   one by one can take a significant amount of time, because deleting
   them from the rbtree can often be a bit slow when the deletion results
   in rebalancing the rbtree.

So change this so that we try to create larger batches, with a total
item size up to the maximum a leaf can support, and by unlocking the leaf
immediately after inserting the items, releasing the reserved metadata
space of each item and releasing each item without holding the write lock
on the leaf.

The following script that runs fs_mark was used to test this change:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/nvme0n1
  MOUNT_OPTIONS="-o ssd"
  MKFS_OPTIONS="-m single -d single"
  FILES=1000000
  THREADS=16
  FILE_SIZE=0

  echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

  umount $DEV &> /dev/null
  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  OPTS="-S 0 -L 5 -n $FILES -s $FILE_SIZE -t 16"
  for ((i = 1; i <= $THREADS; i++)); do
      OPTS="$OPTS -d $MNT/d$i"
  done

  fs_mark $OPTS

  umount $MNT

It was run on machine with 12 cores, 64G of ram, using a NVMe device and
using a non-debug kernel config (Debian's default config).

Results before this change:

FSUse%        Count         Size    Files/sec         App Overhead
     1     16000000            0      76182.1             72223046
     3     32000000            0      62746.9             80776528
     5     48000000            0      77029.0             93022381
     6     64000000            0      73691.6             95251075
     8     80000000            0      66288.0             85089634

Results after this change:

FSUse%        Count         Size    Files/sec         App Overhead
     1     16000000            0      79049.5 (+3.7%)     69700824
     3     32000000            0      65248.9 (+3.9%)     80583693
     5     48000000            0      77991.4 (+1.2%)     90040908
     6     64000000            0      75096.8 (+1.9%)     89862241
     8     80000000            0      66926.8 (+1.0%)     84429169

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:00 +02:00
..
tests btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
acl.c fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
async-thread.c Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
async-thread.h Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
backref.c btrfs: pass NULL as trans to btrfs_search_slot if we only want to search 2021-08-23 13:19:00 +02:00
backref.h btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
block-group.c btrfs: rescue: allow ibadroots to skip bad extent tree when reading block group items 2021-08-23 13:19:00 +02:00
block-group.h btrfs: rework chunk allocation to avoid exhaustion of the system chunk array 2021-07-07 17:42:41 +02:00
block-rsv.c btrfs: introduce mount option rescue=ignorebadroots 2020-12-08 15:53:41 +01:00
block-rsv.h btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
btrfs_inode.h btrfs: remove stale comment and logic from btrfs_inode_in_log() 2021-04-19 17:25:16 +02:00
check-integrity.c btrfs: check-integrity: drop kmap/kunmap for block pages 2021-08-23 13:19:00 +02:00
check-integrity.h
compression.c btrfs: compression: drop kmap/kunmap from generic helpers 2021-08-23 13:19:00 +02:00
compression.h btrfs: optimize variables size in btrfs_submit_compressed_write 2021-06-21 15:19:07 +02:00
ctree.c btrfs: continue readahead of siblings even if target node is in memory 2021-08-23 13:19:00 +02:00
ctree.h btrfs: zoned: remove max_zone_append_size logic 2021-08-23 13:18:58 +02:00
delalloc-space.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
delalloc-space.h btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delayed-inode.c btrfs: improve the batch insertion of delayed items 2021-08-23 13:19:00 +02:00
delayed-inode.h btrfs: make btrfs_delayed_update_inode take btrfs_inode 2020-12-08 15:54:10 +01:00
delayed-ref.c btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
delayed-ref.h btrfs: only let one thread pre-flush delayed refs in commit 2021-02-08 22:58:56 +01:00
dev-replace.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
dev-replace.h btrfs: zoned: mark block groups to copy for device-replace 2021-02-09 02:46:07 +01:00
dir-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
discard.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
discard.h btrfs: cleanup btrfs_discard_update_discardable usage 2020-12-08 15:54:02 +01:00
disk-io.c btrfs: calculate number of eb pages properly in csum_tree_block 2021-07-29 13:01:04 +02:00
disk-io.h btrfs: split alloc_log_tree() 2021-02-09 02:46:07 +01:00
export.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
export.h
extent-io-tree.h btrfs: use fixed width int type for extent_state::state 2020-12-08 15:54:13 +01:00
extent-tree.c btrfs: pass NULL as trans to btrfs_search_slot if we only want to search 2021-08-23 13:19:00 +02:00
extent_io.c btrfs: zoned: remove max_zone_append_size logic 2021-08-23 13:18:58 +02:00
extent_io.h btrfs: rename PagePrivate2 to PageOrdered inside btrfs 2021-06-21 15:19:09 +02:00
extent_map.c btrfs: fix parameter description of btrfs_add_extent_mapping 2021-02-08 22:58:53 +01:00
extent_map.h
file-item.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
file.c Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:30:04 -07:00
free-space-cache.c btrfs: don't set the full sync flag when truncation does not touch extents 2021-06-21 15:19:05 +02:00
free-space-cache.h btrfs: zoned: track unusable bytes for zones 2021-02-09 02:46:03 +01:00
free-space-tree.c btrfs: fix possible free space tree corruption with online conversion 2021-01-25 18:44:37 +01:00
free-space-tree.h
inode-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
inode.c btrfs: compression: drop kmap/kunmap from generic helpers 2021-08-23 13:19:00 +02:00
ioctl.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
Kconfig btrfs: disable build on platforms having page size 256K 2021-06-22 14:11:57 +02:00
locking.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
locking.h btrfs: remove the recurse parameter from __btrfs_tree_read_lock 2020-12-08 15:54:09 +01:00
lzo.c btrfs: compression: drop kmap/kunmap from lzo 2021-08-23 13:18:59 +02:00
Makefile btrfs: move the tree mod log code into its own file 2021-04-19 17:25:16 +02:00
misc.h btrfs: rename tree_entry to rb_simple_node and export it 2020-05-25 11:25:19 +02:00
ordered-data.c btrfs: store a block_device in struct btrfs_ordered_extent 2021-07-22 15:50:15 +02:00
ordered-data.h btrfs: store a block_device in struct btrfs_ordered_extent 2021-07-22 15:50:15 +02:00
orphan.c
print-tree.c btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
print-tree.h btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
props.c btrfs: props: change how empty value is interpreted 2021-06-22 14:11:58 +02:00
props.h
qgroup.c btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
qgroup.h btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
raid56.c btrfs: drop from __GFP_HIGHMEM all allocations 2021-08-23 13:18:59 +02:00
raid56.h
rcu-string.h btrfs: rcu-string: Replace zero-length array with flexible-array member 2020-03-23 17:01:53 +01:00
reada.c btrfs: subpage: make readahead work properly 2021-03-16 11:06:21 +01:00
ref-verify.c btrfs: ref-verify: use 'inline void' keyword ordering 2021-03-02 16:55:40 +01:00
ref-verify.h
reflink.c btrfs: reflink: make copy_inline_to_page() to be subpage compatible 2021-06-21 15:19:10 +02:00
reflink.h Btrfs: move all reflink implementation code into its own file 2020-03-23 17:01:54 +01:00
relocation.c btrfs: ensure relocation never runs while we have send operations running 2021-06-22 14:11:58 +02:00
root-tree.c btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations 2020-10-07 12:12:13 +02:00
scrub.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
send.c btrfs: send: fix crash when memory allocations trigger reclaim 2021-06-22 14:11:58 +02:00
send.h btrfs: send: avoid copying file data 2020-10-07 12:13:17 +02:00
space-info.c btrfs: rip out btrfs_space_info::total_bytes_pinned 2021-06-22 14:55:25 +02:00
space-info.h btrfs: rip out btrfs_space_info::total_bytes_pinned 2021-06-22 14:55:25 +02:00
struct-funcs.c btrfs: add special case to setget helpers for 64k pages 2021-08-23 13:18:58 +02:00
subpage.c btrfs: subpage: fix a rare race between metadata endio and eb freeing 2021-06-21 15:19:10 +02:00
subpage.h btrfs: subpage: fix a rare race between metadata endio and eb freeing 2021-06-21 15:19:10 +02:00
super.c btrfs: shorten integrity checker extent data mount option 2021-06-22 14:11:58 +02:00
sysfs.c btrfs: rip out btrfs_space_info::total_bytes_pinned 2021-06-22 14:55:25 +02:00
sysfs.h btrfs: split and refactor btrfs_sysfs_remove_devices_dir 2020-10-07 12:12:21 +02:00
transaction.c btrfs: rework chunk allocation to avoid exhaustion of the system chunk array 2021-07-07 17:42:41 +02:00
transaction.h btrfs: rework chunk allocation to avoid exhaustion of the system chunk array 2021-07-07 17:42:41 +02:00
tree-checker.c btrfs: tree-checker: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly 2021-04-19 17:25:21 +02:00
tree-checker.h
tree-defrag.c btrfs: locking: remove all the blocking helpers 2020-12-08 15:54:01 +01:00
tree-log.c btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction 2021-07-28 19:02:30 +02:00
tree-log.h btrfs: make fast fsyncs wait only for writeback 2020-10-07 12:06:56 +02:00
tree-mod-log.c btrfs: fix race when picking most recent mod log operation for an old root 2021-04-20 19:27:17 +02:00
tree-mod-log.h btrfs: add and use helper to get lowest sequence number for the tree mod log 2021-04-19 17:25:17 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: remove unnecessary casts in printk 2020-12-08 15:53:52 +01:00
volumes.c btrfs: make btrfs_finish_chunk_alloc private to block-group.c 2021-08-23 13:18:59 +02:00
volumes.h btrfs: make btrfs_finish_chunk_alloc private to block-group.c 2021-08-23 13:18:59 +02:00
xattr.c for-5.12-rc1-tag 2021-03-05 12:21:14 -08:00
xattr.h
zlib.c btrfs: compression: drop kmap/kunmap from zlib 2021-08-23 13:18:59 +02:00
zoned.c btrfs: zoned: remove max_zone_append_size logic 2021-08-23 13:18:58 +02:00
zoned.h btrfs: zoned: remove max_zone_append_size logic 2021-08-23 13:18:58 +02:00
zstd.c btrfs: compression: drop kmap/kunmap from zstd 2021-08-23 13:18:59 +02:00