linux-stable/fs/btrfs
Filipe Manana 47876f7cef btrfs: do not block inode logging for so long during transaction commit
Early on during a transaction commit we acquire the tree_log_mutex and
hold it until after we write the super blocks. But before writing the
extent buffers dirtied by the transaction and the super blocks we unblock
the transaction by setting its state to TRANS_STATE_UNBLOCKED and setting
fs_info->running_transaction to NULL.

This means that after that and before writing the super blocks, new
transactions can start. However if any transaction wants to log an inode,
it will block waiting for the transaction commit to write its dirty
extent buffers and the super blocks because the tree_log_mutex is only
released after those operations are complete, and starting a new log
transaction blocks on that mutex (at start_log_trans()).

Writing the dirty extent buffers and the super blocks can take a very
significant amount of time to complete, but we could allow the tasks
wanting to log an inode to proceed with most of their steps:

1) create the log trees
2) log metadata in the trees
3) write their dirty extent buffers

They only need to wait for the previous transaction commit to complete
(write its super blocks) before they attempt to write their super blocks,
otherwise we could end up with a corrupt filesystem after a crash.

So change start_log_trans() to use the root tree's log_mutex to serialize
for the creation of the log root tree instead of using the tree_log_mutex,
and make btrfs_sync_log() acquire the tree_log_mutex before writing the
super blocks. This allows for inode logging to wait much less time when
there is a previous transaction that is still committing, often not having
to wait at all, as by the time when we try to sync the log the previous
transaction already wrote its super blocks.

This patch belongs to a patch set that is comprised of the following
patches:

  btrfs: fix race causing unnecessary inode logging during link and rename
  btrfs: fix race that results in logging old extents during a fast fsync
  btrfs: fix race that causes unnecessary logging of ancestor inodes
  btrfs: fix race that makes inode logging fallback to transaction commit
  btrfs: fix race leading to unnecessary transaction commit when logging inode
  btrfs: do not block inode logging for so long during transaction commit

The following script that uses dbench was used to measure the impact of
the whole patchset:

  $ cat test-dbench.sh
  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/btrfs
  MOUNT_OPTIONS="-o ssd"

  echo "performance" | \
      tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

  mkfs.btrfs -f -m single -d single $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  dbench -D $MNT -t 300 64

  umount $MNT

The test was run on a machine with 12 cores, 64G of ram, using a NVMe
device and a non-debug kernel configuration (Debian's default).

Before patch set:

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    11277211    0.250    85.340
 Close        8283172     0.002     6.479
 Rename        477515     1.935    86.026
 Unlink       2277936     0.770    87.071
 Deltree          256    15.732    81.379
 Mkdir            128     0.003     0.009
 Qpathinfo    10221180    0.056    44.404
 Qfileinfo    1789967     0.002     4.066
 Qfsinfo      1874399     0.003     9.176
 Sfileinfo     918589     0.061    10.247
 Find         3951758     0.341    54.040
 WriteX       5616547     0.047    85.079
 ReadX        17676028    0.005     9.704
 LockX          36704     0.003     1.800
 UnlockX        36704     0.002     0.687
 Flush         790541    14.115   676.236

Throughput 1179.19 MB/sec  64 clients  64 procs  max_latency=676.240 ms

After patch set:

Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    12687926    0.171    86.526
 Close        9320780     0.002     8.063
 Rename        537253     1.444    78.576
 Unlink       2561827     0.559    87.228
 Deltree          374    11.499    73.549
 Mkdir            187     0.003     0.005
 Qpathinfo    11500300    0.061    36.801
 Qfileinfo    2017118     0.002     7.189
 Qfsinfo      2108641     0.003     4.825
 Sfileinfo    1033574     0.008     8.065
 Find         4446553     0.408    47.835
 WriteX       6335667     0.045    84.388
 ReadX        19887312    0.003     9.215
 LockX          41312     0.003     1.394
 UnlockX        41312     0.002     1.425
 Flush         889233    13.014   623.259

Throughput 1339.32 MB/sec  64 clients  64 procs  max_latency=623.265 ms

+12.7% throughput, -8.2% max latency

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-09 19:16:07 +01:00
..
tests btrfs: remove recalc_thresholds from free space ops 2020-12-09 19:16:06 +01:00
Kconfig btrfs: switch to iomap for direct IO 2020-10-07 12:06:57 +02:00
Makefile btrfs: remove inode number cache feature 2020-12-09 19:16:05 +01:00
acl.c
async-thread.c Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
async-thread.h Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
backref.c btrfs: pass root owner to read_tree_block 2020-12-08 15:54:07 +01:00
backref.h btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE 2020-05-25 11:25:35 +02:00
block-group.c btrfs: implement log-structured superblock for ZONED mode 2020-12-09 19:16:04 +01:00
block-group.h btrfs: load free space cache asynchronously 2020-12-08 15:54:03 +01:00
block-rsv.c btrfs: introduce mount option rescue=ignorebadroots 2020-12-08 15:53:41 +01:00
block-rsv.h btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
btrfs_inode.h btrfs: skip unnecessary searches for xattrs when logging an inode 2020-12-08 15:54:12 +01:00
check-integrity.c btrfs: drop casts of bio bi_sector 2020-12-09 19:16:05 +01:00
check-integrity.h btrfs: remove btrfsic_submit_bh() 2020-03-23 17:01:39 +01:00
compression.c btrfs: drop casts of bio bi_sector 2020-12-09 19:16:05 +01:00
compression.h btrfs: compression: move declarations to header 2020-10-07 12:06:55 +02:00
ctree.c btrfs: simplify return values in setup_nodes_for_search 2020-12-08 15:54:13 +01:00
ctree.h btrfs: do not block inode logging for so long during transaction commit 2020-12-09 19:16:07 +01:00
delalloc-space.c btrfs: add btrfs_reserve_data_bytes and use it 2020-10-07 12:06:52 +02:00
delalloc-space.h btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delayed-inode.c btrfs: make btrfs_delayed_update_inode take btrfs_inode 2020-12-08 15:54:10 +01:00
delayed-inode.h btrfs: make btrfs_delayed_update_inode take btrfs_inode 2020-12-08 15:54:10 +01:00
delayed-ref.c btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
delayed-ref.h
dev-replace.c btrfs: check and enable ZONED mode 2020-12-09 19:16:03 +01:00
dev-replace.h
dir-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
discard.c btrfs: don't miss async discards after scheduled work override 2020-12-08 15:54:05 +01:00
discard.h btrfs: cleanup btrfs_discard_update_discardable usage 2020-12-08 15:54:02 +01:00
disk-io.c btrfs: remove inode number cache feature 2020-12-09 19:16:05 +01:00
disk-io.h btrfs: move btrfs_find_highest_objectid/btrfs_find_free_objectid to disk-io.c 2020-12-09 19:16:05 +01:00
export.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
export.h btrfs: export helpers for subvolume name/id resolution 2020-03-23 17:01:42 +01:00
extent-io-tree.h btrfs: use fixed width int type for extent_state::state 2020-12-08 15:54:13 +01:00
extent-tree.c btrfs: set the lockdep class for extent buffers on creation 2020-12-08 15:54:07 +01:00
extent_io.c btrfs: drop casts of bio bi_sector 2020-12-09 19:16:05 +01:00
extent_io.h btrfs: use fixed width int type for extent_state::state 2020-12-08 15:54:13 +01:00
extent_map.c Btrfs: fix race between using extent maps and merging them 2020-02-12 17:16:46 +01:00
extent_map.h btrfs: remove extent_map::bdev 2019-11-18 23:43:44 +01:00
file-item.c btrfs: drop casts of bio bi_sector 2020-12-09 19:16:05 +01:00
file.c btrfs: disable fallocate in ZONED mode 2020-12-09 19:16:04 +01:00
free-space-cache.c btrfs: remove recalc_thresholds from free space ops 2020-12-09 19:16:06 +01:00
free-space-cache.h btrfs: remove recalc_thresholds from free space ops 2020-12-09 19:16:06 +01:00
free-space-tree.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
free-space-tree.h btrfs: rename btrfs_block_group_cache 2019-11-18 17:51:51 +01:00
inode-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
inode.c btrfs: remove inode number cache feature 2020-12-09 19:16:05 +01:00
ioctl.c btrfs: remove inode number cache feature 2020-12-09 19:16:05 +01:00
locking.c btrfs: remove the recurse parameter from __btrfs_tree_read_lock 2020-12-08 15:54:09 +01:00
locking.h btrfs: remove the recurse parameter from __btrfs_tree_read_lock 2020-12-08 15:54:09 +01:00
lzo.c btrfs: compression: inline free_workspace 2019-11-18 12:46:59 +01:00
misc.h btrfs: rename tree_entry to rb_simple_node and export it 2020-05-25 11:25:19 +02:00
ordered-data.c btrfs: switch cached fs_info::csum_size from u16 to u32 2020-12-08 15:53:59 +01:00
ordered-data.h btrfs: remove unnecessary local variables for checksum size 2020-12-08 15:54:00 +01:00
orphan.c
print-tree.c btrfs: pass root owner to read_tree_block 2020-12-08 15:54:07 +01:00
print-tree.h btrfs: pretty print leaked root name 2020-10-07 12:12:20 +02:00
props.c btrfs: simplify iget helpers 2020-05-25 11:25:37 +02:00
props.h
qgroup.c btrfs: pass root owner to read_tree_block 2020-12-08 15:54:07 +01:00
qgroup.h btrfs: qgroup: export qgroups in sysfs 2020-07-27 12:55:37 +02:00
raid56.c btrfs: drop casts of bio bi_sector 2020-12-09 19:16:05 +01:00
raid56.h
rcu-string.h btrfs: rcu-string: Replace zero-length array with flexible-array member 2020-03-23 17:01:53 +01:00
reada.c btrfs: pass the owner_root and level to alloc_extent_buffer 2020-12-08 15:54:07 +01:00
ref-verify.c btrfs: use btrfs_read_node_slot in walk_down_tree 2020-12-08 15:54:06 +01:00
ref-verify.h
reflink.c btrfs: make btrfs_cont_expand take btrfs_inode 2020-12-08 15:54:12 +01:00
reflink.h Btrfs: move all reflink implementation code into its own file 2020-03-23 17:01:54 +01:00
relocation.c btrfs: remove inode number cache feature 2020-12-09 19:16:05 +01:00
root-tree.c btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations 2020-10-07 12:12:13 +02:00
scrub.c btrfs: implement log-structured superblock for ZONED mode 2020-12-09 19:16:04 +01:00
send.c btrfs: send: use helpers to access root_item::ctransid 2020-12-08 15:53:51 +01:00
send.h btrfs: send: avoid copying file data 2020-10-07 12:13:17 +02:00
space-info.c btrfs: kill the RCU protection for fs_info->space_info 2020-10-07 12:13:19 +02:00
space-info.h btrfs: add btrfs_reserve_data_bytes and use it 2020-10-07 12:06:52 +02:00
struct-funcs.c btrfs: use unaligned helpers for stack and header set/get helpers 2020-10-07 12:13:23 +02:00
super.c btrfs: remove inode number cache feature 2020-12-09 19:16:05 +01:00
sysfs.c btrfs: introduce ZONED feature flag 2020-12-08 15:54:16 +01:00
sysfs.h btrfs: split and refactor btrfs_sysfs_remove_devices_dir 2020-10-07 12:12:21 +02:00
transaction.c btrfs: remove inode number cache feature 2020-12-09 19:16:05 +01:00
transaction.h btrfs: return bool from btrfs_should_end_transaction 2020-12-08 15:54:16 +01:00
tree-checker.c btrfs: tree-checker: annotate all error branches as unlikely 2020-12-08 15:54:15 +01:00
tree-checker.h
tree-defrag.c btrfs: locking: remove all the blocking helpers 2020-12-08 15:54:01 +01:00
tree-log.c btrfs: do not block inode logging for so long during transaction commit 2020-12-09 19:16:07 +01:00
tree-log.h btrfs: make fast fsyncs wait only for writeback 2020-10-07 12:06:56 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: remove unnecessary casts in printk 2020-12-08 15:53:52 +01:00
volumes.c btrfs: drop casts of bio bi_sector 2020-12-09 19:16:05 +01:00
volumes.h btrfs: get zone information of zoned block devices 2020-12-09 19:15:57 +01:00
xattr.c btrfs: skip unnecessary searches for xattrs when logging an inode 2020-12-08 15:54:12 +01:00
xattr.h
zlib.c btrfs: use larger zlib buffer for s390 hardware compression 2020-01-31 10:30:40 -08:00
zoned.c btrfs: implement log-structured superblock for ZONED mode 2020-12-09 19:16:04 +01:00
zoned.h btrfs: implement log-structured superblock for ZONED mode 2020-12-09 19:16:04 +01:00
zstd.c btrfs: compression: inline free_workspace 2019-11-18 12:46:59 +01:00