linux-stable/fs/btrfs
Wang Xiaoguang 1d57ee9416 btrfs: improve delayed refs iterations
This issue was found when I tried to delete a heavily reflinked file,
when deleting such files, other transaction operation will not have a
chance to make progress, for example, start_transaction() will blocked
in wait_current_trans(root) for long time, sometimes it even triggers
soft lockups, and the time taken to delete such heavily reflinked file
is also very large, often hundreds of seconds. Using perf top, it reports
that:

PerfTop:    7416 irqs/sec  kernel:99.8%  exact:  0.0% [4000Hz cpu-clock],  (all, 4 CPUs)
---------------------------------------------------------------------------------------
    84.37%  [btrfs]             [k] __btrfs_run_delayed_refs.constprop.80
    11.02%  [kernel]            [k] delay_tsc
     0.79%  [kernel]            [k] _raw_spin_unlock_irq
     0.78%  [kernel]            [k] _raw_spin_unlock_irqrestore
     0.45%  [kernel]            [k] do_raw_spin_lock
     0.18%  [kernel]            [k] __slab_alloc
It seems __btrfs_run_delayed_refs() took most cpu time, after some debug
work, I found it's select_delayed_ref() causing this issue, for a delayed
head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but
select_delayed_ref() will firstly try to iterate node list to find
BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and
waste much time.

To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head,
then in select_delayed_ref(), if this list is not empty, we can directly use
nodes in this list. With this patch, it just took about 10~15 seconds to
delte the same file. Now using perf top, it reports that:

PerfTop:    2734 irqs/sec  kernel:99.5%  exact:  0.0% [4000Hz cpu-clock],  (all, 4 CPUs)
----------------------------------------------------------------------------------------

    20.74%  [kernel]          [k] _raw_spin_unlock_irqrestore
    16.33%  [kernel]          [k] __slab_alloc
     5.41%  [kernel]          [k] lock_acquired
     4.42%  [kernel]          [k] lock_acquire
     4.05%  [kernel]          [k] lock_release
     3.37%  [kernel]          [k] _raw_spin_unlock_irq

For normal files, this patch also gives help, at least we do not need to
iterate whole list to found BTRFS_ADD_DELAYED_REF nodes.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30 13:45:21 +01:00
..
tests btrfs: remove constant parameter to memset_extent_buffer and rename it 2016-11-30 13:45:17 +01:00
acl.c posix_acl: Clear SGID bit when setting file permissions 2016-09-22 10:55:32 +02:00
async-thread.c btrfs: plumb fs_info into btrfs_work 2016-07-26 13:53:15 +02:00
async-thread.h btrfs: plumb fs_info into btrfs_work 2016-07-26 13:53:15 +02:00
backref.c btrfs: convert pr_* to btrfs_* where possible 2016-09-26 19:37:04 +02:00
backref.h
btrfs_inode.h Btrfs: add a flags field to btrfs_fs_info 2016-09-26 17:59:49 +02:00
check-integrity.c btrfs: use bio_for_each_segment_all in __btrfsic_submit_bio 2016-11-30 13:45:20 +01:00
check-integrity.h fs: have submit_bh users pass in op and flags separately 2016-06-07 13:41:38 -06:00
compression.c btrfs: calculate end of bio offset properly 2016-11-30 13:45:20 +01:00
compression.h btrfs: use bio iterators for the decompression handlers 2016-11-30 13:45:19 +01:00
ctree.c btrfs: add optimized version of eb to eb copy 2016-11-30 13:45:17 +01:00
ctree.h btrfs: store and load values of stripes_min/stripes_max in balance status item 2016-11-30 13:45:18 +01:00
dedupe.h btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize 2016-07-26 13:52:25 +02:00
delayed-inode.c btrfs: increment ctx->pos for every emitted or skipped dirent in readdir 2016-11-30 13:45:19 +01:00
delayed-inode.h btrfs: increment ctx->pos for every emitted or skipped dirent in readdir 2016-11-30 13:45:19 +01:00
delayed-ref.c btrfs: improve delayed refs iterations 2016-11-30 13:45:21 +01:00
delayed-ref.h btrfs: improve delayed refs iterations 2016-11-30 13:45:21 +01:00
dev-replace.c btrfs: convert pr_* to btrfs_* where possible 2016-09-26 19:37:04 +02:00
dev-replace.h btrfs: refactor btrfs_dev_replace_start for reuse 2016-04-28 10:59:13 +02:00
dir-item.c btrfs: unsplit printed strings 2016-09-26 18:08:44 +02:00
disk-io.c btrfs: improve delayed refs iterations 2016-11-30 13:45:21 +01:00
disk-io.h btrfs: change btrfs_csum_final result param type to u8 2016-11-30 13:45:18 +01:00
export.c BTRFS: support NFSv2 export 2015-10-06 06:55:23 -07:00
export.h
extent-tree.c btrfs: improve delayed refs iterations 2016-11-30 13:45:21 +01:00
extent_io.c btrfs: add optimized version of eb to eb copy 2016-11-30 13:45:17 +01:00
extent_io.h btrfs: add optimized version of eb to eb copy 2016-11-30 13:45:17 +01:00
extent_map.c btrfs: Fix slab accounting flags 2016-07-26 13:52:25 +02:00
extent_map.h btrfs: cleanup, stop casting for extent_map->lookup everywhere 2016-01-15 19:22:28 +01:00
file-item.c btrfs: refactor __btrfs_lookup_bio_sums to use bio_for_each_segment_all 2016-11-30 13:45:20 +01:00
file.c Btrfs: abort transaction if fill_holes() fails 2016-11-30 13:45:19 +01:00
free-space-cache.c btrfs: remove redundant check of btrfs_iget return value 2016-11-30 13:45:18 +01:00
free-space-cache.h btrfs: convert pr_* to btrfs_* where possible 2016-09-26 19:37:04 +02:00
free-space-tree.c Merge branch 'fst-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.9 2016-10-12 13:16:00 -07:00
free-space-tree.h Btrfs: implement the free space B-tree 2015-12-17 12:16:47 -08:00
hash.c btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
hash.h btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
inode-item.c btrfs: rename btrfs_std_error to btrfs_handle_fs_error 2016-04-28 10:36:54 +02:00
inode-map.c btrfs: convert pr_* to btrfs_* where possible 2016-09-26 19:37:04 +02:00
inode-map.h Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots 2016-01-15 19:25:02 +01:00
inode.c btrfs: don't access the bio directly in the direct I/O code 2016-11-30 13:45:20 +01:00
ioctl.c btrfs: return early from failed memory allocations in ioctl handlers 2016-11-30 13:45:18 +01:00
Kconfig
locking.c btrfs: cleanup, remove stray return statements 2016-01-07 14:30:52 +01:00
locking.h
lzo.c btrfs: use bio iterators for the decompression handlers 2016-11-30 13:45:19 +01:00
Makefile Btrfs: add free space tree sanity tests 2015-12-17 12:16:47 -08:00
math.h
ordered-data.c btrfs: unsplit printed strings 2016-09-26 18:08:44 +02:00
ordered-data.h Btrfs: fix race setting block group readonly during device replace 2016-05-30 12:58:21 +01:00
orphan.c
print-tree.c btrfs: convert printk(KERN_* to use pr_* calls 2016-09-26 18:08:44 +02:00
print-tree.h
props.c btrfs: simpilify btrfs_subvol_inherit_props 2016-07-26 13:54:22 +02:00
props.h
qgroup.c btrfs: Export and move leaf/subtree qgroup helpers to qgroup.c 2016-11-30 13:45:21 +01:00
qgroup.h btrfs: Export and move leaf/subtree qgroup helpers to qgroup.c 2016-11-30 13:45:21 +01:00
raid56.c btrfs: don't access the bio directly in the raid5/6 code 2016-11-30 13:45:19 +01:00
raid56.h
rcu-string.h
reada.c btrfs: reada, remove pointless BUG_ON check for fs_info 2016-11-30 13:45:16 +01:00
relocation.c btrfs: qgroup: Fix qgroup data leaking by using subtree tracing 2016-11-30 13:45:21 +01:00
root-tree.c btrfs: unsplit printed strings 2016-09-26 18:08:44 +02:00
scrub.c btrfs: don't abuse REQ_OP_* flags for btrfs_map_block 2016-11-29 14:10:38 +01:00
send.c Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2016-10-28 10:07:35 -07:00
send.h Btrfs: use linux/sizes.h to represent constants 2016-01-07 14:38:02 +01:00
struct-funcs.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
super.c btrfs: remove stale comment from btrfs_statfs 2016-11-30 13:45:15 +01:00
sysfs.c btrfs: convert printk(KERN_* to use pr_* calls 2016-09-26 18:08:44 +02:00
sysfs.h btrfs: sysfs: introduce helper for syncing bits with sysfs files 2016-01-21 18:50:40 +01:00
transaction.c Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2016-10-11 11:23:06 -07:00
transaction.h btrfs: convert pr_* to btrfs_* where possible 2016-09-26 19:37:04 +02:00
tree-defrag.c Btrfs: fix locking bugs when defragging leaves 2015-12-18 02:51:32 +00:00
tree-log.c btrfs: qgroup: Rename functions to make it follow reserve,trace,account steps 2016-11-30 13:45:21 +01:00
tree-log.h Btrfs: fix lockdep warning on deadlock against an inode's log mutex 2016-08-25 03:58:32 -07:00
ulist.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
ulist.h
uuid-tree.c btrfs: unsplit printed strings 2016-09-26 18:08:44 +02:00
volumes.c btrfs: remove constant parameter to memset_extent_buffer and rename it 2016-11-30 13:45:17 +01:00
volumes.h btrfs: don't abuse REQ_OP_* flags for btrfs_map_block 2016-11-29 14:10:38 +01:00
xattr.c fs: Replace current_fs_time() with current_time() 2016-09-27 21:06:22 -04:00
xattr.h btrfs: Switch to generic xattr handlers 2016-05-17 19:17:09 -04:00
zlib.c btrfs: use bio iterators for the decompression handlers 2016-11-30 13:45:19 +01:00