linux-stable/fs
Filipe Manana e2e58d0f8d btrfs: try to unlock parent nodes earlier when inserting a key
When inserting a new key, we release the write lock on the leaf's parent
only after doing the binary search on the leaf. This is because if the
key ends up at slot 0, we will have to update the key at slot 0 of the
parent node. The same reasoning applies to any other upper level nodes
when their slot is 0. We also need to keep the parent locked in case the
leaf does not have enough free space to insert the new key/item, because
in that case we will split the leaf and we will need to add a new key to
the parent due to a new leaf resulting from the split operation.

However if the leaf has enough space for the new key and the key does not
end up at slot 0 of the leaf we could release our write lock on the parent
before doing the binary search on the leaf to figure out the destination
slot. That leads to reducing the amount of time other tasks are blocked
waiting to lock the parent, therefore increasing parallelism when there
are other tasks that are trying to access other leaves accessible through
the same parent. This also applies to other upper nodes besides the
immediate parent, when their slot is 0, since we keep locks on them until
we figure out if the leaf slot is slot 0 or not.

In fact, having the key ending at up slot 0 when is rare. Typically it
only happens when the key is less than or equals to the smallest, the
"left most", key of the entire btree, during a split attempt when we try
to push to the right sibling leaf or when the caller just wants to update
the item of an existing key. It's also very common that a leaf has enough
space to insert a new key, since after a split we move about half of the
keys from one into the new leaf.

So unlock the parent, and any other upper level nodes, when during a key
insertion we notice the key is greater then the first key in the leaf and
the leaf has enough free space. After unlocking the upper level nodes, do
the binary search using a low boundary of slot 1 and not slot 0, to figure
out the slot where the key will be inserted (or where the key already is
in case it exists and the caller wants to modify its item data).
This extra comparison, with the first key, is cheap and the key is very
likely already in a cache line because it immediately follows the header
of the extent buffer and we have recently read the level field of the
header (which in fact is the last field of the header).

The following fs_mark test was run on a non-debug kernel (debian's default
kernel config), with a 12 cores intel CPU, and using a NVMe device:

  $ cat run-fsmark.sh
  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/nvme0n1
  MOUNT_OPTIONS="-o ssd"
  MKFS_OPTIONS="-O no-holes -R free-space-tree"
  FILES=100000
  THREADS=$(nproc --all)
  FILE_SIZE=0

  echo "performance" | \
	tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  OPTS="-S 0 -L 10 -n $FILES -s $FILE_SIZE -t $THREADS -k"
  for ((i = 1; i <= $THREADS; i++)); do
      OPTS="$OPTS -d $MNT/d$i"
  done

  fs_mark $OPTS

  umount $MNT

Before this change:

FSUse%        Count         Size    Files/sec     App Overhead
     0      1200000            0     165273.6          5958381
     0      2400000            0     190938.3          6284477
     0      3600000            0     181429.1          6044059
     0      4800000            0     173979.2          6223418
     0      6000000            0     139288.0          6384560
     0      7200000            0     163000.4          6520083
     1      8400000            0      57799.2          5388544
     1      9600000            0      66461.6          5552969
     2     10800000            0      49593.5          5163675
     2     12000000            0      57672.1          4889398

After this change:

FSUse%        Count         Size    Files/sec            App Overhead
     0      1200000            0     167987.3 (+1.6%)         6272730
     0      2400000            0     198563.9 (+4.0%)         6048847
     0      3600000            0     197436.6 (+8.8%)         6163637
     0      4800000            0     202880.7 (+16.6%)        6371771
     1      6000000            0     167275.9 (+20.1%)        6556733
     1      7200000            0     204051.2 (+25.2%)        6817091
     1      8400000            0      69622.8 (+20.5%)        5525675
     1      9600000            0      69384.5 (+4.4%)         5700723
     1     10800000            0      61454.1 (+23.9%)        5363754
     3     12000000            0      61908.7 (+7.3%)         5370196

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:23 +01:00
..
9p netfs, 9p, afs, ceph: Use folios 2021-11-10 21:16:56 +00:00
adfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
affs affs: use bdev_nr_sectors instead of open coding it 2021-10-18 14:43:22 -06:00
afs afs: Fix mmap 2021-12-16 09:10:13 -08:00
autofs autofs: fix wait name hash calculation in autofs_wait() 2021-10-20 21:09:02 -04:00
befs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
bfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
btrfs btrfs: try to unlock parent nodes earlier when inserting a key 2022-01-07 14:18:23 +01:00
cachefiles for-5.16/ki_complete-2021-10-29 2021-11-01 10:17:11 -07:00
ceph ceph: fix up non-directory creation in SGID directories 2021-12-01 17:08:27 +01:00
cifs cifs: sanitize multiple delimiters in prepath 2021-12-17 19:16:49 -06:00
coda coda: bump module version to 7.2 2021-11-09 10:02:51 -08:00
configfs configfs: fix a race in configfs_lookup() 2021-08-25 07:58:49 +02:00
cramfs cramfs: use bdev_nr_bytes instead of open coding it 2021-10-18 14:43:22 -06:00
crypto fscrypt: improve a few comments 2021-10-25 19:11:50 -07:00
debugfs debugfs: debugfs_create_file_size(): use IS_ERR to check for error 2021-09-21 09:09:06 +02:00
devpts
dlm fs: dlm: avoid comms shutdown delay in release_lockspace 2021-09-01 11:29:14 -05:00
ecryptfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
efivarfs
efs
erofs erofs: fix deadlock when shrink erofs slab 2021-11-23 14:58:16 +08:00
exfat exfat: fix incorrect loading of i_blocks for large files 2021-11-01 07:49:21 +09:00
exportfs
ext2 ext2: fix sleeping in atomic bugs on error 2021-09-22 13:05:23 +02:00
ext4 Only bug fixes and cleanups for ext4 this merge window. Of note are 2021-11-10 17:05:37 -08:00
f2fs Update to zstd-1.4.10 2021-11-13 15:32:30 -08:00
fat for-5.16/inode-sync-2021-10-29 2021-11-01 10:25:27 -07:00
freevxfs
fscache fscache: Remove an unused static variable 2021-10-04 22:13:12 +01:00
fuse fuse: release pipe buf after last use 2021-11-25 14:05:18 +01:00
gfs2 gfs2: gfs2_create_inode rework 2021-12-02 12:41:10 +01:00
hfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
hfsplus Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
hostfs hostfs: support splice_write 2021-08-26 22:28:02 +02:00
hpfs treewide: Replace open-coded flex arrays in unions 2021-10-18 12:28:53 -07:00
hugetlbfs mm,hugetlb: remove mlock ulimit for SHM_HUGETLB 2021-11-09 10:02:48 -08:00
iomap iomap: iomap_read_inline_data cleanup 2021-11-24 10:15:47 -08:00
isofs isofs: Fix out of bound access for corrupted isofs image 2021-10-19 12:51:02 +02:00
jbd2 jbd2: add sparse annotations for add_transaction_credits() 2021-08-30 23:36:50 -04:00
jffs2 vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
jfs Just one JFS patch 2021-11-03 09:23:25 -07:00
kernfs Merge 5.15-rc6 into driver-core-next 2021-10-18 09:43:37 +02:00
ksmbd ksmbd: disable SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1 2021-12-17 19:19:45 -06:00
lockd A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping 2021-11-10 16:45:54 -08:00
minix mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
netfs netfs: fix parameter of cleanup() 2021-12-07 15:47:09 +00:00
nfs NFS client bugfixes for Linux 5.16 2021-11-27 10:33:55 -08:00
nfs_common nfs: Fix kerneldoc warning shown up by W=1 2021-10-04 22:02:17 +01:00
nfsd NFSD: Fix READDIR buffer overflow 2021-12-18 17:11:06 -05:00
nilfs2 Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
nls
notify fanotify: Allow users to request FAN_FS_ERROR events 2021-10-27 12:53:45 +02:00
ntfs fs: ntfs: Limit NTFS_RW to page sizes smaller than 64k 2021-11-27 14:34:41 -08:00
ntfs3 gfs2: Fix mmap + page fault deadlocks 2021-11-02 12:25:03 -07:00
ocfs2 Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
omfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
openpromfs
orangefs orangefs: three fixes from other folks... 2021-11-09 10:34:06 -08:00
overlayfs overlayfs update for 5.16 2021-11-09 10:51:12 -08:00
proc proc/vmcore: fix clearing user buffer by properly using clear_user() 2021-11-20 10:35:55 -08:00
pstore pstore/blk: Use "%lu" to format unsigned long 2021-11-21 09:44:19 -08:00
qnx4 qnx4: work around gcc false positive warning bug 2021-09-21 08:36:48 -07:00
qnx6
quota \n 2021-11-06 16:40:48 -07:00
ramfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
reiserfs \n 2021-11-06 16:40:48 -07:00
romfs
smbfs_common cifs: Fix crash on unload of cifs_arc4.ko 2021-12-07 22:38:03 -06:00
squashfs lib: zstd: Add kernel-specific API 2021-11-08 16:55:21 -08:00
sysfs fs/sysfs/dir.c: replace S_IRWXU|S_IRUGO|S_IXUGO with 0755 sysfs_create_dir_ns() 2021-10-05 16:35:05 +02:00
sysv sysv: use BUILD_BUG_ON instead of runtime check 2021-11-09 10:02:52 -08:00
tracefs tracefs: Set all files to the same group ownership as the mount option 2021-12-08 08:06:40 -05:00
ubifs fscrypt: remove fscrypt_operations::max_namelen 2021-09-20 19:32:33 -07:00
udf udf: Fix crash after seekdir 2021-11-09 12:53:58 +01:00
ufs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
unicode
vboxsf vboxfs: fix broken legacy mount signature checking 2021-09-27 11:26:21 -07:00
verity fs-verity: fix signed integer overflow with i_size near S64_MAX 2021-09-22 10:56:34 -07:00
xfs xfs: remove all COW fork extents when remounting readonly 2021-12-07 10:17:29 -08:00
zonefs zonefs: add MODULE_ALIAS_FS 2021-12-17 16:56:35 +09:00
aio.c aio: Fix incorrect usage of eventfd_signal_allowed() 2021-12-09 10:52:55 -08:00
anon_inodes.c fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() 2021-09-19 22:35:37 -04:00
attr.c fs: handle circular mappings correctly 2021-11-17 09:26:09 +01:00
bad_inode.c vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
binfmt_aout.c binfmt: a.out: Fix bogus semicolon 2021-09-05 10:15:05 -07:00
binfmt_elf.c Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
binfmt_elf_fdpic.c coredump: Limit coredumps to a single thread group 2021-10-08 12:06:02 -05:00
binfmt_flat.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_misc.c
binfmt_script.c
buffer.c fs: simplify init_page_buffers 2021-10-18 14:43:22 -06:00
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: Limit coredumps to a single thread group 2021-10-08 12:06:02 -05:00
d_path.c d_path: fix Kernel doc validator complaining 2021-11-06 13:30:32 -07:00
dax.c New code for 5.15: 2021-08-31 11:13:35 -07:00
dcache.c
direct-io.c fs: get rid of the res2 iocb->ki_complete argument 2021-10-25 10:36:24 -06:00
drop_caches.c fs: drop_caches: fix skipping over shadow cache inodes 2021-09-03 09:58:10 -07:00
eventfd.c eventfd: Export eventfd_wake_count to modules 2021-09-06 07:20:56 -04:00
eventpoll.c ARM development updates for 5.15: 2021-09-09 13:25:49 -07:00
exec.c Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
fcntl.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
fhandle.c
file.c fget: clarify and improve __fget_files() implementation 2021-12-13 10:55:30 -08:00
file_table.c
filesystems.c fs: simplify get_filesystem_list / get_all_fs_names 2021-08-23 01:25:40 -04:00
fs-writeback.c Various hardening fixes and cleanups for 5.16-rc1 2021-11-01 17:29:10 -07:00
fs_context.c memcg: charge fs_context and legacy_fs_context 2021-09-03 09:58:12 -07:00
fs_parser.c namei: Standardize callers of filename_lookup() 2021-09-07 16:07:47 -04:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c fs: Remove FS_THP_SUPPORT 2021-11-17 10:36:35 -05:00
internal.h Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
io-wq.c io-wq: drop wqe lock before creating new worker 2021-12-13 09:04:01 -07:00
io-wq.h io_uring: optimise INIT_WQ_LIST 2021-10-19 05:49:54 -06:00
io_uring.c io_uring: zero iocb->ki_pos for stream file types 2021-12-22 20:34:32 -07:00
ioctl.c New code for 5.15: 2021-08-31 11:06:32 -07:00
Kconfig 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
Kconfig.binfmt binfmt: remove support for em86 (alpha only) 2021-07-25 22:33:03 -07:00
kernel_read_file.c vfs: check fd has read access in kernel_read_file_from_fd() 2021-10-18 20:22:03 -10:00
libfs.c libfs: Support RENAME_EXCHANGE in simple_rename() 2021-11-03 15:43:08 +01:00
locks.c locks: remove changelog comments 2021-10-19 14:11:39 -04:00
Makefile 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
mbcache.c
mount.h
mpage.c
namei.c File locking changes for v5.16 2021-11-01 09:06:53 -07:00
namespace.c fs/mount_setattr: always cleanup mount_kattr 2021-12-30 15:12:13 -08:00
no-block.c
nsfs.c
open.c Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
pipe.c Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly" 2021-09-07 11:03:45 -07:00
pnode.c
pnode.h
posix_acl.c fs/posix_acl.c: avoid -Wempty-body warning 2021-11-06 13:30:32 -07:00
proc_namespace.c
read_write.c fs: remove leftover comments from mandatory locking removal 2021-10-26 12:20:50 -04:00
readdir.c
remap_range.c fs: remove mandatory file locking support 2021-08-23 06:15:36 -04:00
select.c Revert "memcg: enable accounting for pollfd and select bits arrays" 2021-09-07 11:26:23 -07:00
seq_file.c seq_file: move seq_escape() to a header 2021-11-09 10:02:52 -08:00
signalfd.c signalfd: use wake_up_pollfree() 2021-12-09 10:49:56 -08:00
splice.c
stack.c
stat.c fs: add generic helper for filling statx attribute flags 2021-08-17 11:47:43 +02:00
statfs.c
super.c fs: explicitly unregister per-superblock BDIs 2021-11-06 13:30:34 -07:00
sync.c block: simplify the block device syncing code 2021-10-22 08:36:55 -06:00
timerfd.c timerfd: Provide timerfd_resume() 2021-08-10 17:57:22 +02:00
userfaultfd.c userfaultfd: fix a race between writeprotect and exit_mmap() 2021-10-18 20:22:02 -10:00
utimes.c
xattr.c