linux-stable/fs
Dave Chinner b0dff466c0 xfs: separate read-only variables in struct xfs_mount
Seeing massive cpu usage from xfs_agino_range() on one machine;
instruction level profiles look similar to another machine running
the same workload, only one machine is consuming 10x as much CPU as
the other and going much slower. The only real difference between
the two machines is core count per socket. Both are running
identical 16p/16GB virtual machine configurations

Machine A:

  25.83%  [k] xfs_agino_range
  12.68%  [k] __xfs_dir3_data_check
   6.95%  [k] xfs_verify_ino
   6.78%  [k] xfs_dir2_data_entry_tag_p
   3.56%  [k] xfs_buf_find
   2.31%  [k] xfs_verify_dir_ino
   2.02%  [k] xfs_dabuf_map.constprop.0
   1.65%  [k] xfs_ag_block_count

And takes around 13 minutes to remove 50 million inodes.

Machine B:

  13.90%  [k] __pv_queued_spin_lock_slowpath
   3.76%  [k] do_raw_spin_lock
   2.83%  [k] xfs_dir3_leaf_check_int
   2.75%  [k] xfs_agino_range
   2.51%  [k] __raw_callee_save___pv_queued_spin_unlock
   2.18%  [k] __xfs_dir3_data_check
   2.02%  [k] xfs_log_commit_cil

And takes around 5m30s to remove 50 million inodes.

Suspect is cacheline contention on m_sectbb_log which is used in one
of the macros in xfs_agino_range. This is a read-only variable but
shares a cacheline with m_active_trans which is a global atomic that
gets bounced all around the machine.

The workload is trying to run hundreds of thousands of transactions
per second and hence cacheline contention will be occurring on this
atomic counter. Hence xfs_agino_range() is likely just be an
innocent bystander as the cache coherency protocol fights over the
cacheline between CPU cores and sockets.

On machine A, this rearrangement of the struct xfs_mount
results in the profile changing to:

   9.77%  [kernel]  [k] xfs_agino_range
   6.27%  [kernel]  [k] __xfs_dir3_data_check
   5.31%  [kernel]  [k] __pv_queued_spin_lock_slowpath
   4.54%  [kernel]  [k] xfs_buf_find
   3.79%  [kernel]  [k] do_raw_spin_lock
   3.39%  [kernel]  [k] xfs_verify_ino
   2.73%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock

Vastly less CPU usage in xfs_agino_range(), but still 3x the amount
of machine B and still runs substantially slower than it should.

Current rm -rf of 50 million files:

		vanilla		patched
machine A	13m20s		6m42s
machine B	5m30s		5m02s

It's an improvement, hence indicating that separation and further
optimisation of read-only global filesystem data is worthwhile, but
it clearly isn't the underlying issue causing this specific
performance degradation.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-27 08:49:25 -07:00
..
9p 9p: read only once on O_NONBLOCK 2020-03-27 09:29:56 +00:00
adfs fs/adfs: bigdir: Fix an error code in adfs_fplus_read() 2020-01-25 11:31:59 -05:00
affs affs: fix a memory leak in affs_remount 2019-11-18 14:26:43 +01:00
afs afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriate 2020-04-24 16:33:32 +01:00
autofs LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() 2020-03-13 21:08:17 -04:00
befs
bfs
btrfs btrfs: fix gcc-4.8 build warning for struct initializer 2020-04-30 12:17:49 +02:00
cachefiles cachefiles: drop direct usage of ->bmap method. 2020-02-03 08:05:56 -05:00
ceph ceph: fix potential bad pointer deref in async dirops cb's 2020-04-13 19:33:47 +02:00
cifs cifs: fix uninitialised lease_key in open_shroot() 2020-04-22 20:29:11 -05:00
coda
configfs utimes: Clamp the timestamps in notify_change() 2019-12-08 19:10:50 -05:00
cramfs cramfs: switch to use of errofc() et.al. 2020-02-07 14:48:41 -05:00
crypto fscrypt updates for 5.7 2020-03-31 12:58:36 -07:00
debugfs debugfs: remove return value of debugfs_create_u32() 2020-04-17 17:08:50 +02:00
devpts
dlm dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD 2019-12-18 18:07:31 +01:00
ecryptfs eCryptfs fixes for 5.6-rc3 2020-02-17 21:08:37 -08:00
efivarfs efi: Use more granular check for availability for variable services 2020-02-23 21:59:42 +01:00
efs
erofs erofs: handle corrupted images whose decompressed size less than it'd be 2020-03-03 23:40:52 +08:00
exfat exfat: truncate atimes to 2s granularity 2020-04-22 20:14:06 +09:00
exportfs race in exportfs_decode_fh() 2019-11-11 09:21:59 -05:00
ext2 ext2: fix empty body warnings when -Wextra is used 2020-03-23 13:01:37 +01:00
ext4 ext4: convert BUG_ON's to WARN_ON's in mballoc.c 2020-04-15 23:58:49 -04:00
f2fs f2fs-for-5.7-rc1 2020-04-07 13:48:26 -07:00
fat fat: fix uninit-memory access for partial initialized inode 2020-03-06 07:06:09 -06:00
freevxfs
fscache proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
fuse fuse: fix stack use after return 2020-02-13 09:16:07 +01:00
gfs2 We've got a lot of patches (39) for this merge window. Most of these patches 2020-03-31 14:16:03 -07:00
hfs hfs/hfsplus: use 64-bit inode timestamps 2019-12-18 18:07:32 +01:00
hfsplus hfsplus: fix crash and filesystem corruption when deleting files 2020-04-10 15:36:20 -07:00
hostfs hostfs: Use kasprintf() instead of fixed buffer formatting 2020-03-29 23:23:00 +02:00
hpfs
hugetlbfs hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race 2020-04-02 09:35:32 -07:00
iomap fibmap: Warn and return an error in case of block > INT_MAX 2020-04-30 07:57:46 -07:00
isofs
jbd2 jbd2: improve comments about freeing data buffers whose page mapping is NULL 2020-03-05 20:25:05 -05:00
jffs2 fs_parse: fold fs_parameter_desc/fs_parameter_spec 2020-02-07 14:48:37 -05:00
jfs Trivial cleanup for jfs 2020-02-05 05:28:20 +00:00
kernfs kernfs: Add option to enable user xattrs 2020-03-16 15:53:47 -04:00
lockd proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
minix
nfs NFS: Fix a race in __nfs_list_for_each_server() 2020-04-30 15:08:26 -04:00
nfs_common
nfsd SUNRPC: Fix backchannel RPC soft lockups 2020-04-17 12:40:31 -04:00
nilfs2
nls
notify fanotify: Fix the checks in fanotify_fsid_equal 2020-03-30 12:40:53 +02:00
ntfs fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t 2020-03-28 13:21:08 +01:00
ocfs2 dlmfs_file_write(): fix the bogosity in handling non-zero *ppos 2020-04-23 13:45:27 -04:00
omfs
openpromfs
orangefs orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush 2020-04-08 09:39:11 -04:00
overlayfs ovl: enable xino automatically in more cases 2020-03-27 16:51:02 +01:00
proc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-04-25 12:25:32 -07:00
pstore pstore/ram: Replace zero-length array with flexible-array member 2020-03-09 14:45:40 -07:00
qnx4
qnx6
quota \n 2020-01-30 15:37:41 -08:00
ramfs fs_parse: fold fs_parameter_desc/fs_parameter_spec 2020-02-07 14:48:37 -05:00
reiserfs reiserfs: clean up several indentation issues 2020-04-07 10:43:44 -07:00
romfs
squashfs
sysfs sysfs: remove redundant __compat_only_sysfs_link_entry_to_kobj fn 2020-04-05 11:34:35 -07:00
sysv
tracefs simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems 2019-12-10 22:29:58 -05:00
ubifs This pull request contains fixes for UBI and UBIFS: 2020-04-07 12:40:56 -07:00
udf change email address for Pali Rohár 2020-04-10 15:36:22 -07:00
ufs
unicode .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
vboxsf fs: Add VirtualBox guest shared folder (vboxsf) support 2020-02-08 17:34:58 -05:00
verity fs-verity: use u64_to_user_ptr() 2020-01-14 13:28:28 -08:00
xfs xfs: separate read-only variables in struct xfs_mount 2020-05-27 08:49:25 -07:00
zonefs zonfs: Fix handling of read-only zones 2020-03-25 11:28:26 +09:00
aio.c aio: prevent potential eventfd recursion on poll 2020-02-03 17:27:47 -07:00
anon_inodes.c
attr.c utimes: Clamp the timestamps in notify_change() 2019-12-08 19:10:50 -05:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path 2020-04-07 10:43:44 -07:00
binfmt_elf_fdpic.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c block: remove unused header 2020-04-21 09:51:10 -06:00
buffer.c block-5.7-2020-04-24 2020-04-24 12:44:19 -07:00
char_dev.c chardev: Avoid potential use-after-free in 'chrdev_open()' 2020-01-06 20:10:26 +01:00
compat.c
compat_binfmt_elf.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
coredump.c coredump: fix null pointer dereference on coredump 2020-04-21 11:11:56 -07:00
d_path.c
dax.c dax,iomap: Add helper dax_iomap_zero() to zero a range 2020-04-02 19:15:03 -07:00
dcache.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-08 11:08:28 -08:00
dcookies.c
direct-io.c fs/direct-io.c: include fs/internal.h for missing prototype 2020-01-04 13:55:09 -08:00
drop_caches.c fs: avoid softlockups in s_inodes iterators 2019-12-18 00:03:01 -05:00
eventfd.c eventfd: track eventfd_signal() recursion depth 2020-02-03 17:27:38 -07:00
eventpoll.c fs/epoll: make nesting accounting safe for -rt kernel 2020-04-07 10:43:44 -07:00
exec.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-04-02 11:22:17 -07:00
fcntl.c fcntl: Distribute switch variables for initialization 2020-03-03 10:55:06 -05:00
fhandle.c
file.c io_uring: make sure openat/openat2 honor rlimit nofile 2020-03-20 08:47:27 -06:00
file_table.c
filesystems.c fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() 2020-04-10 15:36:22 -07:00
fs-writeback.c memcg: fix a crash in wb_workfn when a device disappears 2020-01-31 10:30:36 -08:00
fs_context.c add prefix to fs_context->log 2020-02-07 14:48:35 -05:00
fs_parser.c fs_parse: remove pr_notice() about each validation 2020-04-02 09:35:26 -07:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c add prefix to fs_context->log 2020-02-07 14:48:35 -05:00
inode.c futex: Fix inode life-time issue 2020-03-06 11:06:15 +01:00
internal.h Merge branch 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-04-02 12:30:08 -07:00
io-wq.c io_uring: use io-wq manager as backup task if task is exiting 2020-04-03 11:35:57 -06:00
io-wq.h io_uring: use io-wq manager as backup task if task is exiting 2020-04-03 11:35:57 -06:00
io_uring.c io_uring: punt splice async because of inode mutex 2020-05-01 08:50:57 -06:00
ioctl.c fibmap: Warn and return an error in case of block > INT_MAX 2020-04-30 07:57:46 -07:00
Kconfig exfat: add Kconfig and Makefile 2020-03-05 21:00:40 -05:00
Kconfig.binfmt
libfs.c libfs: fix infoleak in simple_attr_read() 2020-03-24 13:27:16 +01:00
locks.c locks: reinstate locks_delete_block optimization 2020-03-18 13:03:38 -07:00
Makefile exfat: add Kconfig and Makefile 2020-03-05 21:00:40 -05:00
mbcache.c
mount.h
mpage.c fs: move guard_bio_eod() after bio_set_op_attrs 2020-01-09 08:16:12 -07:00
namei.c fix a braino in legitimize_path() 2020-04-06 10:38:59 -04:00
namespace.c LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() 2020-03-13 21:08:17 -04:00
no-block.c
nsfs.c fs/nsfs.c: Added ns_match 2020-03-12 17:33:11 -07:00
open.c Merge branch 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-04-02 12:30:08 -07:00
pipe.c mm: kmem: rename memcg_kmem_(un)charge() into memcg_kmem_(un)charge_page() 2020-04-02 09:35:28 -07:00
pnode.c propagate_one(): mnt_set_mountpoint() needs mount_lock 2020-04-27 10:37:14 -04:00
pnode.h
posix_acl.c fs/posix_acl.c: fix kernel-doc warnings 2020-01-04 13:55:09 -08:00
proc_namespace.c
read_write.c powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro 2020-04-03 00:09:59 +11:00
readdir.c readdir: make user_access_begin() use the real access range 2020-01-23 10:15:28 -08:00
select.c y2038: syscalls: change remaining timeval to __kernel_old_timeval 2019-11-15 14:38:29 +01:00
seq_file.c fs/seq_file.c: seq_read(): add info message about buggy .next functions 2020-04-10 15:36:22 -07:00
signalfd.c
splice.c splice: make do_splice public 2020-03-02 14:04:31 -07:00
stack.c sched/rt, fs: Use CONFIG_PREEMPTION 2019-12-08 14:37:36 +01:00
stat.c fs: make two stat prep helpers available 2020-01-20 17:03:54 -07:00
statfs.c
super.c Fix use after free in get_tree_bdev() 2020-04-28 14:37:40 -07:00
sync.c
timerfd.c timerfd: Make timerfd_settime() time namespace aware 2020-01-14 12:20:53 +01:00
userfaultfd.c userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally 2020-04-07 10:43:40 -07:00
utimes.c utimes: Clamp the timestamps in notify_change() 2019-12-08 19:10:50 -05:00
xattr.c kernfs: Add removed_size out param for simple_xattr_set 2020-03-16 15:53:47 -04:00