linux-stable/fs
Dave Chinner 108a42358a xfs: Lower CIL flush limit for large logs
The current CIL size aggregation limit is 1/8th the log size. This
means for large logs we might be aggregating at least 250MB of dirty objects
in memory before the CIL is flushed to the journal. With CIL shadow
buffers sitting around, this means the CIL is often consuming >500MB
of temporary memory that is all allocated under GFP_NOFS conditions.

Flushing the CIL can take some time to do if there is other IO
ongoing, and can introduce substantial log force latency by itself.
It also pins the memory until the objects are in the AIL and can be
written back and reclaimed by shrinkers. Hence this threshold also
tends to determine the minimum amount of memory XFS can operate in
under heavy modification without triggering the OOM killer.

Modify the CIL space limit to prevent such huge amounts of pinned
metadata from aggregating. We can have 2MB of log IO in flight at
once, so limit aggregation to 16x this size. This threshold was
chosen as it little impact on performance (on 16-way fsmark) or log
traffic but pins a lot less memory on large logs especially under
heavy memory pressure.  An aggregation limit of 8x had 5-10%
performance degradation and a 50% increase in log throughput for
the same workload, so clearly that was too small for highly
concurrent workloads on large logs.

This was found via trace analysis of AIL behaviour. e.g. insertion
from a single CIL flush:

xfs_ail_insert: old lsn 0/0 new lsn 1/3033090 type XFS_LI_INODE flags IN_AIL

$ grep xfs_ail_insert /mnt/scratch/s.t |grep "new lsn 1/3033090" |wc -l
1721823
$

So there were 1.7 million objects inserted into the AIL from this
CIL checkpoint, the first at 2323.392108, the last at 2325.667566 which
was the end of the trace (i.e. it hadn't finished). Clearly a major
problem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-27 08:32:54 -07:00
..
9p
adfs fs/adfs: bigdir: Fix an error code in adfs_fplus_read() 2020-01-25 11:31:59 -05:00
affs
afs Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
autofs
befs
bfs
btrfs for-5.6-rc2-tag 2020-02-23 09:43:50 -08:00
cachefiles cachefiles: drop direct usage of ->bmap method. 2020-02-03 08:05:56 -05:00
ceph ceph: noacl mount option is effectively ignored 2020-02-11 17:04:40 +01:00
cifs cifs: make sure we do not overflow the max EA buffer size 2020-02-14 11:10:24 -06:00
coda
configfs
cramfs cramfs: switch to use of errofc() et.al. 2020-02-07 14:48:41 -05:00
crypto
debugfs Merge branch 'work.recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-05 05:09:46 +00:00
devpts
dlm
ecryptfs eCryptfs fixes for 5.6-rc3 2020-02-17 21:08:37 -08:00
efivarfs
efs
erofs
exportfs
ext2 dax fixes 5.6-rc1 2020-02-11 16:52:08 -08:00
ext4 ext4: potential crash on allocation error in ext4_alloc_flex_bg_array() 2020-02-29 17:48:08 -05:00
f2fs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:04:49 -08:00
fat Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-05 05:02:42 +00:00
freevxfs
fscache proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
fuse Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
gfs2 Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
hfs
hfsplus
hostfs
hpfs
hugetlbfs hugetlbfs: switch to use of invalfc() 2020-02-07 14:48:42 -05:00
iomap
isofs
jbd2 jbd2: fix data races at struct journal_head 2020-02-29 13:40:02 -05:00
jffs2 fs_parse: fold fs_parameter_desc/fs_parameter_spec 2020-02-07 14:48:37 -05:00
jfs Trivial cleanup for jfs 2020-02-05 05:28:20 +00:00
kernfs Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-05 05:02:42 +00:00
lockd proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
minix
nfs NFSv4: Ensure the delegation cred is pinned when we call delegreturn 2020-02-13 16:23:02 -05:00
nfs_common
nfsd Highlights: 2020-02-07 17:50:21 -08:00
nilfs2
nls
notify
ntfs
ocfs2 treewide: remove redundant IS_ERR() before error code check 2020-02-04 03:05:27 +00:00
omfs
openpromfs
orangefs help_next should increase position index 2020-02-04 15:22:04 -05:00
overlayfs ovl: fix lseek overflow on 32bit 2020-02-03 11:41:53 +01:00
proc Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
pstore
qnx4
qnx6
quota \n 2020-01-30 15:37:41 -08:00
ramfs fs_parse: fold fs_parameter_desc/fs_parameter_spec 2020-02-07 14:48:37 -05:00
reiserfs Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
romfs
squashfs
sysfs treewide: remove redundant IS_ERR() before error code check 2020-02-04 03:05:27 +00:00
sysv
tracefs
ubifs Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-05 05:02:42 +00:00
udf
ufs
unicode kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
vboxsf fs: Add VirtualBox guest shared folder (vboxsf) support 2020-02-08 17:34:58 -05:00
verity
xfs xfs: Lower CIL flush limit for large logs 2020-03-27 08:32:54 -07:00
zonefs zonefs: select FS_IOMAP 2020-02-26 16:58:15 +09:00
aio.c aio: prevent potential eventfd recursion on poll 2020-02-03 17:27:47 -07:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf.c fs/binfmt_elf.c: coredump: allow process with empty address space to coredump 2020-01-31 10:30:41 -08:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c
buffer.c smp: Remove allocation mask from on_each_cpu_cond.*() 2020-01-24 20:40:09 +01:00
char_dev.c
compat.c
compat_binfmt_elf.c
coredump.c pipe: use exclusive waits when reading or writing 2020-02-08 11:39:19 -08:00
d_path.c
dax.c dax: pass NOWAIT flag to iomap_apply 2020-02-05 20:34:32 -08:00
dcache.c
dcookies.c
direct-io.c
drop_caches.c
eventfd.c eventfd: track eventfd_signal() recursion depth 2020-02-03 17:27:38 -07:00
eventpoll.c eventpoll: support non-blocking do_epoll_ctl() calls 2020-01-29 15:45:47 -07:00
exec.c Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
fcntl.c
fhandle.c
file.c threads-v5.6 2020-01-29 19:38:34 -08:00
file_table.c
filesystems.c fs_parser: remove fs_parameter_description name field 2020-02-07 14:48:36 -05:00
fs-writeback.c memcg: fix a crash in wb_workfn when a device disappears 2020-01-31 10:30:36 -08:00
fs_context.c add prefix to fs_context->log 2020-02-07 14:48:35 -05:00
fs_parser.c turn fs_param_is_... into functions 2020-02-07 14:48:38 -05:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c add prefix to fs_context->log 2020-02-07 14:48:35 -05:00
inode.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:04:49 -08:00
internal.h for-5.6/io_uring-vfs-2020-01-29 2020-01-29 18:53:37 -08:00
io-wq.c io-wq: remove spin-for-work optimization 2020-02-25 08:57:37 -07:00
io-wq.h io-wq: ensure work->task_pid is cleared on init 2020-02-25 13:23:48 -07:00
io_uring.c io_uring: fix 32-bit compatability with sendmsg/recvmsg 2020-02-27 14:17:49 -07:00
ioctl.c compat-ioctl fix for v5.6 2020-02-08 13:44:41 -08:00
Kconfig fs: New zonefs file system 2020-02-09 15:51:46 -08:00
Kconfig.binfmt
libfs.c
locks.c
Makefile fs: New zonefs file system 2020-02-09 15:51:46 -08:00
mbcache.c
mount.h
mpage.c
namei.c vfs: fix do_last() regression 2020-02-01 10:36:49 -08:00
namespace.c saner copy_mount_options() 2020-02-03 21:23:33 -05:00
no-block.c
nsfs.c Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-01-29 11:20:24 -08:00
open.c
pipe.c pipe: make sure to wake up everybody when the last reader/writer closes 2020-02-18 14:34:36 -08:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c overlayfs update for 5.6 2020-02-04 11:45:21 +00:00
readdir.c
select.c
seq_file.c
signalfd.c
splice.c pipe: use exclusive waits when reading or writing 2020-02-08 11:39:19 -08:00
stack.c
stat.c
statfs.c
super.c
sync.c
timerfd.c
userfaultfd.c
utimes.c
xattr.c