linux-stable/fs
Dave Chinner 0020a190cf xfs: AIL needs asynchronous CIL forcing
The AIL pushing is stalling on log forces when it comes across
pinned items. This is happening on removal workloads where the AIL
is dominated by stale items that are removed from AIL when the
checkpoint that marks the items stale is committed to the journal.
This results is relatively few items in the AIL, but those that are
are often pinned as directories items are being removed from are
still being logged.

As a result, many push cycles through the CIL will first issue a
blocking log force to unpin the items. This can take some time to
complete, with tracing regularly showing push delays of half a
second and sometimes up into the range of several seconds. Sequences
like this aren't uncommon:

....
 399.829437:  xfsaild: last lsn 0x11002dd000 count 101 stuck 101 flushing 0 tout 20
<wanted 20ms, got 270ms delay>
 400.099622:  xfsaild: target 0x11002f3600, prev 0x11002f3600, last lsn 0x0
 400.099623:  xfsaild: first lsn 0x11002f3600
 400.099679:  xfsaild: last lsn 0x1100305000 count 16 stuck 11 flushing 0 tout 50
<wanted 50ms, got 500ms delay>
 400.589348:  xfsaild: target 0x110032e600, prev 0x11002f3600, last lsn 0x0
 400.589349:  xfsaild: first lsn 0x1100305000
 400.589595:  xfsaild: last lsn 0x110032e600 count 156 stuck 101 flushing 30 tout 50
<wanted 50ms, got 460ms delay>
 400.950341:  xfsaild: target 0x1100353000, prev 0x110032e600, last lsn 0x0
 400.950343:  xfsaild: first lsn 0x1100317c00
 400.950436:  xfsaild: last lsn 0x110033d200 count 105 stuck 101 flushing 0 tout 20
<wanted 20ms, got 200ms delay>
 401.142333:  xfsaild: target 0x1100361600, prev 0x1100353000, last lsn 0x0
 401.142334:  xfsaild: first lsn 0x110032e600
 401.142535:  xfsaild: last lsn 0x1100353000 count 122 stuck 101 flushing 8 tout 10
<wanted 10ms, got 10ms delay>
 401.154323:  xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x1100353000
 401.154328:  xfsaild: first lsn 0x1100353000
 401.154389:  xfsaild: last lsn 0x1100353000 count 101 stuck 101 flushing 0 tout 20
<wanted 20ms, got 300ms delay>
 401.451525:  xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x0
 401.451526:  xfsaild: first lsn 0x1100353000
 401.451804:  xfsaild: last lsn 0x1100377200 count 170 stuck 22 flushing 122 tout 50
<wanted 50ms, got 500ms delay>
 401.933581:  xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x0
....

In each of these cases, every AIL pass saw 101 log items stuck on
the AIL (pinned) with very few other items being found. Each pass, a
log force was issued, and delay between last/first is the sleep time
+ the sync log force time.

Some of these 101 items pinned the tail of the log. The tail of the
log does slowly creep forward (first lsn), but the problem is that
the log is actually out of reservation space because it's been
running so many transactions that stale items that never reach the
AIL but consume log space. Hence we have a largely empty AIL, with
long term pins on items that pin the tail of the log that don't get
pushed frequently enough to keep log space available.

The problem is the hundreds of milliseconds that we block in the log
force pushing the CIL out to disk. The AIL should not be stalled
like this - it needs to run and flush items that are at the tail of
the log with minimal latency. What we really need to do is trigger a
log flush, but then not wait for it at all - we've already done our
waiting for stuff to complete when we backed off prior to the log
force being issued.

Even if we remove the XFS_LOG_SYNC from the xfs_log_force() call, we
still do a blocking flush of the CIL and that is what is causing the
issue. Hence we need a new interface for the CIL to trigger an
immediate background push of the CIL to get it moving faster but not
to wait on that to occur. While the CIL is pushing, the AIL can also
be pushing.

We already have an internal interface to do this -
xlog_cil_push_now() - but we need a wrapper for it to be used
externally. xlog_cil_force_seq() can easily be extended to do what
we need as it already implements the synchronous CIL push via
xlog_cil_push_now(). Add the necessary flags and "push current
sequence" semantics to xlog_cil_force_seq() and convert the AIL
pushing to use it.

One of the complexities here is that the CIL push does not guarantee
that the commit record for the CIL checkpoint is written to disk.
The current log force ensures this by submitting the current ACTIVE
iclog that the commit record was written to. We need the CIL to
actually write this commit record to disk for an async push to
ensure that the checkpoint actually makes it to disk and unpins the
pinned items in the checkpoint on completion. Hence we need to pass
down to the CIL push that we are doing an async flush so that it can
switch out the commit_iclog if necessary to get written to disk when
the commit iclog is finally released.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 12:09:30 -07:00
..
9p 9p for 5.13-rc1 2021-05-07 11:18:52 -07:00
adfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
affs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
afs afs: Remove redundant assignment to ret 2021-07-21 15:11:22 +01:00
autofs autofs: should_expire() argument is guaranteed to be positive 2021-03-24 14:14:27 -04:00
befs fs/befs: Delete obsolete TODO file 2021-03-30 16:54:49 -07:00
bfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
btrfs for-5.14-rc3-tag 2021-07-30 10:50:09 -07:00
cachefiles fscache, cachefiles: Add alternate API to use kiocb for read/write to cache 2021-04-23 10:14:32 +01:00
ceph ceph: don't WARN if we're still opening a session to an MDS 2021-07-20 17:57:33 +02:00
cifs cifs: add missing parsing of backupuid 2021-07-28 17:03:24 -05:00
coda coda: fix reference counting in coda_file_mmap error path 2021-04-23 14:42:39 -07:00
configfs configfs: fix the read and write iterators 2021-07-13 20:56:24 +02:00
cramfs cramfs: use %pD instead of messing with file_dentry()->d_name 2021-01-05 23:02:47 -05:00
crypto fscrypt: fix derivation of SipHash keys on big endian CPUs 2021-06-05 00:52:52 -07:00
debugfs Linux 5.13-rc6 2021-06-14 09:07:45 +02:00
devpts
dlm fs: dlm: invalid buffer access in lookup error 2021-06-11 12:44:47 -05:00
ecryptfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
efivarfs efivars: convert to fileattr 2021-04-12 15:04:29 +02:00
efs
erofs erofs: clean up file headers & footers 2021-06-08 00:41:24 +08:00
exfat Description for this pull request: 2021-07-06 11:06:04 -07:00
exportfs exportfs: Add a function to return the raw output from fh_to_dentry() 2020-12-09 09:39:38 -05:00
ext2 fs/ext2: Avoid page_address on pages returned by ext2_get_page 2021-07-16 12:36:51 +02:00
ext4 Ext4 regression and bug fixes for v5.14-rc1 2021-07-09 09:57:27 -07:00
f2fs f2fs: drop dirty node pages when cp is in error status 2021-07-06 22:05:06 -07:00
fat mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
freevxfs
fscache fscache, cachefiles: Add alternate API to use kiocb for read/write to cache 2021-04-23 10:14:32 +01:00
fuse fuse update for 5.14 2021-07-06 11:17:41 -07:00
gfs2 Various minor gfs2 cleanups and fixes 2021-06-29 20:23:08 -07:00
hfs hfs: add lock nesting notation to hfs_find_init 2021-07-15 10:13:49 -07:00
hfsplus hfsplus: report create_date to kstat.btime 2021-07-01 11:06:06 -07:00
hostfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-02 09:14:01 -07:00
hpfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
hugetlbfs hugetlbfs: fix mount mode command line processing 2021-07-23 17:43:28 -07:00
iomap iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor 2021-07-15 09:58:06 -07:00
isofs isofs: remove redundant continue statement 2021-06-17 17:11:42 +02:00
jbd2 ext4: inline jbd2_journal_[un]register_shrinker() 2021-07-08 08:37:31 -04:00
jffs2 This pull request contains changes for JFFS2, UBI and UBIFS 2021-05-04 18:08:40 -07:00
jfs JFS fixes for 5.14 2021-07-02 14:25:17 -07:00
kernfs Driver core changes for 5.14-rc1 2021-07-05 13:51:41 -07:00
lockd lockd: Update the NLMv4 SHARE results encoder to use struct xdr_stream 2021-07-06 20:14:44 -04:00
minix mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
netfs netfs: fix test for whether we can skip read when writing beyond EOF 2021-06-21 21:24:07 +01:00
nfs NFS client updates for Linux 5.14 2021-07-09 09:43:57 -07:00
nfs_common nfs_common: fix doc warning 2021-07-06 20:14:41 -04:00
nfsd block-5.14-2021-07-08 2021-07-09 12:05:33 -07:00
nilfs2 Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
nls
notify fanotify: fix copy_event_to_user() fid error clean up 2021-06-14 12:16:37 +02:00
ntfs Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:30:04 -07:00
ocfs2 ocfs2: issue zeroout to EOF blocks 2021-07-30 10:14:39 -07:00
omfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
openpromfs openpromfs: don't do unlock_new_inode() until the new inode is set up 2021-03-12 22:15:22 -05:00
orangefs orangefs: fix orangefs df output. 2021-06-28 08:40:08 -04:00
overlayfs overlayfs update for 5.13 2021-04-30 15:17:08 -07:00
proc Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
pstore for-5.14/drivers-2021-06-29 2021-06-30 12:21:16 -07:00
qnx4
qnx6
quota quota: remove unnecessary oom message 2021-06-22 10:40:52 +02:00
ramfs fs: move ramfs_aops to libfs 2021-06-29 10:53:48 -07:00
reiserfs reiserfs: check directory items on read from disk 2021-07-16 12:36:51 +02:00
romfs
squashfs squashfs: add option to panic on errors 2021-06-29 10:53:46 -07:00
sysfs sysfs: Support zapping of binary attr mmaps 2021-01-12 14:26:31 +01:00
sysv mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
tracefs tracing: Fix various typos in comments 2021-03-23 14:08:18 -04:00
ubifs ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode 2021-06-22 09:21:39 +02:00
udf \n 2021-07-01 12:06:39 -07:00
ufs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
unicode .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
vboxsf vboxsf: Add support for the atomic_open directory-inode op 2021-06-23 14:36:52 +02:00
verity fsverity: relax build time dependency on CRYPTO_SHA256 2021-04-22 17:31:32 +10:00
xfs xfs: AIL needs asynchronous CIL forcing 2021-08-16 12:09:30 -07:00
zonefs zonefs: remove redundant null bio check 2021-07-16 13:45:18 +09:00
Kconfig mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON 2021-06-30 20:47:26 -07:00
Kconfig.binfmt binfmt: remove support for em86 (alpha only) 2021-07-25 22:33:03 -07:00
Makefile binfmt: remove support for em86 (alpha only) 2021-07-25 22:33:03 -07:00
aio.c Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio" 2021-04-30 11:20:39 -07:00
anon_inodes.c fs: anon_inodes: rephrase to appropriate kernel-doc 2021-01-15 12:17:25 -05:00
attr.c ima: handle idmapped mounts 2021-01-24 14:27:20 +01:00
bad_inode.c fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
binfmt_aout.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_elf.c Merge branch 'akpm' (patches from Andrew) 2021-06-29 17:29:11 -07:00
binfmt_elf_fdpic.c Merge branch 'akpm' (patches from Andrew) 2021-06-29 17:29:11 -07:00
binfmt_flat.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_misc.c binfmt_misc: fix possible deadlock in bm_register_write 2021-03-13 11:27:30 -08:00
binfmt_script.c
block_dev.c block-5.14-2021-07-30 2021-07-30 11:08:12 -07:00
buffer.c mm/writeback: move __set_page_dirty() to core mm 2021-06-29 10:53:48 -07:00
char_dev.c
compat_binfmt_elf.c get rid of COMPAT_ELF_EXEC_PAGESIZE 2021-01-06 08:42:51 -05:00
coredump.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
d_path.c getcwd(2): clean up error handling 2021-05-18 20:15:58 -04:00
dax.c dax: fix ENOMEM handling in grab_mapping_entry() 2021-06-29 10:53:47 -07:00
dcache.c useful constants: struct qstr for ".." 2021-04-15 22:36:45 -04:00
direct-io.c fs: direct-io: fix missing sdio->boundary 2021-04-09 14:54:23 -07:00
drop_caches.c
eventfd.c eventfd: Export eventfd_ctx_do_read() 2020-11-15 09:49:10 -05:00
eventpoll.c fs/epoll: restore waking from ep_done_scan() 2021-05-06 19:24:13 -07:00
exec.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
fcntl.c fcntl: Fix unreachable code in do_fcntl() 2021-07-12 11:09:13 -05:00
fhandle.c switch file_open_root() to struct path 2021-04-07 13:56:43 -04:00
file.c Merge branch 'work.file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-03 11:05:28 -07:00
file_table.c
filesystems.c
fs-writeback.c writeback, cgroup: do not reparent dax inodes 2021-07-23 17:43:28 -07:00
fs_context.c fs: add vfs_parse_fs_param_source() helper 2021-07-14 09:19:06 -07:00
fs_parser.c vfs: fs_parser: clean up kernel-doc warnings 2021-04-30 11:20:35 -07:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c init: handle idmapped mounts 2021-01-24 14:27:19 +01:00
inode.c mm: remove nrexceptional from inode: remove BUG_ON 2021-05-05 11:27:20 -07:00
internal.h cgroup1: fix leaked context root causing sporadic NULL deref in LTP 2021-07-21 06:39:20 -10:00
io-wq.c io_uring: explicitly catch any illegal async queue attempt 2021-07-23 16:44:51 -06:00
io-wq.h io_uring: move creds from io-wq work to io_kiocb 2021-06-18 09:22:02 -06:00
io_uring.c io_uring: fix poll requests leaking second poll entries 2021-07-28 07:24:57 -06:00
ioctl.c vfs: add fileattr ops 2021-04-12 15:04:23 +02:00
kernel_read_file.c switch file_open_root() to struct path 2021-04-07 13:56:43 -04:00
libfs.c fs: remove noop_set_page_dirty() 2021-06-29 10:53:48 -07:00
locks.c Additional fixes and clean-ups for NFSD since tags/nfsd-5.13, 2021-05-05 13:44:19 -07:00
mbcache.c
mount.h mount: make {lock,unlock}_mount_hash() static 2021-01-24 14:29:34 +01:00
mpage.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
namei.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
namespace.c mount: Support "nosymfollow" in new mount api 2021-06-01 12:09:27 +02:00
no-block.c
nsfs.c
open.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
pipe.c pipe: make pipe writes always wake up readers 2021-07-30 15:42:34 -07:00
pnode.c
pnode.h mount: fix mounting of detached mounts onto targets that reside on shared mounts 2021-03-08 15:18:43 +01:00
posix_acl.c fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
proc_namespace.c fs: introduce MOUNT_ATTR_IDMAP 2021-01-24 14:43:45 +01:00
read_write.c teach sendfile(2) to handle send-to-pipe directly 2021-01-25 23:29:36 -05:00
readdir.c readdir: make sure to verify directory entry for legacy interfaces too 2021-04-17 11:39:49 -07:00
remap_range.c ioctl: handle idmapped mounts 2021-01-24 14:27:19 +01:00
select.c kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data() 2021-03-16 22:13:10 +01:00
seq_file.c seq_file: disallow extremely large seq buffer allocations 2021-07-19 17:18:48 -07:00
signalfd.c signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo 2021-05-18 16:20:54 -05:00
splice.c for-5.12/block-2021-02-17 2021-02-21 11:02:48 -08:00
stack.c
stat.c fs: fix reporting supported extra file attributes for statx() 2021-04-17 23:03:50 -04:00
statfs.c s390,alpha: switch to 64-bit ino_t 2021-02-13 17:17:53 +01:00
super.c block: move bd_mutex to struct gendisk 2021-06-01 07:44:32 -06:00
sync.c
timerfd.c
userfaultfd.c userfaultfd: do not untag user pointers 2021-07-23 17:43:28 -07:00
utimes.c utimes: handle idmapped mounts 2021-01-24 14:27:18 +01:00
xattr.c xattr: fix kernel-doc for mnt_userns and vfs xattr helpers 2021-03-23 11:20:26 +01:00