linux-stable/fs/xfs
Dave Chinner 19f4e7cc81 xfs: Fix CIL throttle hang when CIL space used going backwards
A hang with tasks stuck on the CIL hard throttle was reported and
largely diagnosed by Donald Buczek, who discovered that it was a
result of the CIL context space usage decrementing in committed
transactions once the hard throttle limit had been hit and processes
were already blocked.  This resulted in the CIL push not waking up
those waiters because the CIL context was no longer over the hard
throttle limit.

The surprising aspect of this was the CIL space usage going
backwards regularly enough to trigger this situation. Assumptions
had been made in design that the relogging process would only
increase the size of the objects in the CIL, and so that space would
only increase.

This change and commit message fixes the issue and documents the
result of an audit of the triggers that can cause the CIL space to
go backwards, how large the backwards steps tend to be, the
frequency in which they occur, and what the impact on the CIL
accounting code is.

Even though the CIL ctx->space_used can go backwards, it will only
do so if the log item is already logged to the CIL and contains a
space reservation for it's entire logged state. This is tracked by
the shadow buffer state on the log item. If the item is not
previously logged in the CIL it has no shadow buffer nor log vector,
and hence the entire size of the logged item copied to the log
vector is accounted to the CIL space usage. i.e.  it will always go
up in this case.

If the item has a log vector (i.e. already in the CIL) and the size
decreases, then the existing log vector will be overwritten and the
space usage will go down. This is the only condition where the space
usage reduces, and it can only occur when an item is already tracked
in the CIL. Hence we are safe from CIL space usage underruns as a
result of log items decreasing in size when they are relogged.

Typically this reduction in CIL usage occurs from metadata blocks
being free, such as when a btree block merge occurs or a directory
enter/xattr entry is removed and the da-tree is reduced in size.
This generally results in a reduction in size of around a single
block in the CIL, but also tends to increase the number of log
vectors because the parent and sibling nodes in the tree needs to be
updated when a btree block is removed. If a multi-level merge
occurs, then we see reduction in size of 2+ blocks, but again the
log vector count goes up.

The other vector is inode fork size changes, which only log the
current size of the fork and ignore the previously logged size when
the fork is relogged. Hence if we are removing items from the inode
fork (dir/xattr removal in shortform, extent record removal in
extent form, etc) the relogged size of the inode for can decrease.

No other log items can decrease in size either because they are a
fixed size (e.g. dquots) or they cannot be relogged (e.g. relogging
an intent actually creates a new intent log item and doesn't relog
the old item at all.) Hence the only two vectors for CIL context
size reduction are relogging inode forks and marking buffers active
in the CIL as stale.

Long story short: the majority of the code does the right thing and
handles the reduction in log item size correctly, and only the CIL
hard throttle implementation is problematic and needs fixing. This
patch makes that fix, as well as adds comments in the log item code
that result in items shrinking in size when they are relogged as a
clear reminder that this can and does happen frequently.

The throttle fix is based upon the change Donald proposed, though it
goes further to ensure that once the throttle is activated, it
captures all tasks until the CIL push issues a wakeup, regardless of
whether the CIL space used has gone back under the throttle
threshold.

This ensures that we prevent tasks reducing the CIL slightly under
the throttle threshold and then making more changes that push it
well over the throttle limit. This is acheived by checking if the
throttle wait queue is already active as a condition of throttling.
Hence once we start throttling, we continue to apply the throttle
until the CIL context push wakes everything on the wait queue.

We can use waitqueue_active() for the waitqueue manipulations and
checks as they are all done under the ctx->xc_push_lock. Hence the
waitqueue has external serialisation and we can safely peek inside
the wait queue without holding the internal waitqueue locks.

Many thanks to Donald for his diagnostic and analysis work to
isolate the cause of this hang.

Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-21 10:06:14 -07:00
..
libxfs xfs: log stripe roundoff is a property of the log 2021-06-18 08:21:48 -07:00
scrub xfs: initial agnumber -> perag conversions for shrink 2021-06-08 09:13:13 -07:00
Kconfig xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n 2020-10-16 15:34:28 -07:00
kmem.c
kmem.h xfs: Remove kmem_zalloc_large() 2020-09-15 20:52:41 -07:00
Makefile
mrlock.h
xfs.h
xfs_acl.c xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_acl.h fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
xfs_aops.c iomap: remove unused private field from ioend 2021-05-04 08:54:29 -07:00
xfs_aops.h
xfs_attr_inactive.c xfs: Add delay ready attr remove routines 2021-06-01 10:49:47 -07:00
xfs_attr_list.c xfs: rename and simplify xfs_bmap_one_block 2021-04-15 09:35:50 -07:00
xfs_bio_io.c xfs: async blkdev cache flush 2021-06-21 10:05:51 -07:00
xfs_bmap_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_bmap_item.h
xfs_bmap_util.c xfs: remove unnecessary shifts 2021-06-01 12:53:59 -07:00
xfs_bmap_util.h
xfs_buf.c xfs: remove xfs_blkdev_issue_flush 2021-06-21 10:05:46 -07:00
xfs_buf.h xfs: remove ->b_offset handling for page backed buffers 2021-06-07 11:49:50 +10:00
xfs_buf_item.c xfs: Fix CIL throttle hang when CIL space used going backwards 2021-06-21 10:06:14 -07:00
xfs_buf_item.h xfs: move the buffer retry logic to xfs_buf.c 2020-09-15 20:52:38 -07:00
xfs_buf_item_recover.c xfs: fix finobt btree block recovery ordering 2020-09-30 07:28:52 -07:00
xfs_dir2_readdir.c xfs: remove XFS_IFINLINE 2021-04-15 09:35:51 -07:00
xfs_discard.c xfs: convert allocbt cursors to use perags 2021-06-02 10:48:24 +10:00
xfs_discard.h
xfs_dquot.c xfs: move the XFS_IFEXTENTS check into xfs_iread_extents 2021-04-15 09:35:50 -07:00
xfs_dquot.h xfs: refactor default quota grace period setting code 2020-09-15 20:52:40 -07:00
xfs_dquot_item.c
xfs_dquot_item.h
xfs_dquot_item_recover.c
xfs_error.c xfs: add error injection for per-AG resv failure 2021-03-25 16:47:53 -07:00
xfs_error.h
xfs_export.c
xfs_export.h
xfs_extent_busy.c xfs: pass perags through to the busy extent code 2021-06-02 10:48:24 +10:00
xfs_extent_busy.h xfs: pass perags through to the busy extent code 2021-06-02 10:48:24 +10:00
xfs_extfree_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_extfree_item.h
xfs_file.c xfs: remove xfs_blkdev_issue_flush 2021-06-21 10:05:46 -07:00
xfs_filestream.c xfs: move xfs_perag_get/put to xfs_ag.[ch] 2021-06-02 10:48:24 +10:00
xfs_filestream.h xfs: move the di_flags field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_fsmap.c xfs: remove agno from btree cursor 2021-06-02 10:48:24 +10:00
xfs_fsmap.h xfs: fix deadlock and streamline xfs_getfsmap performance 2020-10-07 08:40:29 -07:00
xfs_fsops.c xfs: make for_each_perag... a first class citizen 2021-06-02 10:48:24 +10:00
xfs_fsops.h xfs: get rid of xfs_growfs_{data,log}_t 2021-02-03 09:18:50 -08:00
xfs_globals.c xfs: consolidate the eofblocks and cowblocks workers 2021-02-03 09:18:49 -08:00
xfs_health.c xfs: drop IDONTCACHE on inodes when we mark them sick 2021-06-08 09:30:20 -07:00
xfs_icache.c xfs: rename struct xfs_eofblocks to xfs_icwalk 2021-06-08 09:30:20 -07:00
xfs_icache.h xfs: rename struct xfs_eofblocks to xfs_icwalk 2021-06-08 09:30:20 -07:00
xfs_icreate_item.c
xfs_icreate_item.h
xfs_inode.c xfs: clean up incore inode walk functions 2021-06-08 09:26:44 -07:00
xfs_inode.h xfs: get rid of xfs_dir_ialloc() 2021-06-02 10:48:24 +10:00
xfs_inode_item.c xfs: Fix CIL throttle hang when CIL space used going backwards 2021-06-21 10:06:14 -07:00
xfs_inode_item.h xfs: move the buffer retry logic to xfs_buf.c 2020-09-15 20:52:38 -07:00
xfs_inode_item_recover.c xfs: rename struct xfs_legacy_ictimestamp 2021-04-22 18:29:25 -07:00
xfs_ioctl.c xfs: rename struct xfs_eofblocks to xfs_icwalk 2021-06-08 09:30:20 -07:00
xfs_ioctl.h xfs: convert to fileattr 2021-04-12 15:04:29 +02:00
xfs_ioctl32.c xfs: convert to fileattr 2021-04-12 15:04:29 +02:00
xfs_ioctl32.h xfs: convert to fileattr 2021-04-12 15:04:29 +02:00
xfs_iomap.c xfs: remove XFS_IFEXTENTS 2021-04-15 09:35:51 -07:00
xfs_iomap.h
xfs_iops.c xfs: clean up open-coded fs block unit conversions 2021-06-01 12:53:59 -07:00
xfs_iops.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_itable.c xfs: move the di_crtime field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_itable.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_iwalk.c xfs: use perag for ialloc btree cursors 2021-06-02 10:48:24 +10:00
xfs_iwalk.h
xfs_linux.h xfs: async blkdev cache flush 2021-06-21 10:05:51 -07:00
xfs_log.c xfs: journal IO cache flush reductions 2021-06-21 10:06:08 -07:00
xfs_log.h xfs: journal IO cache flush reductions 2021-06-21 10:06:08 -07:00
xfs_log_cil.c xfs: Fix CIL throttle hang when CIL space used going backwards 2021-06-21 10:06:14 -07:00
xfs_log_priv.h xfs: journal IO cache flush reductions 2021-06-21 10:06:08 -07:00
xfs_log_recover.c xfs: convert raw ag walks to use for_each_perag 2021-06-02 10:48:24 +10:00
xfs_message.c
xfs_message.h xfs: validate extsz hints against rt extent size when rtinherit is set 2021-05-24 18:01:04 -07:00
xfs_mount.c xfs: move perag structure and setup to libxfs/xfs_ag.[ch] 2021-06-02 10:48:24 +10:00
xfs_mount.h xfs: move perag structure and setup to libxfs/xfs_ag.[ch] 2021-06-02 10:48:24 +10:00
xfs_mru_cache.c xfs: set WQ_SYSFS on all workqueues in debug mode 2021-02-03 09:18:49 -08:00
xfs_mru_cache.h
xfs_ondisk.h xfs: rename struct xfs_legacy_ictimestamp 2021-04-22 18:29:25 -07:00
xfs_pnfs.c xfs: move the di_size field to struct xfs_inode 2021-04-07 14:37:03 -07:00
xfs_pnfs.h
xfs_pwork.c xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_pwork.h xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_qm.c xfs: get rid of xfs_dir_ialloc() 2021-06-02 10:48:24 +10:00
xfs_qm.h xfs: move the quotaoff dqrele inode walk into xfs_icache.c 2021-06-03 15:56:02 -07:00
xfs_qm_bhv.c xfs: move the di_projid field to struct xfs_inode 2021-04-07 14:37:03 -07:00
xfs_qm_syscalls.c xfs: move the quotaoff dqrele inode walk into xfs_icache.c 2021-06-03 15:56:02 -07:00
xfs_quota.h xfs: remove xfs_qm_vop_chown_reserve 2021-02-03 09:18:49 -08:00
xfs_quotaops.c xfs: move the di_nblocks field to struct xfs_inode 2021-04-07 14:37:03 -07:00
xfs_refcount_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_refcount_item.h
xfs_reflink.c xfs: convert refcount btree cursor to use perags 2021-06-02 10:48:24 +10:00
xfs_reflink.h
xfs_rmap_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_rmap_item.h
xfs_rtalloc.c xfs: move the di_flags field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_rtalloc.h xfs: remove xfs_buf_t typedef 2020-12-16 16:07:34 -08:00
xfs_stats.c xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_stats.h xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_super.c xfs: remove xfs_blkdev_issue_flush 2021-06-21 10:05:46 -07:00
xfs_super.h xfs: remove xfs_blkdev_issue_flush 2021-06-21 10:05:46 -07:00
xfs_symlink.c xfs: get rid of xfs_dir_ialloc() 2021-06-02 10:48:24 +10:00
xfs_symlink.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_sysctl.c xfs: restore speculative_cow_prealloc_lifetime sysctl 2021-02-24 10:16:08 -08:00
xfs_sysctl.h xfs: consolidate the eofblocks and cowblocks workers 2021-02-03 09:18:49 -08:00
xfs_sysfs.c
xfs_sysfs.h
xfs_trace.c xfs: move perag structure and setup to libxfs/xfs_ag.[ch] 2021-06-02 10:48:24 +10:00
xfs_trace.h Merge tag 'xfs-delay-ready-attrs-v20.1' of https://github.com/allisonhenderson/xfs_work into xfs-5.14-merge4 2021-06-18 08:13:22 -07:00
xfs_trans.c xfs: update superblock counters correctly for !lazysbcount 2021-04-29 07:44:18 -07:00
xfs_trans.h xfs: remove obsolete AGF counter debugging 2021-04-29 07:44:18 -07:00
xfs_trans_ail.c
xfs_trans_buf.c xfs: remove xfs_buf_t typedef 2020-12-16 16:07:34 -08:00
xfs_trans_dquot.c xfs: shut down the filesystem if we screw up quota reservation 2021-02-03 09:18:49 -08:00
xfs_trans_priv.h
xfs_xattr.c xfs: prevent metadata files from being inactivated 2021-03-25 16:47:50 -07:00