linux-stable/fs/xfs
Darrick J. Wong ecd49f7a36 xfs: fix per-cpu CIL structure aggregation racing with dying cpus
In commit 7c8ade2121 ("xfs: implement percpu cil space used
calculation"), the XFS committed (log) item list code was converted to
use per-cpu lists and space tracking to reduce cpu contention when
multiple threads are modifying different parts of the filesystem and
hence end up contending on the log structures during transaction commit.
Each CPU tracks its own commit items and space usage, and these do not
have to be merged into the main CIL until either someone wants to push
the CIL items, or we run over a soft threshold and switch to slower (but
more accurate) accounting with atomics.

Unfortunately, the for_each_cpu iteration suffers from the same race
with cpu dying problem that was identified in commit 8b57b11cca
("pcpcntrs: fix dying cpu summation race") -- CPUs are removed from
cpu_online_mask before the CPUHP_XFS_DEAD callback gets called.  As a
result, both CIL percpu structure aggregation functions fail to collect
the items and accounted space usage at the correct point in time.

If we're lucky, the items that are collected from the online cpus exceed
the space given to those cpus, and the log immediately shuts down in
xlog_cil_insert_items due to the (apparent) log reservation overrun.
This happens periodically with generic/650, which exercises cpu hotplug
vs. the filesystem code:

smpboot: CPU 3 is now offline
XFS (sda3): ctx ticket reservation ran out. Need to up reservation
XFS (sda3): ticket reservation summary:
XFS (sda3):   unit res    = 9268 bytes
XFS (sda3):   current res = -40 bytes
XFS (sda3):   original count  = 1
XFS (sda3):   remaining count = 1
XFS (sda3): Filesystem has been shut down due to log error (0x2).

Applying the same sort of fix from 8b57b11cca to the CIL code seems
to make the generic/650 problem go away, but I've been told that tglx
was not happy when he saw:

"...the only thing we actually need to care about is that
percpu_counter_sum() iterates dying CPUs. That's trivial to do, and when
there are no CPUs dying, it has no addition overhead except for a
cpumask_or() operation."

The CPU hotplug code is rather complex and difficult to understand and I
don't want to try to understand the cpu hotplug locking well enough to
use cpu_dying mask.  Furthermore, there's a performance improvement that
could be had here.  Attach a private cpu mask to the CIL structure so
that we can track exactly which cpus have accessed the percpu data at
all.  It doesn't matter if the cpu has since gone offline; log item
aggregation will still find the items.  Better yet, we skip cpus that
have not recently logged anything.

Worse yet, Ritesh Harjani and Eric Sandeen both reported today that CPU
hot remove racing with an xfs mount can crash if the cpu_dead notifier
tries to access the log but the mount hasn't yet set up the log.

Link: https://lore.kernel.org/linux-xfs/ZOLzgBOuyWHapOyZ@dread.disaster.area/T/
Link: https://lore.kernel.org/lkml/877cuj1mt1.ffs@tglx/
Link: https://lore.kernel.org/lkml/20230414162755.281993820@linutronix.de/
Link: https://lore.kernel.org/linux-xfs/ZOVkjxWZq0YmjrJu@dread.disaster.area/T/
Cc: tglx@linutronix.de
Cc: peterz@infradead.org
Reported-by: ritesh.list@gmail.com
Reported-by: sandeen@sandeen.net
Fixes: af1c2146a5 ("xfs: introduce per-cpu CIL tracking structure")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-09-11 08:39:02 -07:00
..
libxfs New code for 6.6: 2023-08-30 12:34:12 -07:00
scrub New code for 6.6: 2023-08-30 12:34:12 -07:00
Kconfig xfs: track usage statistics of online fsck 2023-08-10 07:48:07 -07:00
kmem.c
kmem.h
Makefile xfs: move the realtime summary file scrubber to a separate source file 2023-08-10 07:48:09 -07:00
mrlock.h
xfs.h
xfs_acl.c xfs: convert to ctime accessor functions 2023-07-24 10:30:06 +02:00
xfs_acl.h
xfs_aops.c New code for 6.6: 2023-08-30 12:34:12 -07:00
xfs_aops.h
xfs_attr_inactive.c
xfs_attr_item.c
xfs_attr_item.h
xfs_attr_list.c
xfs_bio_io.c
xfs_bmap_item.c
xfs_bmap_item.h
xfs_bmap_util.c xfs: convert to ctime accessor functions 2023-07-24 10:30:06 +02:00
xfs_bmap_util.h
xfs_buf.c New code for 6.6: 2023-08-30 12:34:12 -07:00
xfs_buf.h xfs: allow scanning ranges of the buffer cache for live buffers 2023-08-10 07:48:03 -07:00
xfs_buf_item.c xfs: buffer pins need to hold a buffer reference 2023-06-05 04:05:27 +10:00
xfs_buf_item.h
xfs_buf_item_recover.c
xfs_dahash_test.c
xfs_dahash_test.h
xfs_dir2_readdir.c
xfs_discard.c
xfs_discard.h
xfs_dquot.c xfs: fix dqiterate thinko 2023-08-18 13:42:36 +05:30
xfs_dquot.h
xfs_dquot_item.c
xfs_dquot_item.h
xfs_dquot_item_recover.c
xfs_drain.c
xfs_drain.h
xfs_error.c
xfs_error.h
xfs_export.c
xfs_export.h
xfs_extent_busy.c xfs: don't block in busy flushing when freeing extents 2023-06-29 09:28:24 -07:00
xfs_extent_busy.h xfs: don't block in busy flushing when freeing extents 2023-06-29 09:28:24 -07:00
xfs_extfree_item.c xfs: Remove unneeded semicolon 2023-07-03 09:48:18 -07:00
xfs_extfree_item.h
xfs_file.c mm: remove enum page_entry_size 2023-08-24 16:20:30 -07:00
xfs_filestream.c xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag() 2023-06-05 14:48:15 +10:00
xfs_filestream.h
xfs_fsmap.c xfs: fix an agbno overflow in __xfs_getfsmap_datadev 2023-09-11 08:39:02 -07:00
xfs_fsmap.h
xfs_fsops.c Minor cleanups for 6.5: 2023-07-09 09:50:42 -07:00
xfs_fsops.h
xfs_globals.c
xfs_health.c
xfs_icache.c xfs: hide xfs_inode_is_allocated in scrub common code 2023-08-10 07:48:12 -07:00
xfs_icache.h xfs: hide xfs_inode_is_allocated in scrub common code 2023-08-10 07:48:12 -07:00
xfs_icreate_item.c
xfs_icreate_item.h
xfs_inode.c xfs: convert to ctime accessor functions 2023-07-24 10:30:06 +02:00
xfs_inode.h xfs: collect errors from inodegc for unlinked inode recovery 2023-06-05 14:48:15 +10:00
xfs_inode_item.c xfs: convert to ctime accessor functions 2023-07-24 10:30:06 +02:00
xfs_inode_item.h xfs: fix AGF vs inode cluster buffer deadlock 2023-06-05 04:08:27 +10:00
xfs_inode_item_recover.c
xfs_ioctl.c
xfs_ioctl.h
xfs_ioctl32.c
xfs_ioctl32.h
xfs_iomap.c xfs: don't allocate into the data fork for an unshare request 2023-05-02 09:14:51 +10:00
xfs_iomap.h
xfs_iops.c xfs: switch to multigrain timestamps 2023-08-11 09:04:57 +02:00
xfs_iops.h
xfs_itable.c xfs: convert to ctime accessor functions 2023-07-24 10:30:06 +02:00
xfs_itable.h
xfs_iunlink_item.c
xfs_iunlink_item.h
xfs_iwalk.c
xfs_iwalk.h
xfs_linux.h xfs: create scaffolding for creating debugfs entries 2023-08-10 07:48:07 -07:00
xfs_log.c xfs: journal geometry is not properly bounds checked 2023-06-29 09:28:24 -07:00
xfs_log.h
xfs_log_cil.c xfs: fix per-cpu CIL structure aggregation racing with dying cpus 2023-09-11 08:39:02 -07:00
xfs_log_priv.h xfs: fix per-cpu CIL structure aggregation racing with dying cpus 2023-09-11 08:39:02 -07:00
xfs_log_recover.c xfs: collect errors from inodegc for unlinked inode recovery 2023-06-05 14:48:15 +10:00
xfs_message.c
xfs_message.h
xfs_mount.c xfs: track usage statistics of online fsck 2023-08-10 07:48:07 -07:00
xfs_mount.h xfs: track usage statistics of online fsck 2023-08-10 07:48:07 -07:00
xfs_mru_cache.c
xfs_mru_cache.h
xfs_notify_failure.c xfs: fix the calculation for "end" and "length" 2023-07-02 09:26:19 -07:00
xfs_ondisk.h xfs: convert flex-array declarations in xfs attr shortform objects 2023-07-17 08:48:56 -07:00
xfs_pnfs.c
xfs_pnfs.h
xfs_pwork.c
xfs_pwork.h
xfs_qm.c
xfs_qm.h
xfs_qm_bhv.c
xfs_qm_syscalls.c
xfs_quota.h
xfs_quotaops.c
xfs_refcount_item.c
xfs_refcount_item.h
xfs_reflink.c xfs: use deferred frees for btree block freeing 2023-06-29 09:28:23 -07:00
xfs_reflink.h
xfs_rmap_item.c
xfs_rmap_item.h
xfs_rtalloc.c
xfs_rtalloc.h
xfs_stats.c
xfs_stats.h
xfs_super.c xfs: fix per-cpu CIL structure aggregation racing with dying cpus 2023-09-11 08:39:02 -07:00
xfs_super.h xfs: create scaffolding for creating debugfs entries 2023-08-10 07:48:07 -07:00
xfs_symlink.c
xfs_symlink.h
xfs_sysctl.c
xfs_sysctl.h
xfs_sysfs.c
xfs_sysfs.h
xfs_trace.c
xfs_trace.h New code for 6.6: 2023-08-30 12:34:12 -07:00
xfs_trans.c xfs: collect errors from inodegc for unlinked inode recovery 2023-06-05 14:48:15 +10:00
xfs_trans.h
xfs_trans_ail.c xfs: don't reverse order of items in bulk AIL insertion 2023-06-29 09:28:23 -07:00
xfs_trans_buf.c
xfs_trans_dquot.c
xfs_trans_priv.h
xfs_xattr.c
xfs_xattr.h