Commit graph

73 commits

Author SHA1 Message Date
Darrick J. Wong
46e0dd8965 xfs: rename xchk_get_inode -> xchk_iget_for_scrubbing
Dave Chinner suggested renaming this function to make more obvious what
it does.  The function returns an incore inode to callers that want to
scrub a metadata structure that hangs off an inode.  If the iget fails
with EINVAL, it will single-step the loading process to distinguish
between actually free inodes or impossible inumbers (ENOENT);
discrepancies between the inobt freemask and the free status in the
inode record (EFSCORRUPTED).  Any other negative errno is returned
unchanged.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 19:00:21 -07:00
Darrick J. Wong
30f8ee5e7e xfs: ensure that single-owner file blocks are not owned by others
For any file fork mapping that can only have a single owner, make sure
that there are no other rmap owners for that mapping.  This patch
requires the more detailed checking provided by xfs_rmap_count_owners so
that we can know how many rmap records for a given range of space had a
matching owner, how many had a non-matching owner, and how many
conflicted with the records that have a matching owner.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 19:00:16 -07:00
Darrick J. Wong
7ac14fa2bd xfs: ensure that all metadata and data blocks are not cow staging extents
Make sure that all filesystem metadata blocks and file data blocks are
not also marked as CoW staging extents.  The extra checking added here
was inspired by an actual VM host filesystem corruption incident due to
bugs in the CoW handling of 4.x kernels.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 19:00:12 -07:00
Darrick J. Wong
69010fe3ac xfs: standardize ondisk to incore conversion for bmap btrees
Fix all xfs_bmbt_disk_get_all callsites to call xfs_bmap_validate_extent
and bubble up corruption reports.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 19:00:04 -07:00
Darrick J. Wong
466c525d6d xfs: minimize overhead of drain wakeups by using jump labels
To reduce the runtime overhead even further when online fsck isn't
running, use a static branch key to decide if we call wake_up on the
drain.  For compilers that support jump labels, the call to wake_up is
replaced by a nop sled when nobody is waiting for intents to drain.

From my initial microbenchmarking, every transition of the static key
between the on and off states takes about 22000ns to complete; this is
paid entirely by the xfs_scrub process.  When the static key is off
(which it should be when fsck isn't running), the nop sled adds an
overhead of approximately 0.36ns to runtime code.  The post-atomic
lockless waiter check adds about 0.03ns, which is basically free.

For the few compilers that don't support jump labels, runtime code pays
the cost of calling wake_up on an empty waitqueue, which was observed to
be about 30ns.  However, most architectures that have sufficient memory
and CPU capacity to run XFS also support jump labels, so this is not
much of a worry.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:59 -07:00
Darrick J. Wong
ecc73f8a58 xfs: update copyright years for scrub/ files
Update the copyright years in the scrub/ source code files.  This isn't
required, but it's helpful to remind myself just how long it's taken to
develop this feature.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:57 -07:00
Darrick J. Wong
739a2fe042 xfs: fix author and spdx headers on scrub/ files
Fix the spdx tags to match current practice, and update the author
contact information.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:56 -07:00
Dave Chinner
c4d5660afb xfs: active perag reference counting
We need to be able to dynamically remove instantiated AGs from
memory safely, either for shrinking the filesystem or paging AG
state in and out of memory (e.g. supporting millions of AGs). This
means we need to be able to safely exclude operations from accessing
perags while dynamic removal is in progress.

To do this, introduce the concept of active and passive references.
Active references are required for high level operations that make
use of an AG for a given operation (e.g. allocation) and pin the
perag in memory for the duration of the operation that is operating
on the perag (e.g. transaction scope). This means we can fail to get
an active reference to an AG, hence callers of the new active
reference API must be able to handle lookup failure gracefully.

Passive references are used in low level code, where we might need
to access the perag structure for the purposes of completing high
level operations. For example, buffers need to use passive
references because:
- we need to be able to do metadata IO during operations like grow
  and shrink transactions where high level active references to the
  AG have already been blocked
- buffers need to pin the perag until they are reclaimed from
  memory, something that high level code has no direct control over.
- unused cached buffers should not prevent a shrink from being
  started.

Hence we have active references that will form exclusion barriers
for operations to be performed on an AG, and passive references that
will prevent reclaim of the perag until all objects with passive
references have been reclaimed themselves.

This patch introduce xfs_perag_grab()/xfs_perag_rele() as the API
for active AG reference functionality. We also need to convert the
for_each_perag*() iterators to use active references, which will
start the process of converting high level code over to using active
references. Conversion of non-iterator based code to active
references will be done in followup patches.

Note that the implementation using reference counting is really just
a development vehicle for the API to ensure we don't have any leaks
in the callers. Once we need to remove perag structures from memory
dyanmically, we will need a much more robust per-ag state transition
mechanism for preventing new references from being taken while we
wait for existing references to drain before removal from memory can
occur....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13 09:14:42 +11:00
Darrick J. Wong
5eef46358f xfs: teach scrub to flag non-extents format cow forks
CoW forks only exist in memory, which means that they can only ever have
an incore extent tree.  Hence they must always be FMT_EXTENTS, so check
this when we're scrubbing them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-16 15:25:05 -08:00
Darrick J. Wong
3178553701 xfs: check that CoW fork extents are not shared
Ensure that extents in an inode's CoW fork are not marked as shared in
the refcount btree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-16 15:25:04 -08:00
Darrick J. Wong
830ffa09fb xfs: block map scrub should handle incore delalloc reservations
Enhance the block map scrubber to check delayed allocation reservations.
Though there are no physical space allocations to check, we do need to
make sure that the range of file offsets being mapped are correct, and
to bump the lastoff cursor so that key order checking works correctly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-16 15:25:04 -08:00
Darrick J. Wong
6a5777865e xfs: teach scrub to check for adjacent bmaps when rmap larger than bmap
When scrub is checking file fork mappings against rmap records and
the rmap record starts before or ends after the bmap record, check the
adjacent bmap records to make sure that they're adjacent to the one
we're checking.  This helps us to detect cases where the rmaps cover
territory that the bmaps do not.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-16 15:25:04 -08:00
Darrick J. Wong
033985b6fe xfs: fix perag loop in xchk_bmap_check_rmaps
sparse complains that we can return an uninitialized error from this
function and that pag could be uninitialized.  We know that there are no
zero-AG filesystems and hence we had to call xchk_bmap_check_ag_rmaps at
least once, so this is not actually possible, but I'm too worn out from
automated complaints from unsophisticated AIs so let's just fix this and
move on to more interesting problems, eh?

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-16 15:25:04 -08:00
Darrick J. Wong
732436ef91 xfs: convert XFS_IFORK_PTR to a static inline helper
We're about to make this logic do a bit more, so convert the macro to a
static inline function for better typechecking and fewer shouty macros.
No functional changes here.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-07-09 15:17:21 -07:00
Dave Chinner
08d3e84fee xfs: pass perag to xfs_alloc_read_agf()
xfs_alloc_read_agf() initialises the perag if it hasn't been done
yet, so it makes sense to pass it the perag rather than pull a
reference from the buffer. This allows callers to be per-ag centric
rather than passing mount/agno pairs everywhere.

Whilst modifying the xfs_reflink_find_shared() function definition,
declare it static and remove the extern declaration as it is an
internal function only these days.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-07-07 19:07:40 +10:00
Darrick J. Wong
5b7ca8b313 xfs: simplify xfs_rmap_lookup_le call sites
Most callers of xfs_rmap_lookup_le will retrieve the btree record
immediately if the lookup succeeds.  The overlapped version of this
function (xfs_rmap_lookup_le_range) will return the record if the lookup
succeeds, so make the regular version do it too.  Get rid of the useless
len argument, since it's not part of the lookup key.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-04-27 10:47:19 -07:00
Chandan Babu R
95f0b95e2b xfs: Define max extent length based on on-disk format definition
The maximum extent length depends on maximum block count that can be stored in
a BMBT record. Hence this commit defines MAXEXTLEN based on
BMBT_BLOCKCOUNT_BITLEN.

While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN.

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11 04:11:17 +00:00
Darrick J. Wong
6ca444cfd6 xfs: prepare xfs_btree_cur for dynamic cursor heights
Split out the btree level information into a separate struct and put it
at the end of the cursor structure as a VLA.  Files with huge data forks
(and in the future, the realtime rmap btree) will require the ability to
support many more levels than a per-AG btree cursor, which means that
we're going to create per-btree type cursor caches to conserve memory
for the more common case.

Note that a subsequent patch actually introduces dynamic cursor heights.
This one merely rearranges the structure to prepare for that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-10-19 11:45:14 -07:00
Darrick J. Wong
61e0d0cc51 xfs: fix perag structure refcounting error when scrub fails
The kernel test robot found the following bug when running xfs/355 to
scrub a bmap btree:

XFS: Assertion failed: !sa->pag, file: fs/xfs/scrub/common.c, line: 412
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:110!
invalid opcode: 0000 [#1] SMP PTI
CPU: 2 PID: 1415 Comm: xfs_scrub Not tainted 5.14.0-rc4-00021-g48c6615cc557 #1
Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013
RIP: 0010:assfail+0x23/0x28 [xfs]
RSP: 0018:ffffc9000aacb890 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffc9000aacbcc8 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffc09e7dcd
RBP: ffffc9000aacbc80 R08: ffff8881fdf17d50 R09: 0000000000000000
R10: 000000000000000a R11: f000000000000000 R12: 0000000000000000
R13: ffff88820c7ed000 R14: 0000000000000001 R15: ffffc9000aacb980
FS:  00007f185b955700(0000) GS:ffff8881fdf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7f6ef43000 CR3: 000000020de38002 CR4: 00000000001706e0
Call Trace:
 xchk_ag_read_headers+0xda/0x100 [xfs]
 xchk_ag_init+0x15/0x40 [xfs]
 xchk_btree_check_block_owner+0x76/0x180 [xfs]
 xchk_btree_get_block+0xd0/0x140 [xfs]
 xchk_btree+0x32e/0x440 [xfs]
 xchk_bmap_btree+0xd4/0x140 [xfs]
 xchk_bmap+0x1eb/0x3c0 [xfs]
 xfs_scrub_metadata+0x227/0x4c0 [xfs]
 xfs_ioc_scrub_metadata+0x50/0xc0 [xfs]
 xfs_file_ioctl+0x90c/0xc40 [xfs]
 __x64_sys_ioctl+0x83/0xc0
 do_syscall_64+0x3b/0xc0

The unusual handling of errors while initializing struct xchk_ag is the
root cause here.  Since the beginning of xfs_scrub, the goal of
xchk_ag_read_headers has been to read all three AG header buffers and
attach them both to the xchk_ag structure and the scrub transaction.
Corruption errors on any of the three headers doesn't necessarily
trigger an immediate return to userspace, because xfs_scrub can also
tell us to /fix/ the problem.

In other words, it's possible for the xchk_ag init functions to return
an error code and a partially filled out structure so that scrub can use
however much information it managed to pull.  Before 5.15, it was
sufficient to cancel (or commit) the scrub transaction on the way out of
the scrub code to release the buffers.

Ccommit 48c6615cc5 added a reference to the perag structure to struct
xchk_ag.  Since perag structures are not attached to transactions like
buffers are, this adds the requirement that the perag ref be released
explicitly.  The scrub teardown function xchk_teardown was amended to do
this for the xchk_ag embedded in struct xfs_scrub.

Unfortunately, I forgot that certain parts of the scrub code probe
multiple AGs and therefore handle the initialization and cleanup on
their own.  Specifically, the bmbt scrubber will initialize it long
enough to cross-reference AG metadata for btree blocks and for the
extent mappings in the bmbt.

If one of the AG headers is corrupt, the init function returns with a
live perag structure reference and some of the AG header buffers.  If an
error occurs, the cross referencing will be noted as XCORRUPTion and
skipped, but the main scrub process will move on to the next record.
It is now necessary to release the perag reference before we try to
analyze something from a different AG, or else we'll trip over the
assertion noted above.

Fixes: 48c6615cc5 ("xfs: grab active perag ref when reading AG headers")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2021-08-20 13:20:33 -07:00
Dave Chinner
ebd9027d08 xfs: convert xfs_sb_version_has checks to use mount features
This is a conversion of the remaining xfs_sb_version_has..(sbp)
checks to use xfs_has_..(mp) feature checks.

This was largely done with a vim replacement macro that did:

:0,$s/xfs_sb_version_has\(.*\)&\(.*\)->m_sb/xfs_has_\1\2/g<CR>

A couple of other variants were also used, and the rest touched up
by hand.

$ size -t fs/xfs/built-in.a
	   text    data     bss     dec     hex filename
before	1127533  311352     484 1439369  15f689 (TOTALS)
after	1125360  311352     484 1437196  15ee0c (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19 10:07:14 -07:00
Dave Chinner
38c26bfd90 xfs: replace xfs_sb_version checks with feature flag checks
Convert the xfs_sb_version_hasfoo() to checks against
mp->m_features. Checks of the superblock itself during disk
operations (e.g. in the read/write verifiers and the to/from disk
formatters) are not converted - they operate purely on the
superblock state. Everything else should use the mount features.

Large parts of this conversion were done with sed with commands like
this:

for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
	sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
done

With manual cleanups for things like "xfs_has_extflgbit" and other
little inconsistencies in naming.

The result is ia lot less typing to check features and an XFS binary
size reduced by a bit over 3kB:

$ size -t fs/xfs/built-in.a
	text	   data	    bss	    dec	    hex	filenam
before	1130866  311352     484 1442702  16038e (TOTALS)
after	1127727  311352     484 1439563  15f74b (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19 10:07:12 -07:00
Darrick J. Wong
22ece4e836 xfs: mark the record passed into xchk_btree functions as const
xchk_btree calls a user-supplied function to validate each btree record
that it finds.  Those functions are not supposed to change the record
data, so mark the parameter const.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-18 18:46:01 -07:00
Darrick J. Wong
159eb69dba xfs: make the record pointer passed to query_range functions const
The query_range functions are supposed to call a caller-supplied
function on each record found in the dataset.  These functions don't
own the memory storing the record, so don't let them change the record.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-18 18:46:01 -07:00
Darrick J. Wong
48c6615cc5 xfs: grab active perag ref when reading AG headers
This patch prepares scrub to deal with the possibility of tearing down
entire AGs by changing the order of resource acquisition to match the
rest of the XFS codebase.  In other words, scrub now grabs AG resources
in order of: perag structure, then AGI/AGF/AGFL buffers, then btree
cursors; and releases them in reverse order.

This requires us to distinguish xchk_ag_init callers -- some are
responding to a user request to check AG metadata, in which case we can
return ENOENT to userspace; but other callers have an ondisk reference
to an AG that they're trying to cross-reference.  In this second case,
the lack of an AG means there's ondisk corruption, since ondisk metadata
cannot point into nonexistent space.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2021-08-09 11:13:17 -07:00
Linus Torvalds
9f7b640f00 New code for 5.14:
- Refactor the buffer cache to use bulk page allocation
 - Convert agnumber-based AG iteration to walk per-AG structures
 - Clean up some unit conversions and other code warts
 - Reduce spinlock contention in the directio fastpath
 - Collapse all the inode cache walks into a single function
 - Remove indirect function calls from the inode cache walk code
 - Dramatically reduce the number of cache flushes sent when writing log
   buffers
 - Preserve inode sickness reports for longer
 - Rename xfs_eofblocks since it controls inode cache walks
 - Refactor the extended attribute code to prepare it for the addition
   of log intent items to make xattrs fully transactional
 - A few fixes to earlier large patchsets
 - Log recovery fixes so that we don't accidentally mark the log clean
   when log intent recovery fails
 - Fix some latent SOB errors
 - Clean up shutdown messages that get logged to dmesg
 - Fix a regression in the online shrink code
 - Fix a UAF in the buffer logging code if the fs goes offline
 - Fix uninitialized error variables
 - Fix a UAF in the CIL when commited log item callbacks race with a
   shutdown
 - Fix a bug where the CIL could hang trying to push part of the log ring
   buffer that hasn't been filled yet
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmDXP38ACgkQ+H93GTRK
 tOsKzw//eHvEgeyBo7ek06GDsUph2kQVR9AJWE7MNMiBFxlmL8R9H225xJK7Qmcr
 YswcyEeDq8cNXbXDA249ueuMb+DxhZPY68hPK5BJ3KsbvL2RZV0lJCbk492l4cgb
 IvBJiG/MDo55km83tdr81AlmFYQM7rSQz5MbVogGxxsnp0ul3VpIrJZba8kPRDQ1
 mZzH2fdlnE9Ozw/CfvjSgT1pySyFpxNeTRucYXUQil1hL1AGTBw7rGGNnccS090y
 u/EawQ4WJ131m8O3+WomUmaGyZFlWvTpHzukKxvrEvZ6AG+HpIhMcbZ5J6nkRTY4
 xxhUBG2qNKIcgPmPwAGmx1cylcsOCNKQgp+fko9tAZjEkgT5cbCpqpjGgjNB0RCf
 pB0PY6idCFl9hmBpVgMWz2AZ9IsDmK54qufmLtzq/zN8cThzt6A95UUR0rGu5Kd8
 CUmmdQTYl0GqlTTszCO2rw1+zRtcasMpBVmeYHDxy00bd1dHLUJ6o8DuXRYTTQti
 J/6CZVVD56jieRb+uvrOq4mhiPR2kynciiu1dXdY5kx79kKom6HMBBvtTl8b9kmh
 smWihfip7BTpz5vFzcwFmMxFwzW3K4LnDZl7qEGqXDEIHOL+pRWazU2yN3JZRGyd
 z4SQMJuER0HTTA0yO09c3/CX9onorhjUIMgQ9U25l1hdyFna0+o=
 =08Q9
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.14-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "Most of the work this cycle has been on refactoring various parts of
  the codebase. The biggest non-cleanup changes are (1) reducing the
  number of cache flushes sent when writing the log; (2) a substantial
  number of log recovery fixes; and (3) I started accepting pull
  requests from contributors if the commits in their branches match
  what's been sent to the list.

  For a week or so I /had/ staged a major cleanup of the logging code
  from Dave Chinner, but it exposed so many lurking bugs in other parts
  of the logging and log recovery code that I decided to defer that
  patchset until we can address those latent bugs.

  Larger cleanups this time include walking the incore inode cache (me)
  and rework of the extended attribute code (Allison) to prepare it for
  adding logged xattr updates (and directory tree parent pointers) in
  future releases.

  Summary:

   - Refactor the buffer cache to use bulk page allocation

   - Convert agnumber-based AG iteration to walk per-AG structures

   - Clean up some unit conversions and other code warts

   - Reduce spinlock contention in the directio fastpath

   - Collapse all the inode cache walks into a single function

   - Remove indirect function calls from the inode cache walk code

   - Dramatically reduce the number of cache flushes sent when writing
     log buffers

   - Preserve inode sickness reports for longer

   - Rename xfs_eofblocks since it controls inode cache walks

   - Refactor the extended attribute code to prepare it for the addition
     of log intent items to make xattrs fully transactional

   - A few fixes to earlier large patchsets

   - Log recovery fixes so that we don't accidentally mark the log clean
     when log intent recovery fails

   - Fix some latent SOB errors

   - Clean up shutdown messages that get logged to dmesg

   - Fix a regression in the online shrink code

   - Fix a UAF in the buffer logging code if the fs goes offline

   - Fix uninitialized error variables

   - Fix a UAF in the CIL when commited log item callbacks race with a
     shutdown

   - Fix a bug where the CIL could hang trying to push part of the log
     ring buffer that hasn't been filled yet"

* tag 'xfs-5.14-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (102 commits)
  xfs: don't wait on future iclogs when pushing the CIL
  xfs: Fix a CIL UAF by getting get rid of the iclog callback lock
  xfs: remove callback dequeue loop from xlog_state_do_iclog_callbacks
  xfs: don't nest icloglock inside ic_callback_lock
  xfs: Initialize error in xfs_attr_remove_iter
  xfs: fix endianness issue in xfs_ag_shrink_space
  xfs: remove dead stale buf unpin handling code
  xfs: hold buffer across unpin and potential shutdown processing
  xfs: force the log offline when log intent item recovery fails
  xfs: fix log intent recovery ENOSPC shutdowns when inactivating inodes
  xfs: shorten the shutdown messages to a single line
  xfs: print name of function causing fs shutdown instead of hex pointer
  xfs: fix type mismatches in the inode reclaim functions
  xfs: separate primary inode selection criteria in xfs_iget_cache_hit
  xfs: refactor the inode recycling code
  xfs: add iclog state trace events
  xfs: xfs_log_force_lsn isn't passed a LSN
  xfs: Fix CIL throttle hang when CIL space used going backwards
  xfs: journal IO cache flush reductions
  xfs: remove need_start_rec parameter from xlog_write()
  ...
2021-07-02 14:30:27 -07:00
Dave Chinner
50f02fe333 xfs: remove agno from btree cursor
Now that everything passes a perag, the agno is not needed anymore.
Convert all the users to use pag->pag_agno instead and remove the
agno from the cursor. This was largely done as an automated search
and replace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 10:48:24 +10:00
Dave Chinner
a81a06211f xfs: convert refcount btree cursor to use perags
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 10:48:24 +10:00
Dave Chinner
fa9c3c1973 xfs: convert rmap btree cursor to using a perag
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 10:48:24 +10:00
Dave Chinner
be9fb17d88 xfs: add a perag to the btree cursor
Which will eventually completely replace the agno in it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-06-02 10:48:24 +10:00
Dave Chinner
934933c3ee xfs: convert raw ag walks to use for_each_perag
Convert the raw walks to an iterator, pulling the current AG out of
pag->pag_agno instead of the loop iterator variable.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 10:48:24 +10:00
Gustavo A. R. Silva
53004ee78d xfs: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix
the following warnings by replacing /* fall through */ comments,
and its variants, with the new pseudo-keyword macro fallthrough:

fs/xfs/libxfs/xfs_alloc.c:3167:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_da_btree.c:286:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_ag_resv.c:346:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_ag_resv.c:388:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_bmap_util.c:246:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_export.c:88:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_export.c:96:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_file.c:867:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_ioctl.c:562:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_ioctl.c:1548:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_iomap.c:1040:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_inode.c:852:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_log.c:2627:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_trans_buf.c:298:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/bmap.c:275:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/btree.c:48:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/common.c:85:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/common.c:138:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/common.c:698:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/dabtree.c:51:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/repair.c:951:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/agheader.c:89:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]

Notice that Clang doesn't recognize /* fall through */ comments as
implicit fall-through markings, so in order to globally enable
-Wimplicit-fallthrough for Clang, these comments need to be
replaced with fallthrough; in the whole codebase.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-05-26 14:51:26 -05:00
Christoph Hellwig
b2197a36c0 xfs: remove XFS_IFEXTENTS
The in-memory XFS_IFEXTENTS is now only used to check if an inode with
extents still needs the extents to be read into memory before doing
operations that need the extent map.  Add a new xfs_need_iread_extents
helper that returns true for btree format forks that do not have any
entries in the in-memory extent btree, and use that instead of checking
the XFS_IFEXTENTS flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:51 -07:00
Christoph Hellwig
862a804aae xfs: move the XFS_IFEXTENTS check into xfs_iread_extents
Move the XFS_IFEXTENTS check from the callers into xfs_iread_extents to
simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:50 -07:00
Darrick J. Wong
026f57ebe1 xfs: get rid of the ip parameter to xchk_setup_*
Now that the scrub context stores a pointer to the file that was used to
invoke the scrub call, the struct xfs_inode pointer that we passed to
all the setup functions is no longer necessary.  This is only ever used
if the caller wants us to scrub the metadata of the open file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-04-09 10:27:18 -07:00
Joseph Qi
2e984badbc xfs: remove unneeded return value check for *init_cursor()
Since *init_cursor() can always return a valid cursor, the NULL check
in caller is unneeded. So clean them up.
This also keeps the behavior consistent with other callers.

Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-12-09 09:49:38 -08:00
Darrick J. Wong
33005fd0a5 xfs: refactor file range validation
Refactor all the open-coded validation of file block ranges into a
single helper, and teach the bmap scrubber to check the ranges.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-12-09 09:49:38 -08:00
Darrick J. Wong
18695ad425 xfs: refactor realtime volume extent validation
Refactor all the open-coded validation of realtime device extents into a
single helper.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-12-09 09:49:38 -08:00
Darrick J. Wong
67457eb0d2 xfs: refactor data device extent validation
Refactor all the open-coded validation of non-static data device extents
into a single helper.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-12-09 09:49:38 -08:00
Darrick J. Wong
498fe261f0 xfs: strengthen rmap record flags checking
We always know the correct state of the rmap record flags (attr, bmbt,
unwritten) so check them by direct comparison.

Fixes: d852657ccf ("xfs: cross-reference reverse-mapping btree")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-11-18 09:23:50 -08:00
Darrick J. Wong
5dda3897fd xfs: set the unwritten bit in rmap lookup flags in xchk_bmap_get_rmapextents
When the bmbt scrubber is looking up rmap extents, we need to set the
extent flags from the bmbt record fully.  This will matter once we fix
the rmap btree comparison functions to check those flags correctly.

Fixes: d852657ccf ("xfs: cross-reference reverse-mapping btree")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-11-10 16:47:51 -08:00
Darrick J. Wong
eb0efe5063 xfs: don't eat an EIO/ENOSPC writeback error when scrubbing data fork
The data fork scrubber calls filemap_write_and_wait to flush dirty pages
and delalloc reservations out to disk prior to checking the data fork's
extent mappings.  Unfortunately, this means that scrub can consume the
EIO/ENOSPC errors that would otherwise have stayed around in the address
space until (we hope) the writer application calls fsync to persist data
and collect errors.  The end result is that programs that wrote to a
file might never see the error code and proceed as if nothing were
wrong.

xfs_scrub is not in a position to notify file writers about the
writeback failure, and it's only here to check metadata, not file
contents.  Therefore, if writeback fails, we should stuff the error code
back into the address space so that an fsync by the writer application
can pick that up.

Fixes: 99d9d8d05d ("xfs: scrub inode block mappings")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-07-06 10:46:56 -07:00
Christoph Hellwig
f7e67b20ec xfs: move the fork format fields into struct xfs_ifork
Both the data and attr fork have a format that is stored in the legacy
idinode.  Move it into the xfs_ifork structure instead, where it uses
up padding.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig
daf83964a3 xfs: move the per-fork nextents fields into struct xfs_ifork
There are there are three extents counters per inode, one for each of
the forks.  Two are in the legacy icdinode and one is directly in
struct xfs_inode.  Switch to a single counter in the xfs_ifork structure
where it uses up padding at the end of the structure.  This simplifies
various bits of code that just wants the number of extents counter and
can now directly dereference it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Darrick J. Wong
5fd68bdb5a xfs: clean up xchk_bmap_check_rmaps usage of XFS_IFORK_Q
XFS_IFORK_Q is supposed to be a predicate, not a function returning a
value.  Its usage is in xchk_bmap_check_rmaps is incorrect, but that
function only cares about whether or not the "size" of the data is zero
or not.  Convert that logic to use a proper boolean, and teach the
caller to skip the call entirely if the end result would be that we'd do
nothing anyway.  This avoids a crash later in this series.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[hch: generalized the NULL ifor check]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-05-19 09:40:58 -07:00
Dave Chinner
92219c292a xfs: convert btree cursor inode-private member names
bc_private.b -> bc_ino conversion via script:

$ sed -i 's/bc_private\.b/bc_ino/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

And then revert the change to the bc_ino #define in
fs/xfs/libxfs/xfs_btree.h manually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: tweak the subject line slightly]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:14 -07:00
Dave Chinner
576af73228 xfs: convert btree cursor ag-private member name
bc_private.a -> bc_ag conversion via script:

`sed -i 's/bc_private\.a/bc_ag/g' fs/xfs/*[ch] fs/xfs/*/*[ch]`

And then revert the change to the bc_ag #define in
fs/xfs/libxfs/xfs_btree.h manually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:14 -07:00
Darrick J. Wong
e7ee96dfb8 xfs: remove all *_ITER_ABORT values
Use -ECANCELED to signal "stop iterating" instead of these magical
*_ITER_ABORT values, since it's duplicative.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-29 21:22:41 -07:00
Darrick J. Wong
519e5869d5 xfs: bmap scrub should only scrub records once
The inode block mapping scrub function does more work for btree format
extent maps than is absolutely necessary -- first it will walk the bmbt
and check all the entries, and then it will load the incore tree and
check every entry in that tree, possibly for a second time.

Simplify the code and decrease check runtime by separating the two
responsibilities.  The bmbt walk will make sure the incore extent
mappings are loaded, check the shape of the bmap btree (via xchk_btree)
and check that every bmbt record has a corresponding incore extent map;
and the incore extent map walk takes all the responsibility for checking
the mapping records and cross referencing them with other AG metadata.

This enables us to clean up some messy parameter handling and reduce
redundant code.  Rename a few functions to make the split of
responsibilities clearer.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-08-26 17:43:15 -07:00
Eric Sandeen
250d4b4c40 xfs: remove unused header files
There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.

nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this.  I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:30:43 -07:00
Darrick J. Wong
f8c1d7023e xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t
Teach scrub to flag extent maps that exceed the range that can be mapped
with a xfs_dablk_t.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:40 -08:00