New code for 6.8:

* New features/functionality
     * Online repair
       * Reserve disk space for online repairs.
       * Fix misinteraction between the AIL and btree bulkloader because of
         which the bulk load fails to queue a buffer for writeback if it
         happens to be on the AIL list.
       * Prevent transaction reservation overflows when reaping blocks during
         online repair.
       * Whenever possible, bulkloader now copies multiple records into a
         block.
       * Support repairing of
         1. Per-AG free space, inode and refcount btrees.
 	2. Ondisk inodes.
 	3. File data and attribute fork mappings.
       * Verify the contents of
         1. Inode and data fork of realtime bitmap file.
 	2. Quota files.
     * Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks about
       a pmem device being removed.
 
   * Bug fixes
     * Fix memory leak of recovered attri intent items.
     * Fix UAF during log intent recovery.
     * Fix realtime geometry integer overflows.
     * Prevent scrub from live locking in xchk_iget.
     * Prevent fs shutdown when removing files during low free disk space.
     * Prevent transaction reservation overflow when extending an RT device.
     * Prevent incorrect warning from being printed when extending a
       filesystem.
     * Fix an off-by-one error in xreap_agextent_binval.
     * Serialize access to perag radix tree during deletion operation.
     * Fix perag memory leak during growfs.
     * Allow allocation of minlen realtime extent when the maximum sized
       realtime free extent is minlen in size.
 
   * Cleanups
     * Remove duplicate boilerplate code spread across functionality associated
       with different log items.
     * Cleanup resblks interfaces.
     * Pass defer ops pointer to defer helpers instead of an enum.
     * Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings.
     * Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size of
       structures and structure member offsets. This is done in order to be
       able to share the code with userspace.
     * Move XFS documentation under a new directory specific to XFS.
     * Do not invoke deferred ops' ->create_done callback if the deferred
       operation does not have an intent item associated with it.
     * Remove duplicate inclusion of header files from scrub/health.c.
     * Refactor Realtime code.
     * Cleanup attr code.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZZJQbwAKCRAH7y4RirJu
 9JjkAP9Zg0QZNmAMsZwvgEBbuF/OnHKl4GmPA5uq0jPmSWCOqAEA0HjlOmuNfQWn
 93fIw6CPbt+9QCluTYBwUisKLIJ/wgA=
 =qmO0
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Chandan Babu:
 "New features/functionality:
   - Online repair:
       - Reserve disk space for online repairs
       - Fix misinteraction between the AIL and btree bulkloader because
         of which the bulk load fails to queue a buffer for writeback if
         it happens to be on the AIL list
       - Prevent transaction reservation overflows when reaping blocks
         during online repair
       - Whenever possible, bulkloader now copies multiple records into
         a block
       - Support repairing of
           1. Per-AG free space, inode and refcount btrees
           2. Ondisk inodes
           3. File data and attribute fork mappings
       - Verify the contents of
           1. Inode and data fork of realtime bitmap file
           2. Quota files
   - Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks
     about a pmem device being removed

  Bug fixes:
   - Fix memory leak of recovered attri intent items
   - Fix UAF during log intent recovery
   - Fix realtime geometry integer overflows
   - Prevent scrub from live locking in xchk_iget
   - Prevent fs shutdown when removing files during low free disk space
   - Prevent transaction reservation overflow when extending an RT
     device
   - Prevent incorrect warning from being printed when extending a
     filesystem
   - Fix an off-by-one error in xreap_agextent_binval
   - Serialize access to perag radix tree during deletion operation
   - Fix perag memory leak during growfs
   - Allow allocation of minlen realtime extent when the maximum sized
     realtime free extent is minlen in size

  Cleanups:
   - Remove duplicate boilerplate code spread across functionality
     associated with different log items
   - Cleanup resblks interfaces
   - Pass defer ops pointer to defer helpers instead of an enum
   - Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings
   - Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size
     of structures and structure member offsets. This is done in order
     to be able to share the code with userspace
   - Move XFS documentation under a new directory specific to XFS
   - Do not invoke deferred ops' ->create_done callback if the deferred
     operation does not have an intent item associated with it
   - Remove duplicate inclusion of header files from scrub/health.c
   - Refactor Realtime code
   - Cleanup attr code"

* tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (123 commits)
  xfs: use the op name in trace_xlog_intent_recovery_failed
  xfs: fix a use after free in xfs_defer_finish_recovery
  xfs: turn the XFS_DA_OP_REPLACE checks in xfs_attr_shortform_addname into asserts
  xfs: remove xfs_attr_sf_hdr_t
  xfs: remove struct xfs_attr_shortform
  xfs: use xfs_attr_sf_findname in xfs_attr_shortform_getvalue
  xfs: remove xfs_attr_shortform_lookup
  xfs: simplify xfs_attr_sf_findname
  xfs: move the xfs_attr_sf_lookup tracepoint
  xfs: return if_data from xfs_idata_realloc
  xfs: make if_data a void pointer
  xfs: fold xfs_rtallocate_extent into xfs_bmap_rtalloc
  xfs: simplify and optimize the RT allocation fallback cascade
  xfs: reorder the minlen and prod calculations in xfs_bmap_rtalloc
  xfs: remove XFS_RTMIN/XFS_RTMAX
  xfs: remove rt-wrappers from xfs_format.h
  xfs: factor out a xfs_rtalloc_sumlevel helper
  xfs: tidy up xfs_rtallocate_extent_exact
  xfs: merge the calls to xfs_rtallocate_range in xfs_rtallocate_block
  xfs: reflow the tail end of xfs_rtallocate_extent_block
  ...
This commit is contained in:
Linus Torvalds 2024-01-10 08:45:22 -08:00
commit 12958e9c4c
146 changed files with 12802 additions and 3018 deletions

View File

@ -121,8 +121,5 @@ Documentation for filesystem implementations.
udf udf
virtiofs virtiofs
vfat vfat
xfs-delayed-logging-design xfs/index
xfs-maintainer-entry-profile
xfs-self-describing-metadata
xfs-online-fsck-design
zonefs zonefs

View File

@ -0,0 +1,14 @@
.. SPDX-License-Identifier: GPL-2.0
============================
XFS Filesystem Documentation
============================
.. toctree::
:maxdepth: 2
:numbered:
xfs-delayed-logging-design
xfs-maintainer-entry-profile
xfs-self-describing-metadata
xfs-online-fsck-design

View File

@ -962,7 +962,7 @@ disk, but these buffer verifiers cannot provide any consistency checking
between metadata structures. between metadata structures.
For more information, please see the documentation for For more information, please see the documentation for
Documentation/filesystems/xfs-self-describing-metadata.rst Documentation/filesystems/xfs/xfs-self-describing-metadata.rst
Reverse Mapping Reverse Mapping
--------------- ---------------

View File

@ -105,4 +105,4 @@ to do something different in the near future.
../driver-api/media/maintainer-entry-profile ../driver-api/media/maintainer-entry-profile
../driver-api/vfio-pci-device-specific-driver-acceptance ../driver-api/vfio-pci-device-specific-driver-acceptance
../nvme/feature-and-quirk-policy ../nvme/feature-and-quirk-policy
../filesystems/xfs-maintainer-entry-profile ../filesystems/xfs/xfs-maintainer-entry-profile

View File

@ -23846,10 +23846,10 @@ S: Supported
W: http://xfs.org/ W: http://xfs.org/
C: irc://irc.oftc.net/xfs C: irc://irc.oftc.net/xfs
T: git git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git T: git git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
P: Documentation/filesystems/xfs-maintainer-entry-profile.rst P: Documentation/filesystems/xfs/xfs-maintainer-entry-profile.rst
F: Documentation/ABI/testing/sysfs-fs-xfs F: Documentation/ABI/testing/sysfs-fs-xfs
F: Documentation/admin-guide/xfs.rst F: Documentation/admin-guide/xfs.rst
F: Documentation/filesystems/xfs-* F: Documentation/filesystems/xfs/*
F: fs/xfs/ F: fs/xfs/
F: include/uapi/linux/dqblk_xfs.h F: include/uapi/linux/dqblk_xfs.h
F: include/uapi/linux/fsmap.h F: include/uapi/linux/fsmap.h

View File

@ -326,7 +326,8 @@ void kill_dax(struct dax_device *dax_dev)
return; return;
if (dax_dev->holder_data != NULL) if (dax_dev->holder_data != NULL)
dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0); dax_holder_notify_failure(dax_dev, 0, U64_MAX,
MF_MEM_PRE_REMOVE);
clear_bit(DAXDEV_ALIVE, &dax_dev->flags); clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
synchronize_srcu(&dax_srcu); synchronize_srcu(&dax_srcu);

View File

@ -145,6 +145,7 @@ ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
xfs-y += $(addprefix scrub/, \ xfs-y += $(addprefix scrub/, \
trace.o \ trace.o \
agb_bitmap.o \
agheader.o \ agheader.o \
alloc.o \ alloc.o \
attr.o \ attr.o \
@ -175,14 +176,32 @@ xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \
rtsummary.o \ rtsummary.o \
) )
xfs-$(CONFIG_XFS_QUOTA) += scrub/quota.o xfs-$(CONFIG_XFS_QUOTA) += $(addprefix scrub/, \
dqiterate.o \
quota.o \
)
# online repair # online repair
ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y) ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
xfs-y += $(addprefix scrub/, \ xfs-y += $(addprefix scrub/, \
agheader_repair.o \ agheader_repair.o \
alloc_repair.o \
bmap_repair.o \
cow_repair.o \
ialloc_repair.o \
inode_repair.o \
newbt.o \
reap.o \ reap.o \
refcount_repair.o \
repair.o \ repair.o \
) )
xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \
rtbitmap_repair.o \
)
xfs-$(CONFIG_XFS_QUOTA) += $(addprefix scrub/, \
quota_repair.o \
)
endif endif
endif endif

View File

@ -332,6 +332,31 @@ xfs_agino_range(
return __xfs_agino_range(mp, xfs_ag_block_count(mp, agno), first, last); return __xfs_agino_range(mp, xfs_ag_block_count(mp, agno), first, last);
} }
/*
* Free perag within the specified AG range, it is only used to free unused
* perags under the error handling path.
*/
void
xfs_free_unused_perag_range(
struct xfs_mount *mp,
xfs_agnumber_t agstart,
xfs_agnumber_t agend)
{
struct xfs_perag *pag;
xfs_agnumber_t index;
for (index = agstart; index < agend; index++) {
spin_lock(&mp->m_perag_lock);
pag = radix_tree_delete(&mp->m_perag_tree, index);
spin_unlock(&mp->m_perag_lock);
if (!pag)
break;
xfs_buf_hash_destroy(pag);
xfs_defer_drain_free(&pag->pag_intents_drain);
kmem_free(pag);
}
}
int int
xfs_initialize_perag( xfs_initialize_perag(
struct xfs_mount *mp, struct xfs_mount *mp,
@ -424,19 +449,14 @@ xfs_initialize_perag(
out_remove_pag: out_remove_pag:
xfs_defer_drain_free(&pag->pag_intents_drain); xfs_defer_drain_free(&pag->pag_intents_drain);
spin_lock(&mp->m_perag_lock);
radix_tree_delete(&mp->m_perag_tree, index); radix_tree_delete(&mp->m_perag_tree, index);
spin_unlock(&mp->m_perag_lock);
out_free_pag: out_free_pag:
kmem_free(pag); kmem_free(pag);
out_unwind_new_pags: out_unwind_new_pags:
/* unwind any prior newly initialized pags */ /* unwind any prior newly initialized pags */
for (index = first_initialised; index < agcount; index++) { xfs_free_unused_perag_range(mp, first_initialised, agcount);
pag = radix_tree_delete(&mp->m_perag_tree, index);
if (!pag)
break;
xfs_buf_hash_destroy(pag);
xfs_defer_drain_free(&pag->pag_intents_drain);
kmem_free(pag);
}
return error; return error;
} }
@ -984,7 +1004,7 @@ xfs_ag_shrink_space(
if (err2 != -ENOSPC) if (err2 != -ENOSPC)
goto resv_err; goto resv_err;
err2 = __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL, err2 = xfs_free_extent_later(*tpp, args.fsbno, delta, NULL,
XFS_AG_RESV_NONE, true); XFS_AG_RESV_NONE, true);
if (err2) if (err2)
goto resv_err; goto resv_err;

View File

@ -80,6 +80,16 @@ struct xfs_perag {
*/ */
uint16_t pag_checked; uint16_t pag_checked;
uint16_t pag_sick; uint16_t pag_sick;
#ifdef CONFIG_XFS_ONLINE_REPAIR
/*
* Alternate btree heights so that online repair won't trip the write
* verifiers while rebuilding the AG btrees.
*/
uint8_t pagf_repair_levels[XFS_BTNUM_AGF];
uint8_t pagf_repair_refcount_level;
#endif
spinlock_t pag_state_lock; spinlock_t pag_state_lock;
spinlock_t pagb_lock; /* lock for pagb_tree */ spinlock_t pagb_lock; /* lock for pagb_tree */
@ -133,6 +143,8 @@ __XFS_AG_OPSTATE(prefers_metadata, PREFERS_METADATA)
__XFS_AG_OPSTATE(allows_inodes, ALLOWS_INODES) __XFS_AG_OPSTATE(allows_inodes, ALLOWS_INODES)
__XFS_AG_OPSTATE(agfl_needs_reset, AGFL_NEEDS_RESET) __XFS_AG_OPSTATE(agfl_needs_reset, AGFL_NEEDS_RESET)
void xfs_free_unused_perag_range(struct xfs_mount *mp, xfs_agnumber_t agstart,
xfs_agnumber_t agend);
int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t agcount, int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t agcount,
xfs_rfsblock_t dcount, xfs_agnumber_t *maxagi); xfs_rfsblock_t dcount, xfs_agnumber_t *maxagi);
int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno); int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno);

View File

@ -411,6 +411,8 @@ xfs_ag_resv_free_extent(
fallthrough; fallthrough;
case XFS_AG_RESV_NONE: case XFS_AG_RESV_NONE:
xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, (int64_t)len); xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, (int64_t)len);
fallthrough;
case XFS_AG_RESV_IGNORE:
return; return;
} }

View File

@ -246,11 +246,9 @@ xfs_alloc_btrec_to_irec(
/* Simple checks for free space records. */ /* Simple checks for free space records. */
xfs_failaddr_t xfs_failaddr_t
xfs_alloc_check_irec( xfs_alloc_check_irec(
struct xfs_btree_cur *cur, struct xfs_perag *pag,
const struct xfs_alloc_rec_incore *irec) const struct xfs_alloc_rec_incore *irec)
{ {
struct xfs_perag *pag = cur->bc_ag.pag;
if (irec->ar_blockcount == 0) if (irec->ar_blockcount == 0)
return __this_address; return __this_address;
@ -299,7 +297,7 @@ xfs_alloc_get_rec(
return error; return error;
xfs_alloc_btrec_to_irec(rec, &irec); xfs_alloc_btrec_to_irec(rec, &irec);
fa = xfs_alloc_check_irec(cur, &irec); fa = xfs_alloc_check_irec(cur->bc_ag.pag, &irec);
if (fa) if (fa)
return xfs_alloc_complain_bad_rec(cur, fa, &irec); return xfs_alloc_complain_bad_rec(cur, fa, &irec);
@ -2514,7 +2512,7 @@ xfs_defer_agfl_block(
trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1); trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
xfs_extent_free_get_group(mp, xefi); xfs_extent_free_get_group(mp, xefi);
xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list); xfs_defer_add(tp, &xefi->xefi_list, &xfs_agfl_free_defer_type);
return 0; return 0;
} }
@ -2522,14 +2520,15 @@ xfs_defer_agfl_block(
* Add the extent to the list of extents to be free at transaction end. * Add the extent to the list of extents to be free at transaction end.
* The list is maintained sorted (by block number). * The list is maintained sorted (by block number).
*/ */
int static int
__xfs_free_extent_later( xfs_defer_extent_free(
struct xfs_trans *tp, struct xfs_trans *tp,
xfs_fsblock_t bno, xfs_fsblock_t bno,
xfs_filblks_t len, xfs_filblks_t len,
const struct xfs_owner_info *oinfo, const struct xfs_owner_info *oinfo,
enum xfs_ag_resv_type type, enum xfs_ag_resv_type type,
bool skip_discard) bool skip_discard,
struct xfs_defer_pending **dfpp)
{ {
struct xfs_extent_free_item *xefi; struct xfs_extent_free_item *xefi;
struct xfs_mount *mp = tp->t_mountp; struct xfs_mount *mp = tp->t_mountp;
@ -2577,10 +2576,105 @@ __xfs_free_extent_later(
XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len); XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len);
xfs_extent_free_get_group(mp, xefi); xfs_extent_free_get_group(mp, xefi);
xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list); *dfpp = xfs_defer_add(tp, &xefi->xefi_list, &xfs_extent_free_defer_type);
return 0; return 0;
} }
int
xfs_free_extent_later(
struct xfs_trans *tp,
xfs_fsblock_t bno,
xfs_filblks_t len,
const struct xfs_owner_info *oinfo,
enum xfs_ag_resv_type type,
bool skip_discard)
{
struct xfs_defer_pending *dontcare = NULL;
return xfs_defer_extent_free(tp, bno, len, oinfo, type, skip_discard,
&dontcare);
}
/*
* Set up automatic freeing of unwritten space in the filesystem.
*
* This function attached a paused deferred extent free item to the
* transaction. Pausing means that the EFI will be logged in the next
* transaction commit, but the pending EFI will not be finished until the
* pending item is unpaused.
*
* If the system goes down after the EFI has been persisted to the log but
* before the pending item is unpaused, log recovery will find the EFI, fail to
* find the EFD, and free the space.
*
* If the pending item is unpaused, the next transaction commit will log an EFD
* without freeing the space.
*
* Caller must ensure that the tp, fsbno, len, oinfo, and resv flags of the
* @args structure are set to the relevant values.
*/
int
xfs_alloc_schedule_autoreap(
const struct xfs_alloc_arg *args,
bool skip_discard,
struct xfs_alloc_autoreap *aarp)
{
int error;
error = xfs_defer_extent_free(args->tp, args->fsbno, args->len,
&args->oinfo, args->resv, skip_discard, &aarp->dfp);
if (error)
return error;
xfs_defer_item_pause(args->tp, aarp->dfp);
return 0;
}
/*
* Cancel automatic freeing of unwritten space in the filesystem.
*
* Earlier, we created a paused deferred extent free item and attached it to
* this transaction so that we could automatically roll back a new space
* allocation if the system went down. Now we want to cancel the paused work
* item by marking the EFI stale so we don't actually free the space, unpausing
* the pending item and logging an EFD.
*
* The caller generally should have already mapped the space into the ondisk
* filesystem. If the reserved space was partially used, the caller must call
* xfs_free_extent_later to create a new EFI to free the unused space.
*/
void
xfs_alloc_cancel_autoreap(
struct xfs_trans *tp,
struct xfs_alloc_autoreap *aarp)
{
struct xfs_defer_pending *dfp = aarp->dfp;
struct xfs_extent_free_item *xefi;
if (!dfp)
return;
list_for_each_entry(xefi, &dfp->dfp_work, xefi_list)
xefi->xefi_flags |= XFS_EFI_CANCELLED;
xfs_defer_item_unpause(tp, dfp);
}
/*
* Commit automatic freeing of unwritten space in the filesystem.
*
* This unpauses an earlier _schedule_autoreap and commits to freeing the
* allocated space. Call this if none of the reserved space was used.
*/
void
xfs_alloc_commit_autoreap(
struct xfs_trans *tp,
struct xfs_alloc_autoreap *aarp)
{
if (aarp->dfp)
xfs_defer_item_unpause(tp, aarp->dfp);
}
#ifdef DEBUG #ifdef DEBUG
/* /*
* Check if an AGF has a free extent record whose length is equal to * Check if an AGF has a free extent record whose length is equal to
@ -3848,7 +3942,7 @@ xfs_alloc_query_range_helper(
xfs_failaddr_t fa; xfs_failaddr_t fa;
xfs_alloc_btrec_to_irec(rec, &irec); xfs_alloc_btrec_to_irec(rec, &irec);
fa = xfs_alloc_check_irec(cur, &irec); fa = xfs_alloc_check_irec(cur->bc_ag.pag, &irec);
if (fa) if (fa)
return xfs_alloc_complain_bad_rec(cur, fa, &irec); return xfs_alloc_complain_bad_rec(cur, fa, &irec);

View File

@ -185,7 +185,7 @@ xfs_alloc_get_rec(
union xfs_btree_rec; union xfs_btree_rec;
void xfs_alloc_btrec_to_irec(const union xfs_btree_rec *rec, void xfs_alloc_btrec_to_irec(const union xfs_btree_rec *rec,
struct xfs_alloc_rec_incore *irec); struct xfs_alloc_rec_incore *irec);
xfs_failaddr_t xfs_alloc_check_irec(struct xfs_btree_cur *cur, xfs_failaddr_t xfs_alloc_check_irec(struct xfs_perag *pag,
const struct xfs_alloc_rec_incore *irec); const struct xfs_alloc_rec_incore *irec);
int xfs_read_agf(struct xfs_perag *pag, struct xfs_trans *tp, int flags, int xfs_read_agf(struct xfs_perag *pag, struct xfs_trans *tp, int flags,
@ -231,7 +231,7 @@ xfs_buf_to_agfl_bno(
return bp->b_addr; return bp->b_addr;
} }
int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, int xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno,
xfs_filblks_t len, const struct xfs_owner_info *oinfo, xfs_filblks_t len, const struct xfs_owner_info *oinfo,
enum xfs_ag_resv_type type, bool skip_discard); enum xfs_ag_resv_type type, bool skip_discard);
@ -255,18 +255,18 @@ void xfs_extent_free_get_group(struct xfs_mount *mp,
#define XFS_EFI_SKIP_DISCARD (1U << 0) /* don't issue discard */ #define XFS_EFI_SKIP_DISCARD (1U << 0) /* don't issue discard */
#define XFS_EFI_ATTR_FORK (1U << 1) /* freeing attr fork block */ #define XFS_EFI_ATTR_FORK (1U << 1) /* freeing attr fork block */
#define XFS_EFI_BMBT_BLOCK (1U << 2) /* freeing bmap btree block */ #define XFS_EFI_BMBT_BLOCK (1U << 2) /* freeing bmap btree block */
#define XFS_EFI_CANCELLED (1U << 3) /* dont actually free the space */
static inline int struct xfs_alloc_autoreap {
xfs_free_extent_later( struct xfs_defer_pending *dfp;
struct xfs_trans *tp, };
xfs_fsblock_t bno,
xfs_filblks_t len,
const struct xfs_owner_info *oinfo,
enum xfs_ag_resv_type type)
{
return __xfs_free_extent_later(tp, bno, len, oinfo, type, false);
}
int xfs_alloc_schedule_autoreap(const struct xfs_alloc_arg *args,
bool skip_discard, struct xfs_alloc_autoreap *aarp);
void xfs_alloc_cancel_autoreap(struct xfs_trans *tp,
struct xfs_alloc_autoreap *aarp);
void xfs_alloc_commit_autoreap(struct xfs_trans *tp,
struct xfs_alloc_autoreap *aarp);
extern struct kmem_cache *xfs_extfree_item_cache; extern struct kmem_cache *xfs_extfree_item_cache;

View File

@ -323,7 +323,18 @@ xfs_allocbt_verify(
if (bp->b_ops->magic[0] == cpu_to_be32(XFS_ABTC_MAGIC)) if (bp->b_ops->magic[0] == cpu_to_be32(XFS_ABTC_MAGIC))
btnum = XFS_BTNUM_CNTi; btnum = XFS_BTNUM_CNTi;
if (pag && xfs_perag_initialised_agf(pag)) { if (pag && xfs_perag_initialised_agf(pag)) {
if (level >= pag->pagf_levels[btnum]) unsigned int maxlevel = pag->pagf_levels[btnum];
#ifdef CONFIG_XFS_ONLINE_REPAIR
/*
* Online repair could be rewriting the free space btrees, so
* we'll validate against the larger of either tree while this
* is going on.
*/
maxlevel = max_t(unsigned int, maxlevel,
pag->pagf_repair_levels[btnum]);
#endif
if (level >= maxlevel)
return __this_address; return __this_address;
} else if (level >= mp->m_alloc_maxlevels) } else if (level >= mp->m_alloc_maxlevels)
return __this_address; return __this_address;

View File

@ -862,8 +862,11 @@ xfs_attr_lookup(
if (!xfs_inode_hasattr(dp)) if (!xfs_inode_hasattr(dp))
return -ENOATTR; return -ENOATTR;
if (dp->i_af.if_format == XFS_DINODE_FMT_LOCAL) if (dp->i_af.if_format == XFS_DINODE_FMT_LOCAL) {
return xfs_attr_sf_findname(args, NULL, NULL); if (xfs_attr_sf_findname(args))
return -EEXIST;
return -ENOATTR;
}
if (xfs_attr_is_leaf(dp)) { if (xfs_attr_is_leaf(dp)) {
error = xfs_attr_leaf_hasname(args, &bp); error = xfs_attr_leaf_hasname(args, &bp);
@ -880,11 +883,10 @@ xfs_attr_lookup(
return error; return error;
} }
static int static void
xfs_attr_intent_init( xfs_attr_defer_add(
struct xfs_da_args *args, struct xfs_da_args *args,
unsigned int op_flags, /* op flag (set or remove) */ unsigned int op_flags)
struct xfs_attr_intent **attr) /* new xfs_attr_intent */
{ {
struct xfs_attr_intent *new; struct xfs_attr_intent *new;
@ -893,66 +895,22 @@ xfs_attr_intent_init(
new->xattri_op_flags = op_flags; new->xattri_op_flags = op_flags;
new->xattri_da_args = args; new->xattri_da_args = args;
*attr = new; switch (op_flags) {
return 0; case XFS_ATTRI_OP_FLAGS_SET:
} new->xattri_dela_state = xfs_attr_init_add_state(args);
break;
case XFS_ATTRI_OP_FLAGS_REPLACE:
new->xattri_dela_state = xfs_attr_init_replace_state(args);
break;
case XFS_ATTRI_OP_FLAGS_REMOVE:
new->xattri_dela_state = xfs_attr_init_remove_state(args);
break;
default:
ASSERT(0);
}
/* Sets an attribute for an inode as a deferred operation */ xfs_defer_add(args->trans, &new->xattri_list, &xfs_attr_defer_type);
static int
xfs_attr_defer_add(
struct xfs_da_args *args)
{
struct xfs_attr_intent *new;
int error = 0;
error = xfs_attr_intent_init(args, XFS_ATTRI_OP_FLAGS_SET, &new);
if (error)
return error;
new->xattri_dela_state = xfs_attr_init_add_state(args);
xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
trace_xfs_attr_defer_add(new->xattri_dela_state, args->dp); trace_xfs_attr_defer_add(new->xattri_dela_state, args->dp);
return 0;
}
/* Sets an attribute for an inode as a deferred operation */
static int
xfs_attr_defer_replace(
struct xfs_da_args *args)
{
struct xfs_attr_intent *new;
int error = 0;
error = xfs_attr_intent_init(args, XFS_ATTRI_OP_FLAGS_REPLACE, &new);
if (error)
return error;
new->xattri_dela_state = xfs_attr_init_replace_state(args);
xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
trace_xfs_attr_defer_replace(new->xattri_dela_state, args->dp);
return 0;
}
/* Removes an attribute for an inode as a deferred operation */
static int
xfs_attr_defer_remove(
struct xfs_da_args *args)
{
struct xfs_attr_intent *new;
int error;
error = xfs_attr_intent_init(args, XFS_ATTRI_OP_FLAGS_REMOVE, &new);
if (error)
return error;
new->xattri_dela_state = xfs_attr_init_remove_state(args);
xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
trace_xfs_attr_defer_remove(new->xattri_dela_state, args->dp);
return 0;
} }
/* /*
@ -1038,16 +996,16 @@ xfs_attr_set(
error = xfs_attr_lookup(args); error = xfs_attr_lookup(args);
switch (error) { switch (error) {
case -EEXIST: case -EEXIST:
/* if no value, we are performing a remove operation */
if (!args->value) { if (!args->value) {
error = xfs_attr_defer_remove(args); /* if no value, we are performing a remove operation */
xfs_attr_defer_add(args, XFS_ATTRI_OP_FLAGS_REMOVE);
break; break;
} }
/* Pure create fails if the attr already exists */ /* Pure create fails if the attr already exists */
if (args->attr_flags & XATTR_CREATE) if (args->attr_flags & XATTR_CREATE)
goto out_trans_cancel; goto out_trans_cancel;
xfs_attr_defer_add(args, XFS_ATTRI_OP_FLAGS_REPLACE);
error = xfs_attr_defer_replace(args);
break; break;
case -ENOATTR: case -ENOATTR:
/* Can't remove what isn't there. */ /* Can't remove what isn't there. */
@ -1057,14 +1015,11 @@ xfs_attr_set(
/* Pure replace fails if no existing attr to replace. */ /* Pure replace fails if no existing attr to replace. */
if (args->attr_flags & XATTR_REPLACE) if (args->attr_flags & XATTR_REPLACE)
goto out_trans_cancel; goto out_trans_cancel;
xfs_attr_defer_add(args, XFS_ATTRI_OP_FLAGS_SET);
error = xfs_attr_defer_add(args);
break; break;
default: default:
goto out_trans_cancel; goto out_trans_cancel;
} }
if (error)
goto out_trans_cancel;
/* /*
* If this is a synchronous mount, make sure that the * If this is a synchronous mount, make sure that the
@ -1097,10 +1052,9 @@ out_trans_cancel:
static inline int xfs_attr_sf_totsize(struct xfs_inode *dp) static inline int xfs_attr_sf_totsize(struct xfs_inode *dp)
{ {
struct xfs_attr_shortform *sf; struct xfs_attr_sf_hdr *sf = dp->i_af.if_data;
sf = (struct xfs_attr_shortform *)dp->i_af.if_u1.if_data; return be16_to_cpu(sf->totsize);
return be16_to_cpu(sf->hdr.totsize);
} }
/* /*
@ -1112,19 +1066,13 @@ xfs_attr_shortform_addname(
struct xfs_da_args *args) struct xfs_da_args *args)
{ {
int newsize, forkoff; int newsize, forkoff;
int error;
trace_xfs_attr_sf_addname(args); trace_xfs_attr_sf_addname(args);
error = xfs_attr_shortform_lookup(args); if (xfs_attr_sf_findname(args)) {
switch (error) { int error;
case -ENOATTR:
if (args->op_flags & XFS_DA_OP_REPLACE) ASSERT(args->op_flags & XFS_DA_OP_REPLACE);
return error;
break;
case -EEXIST:
if (!(args->op_flags & XFS_DA_OP_REPLACE))
return error;
error = xfs_attr_sf_removename(args); error = xfs_attr_sf_removename(args);
if (error) if (error)
@ -1137,11 +1085,8 @@ xfs_attr_shortform_addname(
* around. * around.
*/ */
args->op_flags &= ~XFS_DA_OP_REPLACE; args->op_flags &= ~XFS_DA_OP_REPLACE;
break; } else {
case 0: ASSERT(!(args->op_flags & XFS_DA_OP_REPLACE));
break;
default:
return error;
} }
if (args->namelen >= XFS_ATTR_SF_ENTSIZE_MAX || if (args->namelen >= XFS_ATTR_SF_ENTSIZE_MAX ||

View File

@ -690,56 +690,32 @@ xfs_attr_shortform_create(
ASSERT(ifp->if_bytes == 0); ASSERT(ifp->if_bytes == 0);
if (ifp->if_format == XFS_DINODE_FMT_EXTENTS) if (ifp->if_format == XFS_DINODE_FMT_EXTENTS)
ifp->if_format = XFS_DINODE_FMT_LOCAL; ifp->if_format = XFS_DINODE_FMT_LOCAL;
xfs_idata_realloc(dp, sizeof(*hdr), XFS_ATTR_FORK);
hdr = (struct xfs_attr_sf_hdr *)ifp->if_u1.if_data; hdr = xfs_idata_realloc(dp, sizeof(*hdr), XFS_ATTR_FORK);
memset(hdr, 0, sizeof(*hdr)); memset(hdr, 0, sizeof(*hdr));
hdr->totsize = cpu_to_be16(sizeof(*hdr)); hdr->totsize = cpu_to_be16(sizeof(*hdr));
xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE | XFS_ILOG_ADATA); xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE | XFS_ILOG_ADATA);
} }
/* /*
* Return -EEXIST if attr is found, or -ENOATTR if not * Return the entry if the attr in args is found, or NULL if not.
* args: args containing attribute name and namelen
* sfep: If not null, pointer will be set to the last attr entry found on
-EEXIST. On -ENOATTR pointer is left at the last entry in the list
* basep: If not null, pointer is set to the byte offset of the entry in the
* list on -EEXIST. On -ENOATTR, pointer is left at the byte offset of
* the last entry in the list
*/ */
int struct xfs_attr_sf_entry *
xfs_attr_sf_findname( xfs_attr_sf_findname(
struct xfs_da_args *args, struct xfs_da_args *args)
struct xfs_attr_sf_entry **sfep,
unsigned int *basep)
{ {
struct xfs_attr_shortform *sf; struct xfs_attr_sf_hdr *sf = args->dp->i_af.if_data;
struct xfs_attr_sf_entry *sfe; struct xfs_attr_sf_entry *sfe;
unsigned int base = sizeof(struct xfs_attr_sf_hdr);
int size = 0;
int end;
int i;
sf = (struct xfs_attr_shortform *)args->dp->i_af.if_u1.if_data; for (sfe = xfs_attr_sf_firstentry(sf);
sfe = &sf->list[0]; sfe < xfs_attr_sf_endptr(sf);
end = sf->hdr.count; sfe = xfs_attr_sf_nextentry(sfe)) {
for (i = 0; i < end; sfe = xfs_attr_sf_nextentry(sfe), if (xfs_attr_match(args, sfe->namelen, sfe->nameval,
base += size, i++) { sfe->flags))
size = xfs_attr_sf_entsize(sfe); return sfe;
if (!xfs_attr_match(args, sfe->namelen, sfe->nameval,
sfe->flags))
continue;
break;
} }
if (sfep != NULL) return NULL;
*sfep = sfe;
if (basep != NULL)
*basep = base;
if (i == end)
return -ENOATTR;
return -EEXIST;
} }
/* /*
@ -751,38 +727,31 @@ xfs_attr_shortform_add(
struct xfs_da_args *args, struct xfs_da_args *args,
int forkoff) int forkoff)
{ {
struct xfs_attr_shortform *sf; struct xfs_inode *dp = args->dp;
struct xfs_mount *mp = dp->i_mount;
struct xfs_ifork *ifp = &dp->i_af;
struct xfs_attr_sf_hdr *sf = ifp->if_data;
struct xfs_attr_sf_entry *sfe; struct xfs_attr_sf_entry *sfe;
int offset, size; int size;
struct xfs_mount *mp;
struct xfs_inode *dp;
struct xfs_ifork *ifp;
trace_xfs_attr_sf_add(args); trace_xfs_attr_sf_add(args);
dp = args->dp;
mp = dp->i_mount;
dp->i_forkoff = forkoff; dp->i_forkoff = forkoff;
ifp = &dp->i_af;
ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL); ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL);
sf = (struct xfs_attr_shortform *)ifp->if_u1.if_data; ASSERT(!xfs_attr_sf_findname(args));
if (xfs_attr_sf_findname(args, &sfe, NULL) == -EEXIST)
ASSERT(0);
offset = (char *)sfe - (char *)sf;
size = xfs_attr_sf_entsize_byname(args->namelen, args->valuelen); size = xfs_attr_sf_entsize_byname(args->namelen, args->valuelen);
xfs_idata_realloc(dp, size, XFS_ATTR_FORK); sf = xfs_idata_realloc(dp, size, XFS_ATTR_FORK);
sf = (struct xfs_attr_shortform *)ifp->if_u1.if_data;
sfe = (struct xfs_attr_sf_entry *)((char *)sf + offset);
sfe = xfs_attr_sf_endptr(sf);
sfe->namelen = args->namelen; sfe->namelen = args->namelen;
sfe->valuelen = args->valuelen; sfe->valuelen = args->valuelen;
sfe->flags = args->attr_filter; sfe->flags = args->attr_filter;
memcpy(sfe->nameval, args->name, args->namelen); memcpy(sfe->nameval, args->name, args->namelen);
memcpy(&sfe->nameval[args->namelen], args->value, args->valuelen); memcpy(&sfe->nameval[args->namelen], args->value, args->valuelen);
sf->hdr.count++; sf->count++;
be16_add_cpu(&sf->hdr.totsize, size); be16_add_cpu(&sf->totsize, size);
xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE | XFS_ILOG_ADATA); xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE | XFS_ILOG_ADATA);
xfs_sbversion_add_attr2(mp, args->trans); xfs_sbversion_add_attr2(mp, args->trans);
@ -811,48 +780,43 @@ int
xfs_attr_sf_removename( xfs_attr_sf_removename(
struct xfs_da_args *args) struct xfs_da_args *args)
{ {
struct xfs_attr_shortform *sf; struct xfs_inode *dp = args->dp;
struct xfs_mount *mp = dp->i_mount;
struct xfs_attr_sf_hdr *sf = dp->i_af.if_data;
struct xfs_attr_sf_entry *sfe; struct xfs_attr_sf_entry *sfe;
int size = 0, end, totsize; uint16_t totsize = be16_to_cpu(sf->totsize);
unsigned int base; void *next, *end;
struct xfs_mount *mp; int size = 0;
struct xfs_inode *dp;
int error;
trace_xfs_attr_sf_remove(args); trace_xfs_attr_sf_remove(args);
dp = args->dp; sfe = xfs_attr_sf_findname(args);
mp = dp->i_mount; if (!sfe) {
sf = (struct xfs_attr_shortform *)dp->i_af.if_u1.if_data; /*
* If we are recovering an operation, finding nothing to remove
error = xfs_attr_sf_findname(args, &sfe, &base); * is not an error, it just means there was nothing to clean up.
*/
/* if (args->op_flags & XFS_DA_OP_RECOVERY)
* If we are recovering an operation, finding nothing to return 0;
* remove is not an error - it just means there was nothing return -ENOATTR;
* to clean up. }
*/
if (error == -ENOATTR && (args->op_flags & XFS_DA_OP_RECOVERY))
return 0;
if (error != -EEXIST)
return error;
size = xfs_attr_sf_entsize(sfe);
/* /*
* Fix up the attribute fork data, covering the hole * Fix up the attribute fork data, covering the hole
*/ */
end = base + size; size = xfs_attr_sf_entsize(sfe);
totsize = be16_to_cpu(sf->hdr.totsize); next = xfs_attr_sf_nextentry(sfe);
if (end != totsize) end = xfs_attr_sf_endptr(sf);
memmove(&((char *)sf)[base], &((char *)sf)[end], totsize - end); if (next < end)
sf->hdr.count--; memmove(sfe, next, end - next);
be16_add_cpu(&sf->hdr.totsize, -size); sf->count--;
totsize -= size;
sf->totsize = cpu_to_be16(totsize);
/* /*
* Fix up the start offset of the attribute fork * Fix up the start offset of the attribute fork
*/ */
totsize -= size; if (totsize == sizeof(struct xfs_attr_sf_hdr) && xfs_has_attr2(mp) &&
if (totsize == sizeof(xfs_attr_sf_hdr_t) && xfs_has_attr2(mp) &&
(dp->i_df.if_format != XFS_DINODE_FMT_BTREE) && (dp->i_df.if_format != XFS_DINODE_FMT_BTREE) &&
!(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE))) { !(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE))) {
xfs_attr_fork_remove(dp, args->trans); xfs_attr_fork_remove(dp, args->trans);
@ -860,7 +824,7 @@ xfs_attr_sf_removename(
xfs_idata_realloc(dp, -size, XFS_ATTR_FORK); xfs_idata_realloc(dp, -size, XFS_ATTR_FORK);
dp->i_forkoff = xfs_attr_shortform_bytesfit(dp, totsize); dp->i_forkoff = xfs_attr_shortform_bytesfit(dp, totsize);
ASSERT(dp->i_forkoff); ASSERT(dp->i_forkoff);
ASSERT(totsize > sizeof(xfs_attr_sf_hdr_t) || ASSERT(totsize > sizeof(struct xfs_attr_sf_hdr) ||
(args->op_flags & XFS_DA_OP_ADDNAME) || (args->op_flags & XFS_DA_OP_ADDNAME) ||
!xfs_has_attr2(mp) || !xfs_has_attr2(mp) ||
dp->i_df.if_format == XFS_DINODE_FMT_BTREE); dp->i_df.if_format == XFS_DINODE_FMT_BTREE);
@ -873,33 +837,6 @@ xfs_attr_sf_removename(
return 0; return 0;
} }
/*
* Look up a name in a shortform attribute list structure.
*/
/*ARGSUSED*/
int
xfs_attr_shortform_lookup(xfs_da_args_t *args)
{
struct xfs_attr_shortform *sf;
struct xfs_attr_sf_entry *sfe;
int i;
struct xfs_ifork *ifp;
trace_xfs_attr_sf_lookup(args);
ifp = &args->dp->i_af;
ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL);
sf = (struct xfs_attr_shortform *)ifp->if_u1.if_data;
sfe = &sf->list[0];
for (i = 0; i < sf->hdr.count;
sfe = xfs_attr_sf_nextentry(sfe), i++) {
if (xfs_attr_match(args, sfe->namelen, sfe->nameval,
sfe->flags))
return -EEXIST;
}
return -ENOATTR;
}
/* /*
* Retrieve the attribute value and length. * Retrieve the attribute value and length.
* *
@ -909,23 +846,19 @@ xfs_attr_shortform_lookup(xfs_da_args_t *args)
*/ */
int int
xfs_attr_shortform_getvalue( xfs_attr_shortform_getvalue(
struct xfs_da_args *args) struct xfs_da_args *args)
{ {
struct xfs_attr_shortform *sf; struct xfs_attr_sf_entry *sfe;
struct xfs_attr_sf_entry *sfe;
int i;
ASSERT(args->dp->i_af.if_format == XFS_DINODE_FMT_LOCAL); ASSERT(args->dp->i_af.if_format == XFS_DINODE_FMT_LOCAL);
sf = (struct xfs_attr_shortform *)args->dp->i_af.if_u1.if_data;
sfe = &sf->list[0]; trace_xfs_attr_sf_lookup(args);
for (i = 0; i < sf->hdr.count;
sfe = xfs_attr_sf_nextentry(sfe), i++) { sfe = xfs_attr_sf_findname(args);
if (xfs_attr_match(args, sfe->namelen, sfe->nameval, if (!sfe)
sfe->flags)) return -ENOATTR;
return xfs_attr_copy_value(args, return xfs_attr_copy_value(args, &sfe->nameval[args->namelen],
&sfe->nameval[args->namelen], sfe->valuelen); sfe->valuelen);
}
return -ENOATTR;
} }
/* Convert from using the shortform to the leaf format. */ /* Convert from using the shortform to the leaf format. */
@ -933,26 +866,23 @@ int
xfs_attr_shortform_to_leaf( xfs_attr_shortform_to_leaf(
struct xfs_da_args *args) struct xfs_da_args *args)
{ {
struct xfs_inode *dp; struct xfs_inode *dp = args->dp;
struct xfs_attr_shortform *sf; struct xfs_ifork *ifp = &dp->i_af;
struct xfs_attr_sf_hdr *sf = ifp->if_data;
struct xfs_attr_sf_entry *sfe; struct xfs_attr_sf_entry *sfe;
int size = be16_to_cpu(sf->totsize);
struct xfs_da_args nargs; struct xfs_da_args nargs;
char *tmpbuffer; char *tmpbuffer;
int error, i, size; int error, i;
xfs_dablk_t blkno; xfs_dablk_t blkno;
struct xfs_buf *bp; struct xfs_buf *bp;
struct xfs_ifork *ifp;
trace_xfs_attr_sf_to_leaf(args); trace_xfs_attr_sf_to_leaf(args);
dp = args->dp;
ifp = &dp->i_af;
sf = (struct xfs_attr_shortform *)ifp->if_u1.if_data;
size = be16_to_cpu(sf->hdr.totsize);
tmpbuffer = kmem_alloc(size, 0); tmpbuffer = kmem_alloc(size, 0);
ASSERT(tmpbuffer != NULL); ASSERT(tmpbuffer != NULL);
memcpy(tmpbuffer, ifp->if_u1.if_data, size); memcpy(tmpbuffer, ifp->if_data, size);
sf = (struct xfs_attr_shortform *)tmpbuffer; sf = (struct xfs_attr_sf_hdr *)tmpbuffer;
xfs_idata_realloc(dp, -size, XFS_ATTR_FORK); xfs_idata_realloc(dp, -size, XFS_ATTR_FORK);
xfs_bmap_local_to_extents_empty(args->trans, dp, XFS_ATTR_FORK); xfs_bmap_local_to_extents_empty(args->trans, dp, XFS_ATTR_FORK);
@ -975,8 +905,8 @@ xfs_attr_shortform_to_leaf(
nargs.trans = args->trans; nargs.trans = args->trans;
nargs.op_flags = XFS_DA_OP_OKNOENT; nargs.op_flags = XFS_DA_OP_OKNOENT;
sfe = &sf->list[0]; sfe = xfs_attr_sf_firstentry(sf);
for (i = 0; i < sf->hdr.count; i++) { for (i = 0; i < sf->count; i++) {
nargs.name = sfe->nameval; nargs.name = sfe->nameval;
nargs.namelen = sfe->namelen; nargs.namelen = sfe->namelen;
nargs.value = &sfe->nameval[nargs.namelen]; nargs.value = &sfe->nameval[nargs.namelen];
@ -1040,23 +970,16 @@ xfs_attr_shortform_allfit(
return xfs_attr_shortform_bytesfit(dp, bytes); return xfs_attr_shortform_bytesfit(dp, bytes);
} }
/* Verify the consistency of an inline attribute fork. */ /* Verify the consistency of a raw inline attribute fork. */
xfs_failaddr_t xfs_failaddr_t
xfs_attr_shortform_verify( xfs_attr_shortform_verify(
struct xfs_inode *ip) struct xfs_attr_sf_hdr *sfp,
size_t size)
{ {
struct xfs_attr_shortform *sfp; struct xfs_attr_sf_entry *sfep = xfs_attr_sf_firstentry(sfp);
struct xfs_attr_sf_entry *sfep;
struct xfs_attr_sf_entry *next_sfep; struct xfs_attr_sf_entry *next_sfep;
char *endp; char *endp;
struct xfs_ifork *ifp;
int i; int i;
int64_t size;
ASSERT(ip->i_af.if_format == XFS_DINODE_FMT_LOCAL);
ifp = xfs_ifork_ptr(ip, XFS_ATTR_FORK);
sfp = (struct xfs_attr_shortform *)ifp->if_u1.if_data;
size = ifp->if_bytes;
/* /*
* Give up if the attribute is way too short. * Give up if the attribute is way too short.
@ -1067,8 +990,7 @@ xfs_attr_shortform_verify(
endp = (char *)sfp + size; endp = (char *)sfp + size;
/* Check all reported entries */ /* Check all reported entries */
sfep = &sfp->list[0]; for (i = 0; i < sfp->count; i++) {
for (i = 0; i < sfp->hdr.count; i++) {
/* /*
* struct xfs_attr_sf_entry has a variable length. * struct xfs_attr_sf_entry has a variable length.
* Check the fixed-offset parts of the structure are * Check the fixed-offset parts of the structure are
@ -1244,14 +1166,10 @@ xfs_attr3_leaf_to_node(
if (error) if (error)
goto out; goto out;
/* copy leaf to new buffer, update identifiers */ /*
xfs_trans_buf_set_type(args->trans, bp2, XFS_BLFT_ATTR_LEAF_BUF); * Copy leaf to new buffer and log it.
bp2->b_ops = bp1->b_ops; */
memcpy(bp2->b_addr, bp1->b_addr, args->geo->blksize); xfs_da_buf_copy(bp2, bp1, args->geo->blksize);
if (xfs_has_crc(mp)) {
struct xfs_da3_blkinfo *hdr3 = bp2->b_addr;
hdr3->blkno = cpu_to_be64(xfs_buf_daddr(bp2));
}
xfs_trans_log_buf(args->trans, bp2, 0, args->geo->blksize - 1); xfs_trans_log_buf(args->trans, bp2, 0, args->geo->blksize - 1);
/* /*

View File

@ -47,16 +47,14 @@ struct xfs_attr3_icleaf_hdr {
*/ */
void xfs_attr_shortform_create(struct xfs_da_args *args); void xfs_attr_shortform_create(struct xfs_da_args *args);
void xfs_attr_shortform_add(struct xfs_da_args *args, int forkoff); void xfs_attr_shortform_add(struct xfs_da_args *args, int forkoff);
int xfs_attr_shortform_lookup(struct xfs_da_args *args);
int xfs_attr_shortform_getvalue(struct xfs_da_args *args); int xfs_attr_shortform_getvalue(struct xfs_da_args *args);
int xfs_attr_shortform_to_leaf(struct xfs_da_args *args); int xfs_attr_shortform_to_leaf(struct xfs_da_args *args);
int xfs_attr_sf_removename(struct xfs_da_args *args); int xfs_attr_sf_removename(struct xfs_da_args *args);
int xfs_attr_sf_findname(struct xfs_da_args *args, struct xfs_attr_sf_entry *xfs_attr_sf_findname(struct xfs_da_args *args);
struct xfs_attr_sf_entry **sfep,
unsigned int *basep);
int xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp); int xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp);
int xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes); int xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes);
xfs_failaddr_t xfs_attr_shortform_verify(struct xfs_inode *ip); xfs_failaddr_t xfs_attr_shortform_verify(struct xfs_attr_sf_hdr *sfp,
size_t size);
void xfs_attr_fork_remove(struct xfs_inode *ip, struct xfs_trans *tp); void xfs_attr_fork_remove(struct xfs_inode *ip, struct xfs_trans *tp);
/* /*

View File

@ -6,14 +6,6 @@
#ifndef __XFS_ATTR_SF_H__ #ifndef __XFS_ATTR_SF_H__
#define __XFS_ATTR_SF_H__ #define __XFS_ATTR_SF_H__
/*
* Attribute storage when stored inside the inode.
*
* Small attribute lists are packed as tightly as possible so as
* to fit into the literal area of the inode.
*/
typedef struct xfs_attr_sf_hdr xfs_attr_sf_hdr_t;
/* /*
* We generate this then sort it, attr_list() must return things in hash-order. * We generate this then sort it, attr_list() must return things in hash-order.
*/ */
@ -41,11 +33,25 @@ static inline int xfs_attr_sf_entsize(struct xfs_attr_sf_entry *sfep)
return struct_size(sfep, nameval, sfep->namelen + sfep->valuelen); return struct_size(sfep, nameval, sfep->namelen + sfep->valuelen);
} }
/* next entry in struct */ /* first entry in the SF attr fork */
static inline struct xfs_attr_sf_entry *
xfs_attr_sf_firstentry(struct xfs_attr_sf_hdr *hdr)
{
return (struct xfs_attr_sf_entry *)(hdr + 1);
}
/* next entry after sfep */
static inline struct xfs_attr_sf_entry * static inline struct xfs_attr_sf_entry *
xfs_attr_sf_nextentry(struct xfs_attr_sf_entry *sfep) xfs_attr_sf_nextentry(struct xfs_attr_sf_entry *sfep)
{ {
return (void *)sfep + xfs_attr_sf_entsize(sfep); return (void *)sfep + xfs_attr_sf_entsize(sfep);
} }
/* pointer to the space after the last entry, e.g. for adding a new one */
static inline struct xfs_attr_sf_entry *
xfs_attr_sf_endptr(struct xfs_attr_sf_hdr *sf)
{
return (void *)sf + be16_to_cpu(sf->totsize);
}
#endif /* __XFS_ATTR_SF_H__ */ #endif /* __XFS_ATTR_SF_H__ */

View File

@ -575,7 +575,7 @@ xfs_bmap_btree_to_extents(
xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork); xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork);
error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo, error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo,
XFS_AG_RESV_NONE); XFS_AG_RESV_NONE, false);
if (error) if (error)
return error; return error;
@ -747,7 +747,7 @@ xfs_bmap_local_to_extents_empty(
ASSERT(ifp->if_nextents == 0); ASSERT(ifp->if_nextents == 0);
xfs_bmap_forkoff_reset(ip, whichfork); xfs_bmap_forkoff_reset(ip, whichfork);
ifp->if_u1.if_root = NULL; ifp->if_data = NULL;
ifp->if_height = 0; ifp->if_height = 0;
ifp->if_format = XFS_DINODE_FMT_EXTENTS; ifp->if_format = XFS_DINODE_FMT_EXTENTS;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
@ -832,7 +832,7 @@ xfs_bmap_local_to_extents(
xfs_bmap_local_to_extents_empty(tp, ip, whichfork); xfs_bmap_local_to_extents_empty(tp, ip, whichfork);
flags |= XFS_ILOG_CORE; flags |= XFS_ILOG_CORE;
ifp->if_u1.if_root = NULL; ifp->if_data = NULL;
ifp->if_height = 0; ifp->if_height = 0;
rec.br_startoff = 0; rec.br_startoff = 0;
@ -3044,7 +3044,8 @@ xfs_bmap_extsize_align(
#define XFS_ALLOC_GAP_UNITS 4 #define XFS_ALLOC_GAP_UNITS 4
void /* returns true if ap->blkno was modified */
bool
xfs_bmap_adjacent( xfs_bmap_adjacent(
struct xfs_bmalloca *ap) /* bmap alloc argument struct */ struct xfs_bmalloca *ap) /* bmap alloc argument struct */
{ {
@ -3079,13 +3080,14 @@ xfs_bmap_adjacent(
if (adjust && if (adjust &&
ISVALID(ap->blkno + adjust, ap->prev.br_startblock)) ISVALID(ap->blkno + adjust, ap->prev.br_startblock))
ap->blkno += adjust; ap->blkno += adjust;
return true;
} }
/* /*
* If not at eof, then compare the two neighbor blocks. * If not at eof, then compare the two neighbor blocks.
* Figure out whether either one gives us a good starting point, * Figure out whether either one gives us a good starting point,
* and pick the better one. * and pick the better one.
*/ */
else if (!ap->eof) { if (!ap->eof) {
xfs_fsblock_t gotbno; /* right side block number */ xfs_fsblock_t gotbno; /* right side block number */
xfs_fsblock_t gotdiff=0; /* right side difference */ xfs_fsblock_t gotdiff=0; /* right side difference */
xfs_fsblock_t prevbno; /* left side block number */ xfs_fsblock_t prevbno; /* left side block number */
@ -3165,14 +3167,21 @@ xfs_bmap_adjacent(
* If both valid, pick the better one, else the only good * If both valid, pick the better one, else the only good
* one, else ap->blkno is already set (to 0 or the inode block). * one, else ap->blkno is already set (to 0 or the inode block).
*/ */
if (prevbno != NULLFSBLOCK && gotbno != NULLFSBLOCK) if (prevbno != NULLFSBLOCK && gotbno != NULLFSBLOCK) {
ap->blkno = prevdiff <= gotdiff ? prevbno : gotbno; ap->blkno = prevdiff <= gotdiff ? prevbno : gotbno;
else if (prevbno != NULLFSBLOCK) return true;
}
if (prevbno != NULLFSBLOCK) {
ap->blkno = prevbno; ap->blkno = prevbno;
else if (gotbno != NULLFSBLOCK) return true;
}
if (gotbno != NULLFSBLOCK) {
ap->blkno = gotbno; ap->blkno = gotbno;
return true;
}
} }
#undef ISVALID #undef ISVALID
return false;
} }
int int
@ -3263,11 +3272,14 @@ xfs_bmap_btalloc_select_lengths(
} }
/* Update all inode and quota accounting for the allocation we just did. */ /* Update all inode and quota accounting for the allocation we just did. */
static void void
xfs_bmap_btalloc_accounting( xfs_bmap_alloc_account(
struct xfs_bmalloca *ap, struct xfs_bmalloca *ap)
struct xfs_alloc_arg *args)
{ {
bool isrt = XFS_IS_REALTIME_INODE(ap->ip) &&
(ap->flags & XFS_BMAPI_ATTRFORK);
uint fld;
if (ap->flags & XFS_BMAPI_COWFORK) { if (ap->flags & XFS_BMAPI_COWFORK) {
/* /*
* COW fork blocks are in-core only and thus are treated as * COW fork blocks are in-core only and thus are treated as
@ -3279,7 +3291,7 @@ xfs_bmap_btalloc_accounting(
* yet. * yet.
*/ */
if (ap->wasdel) { if (ap->wasdel) {
xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)args->len); xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length);
return; return;
} }
@ -3291,22 +3303,25 @@ xfs_bmap_btalloc_accounting(
* This essentially transfers the transaction quota reservation * This essentially transfers the transaction quota reservation
* to that of a delalloc extent. * to that of a delalloc extent.
*/ */
ap->ip->i_delayed_blks += args->len; ap->ip->i_delayed_blks += ap->length;
xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS, xfs_trans_mod_dquot_byino(ap->tp, ap->ip, isrt ?
-(long)args->len); XFS_TRANS_DQ_RES_RTBLKS : XFS_TRANS_DQ_RES_BLKS,
-(long)ap->length);
return; return;
} }
/* data/attr fork only */ /* data/attr fork only */
ap->ip->i_nblocks += args->len; ap->ip->i_nblocks += ap->length;
xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE); xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
if (ap->wasdel) { if (ap->wasdel) {
ap->ip->i_delayed_blks -= args->len; ap->ip->i_delayed_blks -= ap->length;
xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)args->len); xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length);
fld = isrt ? XFS_TRANS_DQ_DELRTBCOUNT : XFS_TRANS_DQ_DELBCOUNT;
} else {
fld = isrt ? XFS_TRANS_DQ_RTBCOUNT : XFS_TRANS_DQ_BCOUNT;
} }
xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT : XFS_TRANS_DQ_BCOUNT, xfs_trans_mod_dquot_byino(ap->tp, ap->ip, fld, ap->length);
args->len);
} }
static int static int
@ -3380,7 +3395,7 @@ xfs_bmap_process_allocated_extent(
ap->offset = orig_offset; ap->offset = orig_offset;
else if (ap->offset + ap->length < orig_offset + orig_length) else if (ap->offset + ap->length < orig_offset + orig_length)
ap->offset = orig_offset + orig_length - ap->length; ap->offset = orig_offset + orig_length - ap->length;
xfs_bmap_btalloc_accounting(ap, args); xfs_bmap_alloc_account(ap);
} }
#ifdef DEBUG #ifdef DEBUG
@ -5010,7 +5025,6 @@ xfs_bmap_del_extent_real(
xfs_fileoff_t del_endoff; /* first offset past del */ xfs_fileoff_t del_endoff; /* first offset past del */
int do_fx; /* free extent at end of routine */ int do_fx; /* free extent at end of routine */
int error; /* error return value */ int error; /* error return value */
int flags = 0;/* inode logging flags */
struct xfs_bmbt_irec got; /* current extent entry */ struct xfs_bmbt_irec got; /* current extent entry */
xfs_fileoff_t got_endoff; /* first offset past got */ xfs_fileoff_t got_endoff; /* first offset past got */
int i; /* temp state */ int i; /* temp state */
@ -5023,6 +5037,8 @@ xfs_bmap_del_extent_real(
uint32_t state = xfs_bmap_fork_to_state(whichfork); uint32_t state = xfs_bmap_fork_to_state(whichfork);
struct xfs_bmbt_irec old; struct xfs_bmbt_irec old;
*logflagsp = 0;
mp = ip->i_mount; mp = ip->i_mount;
XFS_STATS_INC(mp, xs_del_exlist); XFS_STATS_INC(mp, xs_del_exlist);
@ -5035,7 +5051,6 @@ xfs_bmap_del_extent_real(
ASSERT(got_endoff >= del_endoff); ASSERT(got_endoff >= del_endoff);
ASSERT(!isnullstartblock(got.br_startblock)); ASSERT(!isnullstartblock(got.br_startblock));
qfield = 0; qfield = 0;
error = 0;
/* /*
* If it's the case where the directory code is running with no block * If it's the case where the directory code is running with no block
@ -5051,13 +5066,13 @@ xfs_bmap_del_extent_real(
del->br_startoff > got.br_startoff && del_endoff < got_endoff) del->br_startoff > got.br_startoff && del_endoff < got_endoff)
return -ENOSPC; return -ENOSPC;
flags = XFS_ILOG_CORE; *logflagsp = XFS_ILOG_CORE;
if (whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip)) { if (whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip)) {
if (!(bflags & XFS_BMAPI_REMAP)) { if (!(bflags & XFS_BMAPI_REMAP)) {
error = xfs_rtfree_blocks(tp, del->br_startblock, error = xfs_rtfree_blocks(tp, del->br_startblock,
del->br_blockcount); del->br_blockcount);
if (error) if (error)
goto done; return error;
} }
do_fx = 0; do_fx = 0;
@ -5072,11 +5087,9 @@ xfs_bmap_del_extent_real(
if (cur) { if (cur) {
error = xfs_bmbt_lookup_eq(cur, &got, &i); error = xfs_bmbt_lookup_eq(cur, &got, &i);
if (error) if (error)
goto done; return error;
if (XFS_IS_CORRUPT(mp, i != 1)) { if (XFS_IS_CORRUPT(mp, i != 1))
error = -EFSCORRUPTED; return -EFSCORRUPTED;
goto done;
}
} }
if (got.br_startoff == del->br_startoff) if (got.br_startoff == del->br_startoff)
@ -5093,17 +5106,15 @@ xfs_bmap_del_extent_real(
xfs_iext_prev(ifp, icur); xfs_iext_prev(ifp, icur);
ifp->if_nextents--; ifp->if_nextents--;
flags |= XFS_ILOG_CORE; *logflagsp |= XFS_ILOG_CORE;
if (!cur) { if (!cur) {
flags |= xfs_ilog_fext(whichfork); *logflagsp |= xfs_ilog_fext(whichfork);
break; break;
} }
if ((error = xfs_btree_delete(cur, &i))) if ((error = xfs_btree_delete(cur, &i)))
goto done; return error;
if (XFS_IS_CORRUPT(mp, i != 1)) { if (XFS_IS_CORRUPT(mp, i != 1))
error = -EFSCORRUPTED; return -EFSCORRUPTED;
goto done;
}
break; break;
case BMAP_LEFT_FILLING: case BMAP_LEFT_FILLING:
/* /*
@ -5114,12 +5125,12 @@ xfs_bmap_del_extent_real(
got.br_blockcount -= del->br_blockcount; got.br_blockcount -= del->br_blockcount;
xfs_iext_update_extent(ip, state, icur, &got); xfs_iext_update_extent(ip, state, icur, &got);
if (!cur) { if (!cur) {
flags |= xfs_ilog_fext(whichfork); *logflagsp |= xfs_ilog_fext(whichfork);
break; break;
} }
error = xfs_bmbt_update(cur, &got); error = xfs_bmbt_update(cur, &got);
if (error) if (error)
goto done; return error;
break; break;
case BMAP_RIGHT_FILLING: case BMAP_RIGHT_FILLING:
/* /*
@ -5128,12 +5139,12 @@ xfs_bmap_del_extent_real(
got.br_blockcount -= del->br_blockcount; got.br_blockcount -= del->br_blockcount;
xfs_iext_update_extent(ip, state, icur, &got); xfs_iext_update_extent(ip, state, icur, &got);
if (!cur) { if (!cur) {
flags |= xfs_ilog_fext(whichfork); *logflagsp |= xfs_ilog_fext(whichfork);
break; break;
} }
error = xfs_bmbt_update(cur, &got); error = xfs_bmbt_update(cur, &got);
if (error) if (error)
goto done; return error;
break; break;
case 0: case 0:
/* /*
@ -5150,18 +5161,18 @@ xfs_bmap_del_extent_real(
new.br_state = got.br_state; new.br_state = got.br_state;
new.br_startblock = del_endblock; new.br_startblock = del_endblock;
flags |= XFS_ILOG_CORE; *logflagsp |= XFS_ILOG_CORE;
if (cur) { if (cur) {
error = xfs_bmbt_update(cur, &got); error = xfs_bmbt_update(cur, &got);
if (error) if (error)
goto done; return error;
error = xfs_btree_increment(cur, 0, &i); error = xfs_btree_increment(cur, 0, &i);
if (error) if (error)
goto done; return error;
cur->bc_rec.b = new; cur->bc_rec.b = new;
error = xfs_btree_insert(cur, &i); error = xfs_btree_insert(cur, &i);
if (error && error != -ENOSPC) if (error && error != -ENOSPC)
goto done; return error;
/* /*
* If get no-space back from btree insert, it tried a * If get no-space back from btree insert, it tried a
* split, and we have a zero block reservation. Fix up * split, and we have a zero block reservation. Fix up
@ -5174,33 +5185,28 @@ xfs_bmap_del_extent_real(
*/ */
error = xfs_bmbt_lookup_eq(cur, &got, &i); error = xfs_bmbt_lookup_eq(cur, &got, &i);
if (error) if (error)
goto done; return error;
if (XFS_IS_CORRUPT(mp, i != 1)) { if (XFS_IS_CORRUPT(mp, i != 1))
error = -EFSCORRUPTED; return -EFSCORRUPTED;
goto done;
}
/* /*
* Update the btree record back * Update the btree record back
* to the original value. * to the original value.
*/ */
error = xfs_bmbt_update(cur, &old); error = xfs_bmbt_update(cur, &old);
if (error) if (error)
goto done; return error;
/* /*
* Reset the extent record back * Reset the extent record back
* to the original value. * to the original value.
*/ */
xfs_iext_update_extent(ip, state, icur, &old); xfs_iext_update_extent(ip, state, icur, &old);
flags = 0; *logflagsp = 0;
error = -ENOSPC; return -ENOSPC;
goto done;
}
if (XFS_IS_CORRUPT(mp, i != 1)) {
error = -EFSCORRUPTED;
goto done;
} }
if (XFS_IS_CORRUPT(mp, i != 1))
return -EFSCORRUPTED;
} else } else
flags |= xfs_ilog_fext(whichfork); *logflagsp |= xfs_ilog_fext(whichfork);
ifp->if_nextents++; ifp->if_nextents++;
xfs_iext_next(ifp, icur); xfs_iext_next(ifp, icur);
@ -5218,13 +5224,13 @@ xfs_bmap_del_extent_real(
if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
xfs_refcount_decrease_extent(tp, del); xfs_refcount_decrease_extent(tp, del);
} else { } else {
error = __xfs_free_extent_later(tp, del->br_startblock, error = xfs_free_extent_later(tp, del->br_startblock,
del->br_blockcount, NULL, del->br_blockcount, NULL,
XFS_AG_RESV_NONE, XFS_AG_RESV_NONE,
((bflags & XFS_BMAPI_NODISCARD) || ((bflags & XFS_BMAPI_NODISCARD) ||
del->br_state == XFS_EXT_UNWRITTEN)); del->br_state == XFS_EXT_UNWRITTEN));
if (error) if (error)
goto done; return error;
} }
} }
@ -5239,9 +5245,7 @@ xfs_bmap_del_extent_real(
if (qfield && !(bflags & XFS_BMAPI_REMAP)) if (qfield && !(bflags & XFS_BMAPI_REMAP))
xfs_trans_mod_dquot_byino(tp, ip, qfield, (long)-nblks); xfs_trans_mod_dquot_byino(tp, ip, qfield, (long)-nblks);
done: return 0;
*logflagsp = flags;
return error;
} }
/* /*
@ -5250,7 +5254,7 @@ done:
* that value. If not all extents in the block range can be removed then * that value. If not all extents in the block range can be removed then
* *done is set. * *done is set.
*/ */
int /* error */ static int
__xfs_bunmapi( __xfs_bunmapi(
struct xfs_trans *tp, /* transaction pointer */ struct xfs_trans *tp, /* transaction pointer */
struct xfs_inode *ip, /* incore inode */ struct xfs_inode *ip, /* incore inode */
@ -6102,7 +6106,7 @@ __xfs_bmap_add(
bi->bi_bmap = *bmap; bi->bi_bmap = *bmap;
xfs_bmap_update_get_group(tp->t_mountp, bi); xfs_bmap_update_get_group(tp->t_mountp, bi);
xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_BMAP, &bi->bi_list); xfs_defer_add(tp, &bi->bi_list, &xfs_bmap_update_defer_type);
return 0; return 0;
} }
@ -6179,19 +6183,18 @@ xfs_bmap_finish_one(
return error; return error;
} }
/* Check that an inode's extent does not have invalid flags or bad ranges. */ /* Check that an extent does not have invalid flags or bad ranges. */
xfs_failaddr_t xfs_failaddr_t
xfs_bmap_validate_extent( xfs_bmap_validate_extent_raw(
struct xfs_inode *ip, struct xfs_mount *mp,
bool rtfile,
int whichfork, int whichfork,
struct xfs_bmbt_irec *irec) struct xfs_bmbt_irec *irec)
{ {
struct xfs_mount *mp = ip->i_mount;
if (!xfs_verify_fileext(mp, irec->br_startoff, irec->br_blockcount)) if (!xfs_verify_fileext(mp, irec->br_startoff, irec->br_blockcount))
return __this_address; return __this_address;
if (XFS_IS_REALTIME_INODE(ip) && whichfork == XFS_DATA_FORK) { if (rtfile && whichfork == XFS_DATA_FORK) {
if (!xfs_verify_rtbext(mp, irec->br_startblock, if (!xfs_verify_rtbext(mp, irec->br_startblock,
irec->br_blockcount)) irec->br_blockcount))
return __this_address; return __this_address;
@ -6221,3 +6224,53 @@ xfs_bmap_intent_destroy_cache(void)
kmem_cache_destroy(xfs_bmap_intent_cache); kmem_cache_destroy(xfs_bmap_intent_cache);
xfs_bmap_intent_cache = NULL; xfs_bmap_intent_cache = NULL;
} }
/* Check that an inode's extent does not have invalid flags or bad ranges. */
xfs_failaddr_t
xfs_bmap_validate_extent(
struct xfs_inode *ip,
int whichfork,
struct xfs_bmbt_irec *irec)
{
return xfs_bmap_validate_extent_raw(ip->i_mount,
XFS_IS_REALTIME_INODE(ip), whichfork, irec);
}
/*
* Used in xfs_itruncate_extents(). This is the maximum number of extents
* freed from a file in a single transaction.
*/
#define XFS_ITRUNC_MAX_EXTENTS 2
/*
* Unmap every extent in part of an inode's fork. We don't do any higher level
* invalidation work at all.
*/
int
xfs_bunmapi_range(
struct xfs_trans **tpp,
struct xfs_inode *ip,
uint32_t flags,
xfs_fileoff_t startoff,
xfs_fileoff_t endoff)
{
xfs_filblks_t unmap_len = endoff - startoff + 1;
int error = 0;
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
while (unmap_len > 0) {
ASSERT((*tpp)->t_highest_agno == NULLAGNUMBER);
error = __xfs_bunmapi(*tpp, ip, startoff, &unmap_len, flags,
XFS_ITRUNC_MAX_EXTENTS);
if (error)
goto out;
/* free the just unmapped extents */
error = xfs_defer_finish(tpp);
if (error)
goto out;
}
out:
return error;
}

View File

@ -116,6 +116,8 @@ static inline int xfs_bmapi_whichfork(uint32_t bmapi_flags)
return XFS_DATA_FORK; return XFS_DATA_FORK;
} }
void xfs_bmap_alloc_account(struct xfs_bmalloca *ap);
/* /*
* Special values for xfs_bmbt_irec_t br_startblock field. * Special values for xfs_bmbt_irec_t br_startblock field.
*/ */
@ -190,9 +192,6 @@ int xfs_bmapi_read(struct xfs_inode *ip, xfs_fileoff_t bno,
int xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip, int xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip,
xfs_fileoff_t bno, xfs_filblks_t len, uint32_t flags, xfs_fileoff_t bno, xfs_filblks_t len, uint32_t flags,
xfs_extlen_t total, struct xfs_bmbt_irec *mval, int *nmap); xfs_extlen_t total, struct xfs_bmbt_irec *mval, int *nmap);
int __xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
xfs_fileoff_t bno, xfs_filblks_t *rlen, uint32_t flags,
xfs_extnum_t nexts);
int xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip, int xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
xfs_fileoff_t bno, xfs_filblks_t len, uint32_t flags, xfs_fileoff_t bno, xfs_filblks_t len, uint32_t flags,
xfs_extnum_t nexts, int *done); xfs_extnum_t nexts, int *done);
@ -263,6 +262,8 @@ static inline uint32_t xfs_bmap_fork_to_state(int whichfork)
} }
} }
xfs_failaddr_t xfs_bmap_validate_extent_raw(struct xfs_mount *mp, bool rtfile,
int whichfork, struct xfs_bmbt_irec *irec);
xfs_failaddr_t xfs_bmap_validate_extent(struct xfs_inode *ip, int whichfork, xfs_failaddr_t xfs_bmap_validate_extent(struct xfs_inode *ip, int whichfork,
struct xfs_bmbt_irec *irec); struct xfs_bmbt_irec *irec);
int xfs_bmap_complain_bad_rec(struct xfs_inode *ip, int whichfork, int xfs_bmap_complain_bad_rec(struct xfs_inode *ip, int whichfork,
@ -271,6 +272,8 @@ int xfs_bmap_complain_bad_rec(struct xfs_inode *ip, int whichfork,
int xfs_bmapi_remap(struct xfs_trans *tp, struct xfs_inode *ip, int xfs_bmapi_remap(struct xfs_trans *tp, struct xfs_inode *ip,
xfs_fileoff_t bno, xfs_filblks_t len, xfs_fsblock_t startblock, xfs_fileoff_t bno, xfs_filblks_t len, xfs_fsblock_t startblock,
uint32_t flags); uint32_t flags);
int xfs_bunmapi_range(struct xfs_trans **tpp, struct xfs_inode *ip,
uint32_t flags, xfs_fileoff_t startoff, xfs_fileoff_t endoff);
extern struct kmem_cache *xfs_bmap_intent_cache; extern struct kmem_cache *xfs_bmap_intent_cache;

View File

@ -15,6 +15,7 @@
#include "xfs_trans.h" #include "xfs_trans.h"
#include "xfs_alloc.h" #include "xfs_alloc.h"
#include "xfs_btree.h" #include "xfs_btree.h"
#include "xfs_btree_staging.h"
#include "xfs_bmap_btree.h" #include "xfs_bmap_btree.h"
#include "xfs_bmap.h" #include "xfs_bmap.h"
#include "xfs_error.h" #include "xfs_error.h"
@ -272,7 +273,7 @@ xfs_bmbt_free_block(
xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork);
error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo, error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo,
XFS_AG_RESV_NONE); XFS_AG_RESV_NONE, false);
if (error) if (error)
return error; return error;
@ -288,10 +289,7 @@ xfs_bmbt_get_minrecs(
int level) int level)
{ {
if (level == cur->bc_nlevels - 1) { if (level == cur->bc_nlevels - 1) {
struct xfs_ifork *ifp; struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur);
ifp = xfs_ifork_ptr(cur->bc_ino.ip,
cur->bc_ino.whichfork);
return xfs_bmbt_maxrecs(cur->bc_mp, return xfs_bmbt_maxrecs(cur->bc_mp,
ifp->if_broot_bytes, level == 0) / 2; ifp->if_broot_bytes, level == 0) / 2;
@ -306,10 +304,7 @@ xfs_bmbt_get_maxrecs(
int level) int level)
{ {
if (level == cur->bc_nlevels - 1) { if (level == cur->bc_nlevels - 1) {
struct xfs_ifork *ifp; struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur);
ifp = xfs_ifork_ptr(cur->bc_ino.ip,
cur->bc_ino.whichfork);
return xfs_bmbt_maxrecs(cur->bc_mp, return xfs_bmbt_maxrecs(cur->bc_mp,
ifp->if_broot_bytes, level == 0); ifp->if_broot_bytes, level == 0);
@ -543,23 +538,19 @@ static const struct xfs_btree_ops xfs_bmbt_ops = {
.keys_contiguous = xfs_bmbt_keys_contiguous, .keys_contiguous = xfs_bmbt_keys_contiguous,
}; };
/* static struct xfs_btree_cur *
* Allocate a new bmap btree cursor. xfs_bmbt_init_common(
*/ struct xfs_mount *mp,
struct xfs_btree_cur * /* new bmap btree cursor */ struct xfs_trans *tp,
xfs_bmbt_init_cursor( struct xfs_inode *ip,
struct xfs_mount *mp, /* file system mount point */ int whichfork)
struct xfs_trans *tp, /* transaction pointer */
struct xfs_inode *ip, /* inode owning the btree */
int whichfork) /* data or attr fork */
{ {
struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork);
struct xfs_btree_cur *cur; struct xfs_btree_cur *cur;
ASSERT(whichfork != XFS_COW_FORK); ASSERT(whichfork != XFS_COW_FORK);
cur = xfs_btree_alloc_cursor(mp, tp, XFS_BTNUM_BMAP, cur = xfs_btree_alloc_cursor(mp, tp, XFS_BTNUM_BMAP,
mp->m_bm_maxlevels[whichfork], xfs_bmbt_cur_cache); mp->m_bm_maxlevels[whichfork], xfs_bmbt_cur_cache);
cur->bc_nlevels = be16_to_cpu(ifp->if_broot->bb_level) + 1;
cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_bmbt_2); cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_bmbt_2);
cur->bc_ops = &xfs_bmbt_ops; cur->bc_ops = &xfs_bmbt_ops;
@ -567,10 +558,30 @@ xfs_bmbt_init_cursor(
if (xfs_has_crc(mp)) if (xfs_has_crc(mp))
cur->bc_flags |= XFS_BTREE_CRC_BLOCKS; cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
cur->bc_ino.forksize = xfs_inode_fork_size(ip, whichfork);
cur->bc_ino.ip = ip; cur->bc_ino.ip = ip;
cur->bc_ino.allocated = 0; cur->bc_ino.allocated = 0;
cur->bc_ino.flags = 0; cur->bc_ino.flags = 0;
return cur;
}
/*
* Allocate a new bmap btree cursor.
*/
struct xfs_btree_cur *
xfs_bmbt_init_cursor(
struct xfs_mount *mp,
struct xfs_trans *tp,
struct xfs_inode *ip,
int whichfork)
{
struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork);
struct xfs_btree_cur *cur;
cur = xfs_bmbt_init_common(mp, tp, ip, whichfork);
cur->bc_nlevels = be16_to_cpu(ifp->if_broot->bb_level) + 1;
cur->bc_ino.forksize = xfs_inode_fork_size(ip, whichfork);
cur->bc_ino.whichfork = whichfork; cur->bc_ino.whichfork = whichfork;
return cur; return cur;
@ -587,6 +598,76 @@ xfs_bmbt_block_maxrecs(
return blocklen / (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t)); return blocklen / (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t));
} }
/*
* Allocate a new bmap btree cursor for reloading an inode block mapping data
* structure. Note that callers can use the staged cursor to reload extents
* format inode forks if they rebuild the iext tree and commit the staged
* cursor immediately.
*/
struct xfs_btree_cur *
xfs_bmbt_stage_cursor(
struct xfs_mount *mp,
struct xfs_inode *ip,
struct xbtree_ifakeroot *ifake)
{
struct xfs_btree_cur *cur;
struct xfs_btree_ops *ops;
/* data fork always has larger maxheight */
cur = xfs_bmbt_init_common(mp, NULL, ip, XFS_DATA_FORK);
cur->bc_nlevels = ifake->if_levels;
cur->bc_ino.forksize = ifake->if_fork_size;
/* Don't let anyone think we're attached to the real fork yet. */
cur->bc_ino.whichfork = -1;
xfs_btree_stage_ifakeroot(cur, ifake, &ops);
ops->update_cursor = NULL;
return cur;
}
/*
* Swap in the new inode fork root. Once we pass this point the newly rebuilt
* mappings are in place and we have to kill off any old btree blocks.
*/
void
xfs_bmbt_commit_staged_btree(
struct xfs_btree_cur *cur,
struct xfs_trans *tp,
int whichfork)
{
struct xbtree_ifakeroot *ifake = cur->bc_ino.ifake;
struct xfs_ifork *ifp;
static const short brootflag[2] = {XFS_ILOG_DBROOT, XFS_ILOG_ABROOT};
static const short extflag[2] = {XFS_ILOG_DEXT, XFS_ILOG_AEXT};
int flags = XFS_ILOG_CORE;
ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
ASSERT(whichfork != XFS_COW_FORK);
/*
* Free any resources hanging off the real fork, then shallow-copy the
* staging fork's contents into the real fork to transfer everything
* we just built.
*/
ifp = xfs_ifork_ptr(cur->bc_ino.ip, whichfork);
xfs_idestroy_fork(ifp);
memcpy(ifp, ifake->if_fork, sizeof(struct xfs_ifork));
switch (ifp->if_format) {
case XFS_DINODE_FMT_EXTENTS:
flags |= extflag[whichfork];
break;
case XFS_DINODE_FMT_BTREE:
flags |= brootflag[whichfork];
break;
default:
ASSERT(0);
break;
}
xfs_trans_log_inode(tp, cur->bc_ino.ip, flags);
xfs_btree_commit_ifakeroot(cur, tp, whichfork, &xfs_bmbt_ops);
}
/* /*
* Calculate number of records in a bmap btree block. * Calculate number of records in a bmap btree block.
*/ */

View File

@ -11,6 +11,7 @@ struct xfs_btree_block;
struct xfs_mount; struct xfs_mount;
struct xfs_inode; struct xfs_inode;
struct xfs_trans; struct xfs_trans;
struct xbtree_ifakeroot;
/* /*
* Btree block header size depends on a superblock flag. * Btree block header size depends on a superblock flag.
@ -106,6 +107,10 @@ extern int xfs_bmbt_change_owner(struct xfs_trans *tp, struct xfs_inode *ip,
extern struct xfs_btree_cur *xfs_bmbt_init_cursor(struct xfs_mount *, extern struct xfs_btree_cur *xfs_bmbt_init_cursor(struct xfs_mount *,
struct xfs_trans *, struct xfs_inode *, int); struct xfs_trans *, struct xfs_inode *, int);
struct xfs_btree_cur *xfs_bmbt_stage_cursor(struct xfs_mount *mp,
struct xfs_inode *ip, struct xbtree_ifakeroot *ifake);
void xfs_bmbt_commit_staged_btree(struct xfs_btree_cur *cur,
struct xfs_trans *tp, int whichfork);
extern unsigned long long xfs_bmbt_calc_size(struct xfs_mount *mp, extern unsigned long long xfs_bmbt_calc_size(struct xfs_mount *mp,
unsigned long long len); unsigned long long len);

View File

@ -1330,7 +1330,7 @@ xfs_btree_get_buf_block(
* Read in the buffer at the given ptr and return the buffer and * Read in the buffer at the given ptr and return the buffer and
* the block pointer within the buffer. * the block pointer within the buffer.
*/ */
STATIC int int
xfs_btree_read_buf_block( xfs_btree_read_buf_block(
struct xfs_btree_cur *cur, struct xfs_btree_cur *cur,
const union xfs_btree_ptr *ptr, const union xfs_btree_ptr *ptr,
@ -5212,3 +5212,29 @@ xfs_btree_destroy_cur_caches(void)
xfs_rmapbt_destroy_cur_cache(); xfs_rmapbt_destroy_cur_cache();
xfs_refcountbt_destroy_cur_cache(); xfs_refcountbt_destroy_cur_cache();
} }
/* Move the btree cursor before the first record. */
int
xfs_btree_goto_left_edge(
struct xfs_btree_cur *cur)
{
int stat = 0;
int error;
memset(&cur->bc_rec, 0, sizeof(cur->bc_rec));
error = xfs_btree_lookup(cur, XFS_LOOKUP_LE, &stat);
if (error)
return error;
if (!stat)
return 0;
error = xfs_btree_decrement(cur, 0, &stat);
if (error)
return error;
if (stat != 0) {
ASSERT(0);
return -EFSCORRUPTED;
}
return 0;
}

View File

@ -700,6 +700,9 @@ void xfs_btree_set_ptr_null(struct xfs_btree_cur *cur,
int xfs_btree_get_buf_block(struct xfs_btree_cur *cur, int xfs_btree_get_buf_block(struct xfs_btree_cur *cur,
const union xfs_btree_ptr *ptr, struct xfs_btree_block **block, const union xfs_btree_ptr *ptr, struct xfs_btree_block **block,
struct xfs_buf **bpp); struct xfs_buf **bpp);
int xfs_btree_read_buf_block(struct xfs_btree_cur *cur,
const union xfs_btree_ptr *ptr, int flags,
struct xfs_btree_block **block, struct xfs_buf **bpp);
void xfs_btree_set_sibling(struct xfs_btree_cur *cur, void xfs_btree_set_sibling(struct xfs_btree_cur *cur,
struct xfs_btree_block *block, const union xfs_btree_ptr *ptr, struct xfs_btree_block *block, const union xfs_btree_ptr *ptr,
int lr); int lr);
@ -735,4 +738,6 @@ xfs_btree_alloc_cursor(
int __init xfs_btree_init_cur_caches(void); int __init xfs_btree_init_cur_caches(void);
void xfs_btree_destroy_cur_caches(void); void xfs_btree_destroy_cur_caches(void);
int xfs_btree_goto_left_edge(struct xfs_btree_cur *cur);
#endif /* __XFS_BTREE_H__ */ #endif /* __XFS_BTREE_H__ */

View File

@ -333,20 +333,41 @@ xfs_btree_commit_ifakeroot(
/* /*
* Put a btree block that we're loading onto the ordered list and release it. * Put a btree block that we're loading onto the ordered list and release it.
* The btree blocks will be written to disk when bulk loading is finished. * The btree blocks will be written to disk when bulk loading is finished.
* If we reach the dirty buffer threshold, flush them to disk before
* continuing.
*/ */
static void static int
xfs_btree_bload_drop_buf( xfs_btree_bload_drop_buf(
struct list_head *buffers_list, struct xfs_btree_bload *bbl,
struct xfs_buf **bpp) struct list_head *buffers_list,
struct xfs_buf **bpp)
{ {
if (*bpp == NULL) struct xfs_buf *bp = *bpp;
return; int error;
if (!xfs_buf_delwri_queue(*bpp, buffers_list)) if (!bp)
ASSERT(0); return 0;
xfs_buf_relse(*bpp); /*
* Mark this buffer XBF_DONE (i.e. uptodate) so that a subsequent
* xfs_buf_read will not pointlessly reread the contents from the disk.
*/
bp->b_flags |= XBF_DONE;
xfs_buf_delwri_queue_here(bp, buffers_list);
xfs_buf_relse(bp);
*bpp = NULL; *bpp = NULL;
bbl->nr_dirty++;
if (!bbl->max_dirty || bbl->nr_dirty < bbl->max_dirty)
return 0;
error = xfs_buf_delwri_submit(buffers_list);
if (error)
return error;
bbl->nr_dirty = 0;
return 0;
} }
/* /*
@ -384,7 +405,7 @@ xfs_btree_bload_prep_block(
ASSERT(*bpp == NULL); ASSERT(*bpp == NULL);
/* Allocate a new incore btree root block. */ /* Allocate a new incore btree root block. */
new_size = bbl->iroot_size(cur, nr_this_block, priv); new_size = bbl->iroot_size(cur, level, nr_this_block, priv);
ifp->if_broot = kmem_zalloc(new_size, 0); ifp->if_broot = kmem_zalloc(new_size, 0);
ifp->if_broot_bytes = (int)new_size; ifp->if_broot_bytes = (int)new_size;
@ -418,7 +439,10 @@ xfs_btree_bload_prep_block(
*/ */
if (*blockp) if (*blockp)
xfs_btree_set_sibling(cur, *blockp, &new_ptr, XFS_BB_RIGHTSIB); xfs_btree_set_sibling(cur, *blockp, &new_ptr, XFS_BB_RIGHTSIB);
xfs_btree_bload_drop_buf(buffers_list, bpp);
ret = xfs_btree_bload_drop_buf(bbl, buffers_list, bpp);
if (ret)
return ret;
/* Initialize the new btree block. */ /* Initialize the new btree block. */
xfs_btree_init_block_cur(cur, new_bp, level, nr_this_block); xfs_btree_init_block_cur(cur, new_bp, level, nr_this_block);
@ -436,22 +460,19 @@ STATIC int
xfs_btree_bload_leaf( xfs_btree_bload_leaf(
struct xfs_btree_cur *cur, struct xfs_btree_cur *cur,
unsigned int recs_this_block, unsigned int recs_this_block,
xfs_btree_bload_get_record_fn get_record, xfs_btree_bload_get_records_fn get_records,
struct xfs_btree_block *block, struct xfs_btree_block *block,
void *priv) void *priv)
{ {
unsigned int j; unsigned int j = 1;
int ret; int ret;
/* Fill the leaf block with records. */ /* Fill the leaf block with records. */
for (j = 1; j <= recs_this_block; j++) { while (j <= recs_this_block) {
union xfs_btree_rec *block_rec; ret = get_records(cur, j, block, recs_this_block - j + 1, priv);
if (ret < 0)
ret = get_record(cur, priv);
if (ret)
return ret; return ret;
block_rec = xfs_btree_rec_addr(cur, j, block); j += ret;
cur->bc_ops->init_rec_from_cur(cur, block_rec);
} }
return 0; return 0;
@ -485,7 +506,12 @@ xfs_btree_bload_node(
ASSERT(!xfs_btree_ptr_is_null(cur, child_ptr)); ASSERT(!xfs_btree_ptr_is_null(cur, child_ptr));
ret = xfs_btree_get_buf_block(cur, child_ptr, &child_block, /*
* Read the lower-level block in case the buffer for it has
* been reclaimed. LRU refs will be set on the block, which is
* desirable if the new btree commits.
*/
ret = xfs_btree_read_buf_block(cur, child_ptr, 0, &child_block,
&child_bp); &child_bp);
if (ret) if (ret)
return ret; return ret;
@ -570,7 +596,14 @@ xfs_btree_bload_level_geometry(
unsigned int desired_npb; unsigned int desired_npb;
unsigned int maxnr; unsigned int maxnr;
maxnr = cur->bc_ops->get_maxrecs(cur, level); /*
* Compute the absolute maximum number of records that we can store in
* the ondisk block or inode root.
*/
if (cur->bc_ops->get_dmaxrecs)
maxnr = cur->bc_ops->get_dmaxrecs(cur, level);
else
maxnr = cur->bc_ops->get_maxrecs(cur, level);
/* /*
* Compute the number of blocks we need to fill each block with the * Compute the number of blocks we need to fill each block with the
@ -764,6 +797,7 @@ xfs_btree_bload(
cur->bc_nlevels = bbl->btree_height; cur->bc_nlevels = bbl->btree_height;
xfs_btree_set_ptr_null(cur, &child_ptr); xfs_btree_set_ptr_null(cur, &child_ptr);
xfs_btree_set_ptr_null(cur, &ptr); xfs_btree_set_ptr_null(cur, &ptr);
bbl->nr_dirty = 0;
xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level, xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
&avg_per_block, &blocks, &blocks_with_extra); &avg_per_block, &blocks, &blocks_with_extra);
@ -789,7 +823,7 @@ xfs_btree_bload(
trace_xfs_btree_bload_block(cur, level, i, blocks, &ptr, trace_xfs_btree_bload_block(cur, level, i, blocks, &ptr,
nr_this_block); nr_this_block);
ret = xfs_btree_bload_leaf(cur, nr_this_block, bbl->get_record, ret = xfs_btree_bload_leaf(cur, nr_this_block, bbl->get_records,
block, priv); block, priv);
if (ret) if (ret)
goto out; goto out;
@ -802,7 +836,10 @@ xfs_btree_bload(
xfs_btree_copy_ptrs(cur, &child_ptr, &ptr, 1); xfs_btree_copy_ptrs(cur, &child_ptr, &ptr, 1);
} }
total_blocks += blocks; total_blocks += blocks;
xfs_btree_bload_drop_buf(&buffers_list, &bp);
ret = xfs_btree_bload_drop_buf(bbl, &buffers_list, &bp);
if (ret)
goto out;
/* Populate the internal btree nodes. */ /* Populate the internal btree nodes. */
for (level = 1; level < cur->bc_nlevels; level++) { for (level = 1; level < cur->bc_nlevels; level++) {
@ -844,7 +881,11 @@ xfs_btree_bload(
xfs_btree_copy_ptrs(cur, &first_ptr, &ptr, 1); xfs_btree_copy_ptrs(cur, &first_ptr, &ptr, 1);
} }
total_blocks += blocks; total_blocks += blocks;
xfs_btree_bload_drop_buf(&buffers_list, &bp);
ret = xfs_btree_bload_drop_buf(bbl, &buffers_list, &bp);
if (ret)
goto out;
xfs_btree_copy_ptrs(cur, &child_ptr, &first_ptr, 1); xfs_btree_copy_ptrs(cur, &child_ptr, &first_ptr, 1);
} }

View File

@ -37,12 +37,6 @@ struct xbtree_ifakeroot {
/* Number of bytes available for this fork in the inode. */ /* Number of bytes available for this fork in the inode. */
unsigned int if_fork_size; unsigned int if_fork_size;
/* Fork format. */
unsigned int if_format;
/* Number of records. */
unsigned int if_extents;
}; };
/* Cursor interactions with fake roots for inode-rooted btrees. */ /* Cursor interactions with fake roots for inode-rooted btrees. */
@ -53,19 +47,24 @@ void xfs_btree_commit_ifakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
int whichfork, const struct xfs_btree_ops *ops); int whichfork, const struct xfs_btree_ops *ops);
/* Bulk loading of staged btrees. */ /* Bulk loading of staged btrees. */
typedef int (*xfs_btree_bload_get_record_fn)(struct xfs_btree_cur *cur, void *priv); typedef int (*xfs_btree_bload_get_records_fn)(struct xfs_btree_cur *cur,
unsigned int idx, struct xfs_btree_block *block,
unsigned int nr_wanted, void *priv);
typedef int (*xfs_btree_bload_claim_block_fn)(struct xfs_btree_cur *cur, typedef int (*xfs_btree_bload_claim_block_fn)(struct xfs_btree_cur *cur,
union xfs_btree_ptr *ptr, void *priv); union xfs_btree_ptr *ptr, void *priv);
typedef size_t (*xfs_btree_bload_iroot_size_fn)(struct xfs_btree_cur *cur, typedef size_t (*xfs_btree_bload_iroot_size_fn)(struct xfs_btree_cur *cur,
unsigned int nr_this_level, void *priv); unsigned int level, unsigned int nr_this_level, void *priv);
struct xfs_btree_bload { struct xfs_btree_bload {
/* /*
* This function will be called nr_records times to load records into * This function will be called to load @nr_wanted records into the
* the btree. The function does this by setting the cursor's bc_rec * btree. The implementation does this by setting the cursor's bc_rec
* field in in-core format. Records must be returned in sort order. * field in in-core format and using init_rec_from_cur to set the
* records in the btree block. Records must be returned in sort order.
* The function must return the number of records loaded or the usual
* negative errno.
*/ */
xfs_btree_bload_get_record_fn get_record; xfs_btree_bload_get_records_fn get_records;
/* /*
* This function will be called nr_blocks times to obtain a pointer * This function will be called nr_blocks times to obtain a pointer
@ -113,6 +112,16 @@ struct xfs_btree_bload {
* height of the new btree. * height of the new btree.
*/ */
unsigned int btree_height; unsigned int btree_height;
/*
* Flush the new btree block buffer list to disk after this many blocks
* have been formatted. Zero prohibits writing any buffers until all
* blocks have been formatted.
*/
uint16_t max_dirty;
/* Number of dirty buffers. */
uint16_t nr_dirty;
}; };
int xfs_btree_bload_compute_geometry(struct xfs_btree_cur *cur, int xfs_btree_bload_compute_geometry(struct xfs_btree_cur *cur,

View File

@ -421,6 +421,25 @@ xfs_da3_node_read_mapped(
return xfs_da3_node_set_type(tp, *bpp); return xfs_da3_node_set_type(tp, *bpp);
} }
/*
* Copy src directory/attr leaf/node buffer to the dst.
* For v5 file systems make sure the right blkno is stamped in.
*/
void
xfs_da_buf_copy(
struct xfs_buf *dst,
struct xfs_buf *src,
size_t size)
{
struct xfs_da3_blkinfo *da3 = dst->b_addr;
memcpy(dst->b_addr, src->b_addr, size);
dst->b_ops = src->b_ops;
xfs_trans_buf_copy_type(dst, src);
if (xfs_has_crc(dst->b_mount))
da3->blkno = cpu_to_be64(xfs_buf_daddr(dst));
}
/*======================================================================== /*========================================================================
* Routines used for growing the Btree. * Routines used for growing the Btree.
*========================================================================*/ *========================================================================*/
@ -690,12 +709,6 @@ xfs_da3_root_split(
btree = icnodehdr.btree; btree = icnodehdr.btree;
size = (int)((char *)&btree[icnodehdr.count] - (char *)oldroot); size = (int)((char *)&btree[icnodehdr.count] - (char *)oldroot);
level = icnodehdr.level; level = icnodehdr.level;
/*
* we are about to copy oldroot to bp, so set up the type
* of bp while we know exactly what it will be.
*/
xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DA_NODE_BUF);
} else { } else {
struct xfs_dir3_icleaf_hdr leafhdr; struct xfs_dir3_icleaf_hdr leafhdr;
@ -707,31 +720,17 @@ xfs_da3_root_split(
size = (int)((char *)&leafhdr.ents[leafhdr.count] - size = (int)((char *)&leafhdr.ents[leafhdr.count] -
(char *)leaf); (char *)leaf);
level = 0; level = 0;
/*
* we are about to copy oldroot to bp, so set up the type
* of bp while we know exactly what it will be.
*/
xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DIR_LEAFN_BUF);
} }
/* /*
* we can copy most of the information in the node from one block to * Copy old root to new buffer and log it.
* another, but for CRC enabled headers we have to make sure that the
* block specific identifiers are kept intact. We update the buffer
* directly for this.
*/ */
memcpy(node, oldroot, size); xfs_da_buf_copy(bp, blk1->bp, size);
if (oldroot->hdr.info.magic == cpu_to_be16(XFS_DA3_NODE_MAGIC) ||
oldroot->hdr.info.magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC)) {
struct xfs_da3_intnode *node3 = (struct xfs_da3_intnode *)node;
node3->hdr.info.blkno = cpu_to_be64(xfs_buf_daddr(bp));
}
xfs_trans_log_buf(tp, bp, 0, size - 1); xfs_trans_log_buf(tp, bp, 0, size - 1);
bp->b_ops = blk1->bp->b_ops; /*
xfs_trans_buf_copy_type(bp, blk1->bp); * Update blk1 to point to new buffer.
*/
blk1->bp = bp; blk1->bp = bp;
blk1->blkno = blkno; blk1->blkno = blkno;
@ -1220,21 +1219,14 @@ xfs_da3_root_join(
xfs_da_blkinfo_onlychild_validate(bp->b_addr, oldroothdr.level); xfs_da_blkinfo_onlychild_validate(bp->b_addr, oldroothdr.level);
/* /*
* This could be copying a leaf back into the root block in the case of * Copy child to root buffer and log it.
* there only being a single leaf block left in the tree. Hence we have
* to update the b_ops pointer as well to match the buffer type change
* that could occur. For dir3 blocks we also need to update the block
* number in the buffer header.
*/ */
memcpy(root_blk->bp->b_addr, bp->b_addr, args->geo->blksize); xfs_da_buf_copy(root_blk->bp, bp, args->geo->blksize);
root_blk->bp->b_ops = bp->b_ops;
xfs_trans_buf_copy_type(root_blk->bp, bp);
if (oldroothdr.magic == XFS_DA3_NODE_MAGIC) {
struct xfs_da3_blkinfo *da3 = root_blk->bp->b_addr;
da3->blkno = cpu_to_be64(xfs_buf_daddr(root_blk->bp));
}
xfs_trans_log_buf(args->trans, root_blk->bp, 0, xfs_trans_log_buf(args->trans, root_blk->bp, 0,
args->geo->blksize - 1); args->geo->blksize - 1);
/*
* Now we can drop the child buffer.
*/
error = xfs_da_shrink_inode(args, child, bp); error = xfs_da_shrink_inode(args, child, bp);
return error; return error;
} }
@ -2317,9 +2309,10 @@ xfs_da3_swap_lastblock(
/* /*
* Copy the last block into the dead buffer and log it. * Copy the last block into the dead buffer and log it.
*/ */
memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize); xfs_da_buf_copy(dead_buf, last_buf, args->geo->blksize);
xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1); xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
dead_info = dead_buf->b_addr; dead_info = dead_buf->b_addr;
/* /*
* Get values from the moved block. * Get values from the moved block.
*/ */

View File

@ -219,6 +219,8 @@ int xfs_da_reada_buf(struct xfs_inode *dp, xfs_dablk_t bno,
const struct xfs_buf_ops *ops); const struct xfs_buf_ops *ops);
int xfs_da_shrink_inode(xfs_da_args_t *args, xfs_dablk_t dead_blkno, int xfs_da_shrink_inode(xfs_da_args_t *args, xfs_dablk_t dead_blkno,
struct xfs_buf *dead_buf); struct xfs_buf *dead_buf);
void xfs_da_buf_copy(struct xfs_buf *dst, struct xfs_buf *src,
size_t size);
uint xfs_da_hashname(const uint8_t *name_string, int name_length); uint xfs_da_hashname(const uint8_t *name_string, int name_length);
enum xfs_dacmp xfs_da_compname(struct xfs_da_args *args, enum xfs_dacmp xfs_da_compname(struct xfs_da_args *args,

View File

@ -578,20 +578,25 @@ xfs_dir2_block_leaf_p(struct xfs_dir2_block_tail *btp)
#define XFS_ATTR_LEAF_MAPSIZE 3 /* how many freespace slots */ #define XFS_ATTR_LEAF_MAPSIZE 3 /* how many freespace slots */
/* /*
* Entries are packed toward the top as tight as possible. * Attribute storage when stored inside the inode.
*
* Small attribute lists are packed as tightly as possible so as to fit into the
* literal area of the inode.
*
* These "shortform" attribute forks consist of a single xfs_attr_sf_hdr header
* followed by zero or more xfs_attr_sf_entry structures.
*/ */
struct xfs_attr_shortform { struct xfs_attr_sf_hdr { /* constant-structure header block */
struct xfs_attr_sf_hdr { /* constant-structure header block */ __be16 totsize; /* total bytes in shortform list */
__be16 totsize; /* total bytes in shortform list */ __u8 count; /* count of active entries */
__u8 count; /* count of active entries */ __u8 padding;
__u8 padding; };
} hdr;
struct xfs_attr_sf_entry { struct xfs_attr_sf_entry {
uint8_t namelen; /* actual length of name (no NULL) */ __u8 namelen; /* actual length of name (no NULL) */
uint8_t valuelen; /* actual length of value (no NULL) */ __u8 valuelen; /* actual length of value (no NULL) */
uint8_t flags; /* flags bits (see xfs_attr_leaf.h) */ __u8 flags; /* flags bits (XFS_ATTR_*) */
uint8_t nameval[]; /* name & value bytes concatenated */ __u8 nameval[]; /* name & value bytes concatenated */
} list[]; /* variable sized array */
}; };
typedef struct xfs_attr_leaf_map { /* RLE map of free bytes */ typedef struct xfs_attr_leaf_map { /* RLE map of free bytes */

View File

@ -26,6 +26,7 @@
#include "xfs_da_format.h" #include "xfs_da_format.h"
#include "xfs_da_btree.h" #include "xfs_da_btree.h"
#include "xfs_attr.h" #include "xfs_attr.h"
#include "xfs_trans_priv.h"
static struct kmem_cache *xfs_defer_pending_cache; static struct kmem_cache *xfs_defer_pending_cache;
@ -181,16 +182,89 @@ static struct kmem_cache *xfs_defer_pending_cache;
* Note that the continuation requested between t2 and t3 is likely to * Note that the continuation requested between t2 and t3 is likely to
* reoccur. * reoccur.
*/ */
STATIC struct xfs_log_item *
xfs_defer_barrier_create_intent(
struct xfs_trans *tp,
struct list_head *items,
unsigned int count,
bool sort)
{
return NULL;
}
static const struct xfs_defer_op_type *defer_op_types[] = { STATIC void
[XFS_DEFER_OPS_TYPE_BMAP] = &xfs_bmap_update_defer_type, xfs_defer_barrier_abort_intent(
[XFS_DEFER_OPS_TYPE_REFCOUNT] = &xfs_refcount_update_defer_type, struct xfs_log_item *intent)
[XFS_DEFER_OPS_TYPE_RMAP] = &xfs_rmap_update_defer_type, {
[XFS_DEFER_OPS_TYPE_FREE] = &xfs_extent_free_defer_type, /* empty */
[XFS_DEFER_OPS_TYPE_AGFL_FREE] = &xfs_agfl_free_defer_type, }
[XFS_DEFER_OPS_TYPE_ATTR] = &xfs_attr_defer_type,
STATIC struct xfs_log_item *
xfs_defer_barrier_create_done(
struct xfs_trans *tp,
struct xfs_log_item *intent,
unsigned int count)
{
return NULL;
}
STATIC int
xfs_defer_barrier_finish_item(
struct xfs_trans *tp,
struct xfs_log_item *done,
struct list_head *item,
struct xfs_btree_cur **state)
{
ASSERT(0);
return -EFSCORRUPTED;
}
STATIC void
xfs_defer_barrier_cancel_item(
struct list_head *item)
{
ASSERT(0);
}
static const struct xfs_defer_op_type xfs_barrier_defer_type = {
.max_items = 1,
.create_intent = xfs_defer_barrier_create_intent,
.abort_intent = xfs_defer_barrier_abort_intent,
.create_done = xfs_defer_barrier_create_done,
.finish_item = xfs_defer_barrier_finish_item,
.cancel_item = xfs_defer_barrier_cancel_item,
}; };
/* Create a log intent done item for a log intent item. */
static inline void
xfs_defer_create_done(
struct xfs_trans *tp,
struct xfs_defer_pending *dfp)
{
struct xfs_log_item *lip;
/* If there is no log intent item, there can be no log done item. */
if (!dfp->dfp_intent)
return;
/*
* Mark the transaction dirty, even on error. This ensures the
* transaction is aborted, which:
*
* 1.) releases the log intent item and frees the log done item
* 2.) shuts down the filesystem
*/
tp->t_flags |= XFS_TRANS_DIRTY;
lip = dfp->dfp_ops->create_done(tp, dfp->dfp_intent, dfp->dfp_count);
if (!lip)
return;
tp->t_flags |= XFS_TRANS_HAS_INTENT_DONE;
xfs_trans_add_item(tp, lip);
set_bit(XFS_LI_DIRTY, &lip->li_flags);
dfp->dfp_done = lip;
}
/* /*
* Ensure there's a log intent item associated with this deferred work item if * Ensure there's a log intent item associated with this deferred work item if
* the operation must be restarted on crash. Returns 1 if there's a log item; * the operation must be restarted on crash. Returns 1 if there's a log item;
@ -202,18 +276,21 @@ xfs_defer_create_intent(
struct xfs_defer_pending *dfp, struct xfs_defer_pending *dfp,
bool sort) bool sort)
{ {
const struct xfs_defer_op_type *ops = defer_op_types[dfp->dfp_type];
struct xfs_log_item *lip; struct xfs_log_item *lip;
if (dfp->dfp_intent) if (dfp->dfp_intent)
return 1; return 1;
lip = ops->create_intent(tp, &dfp->dfp_work, dfp->dfp_count, sort); lip = dfp->dfp_ops->create_intent(tp, &dfp->dfp_work, dfp->dfp_count,
sort);
if (!lip) if (!lip)
return 0; return 0;
if (IS_ERR(lip)) if (IS_ERR(lip))
return PTR_ERR(lip); return PTR_ERR(lip);
tp->t_flags |= XFS_TRANS_DIRTY;
xfs_trans_add_item(tp, lip);
set_bit(XFS_LI_DIRTY, &lip->li_flags);
dfp->dfp_intent = lip; dfp->dfp_intent = lip;
return 1; return 1;
} }
@ -245,23 +322,50 @@ xfs_defer_create_intents(
return ret; return ret;
} }
STATIC void static inline void
xfs_defer_pending_abort( xfs_defer_pending_abort(
struct xfs_mount *mp,
struct xfs_defer_pending *dfp)
{
trace_xfs_defer_pending_abort(mp, dfp);
if (dfp->dfp_intent && !dfp->dfp_done) {
dfp->dfp_ops->abort_intent(dfp->dfp_intent);
dfp->dfp_intent = NULL;
}
}
static inline void
xfs_defer_pending_cancel_work(
struct xfs_mount *mp,
struct xfs_defer_pending *dfp)
{
struct list_head *pwi;
struct list_head *n;
trace_xfs_defer_cancel_list(mp, dfp);
list_del(&dfp->dfp_list);
list_for_each_safe(pwi, n, &dfp->dfp_work) {
list_del(pwi);
dfp->dfp_count--;
trace_xfs_defer_cancel_item(mp, dfp, pwi);
dfp->dfp_ops->cancel_item(pwi);
}
ASSERT(dfp->dfp_count == 0);
kmem_cache_free(xfs_defer_pending_cache, dfp);
}
STATIC void
xfs_defer_pending_abort_list(
struct xfs_mount *mp, struct xfs_mount *mp,
struct list_head *dop_list) struct list_head *dop_list)
{ {
struct xfs_defer_pending *dfp; struct xfs_defer_pending *dfp;
const struct xfs_defer_op_type *ops;
/* Abort intent items that don't have a done item. */ /* Abort intent items that don't have a done item. */
list_for_each_entry(dfp, dop_list, dfp_list) { list_for_each_entry(dfp, dop_list, dfp_list)
ops = defer_op_types[dfp->dfp_type]; xfs_defer_pending_abort(mp, dfp);
trace_xfs_defer_pending_abort(mp, dfp);
if (dfp->dfp_intent && !dfp->dfp_done) {
ops->abort_intent(dfp->dfp_intent);
dfp->dfp_intent = NULL;
}
}
} }
/* Abort all the intents that were committed. */ /* Abort all the intents that were committed. */
@ -271,7 +375,7 @@ xfs_defer_trans_abort(
struct list_head *dop_pending) struct list_head *dop_pending)
{ {
trace_xfs_defer_trans_abort(tp, _RET_IP_); trace_xfs_defer_trans_abort(tp, _RET_IP_);
xfs_defer_pending_abort(tp->t_mountp, dop_pending); xfs_defer_pending_abort_list(tp->t_mountp, dop_pending);
} }
/* /*
@ -389,27 +493,31 @@ xfs_defer_cancel_list(
{ {
struct xfs_defer_pending *dfp; struct xfs_defer_pending *dfp;
struct xfs_defer_pending *pli; struct xfs_defer_pending *pli;
struct list_head *pwi;
struct list_head *n;
const struct xfs_defer_op_type *ops;
/* /*
* Free the pending items. Caller should already have arranged * Free the pending items. Caller should already have arranged
* for the intent items to be released. * for the intent items to be released.
*/ */
list_for_each_entry_safe(dfp, pli, dop_list, dfp_list) { list_for_each_entry_safe(dfp, pli, dop_list, dfp_list)
ops = defer_op_types[dfp->dfp_type]; xfs_defer_pending_cancel_work(mp, dfp);
trace_xfs_defer_cancel_list(mp, dfp); }
list_del(&dfp->dfp_list);
list_for_each_safe(pwi, n, &dfp->dfp_work) { static inline void
list_del(pwi); xfs_defer_relog_intent(
dfp->dfp_count--; struct xfs_trans *tp,
trace_xfs_defer_cancel_item(mp, dfp, pwi); struct xfs_defer_pending *dfp)
ops->cancel_item(pwi); {
} struct xfs_log_item *lip;
ASSERT(dfp->dfp_count == 0);
kmem_cache_free(xfs_defer_pending_cache, dfp); xfs_defer_create_done(tp, dfp);
lip = dfp->dfp_ops->relog_intent(tp, dfp->dfp_intent, dfp->dfp_done);
if (lip) {
xfs_trans_add_item(tp, lip);
set_bit(XFS_LI_DIRTY, &lip->li_flags);
} }
dfp->dfp_done = NULL;
dfp->dfp_intent = lip;
} }
/* /*
@ -417,7 +525,7 @@ xfs_defer_cancel_list(
* done item to release the intent item; and then log a new intent item. * done item to release the intent item; and then log a new intent item.
* The caller should provide a fresh transaction and roll it after we're done. * The caller should provide a fresh transaction and roll it after we're done.
*/ */
static int static void
xfs_defer_relog( xfs_defer_relog(
struct xfs_trans **tpp, struct xfs_trans **tpp,
struct list_head *dfops) struct list_head *dfops)
@ -456,31 +564,28 @@ xfs_defer_relog(
trace_xfs_defer_relog_intent((*tpp)->t_mountp, dfp); trace_xfs_defer_relog_intent((*tpp)->t_mountp, dfp);
XFS_STATS_INC((*tpp)->t_mountp, defer_relog); XFS_STATS_INC((*tpp)->t_mountp, defer_relog);
dfp->dfp_intent = xfs_trans_item_relog(dfp->dfp_intent, *tpp);
}
if ((*tpp)->t_flags & XFS_TRANS_DIRTY) xfs_defer_relog_intent(*tpp, dfp);
return xfs_defer_trans_roll(tpp); }
return 0;
} }
/* /*
* Log an intent-done item for the first pending intent, and finish the work * Log an intent-done item for the first pending intent, and finish the work
* items. * items.
*/ */
static int int
xfs_defer_finish_one( xfs_defer_finish_one(
struct xfs_trans *tp, struct xfs_trans *tp,
struct xfs_defer_pending *dfp) struct xfs_defer_pending *dfp)
{ {
const struct xfs_defer_op_type *ops = defer_op_types[dfp->dfp_type]; const struct xfs_defer_op_type *ops = dfp->dfp_ops;
struct xfs_btree_cur *state = NULL; struct xfs_btree_cur *state = NULL;
struct list_head *li, *n; struct list_head *li, *n;
int error; int error;
trace_xfs_defer_pending_finish(tp->t_mountp, dfp); trace_xfs_defer_pending_finish(tp->t_mountp, dfp);
dfp->dfp_done = ops->create_done(tp, dfp->dfp_intent, dfp->dfp_count); xfs_defer_create_done(tp, dfp);
list_for_each_safe(li, n, &dfp->dfp_work) { list_for_each_safe(li, n, &dfp->dfp_work) {
list_del(li); list_del(li);
dfp->dfp_count--; dfp->dfp_count--;
@ -517,6 +622,24 @@ out:
return error; return error;
} }
/* Move all paused deferred work from @tp to @paused_list. */
static void
xfs_defer_isolate_paused(
struct xfs_trans *tp,
struct list_head *paused_list)
{
struct xfs_defer_pending *dfp;
struct xfs_defer_pending *pli;
list_for_each_entry_safe(dfp, pli, &tp->t_dfops, dfp_list) {
if (!(dfp->dfp_flags & XFS_DEFER_PAUSED))
continue;
list_move_tail(&dfp->dfp_list, paused_list);
trace_xfs_defer_isolate_paused(tp->t_mountp, dfp);
}
}
/* /*
* Finish all the pending work. This involves logging intent items for * Finish all the pending work. This involves logging intent items for
* any work items that wandered in since the last transaction roll (if * any work items that wandered in since the last transaction roll (if
@ -532,6 +655,7 @@ xfs_defer_finish_noroll(
struct xfs_defer_pending *dfp = NULL; struct xfs_defer_pending *dfp = NULL;
int error = 0; int error = 0;
LIST_HEAD(dop_pending); LIST_HEAD(dop_pending);
LIST_HEAD(dop_paused);
ASSERT((*tp)->t_flags & XFS_TRANS_PERM_LOG_RES); ASSERT((*tp)->t_flags & XFS_TRANS_PERM_LOG_RES);
@ -550,6 +674,8 @@ xfs_defer_finish_noroll(
*/ */
int has_intents = xfs_defer_create_intents(*tp); int has_intents = xfs_defer_create_intents(*tp);
xfs_defer_isolate_paused(*tp, &dop_paused);
list_splice_init(&(*tp)->t_dfops, &dop_pending); list_splice_init(&(*tp)->t_dfops, &dop_pending);
if (has_intents < 0) { if (has_intents < 0) {
@ -562,22 +688,33 @@ xfs_defer_finish_noroll(
goto out_shutdown; goto out_shutdown;
/* Relog intent items to keep the log moving. */ /* Relog intent items to keep the log moving. */
error = xfs_defer_relog(tp, &dop_pending); xfs_defer_relog(tp, &dop_pending);
if (error) xfs_defer_relog(tp, &dop_paused);
goto out_shutdown;
if ((*tp)->t_flags & XFS_TRANS_DIRTY) {
error = xfs_defer_trans_roll(tp);
if (error)
goto out_shutdown;
}
} }
dfp = list_first_entry(&dop_pending, struct xfs_defer_pending, dfp = list_first_entry_or_null(&dop_pending,
dfp_list); struct xfs_defer_pending, dfp_list);
if (!dfp)
break;
error = xfs_defer_finish_one(*tp, dfp); error = xfs_defer_finish_one(*tp, dfp);
if (error && error != -EAGAIN) if (error && error != -EAGAIN)
goto out_shutdown; goto out_shutdown;
} }
/* Requeue the paused items in the outgoing transaction. */
list_splice_tail_init(&dop_paused, &(*tp)->t_dfops);
trace_xfs_defer_finish_done(*tp, _RET_IP_); trace_xfs_defer_finish_done(*tp, _RET_IP_);
return 0; return 0;
out_shutdown: out_shutdown:
list_splice_tail_init(&dop_paused, &dop_pending);
xfs_defer_trans_abort(*tp, &dop_pending); xfs_defer_trans_abort(*tp, &dop_pending);
xfs_force_shutdown((*tp)->t_mountp, SHUTDOWN_CORRUPT_INCORE); xfs_force_shutdown((*tp)->t_mountp, SHUTDOWN_CORRUPT_INCORE);
trace_xfs_defer_finish_error(*tp, error); trace_xfs_defer_finish_error(*tp, error);
@ -590,6 +727,9 @@ int
xfs_defer_finish( xfs_defer_finish(
struct xfs_trans **tp) struct xfs_trans **tp)
{ {
#ifdef DEBUG
struct xfs_defer_pending *dfp;
#endif
int error; int error;
/* /*
@ -609,7 +749,10 @@ xfs_defer_finish(
} }
/* Reset LOWMODE now that we've finished all the dfops. */ /* Reset LOWMODE now that we've finished all the dfops. */
ASSERT(list_empty(&(*tp)->t_dfops)); #ifdef DEBUG
list_for_each_entry(dfp, &(*tp)->t_dfops, dfp_list)
ASSERT(dfp->dfp_flags & XFS_DEFER_PAUSED);
#endif
(*tp)->t_flags &= ~XFS_TRANS_LOWMODE; (*tp)->t_flags &= ~XFS_TRANS_LOWMODE;
return 0; return 0;
} }
@ -621,48 +764,165 @@ xfs_defer_cancel(
struct xfs_mount *mp = tp->t_mountp; struct xfs_mount *mp = tp->t_mountp;
trace_xfs_defer_cancel(tp, _RET_IP_); trace_xfs_defer_cancel(tp, _RET_IP_);
xfs_defer_trans_abort(tp, &tp->t_dfops);
xfs_defer_cancel_list(mp, &tp->t_dfops); xfs_defer_cancel_list(mp, &tp->t_dfops);
} }
/* Add an item for later deferred processing. */ /*
void * Return the last pending work item attached to this transaction if it matches
xfs_defer_add( * the deferred op type.
*/
static inline struct xfs_defer_pending *
xfs_defer_find_last(
struct xfs_trans *tp, struct xfs_trans *tp,
enum xfs_defer_ops_type type, const struct xfs_defer_op_type *ops)
struct list_head *li) {
struct xfs_defer_pending *dfp = NULL;
/* No dfops at all? */
if (list_empty(&tp->t_dfops))
return NULL;
dfp = list_last_entry(&tp->t_dfops, struct xfs_defer_pending,
dfp_list);
/* Wrong type? */
if (dfp->dfp_ops != ops)
return NULL;
return dfp;
}
/*
* Decide if we can add a deferred work item to the last dfops item attached
* to the transaction.
*/
static inline bool
xfs_defer_can_append(
struct xfs_defer_pending *dfp,
const struct xfs_defer_op_type *ops)
{
/* Already logged? */
if (dfp->dfp_intent)
return false;
/* Paused items cannot absorb more work */
if (dfp->dfp_flags & XFS_DEFER_PAUSED)
return NULL;
/* Already full? */
if (ops->max_items && dfp->dfp_count >= ops->max_items)
return false;
return true;
}
/* Create a new pending item at the end of the transaction list. */
static inline struct xfs_defer_pending *
xfs_defer_alloc(
struct xfs_trans *tp,
const struct xfs_defer_op_type *ops)
{
struct xfs_defer_pending *dfp;
dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
GFP_NOFS | __GFP_NOFAIL);
dfp->dfp_ops = ops;
INIT_LIST_HEAD(&dfp->dfp_work);
list_add_tail(&dfp->dfp_list, &tp->t_dfops);
return dfp;
}
/* Add an item for later deferred processing. */
struct xfs_defer_pending *
xfs_defer_add(
struct xfs_trans *tp,
struct list_head *li,
const struct xfs_defer_op_type *ops)
{ {
struct xfs_defer_pending *dfp = NULL; struct xfs_defer_pending *dfp = NULL;
const struct xfs_defer_op_type *ops = defer_op_types[type];
ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES); ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
BUILD_BUG_ON(ARRAY_SIZE(defer_op_types) != XFS_DEFER_OPS_TYPE_MAX);
/* dfp = xfs_defer_find_last(tp, ops);
* Add the item to a pending item at the end of the intake list. if (!dfp || !xfs_defer_can_append(dfp, ops))
* If the last pending item has the same type, reuse it. Else, dfp = xfs_defer_alloc(tp, ops);
* create a new pending item at the end of the intake list.
*/
if (!list_empty(&tp->t_dfops)) {
dfp = list_last_entry(&tp->t_dfops,
struct xfs_defer_pending, dfp_list);
if (dfp->dfp_type != type ||
(ops->max_items && dfp->dfp_count >= ops->max_items))
dfp = NULL;
}
if (!dfp) {
dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
GFP_NOFS | __GFP_NOFAIL);
dfp->dfp_type = type;
dfp->dfp_intent = NULL;
dfp->dfp_done = NULL;
dfp->dfp_count = 0;
INIT_LIST_HEAD(&dfp->dfp_work);
list_add_tail(&dfp->dfp_list, &tp->t_dfops);
}
list_add_tail(li, &dfp->dfp_work); xfs_defer_add_item(dfp, li);
trace_xfs_defer_add_item(tp->t_mountp, dfp, li); trace_xfs_defer_add_item(tp->t_mountp, dfp, li);
dfp->dfp_count++; return dfp;
}
/*
* Add a defer ops barrier to force two otherwise adjacent deferred work items
* to be tracked separately and have separate log items.
*/
void
xfs_defer_add_barrier(
struct xfs_trans *tp)
{
struct xfs_defer_pending *dfp;
ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
/* If the last defer op added was a barrier, we're done. */
dfp = xfs_defer_find_last(tp, &xfs_barrier_defer_type);
if (dfp)
return;
xfs_defer_alloc(tp, &xfs_barrier_defer_type);
trace_xfs_defer_add_item(tp->t_mountp, dfp, NULL);
}
/*
* Create a pending deferred work item to replay the recovered intent item
* and add it to the list.
*/
void
xfs_defer_start_recovery(
struct xfs_log_item *lip,
struct list_head *r_dfops,
const struct xfs_defer_op_type *ops)
{
struct xfs_defer_pending *dfp;
dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
GFP_NOFS | __GFP_NOFAIL);
dfp->dfp_ops = ops;
dfp->dfp_intent = lip;
INIT_LIST_HEAD(&dfp->dfp_work);
list_add_tail(&dfp->dfp_list, r_dfops);
}
/*
* Cancel a deferred work item created to recover a log intent item. @dfp
* will be freed after this function returns.
*/
void
xfs_defer_cancel_recovery(
struct xfs_mount *mp,
struct xfs_defer_pending *dfp)
{
xfs_defer_pending_abort(mp, dfp);
xfs_defer_pending_cancel_work(mp, dfp);
}
/* Replay the deferred work item created from a recovered log intent item. */
int
xfs_defer_finish_recovery(
struct xfs_mount *mp,
struct xfs_defer_pending *dfp,
struct list_head *capture_list)
{
const struct xfs_defer_op_type *ops = dfp->dfp_ops;
int error;
/* dfp is freed by recover_work and must not be accessed afterwards */
error = ops->recover_work(dfp, capture_list);
if (error)
trace_xlog_intent_recovery_failed(mp, ops, error);
return error;
} }
/* /*
@ -769,7 +1029,7 @@ xfs_defer_ops_capture_abort(
{ {
unsigned short i; unsigned short i;
xfs_defer_pending_abort(mp, &dfc->dfc_dfops); xfs_defer_pending_abort_list(mp, &dfc->dfc_dfops);
xfs_defer_cancel_list(mp, &dfc->dfc_dfops); xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
for (i = 0; i < dfc->dfc_held.dr_bufs; i++) for (i = 0; i < dfc->dfc_held.dr_bufs; i++)
@ -938,3 +1198,36 @@ xfs_defer_destroy_item_caches(void)
xfs_rmap_intent_destroy_cache(); xfs_rmap_intent_destroy_cache();
xfs_defer_destroy_cache(); xfs_defer_destroy_cache();
} }
/*
* Mark a deferred work item so that it will be requeued indefinitely without
* being finished. Caller must ensure there are no data dependencies on this
* work item in the meantime.
*/
void
xfs_defer_item_pause(
struct xfs_trans *tp,
struct xfs_defer_pending *dfp)
{
ASSERT(!(dfp->dfp_flags & XFS_DEFER_PAUSED));
dfp->dfp_flags |= XFS_DEFER_PAUSED;
trace_xfs_defer_item_pause(tp->t_mountp, dfp);
}
/*
* Release a paused deferred work item so that it will be finished during the
* next transaction roll.
*/
void
xfs_defer_item_unpause(
struct xfs_trans *tp,
struct xfs_defer_pending *dfp)
{
ASSERT(dfp->dfp_flags & XFS_DEFER_PAUSED);
dfp->dfp_flags &= ~XFS_DEFER_PAUSED;
trace_xfs_defer_item_unpause(tp->t_mountp, dfp);
}

View File

@ -10,19 +10,6 @@ struct xfs_btree_cur;
struct xfs_defer_op_type; struct xfs_defer_op_type;
struct xfs_defer_capture; struct xfs_defer_capture;
/*
* Header for deferred operation list.
*/
enum xfs_defer_ops_type {
XFS_DEFER_OPS_TYPE_BMAP,
XFS_DEFER_OPS_TYPE_REFCOUNT,
XFS_DEFER_OPS_TYPE_RMAP,
XFS_DEFER_OPS_TYPE_FREE,
XFS_DEFER_OPS_TYPE_AGFL_FREE,
XFS_DEFER_OPS_TYPE_ATTR,
XFS_DEFER_OPS_TYPE_MAX,
};
/* /*
* Save a log intent item and a list of extents, so that we can replay * Save a log intent item and a list of extents, so that we can replay
* whatever action had to happen to the extent list and file the log done * whatever action had to happen to the extent list and file the log done
@ -33,19 +20,35 @@ struct xfs_defer_pending {
struct list_head dfp_work; /* work items */ struct list_head dfp_work; /* work items */
struct xfs_log_item *dfp_intent; /* log intent item */ struct xfs_log_item *dfp_intent; /* log intent item */
struct xfs_log_item *dfp_done; /* log done item */ struct xfs_log_item *dfp_done; /* log done item */
const struct xfs_defer_op_type *dfp_ops;
unsigned int dfp_count; /* # extent items */ unsigned int dfp_count; /* # extent items */
enum xfs_defer_ops_type dfp_type; unsigned int dfp_flags;
}; };
void xfs_defer_add(struct xfs_trans *tp, enum xfs_defer_ops_type type, /*
struct list_head *h); * Create a log intent item for this deferred item, but don't actually finish
* the work. Caller must clear this before the final transaction commit.
*/
#define XFS_DEFER_PAUSED (1U << 0)
#define XFS_DEFER_PENDING_STRINGS \
{ XFS_DEFER_PAUSED, "paused" }
void xfs_defer_item_pause(struct xfs_trans *tp, struct xfs_defer_pending *dfp);
void xfs_defer_item_unpause(struct xfs_trans *tp, struct xfs_defer_pending *dfp);
struct xfs_defer_pending *xfs_defer_add(struct xfs_trans *tp, struct list_head *h,
const struct xfs_defer_op_type *ops);
int xfs_defer_finish_noroll(struct xfs_trans **tp); int xfs_defer_finish_noroll(struct xfs_trans **tp);
int xfs_defer_finish(struct xfs_trans **tp); int xfs_defer_finish(struct xfs_trans **tp);
int xfs_defer_finish_one(struct xfs_trans *tp, struct xfs_defer_pending *dfp);
void xfs_defer_cancel(struct xfs_trans *); void xfs_defer_cancel(struct xfs_trans *);
void xfs_defer_move(struct xfs_trans *dtp, struct xfs_trans *stp); void xfs_defer_move(struct xfs_trans *dtp, struct xfs_trans *stp);
/* Description of a deferred type. */ /* Description of a deferred type. */
struct xfs_defer_op_type { struct xfs_defer_op_type {
const char *name;
unsigned int max_items;
struct xfs_log_item *(*create_intent)(struct xfs_trans *tp, struct xfs_log_item *(*create_intent)(struct xfs_trans *tp,
struct list_head *items, unsigned int count, bool sort); struct list_head *items, unsigned int count, bool sort);
void (*abort_intent)(struct xfs_log_item *intent); void (*abort_intent)(struct xfs_log_item *intent);
@ -56,7 +59,11 @@ struct xfs_defer_op_type {
void (*finish_cleanup)(struct xfs_trans *tp, void (*finish_cleanup)(struct xfs_trans *tp,
struct xfs_btree_cur *state, int error); struct xfs_btree_cur *state, int error);
void (*cancel_item)(struct list_head *item); void (*cancel_item)(struct list_head *item);
unsigned int max_items; int (*recover_work)(struct xfs_defer_pending *dfp,
struct list_head *capture_list);
struct xfs_log_item *(*relog_intent)(struct xfs_trans *tp,
struct xfs_log_item *intent,
struct xfs_log_item *done_item);
}; };
extern const struct xfs_defer_op_type xfs_bmap_update_defer_type; extern const struct xfs_defer_op_type xfs_bmap_update_defer_type;
@ -125,7 +132,25 @@ void xfs_defer_ops_capture_abort(struct xfs_mount *mp,
struct xfs_defer_capture *d); struct xfs_defer_capture *d);
void xfs_defer_resources_rele(struct xfs_defer_resources *dres); void xfs_defer_resources_rele(struct xfs_defer_resources *dres);
void xfs_defer_start_recovery(struct xfs_log_item *lip,
struct list_head *r_dfops, const struct xfs_defer_op_type *ops);
void xfs_defer_cancel_recovery(struct xfs_mount *mp,
struct xfs_defer_pending *dfp);
int xfs_defer_finish_recovery(struct xfs_mount *mp,
struct xfs_defer_pending *dfp, struct list_head *capture_list);
static inline void
xfs_defer_add_item(
struct xfs_defer_pending *dfp,
struct list_head *work)
{
list_add_tail(work, &dfp->dfp_work);
dfp->dfp_count++;
}
int __init xfs_defer_init_item_caches(void); int __init xfs_defer_init_item_caches(void);
void xfs_defer_destroy_item_caches(void); void xfs_defer_destroy_item_caches(void);
void xfs_defer_add_barrier(struct xfs_trans *tp);
#endif /* __XFS_DEFER_H__ */ #endif /* __XFS_DEFER_H__ */

View File

@ -196,7 +196,7 @@ xfs_dir_isempty(
return 1; return 1;
if (dp->i_disk_size > xfs_inode_data_fork_size(dp)) if (dp->i_disk_size > xfs_inode_data_fork_size(dp))
return 0; return 0;
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data; sfp = dp->i_df.if_data;
return !sfp->count; return !sfp->count;
} }

View File

@ -1089,7 +1089,7 @@ xfs_dir2_sf_to_block(
int newoffset; /* offset from current entry */ int newoffset; /* offset from current entry */
unsigned int offset = geo->data_entry_offset; unsigned int offset = geo->data_entry_offset;
xfs_dir2_sf_entry_t *sfep; /* sf entry pointer */ xfs_dir2_sf_entry_t *sfep; /* sf entry pointer */
xfs_dir2_sf_hdr_t *oldsfp; /* old shortform header */ struct xfs_dir2_sf_hdr *oldsfp = ifp->if_data;
xfs_dir2_sf_hdr_t *sfp; /* shortform header */ xfs_dir2_sf_hdr_t *sfp; /* shortform header */
__be16 *tagp; /* end of data entry */ __be16 *tagp; /* end of data entry */
struct xfs_name name; struct xfs_name name;
@ -1099,10 +1099,8 @@ xfs_dir2_sf_to_block(
ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL); ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL);
ASSERT(dp->i_disk_size >= offsetof(struct xfs_dir2_sf_hdr, parent)); ASSERT(dp->i_disk_size >= offsetof(struct xfs_dir2_sf_hdr, parent));
oldsfp = (xfs_dir2_sf_hdr_t *)ifp->if_u1.if_data;
ASSERT(ifp->if_bytes == dp->i_disk_size); ASSERT(ifp->if_bytes == dp->i_disk_size);
ASSERT(ifp->if_u1.if_data != NULL); ASSERT(oldsfp != NULL);
ASSERT(dp->i_disk_size >= xfs_dir2_sf_hdr_size(oldsfp->i8count)); ASSERT(dp->i_disk_size >= xfs_dir2_sf_hdr_size(oldsfp->i8count));
ASSERT(dp->i_df.if_nextents == 0); ASSERT(dp->i_df.if_nextents == 0);

View File

@ -175,7 +175,8 @@ extern int xfs_dir2_sf_create(struct xfs_da_args *args, xfs_ino_t pino);
extern int xfs_dir2_sf_lookup(struct xfs_da_args *args); extern int xfs_dir2_sf_lookup(struct xfs_da_args *args);
extern int xfs_dir2_sf_removename(struct xfs_da_args *args); extern int xfs_dir2_sf_removename(struct xfs_da_args *args);
extern int xfs_dir2_sf_replace(struct xfs_da_args *args); extern int xfs_dir2_sf_replace(struct xfs_da_args *args);
extern xfs_failaddr_t xfs_dir2_sf_verify(struct xfs_inode *ip); xfs_failaddr_t xfs_dir2_sf_verify(struct xfs_mount *mp,
struct xfs_dir2_sf_hdr *sfp, int64_t size);
int xfs_dir2_sf_entsize(struct xfs_mount *mp, int xfs_dir2_sf_entsize(struct xfs_mount *mp,
struct xfs_dir2_sf_hdr *hdr, int len); struct xfs_dir2_sf_hdr *hdr, int len);
void xfs_dir2_sf_put_ino(struct xfs_mount *mp, struct xfs_dir2_sf_hdr *hdr, void xfs_dir2_sf_put_ino(struct xfs_mount *mp, struct xfs_dir2_sf_hdr *hdr,

View File

@ -364,25 +364,23 @@ int /* error */
xfs_dir2_sf_addname( xfs_dir2_sf_addname(
xfs_da_args_t *args) /* operation arguments */ xfs_da_args_t *args) /* operation arguments */
{ {
xfs_inode_t *dp; /* incore directory inode */ struct xfs_inode *dp = args->dp;
struct xfs_dir2_sf_hdr *sfp = dp->i_df.if_data;
int error; /* error return value */ int error; /* error return value */
int incr_isize; /* total change in size */ int incr_isize; /* total change in size */
int new_isize; /* size after adding name */ int new_isize; /* size after adding name */
int objchange; /* changing to 8-byte inodes */ int objchange; /* changing to 8-byte inodes */
xfs_dir2_data_aoff_t offset = 0; /* offset for new entry */ xfs_dir2_data_aoff_t offset = 0; /* offset for new entry */
int pick; /* which algorithm to use */ int pick; /* which algorithm to use */
xfs_dir2_sf_hdr_t *sfp; /* shortform structure */
xfs_dir2_sf_entry_t *sfep = NULL; /* shortform entry */ xfs_dir2_sf_entry_t *sfep = NULL; /* shortform entry */
trace_xfs_dir2_sf_addname(args); trace_xfs_dir2_sf_addname(args);
ASSERT(xfs_dir2_sf_lookup(args) == -ENOENT); ASSERT(xfs_dir2_sf_lookup(args) == -ENOENT);
dp = args->dp;
ASSERT(dp->i_df.if_format == XFS_DINODE_FMT_LOCAL); ASSERT(dp->i_df.if_format == XFS_DINODE_FMT_LOCAL);
ASSERT(dp->i_disk_size >= offsetof(struct xfs_dir2_sf_hdr, parent)); ASSERT(dp->i_disk_size >= offsetof(struct xfs_dir2_sf_hdr, parent));
ASSERT(dp->i_df.if_bytes == dp->i_disk_size); ASSERT(dp->i_df.if_bytes == dp->i_disk_size);
ASSERT(dp->i_df.if_u1.if_data != NULL); ASSERT(sfp != NULL);
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
ASSERT(dp->i_disk_size >= xfs_dir2_sf_hdr_size(sfp->i8count)); ASSERT(dp->i_disk_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
/* /*
* Compute entry (and change in) size. * Compute entry (and change in) size.
@ -462,20 +460,17 @@ xfs_dir2_sf_addname_easy(
{ {
struct xfs_inode *dp = args->dp; struct xfs_inode *dp = args->dp;
struct xfs_mount *mp = dp->i_mount; struct xfs_mount *mp = dp->i_mount;
int byteoff; /* byte offset in sf dir */ struct xfs_dir2_sf_hdr *sfp = dp->i_df.if_data;
xfs_dir2_sf_hdr_t *sfp; /* shortform structure */ int byteoff = (int)((char *)sfep - (char *)sfp);
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
byteoff = (int)((char *)sfep - (char *)sfp);
/* /*
* Grow the in-inode space. * Grow the in-inode space.
*/ */
xfs_idata_realloc(dp, xfs_dir2_sf_entsize(mp, sfp, args->namelen), sfp = xfs_idata_realloc(dp, xfs_dir2_sf_entsize(mp, sfp, args->namelen),
XFS_DATA_FORK); XFS_DATA_FORK);
/* /*
* Need to set up again due to realloc of the inode data. * Need to set up again due to realloc of the inode data.
*/ */
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
sfep = (xfs_dir2_sf_entry_t *)((char *)sfp + byteoff); sfep = (xfs_dir2_sf_entry_t *)((char *)sfp + byteoff);
/* /*
* Fill in the new entry. * Fill in the new entry.
@ -528,11 +523,10 @@ xfs_dir2_sf_addname_hard(
/* /*
* Copy the old directory to the stack buffer. * Copy the old directory to the stack buffer.
*/ */
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
old_isize = (int)dp->i_disk_size; old_isize = (int)dp->i_disk_size;
buf = kmem_alloc(old_isize, 0); buf = kmem_alloc(old_isize, 0);
oldsfp = (xfs_dir2_sf_hdr_t *)buf; oldsfp = (xfs_dir2_sf_hdr_t *)buf;
memcpy(oldsfp, sfp, old_isize); memcpy(oldsfp, dp->i_df.if_data, old_isize);
/* /*
* Loop over the old directory finding the place we're going * Loop over the old directory finding the place we're going
* to insert the new entry. * to insert the new entry.
@ -556,11 +550,8 @@ xfs_dir2_sf_addname_hard(
* the data. * the data.
*/ */
xfs_idata_realloc(dp, -old_isize, XFS_DATA_FORK); xfs_idata_realloc(dp, -old_isize, XFS_DATA_FORK);
xfs_idata_realloc(dp, new_isize, XFS_DATA_FORK); sfp = xfs_idata_realloc(dp, new_isize, XFS_DATA_FORK);
/*
* Reset the pointer since the buffer was reallocated.
*/
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
/* /*
* Copy the first part of the directory, including the header. * Copy the first part of the directory, including the header.
*/ */
@ -610,11 +601,10 @@ xfs_dir2_sf_addname_pick(
int i; /* entry number */ int i; /* entry number */
xfs_dir2_data_aoff_t offset; /* data block offset */ xfs_dir2_data_aoff_t offset; /* data block offset */
xfs_dir2_sf_entry_t *sfep; /* shortform entry */ xfs_dir2_sf_entry_t *sfep; /* shortform entry */
xfs_dir2_sf_hdr_t *sfp; /* shortform structure */ struct xfs_dir2_sf_hdr *sfp = dp->i_df.if_data;
int size; /* entry's data size */ int size; /* entry's data size */
int used; /* data bytes used */ int used; /* data bytes used */
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
size = xfs_dir2_data_entsize(mp, args->namelen); size = xfs_dir2_data_entsize(mp, args->namelen);
offset = args->geo->data_first_offset; offset = args->geo->data_first_offset;
sfep = xfs_dir2_sf_firstentry(sfp); sfep = xfs_dir2_sf_firstentry(sfp);
@ -673,14 +663,13 @@ xfs_dir2_sf_check(
{ {
struct xfs_inode *dp = args->dp; struct xfs_inode *dp = args->dp;
struct xfs_mount *mp = dp->i_mount; struct xfs_mount *mp = dp->i_mount;
struct xfs_dir2_sf_hdr *sfp = dp->i_df.if_data;
int i; /* entry number */ int i; /* entry number */
int i8count; /* number of big inode#s */ int i8count; /* number of big inode#s */
xfs_ino_t ino; /* entry inode number */ xfs_ino_t ino; /* entry inode number */
int offset; /* data offset */ int offset; /* data offset */
xfs_dir2_sf_entry_t *sfep; /* shortform dir entry */ xfs_dir2_sf_entry_t *sfep; /* shortform dir entry */
xfs_dir2_sf_hdr_t *sfp; /* shortform structure */
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
offset = args->geo->data_first_offset; offset = args->geo->data_first_offset;
ino = xfs_dir2_sf_get_parent_ino(sfp); ino = xfs_dir2_sf_get_parent_ino(sfp);
i8count = ino > XFS_DIR2_MAX_SHORT_INUM; i8count = ino > XFS_DIR2_MAX_SHORT_INUM;
@ -707,11 +696,10 @@ xfs_dir2_sf_check(
/* Verify the consistency of an inline directory. */ /* Verify the consistency of an inline directory. */
xfs_failaddr_t xfs_failaddr_t
xfs_dir2_sf_verify( xfs_dir2_sf_verify(
struct xfs_inode *ip) struct xfs_mount *mp,
struct xfs_dir2_sf_hdr *sfp,
int64_t size)
{ {
struct xfs_mount *mp = ip->i_mount;
struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
struct xfs_dir2_sf_hdr *sfp;
struct xfs_dir2_sf_entry *sfep; struct xfs_dir2_sf_entry *sfep;
struct xfs_dir2_sf_entry *next_sfep; struct xfs_dir2_sf_entry *next_sfep;
char *endp; char *endp;
@ -719,15 +707,9 @@ xfs_dir2_sf_verify(
int i; int i;
int i8count; int i8count;
int offset; int offset;
int64_t size;
int error; int error;
uint8_t filetype; uint8_t filetype;
ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL);
sfp = (struct xfs_dir2_sf_hdr *)ifp->if_u1.if_data;
size = ifp->if_bytes;
/* /*
* Give up if the directory is way too short. * Give up if the directory is way too short.
*/ */
@ -834,15 +816,13 @@ xfs_dir2_sf_create(
ASSERT(dp->i_df.if_bytes == 0); ASSERT(dp->i_df.if_bytes == 0);
i8count = pino > XFS_DIR2_MAX_SHORT_INUM; i8count = pino > XFS_DIR2_MAX_SHORT_INUM;
size = xfs_dir2_sf_hdr_size(i8count); size = xfs_dir2_sf_hdr_size(i8count);
/* /*
* Make a buffer for the data. * Make a buffer for the data and fill in the header.
*/ */
xfs_idata_realloc(dp, size, XFS_DATA_FORK); sfp = xfs_idata_realloc(dp, size, XFS_DATA_FORK);
/*
* Fill in the header,
*/
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
sfp->i8count = i8count; sfp->i8count = i8count;
/* /*
* Now can put in the inode number, since i8count is set. * Now can put in the inode number, since i8count is set.
*/ */
@ -864,9 +844,9 @@ xfs_dir2_sf_lookup(
{ {
struct xfs_inode *dp = args->dp; struct xfs_inode *dp = args->dp;
struct xfs_mount *mp = dp->i_mount; struct xfs_mount *mp = dp->i_mount;
struct xfs_dir2_sf_hdr *sfp = dp->i_df.if_data;
int i; /* entry index */ int i; /* entry index */
xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */ xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */
xfs_dir2_sf_hdr_t *sfp; /* shortform structure */
enum xfs_dacmp cmp; /* comparison result */ enum xfs_dacmp cmp; /* comparison result */
xfs_dir2_sf_entry_t *ci_sfep; /* case-insens. entry */ xfs_dir2_sf_entry_t *ci_sfep; /* case-insens. entry */
@ -877,8 +857,7 @@ xfs_dir2_sf_lookup(
ASSERT(dp->i_df.if_format == XFS_DINODE_FMT_LOCAL); ASSERT(dp->i_df.if_format == XFS_DINODE_FMT_LOCAL);
ASSERT(dp->i_disk_size >= offsetof(struct xfs_dir2_sf_hdr, parent)); ASSERT(dp->i_disk_size >= offsetof(struct xfs_dir2_sf_hdr, parent));
ASSERT(dp->i_df.if_bytes == dp->i_disk_size); ASSERT(dp->i_df.if_bytes == dp->i_disk_size);
ASSERT(dp->i_df.if_u1.if_data != NULL); ASSERT(sfp != NULL);
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
ASSERT(dp->i_disk_size >= xfs_dir2_sf_hdr_size(sfp->i8count)); ASSERT(dp->i_disk_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
/* /*
* Special case for . * Special case for .
@ -940,13 +919,13 @@ xfs_dir2_sf_removename(
{ {
struct xfs_inode *dp = args->dp; struct xfs_inode *dp = args->dp;
struct xfs_mount *mp = dp->i_mount; struct xfs_mount *mp = dp->i_mount;
struct xfs_dir2_sf_hdr *sfp = dp->i_df.if_data;
int byteoff; /* offset of removed entry */ int byteoff; /* offset of removed entry */
int entsize; /* this entry's size */ int entsize; /* this entry's size */
int i; /* shortform entry index */ int i; /* shortform entry index */
int newsize; /* new inode size */ int newsize; /* new inode size */
int oldsize; /* old inode size */ int oldsize; /* old inode size */
xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */ xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */
xfs_dir2_sf_hdr_t *sfp; /* shortform structure */
trace_xfs_dir2_sf_removename(args); trace_xfs_dir2_sf_removename(args);
@ -954,8 +933,7 @@ xfs_dir2_sf_removename(
oldsize = (int)dp->i_disk_size; oldsize = (int)dp->i_disk_size;
ASSERT(oldsize >= offsetof(struct xfs_dir2_sf_hdr, parent)); ASSERT(oldsize >= offsetof(struct xfs_dir2_sf_hdr, parent));
ASSERT(dp->i_df.if_bytes == oldsize); ASSERT(dp->i_df.if_bytes == oldsize);
ASSERT(dp->i_df.if_u1.if_data != NULL); ASSERT(sfp != NULL);
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
ASSERT(oldsize >= xfs_dir2_sf_hdr_size(sfp->i8count)); ASSERT(oldsize >= xfs_dir2_sf_hdr_size(sfp->i8count));
/* /*
* Loop over the old directory entries. * Loop over the old directory entries.
@ -992,11 +970,12 @@ xfs_dir2_sf_removename(
*/ */
sfp->count--; sfp->count--;
dp->i_disk_size = newsize; dp->i_disk_size = newsize;
/* /*
* Reallocate, making it smaller. * Reallocate, making it smaller.
*/ */
xfs_idata_realloc(dp, newsize - oldsize, XFS_DATA_FORK); sfp = xfs_idata_realloc(dp, newsize - oldsize, XFS_DATA_FORK);
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
/* /*
* Are we changing inode number size? * Are we changing inode number size?
*/ */
@ -1019,13 +998,12 @@ xfs_dir2_sf_replace_needblock(
struct xfs_inode *dp, struct xfs_inode *dp,
xfs_ino_t inum) xfs_ino_t inum)
{ {
struct xfs_dir2_sf_hdr *sfp = dp->i_df.if_data;
int newsize; int newsize;
struct xfs_dir2_sf_hdr *sfp;
if (dp->i_df.if_format != XFS_DINODE_FMT_LOCAL) if (dp->i_df.if_format != XFS_DINODE_FMT_LOCAL)
return false; return false;
sfp = (struct xfs_dir2_sf_hdr *)dp->i_df.if_u1.if_data;
newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF; newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
return inum > XFS_DIR2_MAX_SHORT_INUM && return inum > XFS_DIR2_MAX_SHORT_INUM &&
@ -1041,19 +1019,18 @@ xfs_dir2_sf_replace(
{ {
struct xfs_inode *dp = args->dp; struct xfs_inode *dp = args->dp;
struct xfs_mount *mp = dp->i_mount; struct xfs_mount *mp = dp->i_mount;
struct xfs_dir2_sf_hdr *sfp = dp->i_df.if_data;
int i; /* entry index */ int i; /* entry index */
xfs_ino_t ino=0; /* entry old inode number */ xfs_ino_t ino=0; /* entry old inode number */
int i8elevated; /* sf_toino8 set i8count=1 */ int i8elevated; /* sf_toino8 set i8count=1 */
xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */ xfs_dir2_sf_entry_t *sfep; /* shortform directory entry */
xfs_dir2_sf_hdr_t *sfp; /* shortform structure */
trace_xfs_dir2_sf_replace(args); trace_xfs_dir2_sf_replace(args);
ASSERT(dp->i_df.if_format == XFS_DINODE_FMT_LOCAL); ASSERT(dp->i_df.if_format == XFS_DINODE_FMT_LOCAL);
ASSERT(dp->i_disk_size >= offsetof(struct xfs_dir2_sf_hdr, parent)); ASSERT(dp->i_disk_size >= offsetof(struct xfs_dir2_sf_hdr, parent));
ASSERT(dp->i_df.if_bytes == dp->i_disk_size); ASSERT(dp->i_df.if_bytes == dp->i_disk_size);
ASSERT(dp->i_df.if_u1.if_data != NULL); ASSERT(sfp != NULL);
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
ASSERT(dp->i_disk_size >= xfs_dir2_sf_hdr_size(sfp->i8count)); ASSERT(dp->i_disk_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
/* /*
@ -1076,7 +1053,7 @@ xfs_dir2_sf_replace(
*/ */
xfs_dir2_sf_toino8(args); xfs_dir2_sf_toino8(args);
i8elevated = 1; i8elevated = 1;
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data; sfp = dp->i_df.if_data;
} else } else
i8elevated = 0; i8elevated = 0;
@ -1157,11 +1134,11 @@ xfs_dir2_sf_toino4(
{ {
struct xfs_inode *dp = args->dp; struct xfs_inode *dp = args->dp;
struct xfs_mount *mp = dp->i_mount; struct xfs_mount *mp = dp->i_mount;
struct xfs_dir2_sf_hdr *oldsfp = dp->i_df.if_data;
char *buf; /* old dir's buffer */ char *buf; /* old dir's buffer */
int i; /* entry index */ int i; /* entry index */
int newsize; /* new inode size */ int newsize; /* new inode size */
xfs_dir2_sf_entry_t *oldsfep; /* old sf entry */ xfs_dir2_sf_entry_t *oldsfep; /* old sf entry */
xfs_dir2_sf_hdr_t *oldsfp; /* old sf directory */
int oldsize; /* old inode size */ int oldsize; /* old inode size */
xfs_dir2_sf_entry_t *sfep; /* new sf entry */ xfs_dir2_sf_entry_t *sfep; /* new sf entry */
xfs_dir2_sf_hdr_t *sfp; /* new sf directory */ xfs_dir2_sf_hdr_t *sfp; /* new sf directory */
@ -1175,7 +1152,6 @@ xfs_dir2_sf_toino4(
*/ */
oldsize = dp->i_df.if_bytes; oldsize = dp->i_df.if_bytes;
buf = kmem_alloc(oldsize, 0); buf = kmem_alloc(oldsize, 0);
oldsfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
ASSERT(oldsfp->i8count == 1); ASSERT(oldsfp->i8count == 1);
memcpy(buf, oldsfp, oldsize); memcpy(buf, oldsfp, oldsize);
/* /*
@ -1188,7 +1164,7 @@ xfs_dir2_sf_toino4(
* Reset our pointers, the data has moved. * Reset our pointers, the data has moved.
*/ */
oldsfp = (xfs_dir2_sf_hdr_t *)buf; oldsfp = (xfs_dir2_sf_hdr_t *)buf;
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data; sfp = dp->i_df.if_data;
/* /*
* Fill in the new header. * Fill in the new header.
*/ */
@ -1230,11 +1206,11 @@ xfs_dir2_sf_toino8(
{ {
struct xfs_inode *dp = args->dp; struct xfs_inode *dp = args->dp;
struct xfs_mount *mp = dp->i_mount; struct xfs_mount *mp = dp->i_mount;
struct xfs_dir2_sf_hdr *oldsfp = dp->i_df.if_data;
char *buf; /* old dir's buffer */ char *buf; /* old dir's buffer */
int i; /* entry index */ int i; /* entry index */
int newsize; /* new inode size */ int newsize; /* new inode size */
xfs_dir2_sf_entry_t *oldsfep; /* old sf entry */ xfs_dir2_sf_entry_t *oldsfep; /* old sf entry */
xfs_dir2_sf_hdr_t *oldsfp; /* old sf directory */
int oldsize; /* old inode size */ int oldsize; /* old inode size */
xfs_dir2_sf_entry_t *sfep; /* new sf entry */ xfs_dir2_sf_entry_t *sfep; /* new sf entry */
xfs_dir2_sf_hdr_t *sfp; /* new sf directory */ xfs_dir2_sf_hdr_t *sfp; /* new sf directory */
@ -1248,7 +1224,6 @@ xfs_dir2_sf_toino8(
*/ */
oldsize = dp->i_df.if_bytes; oldsize = dp->i_df.if_bytes;
buf = kmem_alloc(oldsize, 0); buf = kmem_alloc(oldsize, 0);
oldsfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
ASSERT(oldsfp->i8count == 0); ASSERT(oldsfp->i8count == 0);
memcpy(buf, oldsfp, oldsize); memcpy(buf, oldsfp, oldsize);
/* /*
@ -1261,7 +1236,7 @@ xfs_dir2_sf_toino8(
* Reset our pointers, the data has moved. * Reset our pointers, the data has moved.
*/ */
oldsfp = (xfs_dir2_sf_hdr_t *)buf; oldsfp = (xfs_dir2_sf_hdr_t *)buf;
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data; sfp = dp->i_df.if_data;
/* /*
* Fill in the new header. * Fill in the new header.
*/ */

View File

@ -1008,7 +1008,7 @@ enum xfs_dinode_fmt {
* Return pointers to the data or attribute forks. * Return pointers to the data or attribute forks.
*/ */
#define XFS_DFORK_DPTR(dip) \ #define XFS_DFORK_DPTR(dip) \
((char *)dip + xfs_dinode_size(dip->di_version)) ((void *)dip + xfs_dinode_size(dip->di_version))
#define XFS_DFORK_APTR(dip) \ #define XFS_DFORK_APTR(dip) \
(XFS_DFORK_DPTR(dip) + XFS_DFORK_BOFF(dip)) (XFS_DFORK_DPTR(dip) + XFS_DFORK_BOFF(dip))
#define XFS_DFORK_PTR(dip,w) \ #define XFS_DFORK_PTR(dip,w) \
@ -1156,20 +1156,6 @@ static inline bool xfs_dinode_has_large_extent_counts(
#define XFS_DFL_RTEXTSIZE (64 * 1024) /* 64kB */ #define XFS_DFL_RTEXTSIZE (64 * 1024) /* 64kB */
#define XFS_MIN_RTEXTSIZE (4 * 1024) /* 4kB */ #define XFS_MIN_RTEXTSIZE (4 * 1024) /* 4kB */
#define XFS_BLOCKSIZE(mp) ((mp)->m_sb.sb_blocksize)
#define XFS_BLOCKMASK(mp) ((mp)->m_blockmask)
/*
* RT bit manipulation macros.
*/
#define XFS_RTMIN(a,b) ((a) < (b) ? (a) : (b))
#define XFS_RTMAX(a,b) ((a) > (b) ? (a) : (b))
#define XFS_RTLOBIT(w) xfs_lowbit32(w)
#define XFS_RTHIBIT(w) xfs_highbit32(w)
#define XFS_RTBLOCKLOG(b) xfs_highbit64(b)
/* /*
* Dquot and dquot block format definitions * Dquot and dquot block format definitions
*/ */
@ -1272,6 +1258,9 @@ static inline time64_t xfs_dq_bigtime_to_unix(uint32_t ondisk_seconds)
#define XFS_DQ_GRACE_MIN ((int64_t)0) #define XFS_DQ_GRACE_MIN ((int64_t)0)
#define XFS_DQ_GRACE_MAX ((int64_t)U32_MAX) #define XFS_DQ_GRACE_MAX ((int64_t)U32_MAX)
/* Maximum id value for a quota record */
#define XFS_DQ_ID_MAX (U32_MAX)
/* /*
* This is the main portion of the on-disk representation of quota information * This is the main portion of the on-disk representation of quota information
* for a user. We pad this with some more expansion room to construct the on * for a user. We pad this with some more expansion room to construct the on

View File

@ -68,6 +68,11 @@ struct xfs_fsop_geom;
#define XFS_SICK_INO_SYMLINK (1 << 6) /* symbolic link remote target */ #define XFS_SICK_INO_SYMLINK (1 << 6) /* symbolic link remote target */
#define XFS_SICK_INO_PARENT (1 << 7) /* parent pointers */ #define XFS_SICK_INO_PARENT (1 << 7) /* parent pointers */
#define XFS_SICK_INO_BMBTD_ZAPPED (1 << 8) /* data fork erased */
#define XFS_SICK_INO_BMBTA_ZAPPED (1 << 9) /* attr fork erased */
#define XFS_SICK_INO_DIR_ZAPPED (1 << 10) /* directory erased */
#define XFS_SICK_INO_SYMLINK_ZAPPED (1 << 11) /* symlink erased */
/* Primary evidence of health problems in a given group. */ /* Primary evidence of health problems in a given group. */
#define XFS_SICK_FS_PRIMARY (XFS_SICK_FS_COUNTERS | \ #define XFS_SICK_FS_PRIMARY (XFS_SICK_FS_COUNTERS | \
XFS_SICK_FS_UQUOTA | \ XFS_SICK_FS_UQUOTA | \
@ -97,6 +102,11 @@ struct xfs_fsop_geom;
XFS_SICK_INO_SYMLINK | \ XFS_SICK_INO_SYMLINK | \
XFS_SICK_INO_PARENT) XFS_SICK_INO_PARENT)
#define XFS_SICK_INO_ZAPPED (XFS_SICK_INO_BMBTD_ZAPPED | \
XFS_SICK_INO_BMBTA_ZAPPED | \
XFS_SICK_INO_DIR_ZAPPED | \
XFS_SICK_INO_SYMLINK_ZAPPED)
/* These functions must be provided by the xfs implementation. */ /* These functions must be provided by the xfs implementation. */
void xfs_fs_mark_sick(struct xfs_mount *mp, unsigned int mask); void xfs_fs_mark_sick(struct xfs_mount *mp, unsigned int mask);

View File

@ -95,18 +95,28 @@ xfs_inobt_btrec_to_irec(
irec->ir_free = be64_to_cpu(rec->inobt.ir_free); irec->ir_free = be64_to_cpu(rec->inobt.ir_free);
} }
/* Compute the freecount of an incore inode record. */
uint8_t
xfs_inobt_rec_freecount(
const struct xfs_inobt_rec_incore *irec)
{
uint64_t realfree = irec->ir_free;
if (xfs_inobt_issparse(irec->ir_holemask))
realfree &= xfs_inobt_irec_to_allocmask(irec);
return hweight64(realfree);
}
/* Simple checks for inode records. */ /* Simple checks for inode records. */
xfs_failaddr_t xfs_failaddr_t
xfs_inobt_check_irec( xfs_inobt_check_irec(
struct xfs_btree_cur *cur, struct xfs_perag *pag,
const struct xfs_inobt_rec_incore *irec) const struct xfs_inobt_rec_incore *irec)
{ {
uint64_t realfree;
/* Record has to be properly aligned within the AG. */ /* Record has to be properly aligned within the AG. */
if (!xfs_verify_agino(cur->bc_ag.pag, irec->ir_startino)) if (!xfs_verify_agino(pag, irec->ir_startino))
return __this_address; return __this_address;
if (!xfs_verify_agino(cur->bc_ag.pag, if (!xfs_verify_agino(pag,
irec->ir_startino + XFS_INODES_PER_CHUNK - 1)) irec->ir_startino + XFS_INODES_PER_CHUNK - 1))
return __this_address; return __this_address;
if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT || if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT ||
@ -115,12 +125,7 @@ xfs_inobt_check_irec(
if (irec->ir_freecount > XFS_INODES_PER_CHUNK) if (irec->ir_freecount > XFS_INODES_PER_CHUNK)
return __this_address; return __this_address;
/* if there are no holes, return the first available offset */ if (xfs_inobt_rec_freecount(irec) != irec->ir_freecount)
if (!xfs_inobt_issparse(irec->ir_holemask))
realfree = irec->ir_free;
else
realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec);
if (hweight64(realfree) != irec->ir_freecount)
return __this_address; return __this_address;
return NULL; return NULL;
@ -164,7 +169,7 @@ xfs_inobt_get_rec(
return error; return error;
xfs_inobt_btrec_to_irec(mp, rec, irec); xfs_inobt_btrec_to_irec(mp, rec, irec);
fa = xfs_inobt_check_irec(cur, irec); fa = xfs_inobt_check_irec(cur->bc_ag.pag, irec);
if (fa) if (fa)
return xfs_inobt_complain_bad_rec(cur, fa, irec); return xfs_inobt_complain_bad_rec(cur, fa, irec);
@ -1854,7 +1859,7 @@ xfs_difree_inode_chunk(
return xfs_free_extent_later(tp, return xfs_free_extent_later(tp,
XFS_AGB_TO_FSB(mp, agno, sagbno), XFS_AGB_TO_FSB(mp, agno, sagbno),
M_IGEO(mp)->ialloc_blks, &XFS_RMAP_OINFO_INODES, M_IGEO(mp)->ialloc_blks, &XFS_RMAP_OINFO_INODES,
XFS_AG_RESV_NONE); XFS_AG_RESV_NONE, false);
} }
/* holemask is only 16-bits (fits in an unsigned long) */ /* holemask is only 16-bits (fits in an unsigned long) */
@ -1900,7 +1905,8 @@ xfs_difree_inode_chunk(
ASSERT(contigblk % mp->m_sb.sb_spino_align == 0); ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
error = xfs_free_extent_later(tp, error = xfs_free_extent_later(tp,
XFS_AGB_TO_FSB(mp, agno, agbno), contigblk, XFS_AGB_TO_FSB(mp, agno, agbno), contigblk,
&XFS_RMAP_OINFO_INODES, XFS_AG_RESV_NONE); &XFS_RMAP_OINFO_INODES, XFS_AG_RESV_NONE,
false);
if (error) if (error)
return error; return error;
@ -2739,7 +2745,7 @@ xfs_ialloc_count_inodes_rec(
xfs_failaddr_t fa; xfs_failaddr_t fa;
xfs_inobt_btrec_to_irec(cur->bc_mp, rec, &irec); xfs_inobt_btrec_to_irec(cur->bc_mp, rec, &irec);
fa = xfs_inobt_check_irec(cur, &irec); fa = xfs_inobt_check_irec(cur->bc_ag.pag, &irec);
if (fa) if (fa)
return xfs_inobt_complain_bad_rec(cur, fa, &irec); return xfs_inobt_complain_bad_rec(cur, fa, &irec);

View File

@ -79,6 +79,7 @@ int xfs_inobt_lookup(struct xfs_btree_cur *cur, xfs_agino_t ino,
*/ */
int xfs_inobt_get_rec(struct xfs_btree_cur *cur, int xfs_inobt_get_rec(struct xfs_btree_cur *cur,
xfs_inobt_rec_incore_t *rec, int *stat); xfs_inobt_rec_incore_t *rec, int *stat);
uint8_t xfs_inobt_rec_freecount(const struct xfs_inobt_rec_incore *irec);
/* /*
* Inode chunk initialisation routine * Inode chunk initialisation routine
@ -93,7 +94,7 @@ union xfs_btree_rec;
void xfs_inobt_btrec_to_irec(struct xfs_mount *mp, void xfs_inobt_btrec_to_irec(struct xfs_mount *mp,
const union xfs_btree_rec *rec, const union xfs_btree_rec *rec,
struct xfs_inobt_rec_incore *irec); struct xfs_inobt_rec_incore *irec);
xfs_failaddr_t xfs_inobt_check_irec(struct xfs_btree_cur *cur, xfs_failaddr_t xfs_inobt_check_irec(struct xfs_perag *pag,
const struct xfs_inobt_rec_incore *irec); const struct xfs_inobt_rec_incore *irec);
int xfs_ialloc_has_inodes_at_extent(struct xfs_btree_cur *cur, int xfs_ialloc_has_inodes_at_extent(struct xfs_btree_cur *cur,
xfs_agblock_t bno, xfs_extlen_t len, xfs_agblock_t bno, xfs_extlen_t len,

View File

@ -161,7 +161,7 @@ __xfs_inobt_free_block(
xfs_inobt_mod_blockcount(cur, -1); xfs_inobt_mod_blockcount(cur, -1);
fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp)); fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp));
return xfs_free_extent_later(cur->bc_tp, fsbno, 1, return xfs_free_extent_later(cur->bc_tp, fsbno, 1,
&XFS_RMAP_OINFO_INOBT, resv); &XFS_RMAP_OINFO_INOBT, resv, false);
} }
STATIC int STATIC int

View File

@ -158,7 +158,7 @@ static void *
xfs_iext_find_first_leaf( xfs_iext_find_first_leaf(
struct xfs_ifork *ifp) struct xfs_ifork *ifp)
{ {
struct xfs_iext_node *node = ifp->if_u1.if_root; struct xfs_iext_node *node = ifp->if_data;
int height; int height;
if (!ifp->if_height) if (!ifp->if_height)
@ -176,7 +176,7 @@ static void *
xfs_iext_find_last_leaf( xfs_iext_find_last_leaf(
struct xfs_ifork *ifp) struct xfs_ifork *ifp)
{ {
struct xfs_iext_node *node = ifp->if_u1.if_root; struct xfs_iext_node *node = ifp->if_data;
int height, i; int height, i;
if (!ifp->if_height) if (!ifp->if_height)
@ -306,7 +306,7 @@ xfs_iext_find_level(
xfs_fileoff_t offset, xfs_fileoff_t offset,
int level) int level)
{ {
struct xfs_iext_node *node = ifp->if_u1.if_root; struct xfs_iext_node *node = ifp->if_data;
int height, i; int height, i;
if (!ifp->if_height) if (!ifp->if_height)
@ -402,12 +402,12 @@ xfs_iext_grow(
int i; int i;
if (ifp->if_height == 1) { if (ifp->if_height == 1) {
struct xfs_iext_leaf *prev = ifp->if_u1.if_root; struct xfs_iext_leaf *prev = ifp->if_data;
node->keys[0] = xfs_iext_leaf_key(prev, 0); node->keys[0] = xfs_iext_leaf_key(prev, 0);
node->ptrs[0] = prev; node->ptrs[0] = prev;
} else { } else {
struct xfs_iext_node *prev = ifp->if_u1.if_root; struct xfs_iext_node *prev = ifp->if_data;
ASSERT(ifp->if_height > 1); ASSERT(ifp->if_height > 1);
@ -418,7 +418,7 @@ xfs_iext_grow(
for (i = 1; i < KEYS_PER_NODE; i++) for (i = 1; i < KEYS_PER_NODE; i++)
node->keys[i] = XFS_IEXT_KEY_INVALID; node->keys[i] = XFS_IEXT_KEY_INVALID;
ifp->if_u1.if_root = node; ifp->if_data = node;
ifp->if_height++; ifp->if_height++;
} }
@ -430,7 +430,7 @@ xfs_iext_update_node(
int level, int level,
void *ptr) void *ptr)
{ {
struct xfs_iext_node *node = ifp->if_u1.if_root; struct xfs_iext_node *node = ifp->if_data;
int height, i; int height, i;
for (height = ifp->if_height; height > level; height--) { for (height = ifp->if_height; height > level; height--) {
@ -583,11 +583,11 @@ xfs_iext_alloc_root(
{ {
ASSERT(ifp->if_bytes == 0); ASSERT(ifp->if_bytes == 0);
ifp->if_u1.if_root = kmem_zalloc(sizeof(struct xfs_iext_rec), KM_NOFS); ifp->if_data = kmem_zalloc(sizeof(struct xfs_iext_rec), KM_NOFS);
ifp->if_height = 1; ifp->if_height = 1;
/* now that we have a node step into it */ /* now that we have a node step into it */
cur->leaf = ifp->if_u1.if_root; cur->leaf = ifp->if_data;
cur->pos = 0; cur->pos = 0;
} }
@ -603,9 +603,9 @@ xfs_iext_realloc_root(
if (new_size / sizeof(struct xfs_iext_rec) == RECS_PER_LEAF) if (new_size / sizeof(struct xfs_iext_rec) == RECS_PER_LEAF)
new_size = NODE_SIZE; new_size = NODE_SIZE;
new = krealloc(ifp->if_u1.if_root, new_size, GFP_NOFS | __GFP_NOFAIL); new = krealloc(ifp->if_data, new_size, GFP_NOFS | __GFP_NOFAIL);
memset(new + ifp->if_bytes, 0, new_size - ifp->if_bytes); memset(new + ifp->if_bytes, 0, new_size - ifp->if_bytes);
ifp->if_u1.if_root = new; ifp->if_data = new;
cur->leaf = new; cur->leaf = new;
} }
@ -622,13 +622,11 @@ static inline void xfs_iext_inc_seq(struct xfs_ifork *ifp)
} }
void void
xfs_iext_insert( xfs_iext_insert_raw(
struct xfs_inode *ip, struct xfs_ifork *ifp,
struct xfs_iext_cursor *cur, struct xfs_iext_cursor *cur,
struct xfs_bmbt_irec *irec, struct xfs_bmbt_irec *irec)
int state)
{ {
struct xfs_ifork *ifp = xfs_iext_state_to_fork(ip, state);
xfs_fileoff_t offset = irec->br_startoff; xfs_fileoff_t offset = irec->br_startoff;
struct xfs_iext_leaf *new = NULL; struct xfs_iext_leaf *new = NULL;
int nr_entries, i; int nr_entries, i;
@ -662,12 +660,23 @@ xfs_iext_insert(
xfs_iext_set(cur_rec(cur), irec); xfs_iext_set(cur_rec(cur), irec);
ifp->if_bytes += sizeof(struct xfs_iext_rec); ifp->if_bytes += sizeof(struct xfs_iext_rec);
trace_xfs_iext_insert(ip, cur, state, _RET_IP_);
if (new) if (new)
xfs_iext_insert_node(ifp, xfs_iext_leaf_key(new, 0), new, 2); xfs_iext_insert_node(ifp, xfs_iext_leaf_key(new, 0), new, 2);
} }
void
xfs_iext_insert(
struct xfs_inode *ip,
struct xfs_iext_cursor *cur,
struct xfs_bmbt_irec *irec,
int state)
{
struct xfs_ifork *ifp = xfs_iext_state_to_fork(ip, state);
xfs_iext_insert_raw(ifp, cur, irec);
trace_xfs_iext_insert(ip, cur, state, _RET_IP_);
}
static struct xfs_iext_node * static struct xfs_iext_node *
xfs_iext_rebalance_node( xfs_iext_rebalance_node(
struct xfs_iext_node *parent, struct xfs_iext_node *parent,
@ -777,8 +786,8 @@ again:
* If we are at the root and only one entry is left we can just * If we are at the root and only one entry is left we can just
* free this node and update the root pointer. * free this node and update the root pointer.
*/ */
ASSERT(node == ifp->if_u1.if_root); ASSERT(node == ifp->if_data);
ifp->if_u1.if_root = node->ptrs[0]; ifp->if_data = node->ptrs[0];
ifp->if_height--; ifp->if_height--;
kmem_free(node); kmem_free(node);
} }
@ -854,8 +863,8 @@ xfs_iext_free_last_leaf(
struct xfs_ifork *ifp) struct xfs_ifork *ifp)
{ {
ifp->if_height--; ifp->if_height--;
kmem_free(ifp->if_u1.if_root); kmem_free(ifp->if_data);
ifp->if_u1.if_root = NULL; ifp->if_data = NULL;
} }
void void
@ -872,7 +881,7 @@ xfs_iext_remove(
trace_xfs_iext_remove(ip, cur, state, _RET_IP_); trace_xfs_iext_remove(ip, cur, state, _RET_IP_);
ASSERT(ifp->if_height > 0); ASSERT(ifp->if_height > 0);
ASSERT(ifp->if_u1.if_root != NULL); ASSERT(ifp->if_data != NULL);
ASSERT(xfs_iext_valid(ifp, cur)); ASSERT(xfs_iext_valid(ifp, cur));
xfs_iext_inc_seq(ifp); xfs_iext_inc_seq(ifp);
@ -1042,9 +1051,9 @@ void
xfs_iext_destroy( xfs_iext_destroy(
struct xfs_ifork *ifp) struct xfs_ifork *ifp)
{ {
xfs_iext_destroy_node(ifp->if_u1.if_root, ifp->if_height); xfs_iext_destroy_node(ifp->if_data, ifp->if_height);
ifp->if_bytes = 0; ifp->if_bytes = 0;
ifp->if_height = 0; ifp->if_height = 0;
ifp->if_u1.if_root = NULL; ifp->if_data = NULL;
} }

View File

@ -50,12 +50,15 @@ xfs_init_local_fork(
mem_size++; mem_size++;
if (size) { if (size) {
ifp->if_u1.if_data = kmem_alloc(mem_size, KM_NOFS); char *new_data = kmem_alloc(mem_size, KM_NOFS);
memcpy(ifp->if_u1.if_data, data, size);
memcpy(new_data, data, size);
if (zero_terminate) if (zero_terminate)
ifp->if_u1.if_data[size] = '\0'; new_data[size] = '\0';
ifp->if_data = new_data;
} else { } else {
ifp->if_u1.if_data = NULL; ifp->if_data = NULL;
} }
ifp->if_bytes = size; ifp->if_bytes = size;
@ -125,7 +128,7 @@ xfs_iformat_extents(
} }
ifp->if_bytes = 0; ifp->if_bytes = 0;
ifp->if_u1.if_root = NULL; ifp->if_data = NULL;
ifp->if_height = 0; ifp->if_height = 0;
if (size) { if (size) {
dp = (xfs_bmbt_rec_t *) XFS_DFORK_PTR(dip, whichfork); dp = (xfs_bmbt_rec_t *) XFS_DFORK_PTR(dip, whichfork);
@ -212,7 +215,7 @@ xfs_iformat_btree(
ifp->if_broot, size); ifp->if_broot, size);
ifp->if_bytes = 0; ifp->if_bytes = 0;
ifp->if_u1.if_root = NULL; ifp->if_data = NULL;
ifp->if_height = 0; ifp->if_height = 0;
return 0; return 0;
} }
@ -276,10 +279,9 @@ static uint16_t
xfs_dfork_attr_shortform_size( xfs_dfork_attr_shortform_size(
struct xfs_dinode *dip) struct xfs_dinode *dip)
{ {
struct xfs_attr_shortform *atp = struct xfs_attr_sf_hdr *sf = XFS_DFORK_APTR(dip);
(struct xfs_attr_shortform *)XFS_DFORK_APTR(dip);
return be16_to_cpu(atp->hdr.totsize); return be16_to_cpu(sf->totsize);
} }
void void
@ -493,7 +495,7 @@ xfs_iroot_realloc(
* byte_diff -- the change in the number of bytes, positive or negative, * byte_diff -- the change in the number of bytes, positive or negative,
* requested for the if_data array. * requested for the if_data array.
*/ */
void void *
xfs_idata_realloc( xfs_idata_realloc(
struct xfs_inode *ip, struct xfs_inode *ip,
int64_t byte_diff, int64_t byte_diff,
@ -505,21 +507,18 @@ xfs_idata_realloc(
ASSERT(new_size >= 0); ASSERT(new_size >= 0);
ASSERT(new_size <= xfs_inode_fork_size(ip, whichfork)); ASSERT(new_size <= xfs_inode_fork_size(ip, whichfork));
if (byte_diff == 0) if (byte_diff) {
return; ifp->if_data = krealloc(ifp->if_data, new_size,
GFP_NOFS | __GFP_NOFAIL);
if (new_size == 0) { if (new_size == 0)
kmem_free(ifp->if_u1.if_data); ifp->if_data = NULL;
ifp->if_u1.if_data = NULL; ifp->if_bytes = new_size;
ifp->if_bytes = 0;
return;
} }
ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, new_size, return ifp->if_data;
GFP_NOFS | __GFP_NOFAIL);
ifp->if_bytes = new_size;
} }
/* Free all memory and reset a fork back to its initial state. */
void void
xfs_idestroy_fork( xfs_idestroy_fork(
struct xfs_ifork *ifp) struct xfs_ifork *ifp)
@ -531,8 +530,8 @@ xfs_idestroy_fork(
switch (ifp->if_format) { switch (ifp->if_format) {
case XFS_DINODE_FMT_LOCAL: case XFS_DINODE_FMT_LOCAL:
kmem_free(ifp->if_u1.if_data); kmem_free(ifp->if_data);
ifp->if_u1.if_data = NULL; ifp->if_data = NULL;
break; break;
case XFS_DINODE_FMT_EXTENTS: case XFS_DINODE_FMT_EXTENTS:
case XFS_DINODE_FMT_BTREE: case XFS_DINODE_FMT_BTREE:
@ -625,9 +624,9 @@ xfs_iflush_fork(
case XFS_DINODE_FMT_LOCAL: case XFS_DINODE_FMT_LOCAL:
if ((iip->ili_fields & dataflag[whichfork]) && if ((iip->ili_fields & dataflag[whichfork]) &&
(ifp->if_bytes > 0)) { (ifp->if_bytes > 0)) {
ASSERT(ifp->if_u1.if_data != NULL); ASSERT(ifp->if_data != NULL);
ASSERT(ifp->if_bytes <= xfs_inode_fork_size(ip, whichfork)); ASSERT(ifp->if_bytes <= xfs_inode_fork_size(ip, whichfork));
memcpy(cp, ifp->if_u1.if_data, ifp->if_bytes); memcpy(cp, ifp->if_data, ifp->if_bytes);
} }
break; break;
@ -702,19 +701,27 @@ xfs_ifork_verify_local_data(
xfs_failaddr_t fa = NULL; xfs_failaddr_t fa = NULL;
switch (VFS_I(ip)->i_mode & S_IFMT) { switch (VFS_I(ip)->i_mode & S_IFMT) {
case S_IFDIR: case S_IFDIR: {
fa = xfs_dir2_sf_verify(ip); struct xfs_mount *mp = ip->i_mount;
struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
struct xfs_dir2_sf_hdr *sfp = ifp->if_data;
fa = xfs_dir2_sf_verify(mp, sfp, ifp->if_bytes);
break; break;
case S_IFLNK: }
fa = xfs_symlink_shortform_verify(ip); case S_IFLNK: {
struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
fa = xfs_symlink_shortform_verify(ifp->if_data, ifp->if_bytes);
break; break;
}
default: default:
break; break;
} }
if (fa) { if (fa) {
xfs_inode_verifier_error(ip, -EFSCORRUPTED, "data fork", xfs_inode_verifier_error(ip, -EFSCORRUPTED, "data fork",
ip->i_df.if_u1.if_data, ip->i_df.if_bytes, fa); ip->i_df.if_data, ip->i_df.if_bytes, fa);
return -EFSCORRUPTED; return -EFSCORRUPTED;
} }
@ -729,14 +736,17 @@ xfs_ifork_verify_local_attr(
struct xfs_ifork *ifp = &ip->i_af; struct xfs_ifork *ifp = &ip->i_af;
xfs_failaddr_t fa; xfs_failaddr_t fa;
if (!xfs_inode_has_attr_fork(ip)) if (!xfs_inode_has_attr_fork(ip)) {
fa = __this_address; fa = __this_address;
else } else {
fa = xfs_attr_shortform_verify(ip); struct xfs_ifork *ifp = &ip->i_af;
ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL);
fa = xfs_attr_shortform_verify(ifp->if_data, ifp->if_bytes);
}
if (fa) { if (fa) {
xfs_inode_verifier_error(ip, -EFSCORRUPTED, "attr fork", xfs_inode_verifier_error(ip, -EFSCORRUPTED, "attr fork",
ifp->if_u1.if_data, ifp->if_bytes, fa); ifp->if_data, ifp->if_bytes, fa);
return -EFSCORRUPTED; return -EFSCORRUPTED;
} }

View File

@ -13,14 +13,12 @@ struct xfs_dinode;
* File incore extent information, present for each of data & attr forks. * File incore extent information, present for each of data & attr forks.
*/ */
struct xfs_ifork { struct xfs_ifork {
int64_t if_bytes; /* bytes in if_u1 */ int64_t if_bytes; /* bytes in if_data */
struct xfs_btree_block *if_broot; /* file's incore btree root */ struct xfs_btree_block *if_broot; /* file's incore btree root */
unsigned int if_seq; /* fork mod counter */ unsigned int if_seq; /* fork mod counter */
int if_height; /* height of the extent tree */ int if_height; /* height of the extent tree */
union { void *if_data; /* extent tree root or
void *if_root; /* extent tree root */ inline data */
char *if_data; /* inline file data */
} if_u1;
xfs_extnum_t if_nextents; /* # of extents in this fork */ xfs_extnum_t if_nextents; /* # of extents in this fork */
short if_broot_bytes; /* bytes allocated for root */ short if_broot_bytes; /* bytes allocated for root */
int8_t if_format; /* format of this fork */ int8_t if_format; /* format of this fork */
@ -170,7 +168,7 @@ int xfs_iformat_attr_fork(struct xfs_inode *, struct xfs_dinode *);
void xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *, void xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
struct xfs_inode_log_item *, int); struct xfs_inode_log_item *, int);
void xfs_idestroy_fork(struct xfs_ifork *ifp); void xfs_idestroy_fork(struct xfs_ifork *ifp);
void xfs_idata_realloc(struct xfs_inode *ip, int64_t byte_diff, void * xfs_idata_realloc(struct xfs_inode *ip, int64_t byte_diff,
int whichfork); int whichfork);
void xfs_iroot_realloc(struct xfs_inode *, int, int); void xfs_iroot_realloc(struct xfs_inode *, int, int);
int xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int); int xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int);
@ -180,6 +178,9 @@ void xfs_init_local_fork(struct xfs_inode *ip, int whichfork,
const void *data, int64_t size); const void *data, int64_t size);
xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp); xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp);
void xfs_iext_insert_raw(struct xfs_ifork *ifp,
struct xfs_iext_cursor *cur,
struct xfs_bmbt_irec *irec);
void xfs_iext_insert(struct xfs_inode *, struct xfs_iext_cursor *cur, void xfs_iext_insert(struct xfs_inode *, struct xfs_iext_cursor *cur,
struct xfs_bmbt_irec *, int); struct xfs_bmbt_irec *, int);
void xfs_iext_remove(struct xfs_inode *, struct xfs_iext_cursor *, void xfs_iext_remove(struct xfs_inode *, struct xfs_iext_cursor *,

View File

@ -11,6 +11,7 @@
* define how recovery should work for that type of log item. * define how recovery should work for that type of log item.
*/ */
struct xlog_recover_item; struct xlog_recover_item;
struct xfs_defer_op_type;
/* Sorting hat for log items as they're read in. */ /* Sorting hat for log items as they're read in. */
enum xlog_recover_reorder { enum xlog_recover_reorder {
@ -153,4 +154,11 @@ xlog_recover_resv(const struct xfs_trans_res *r)
return ret; return ret;
} }
struct xfs_defer_pending;
void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip,
xfs_lsn_t lsn, const struct xfs_defer_op_type *ops);
int xlog_recover_finish_intent(struct xfs_trans *tp,
struct xfs_defer_pending *dfp);
#endif /* __XFS_LOG_RECOVER_H__ */ #endif /* __XFS_LOG_RECOVER_H__ */

View File

@ -7,16 +7,16 @@
#define __XFS_ONDISK_H #define __XFS_ONDISK_H
#define XFS_CHECK_STRUCT_SIZE(structname, size) \ #define XFS_CHECK_STRUCT_SIZE(structname, size) \
BUILD_BUG_ON_MSG(sizeof(structname) != (size), "XFS: sizeof(" \ static_assert(sizeof(structname) == (size), \
#structname ") is wrong, expected " #size) "XFS: sizeof(" #structname ") is wrong, expected " #size)
#define XFS_CHECK_OFFSET(structname, member, off) \ #define XFS_CHECK_OFFSET(structname, member, off) \
BUILD_BUG_ON_MSG(offsetof(structname, member) != (off), \ static_assert(offsetof(structname, member) == (off), \
"XFS: offsetof(" #structname ", " #member ") is wrong, " \ "XFS: offsetof(" #structname ", " #member ") is wrong, " \
"expected " #off) "expected " #off)
#define XFS_CHECK_VALUE(value, expected) \ #define XFS_CHECK_VALUE(value, expected) \
BUILD_BUG_ON_MSG((value) != (expected), \ static_assert((value) == (expected), \
"XFS: value of " #value " is wrong, expected " #expected) "XFS: value of " #value " is wrong, expected " #expected)
static inline void __init static inline void __init
@ -93,13 +93,13 @@ xfs_check_ondisk_structs(void)
XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, namelen, 8); XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, namelen, 8);
XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, name, 9); XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, name, 9);
XFS_CHECK_STRUCT_SIZE(xfs_attr_leafblock_t, 32); XFS_CHECK_STRUCT_SIZE(xfs_attr_leafblock_t, 32);
XFS_CHECK_STRUCT_SIZE(struct xfs_attr_shortform, 4); XFS_CHECK_STRUCT_SIZE(struct xfs_attr_sf_hdr, 4);
XFS_CHECK_OFFSET(struct xfs_attr_shortform, hdr.totsize, 0); XFS_CHECK_OFFSET(struct xfs_attr_sf_hdr, totsize, 0);
XFS_CHECK_OFFSET(struct xfs_attr_shortform, hdr.count, 2); XFS_CHECK_OFFSET(struct xfs_attr_sf_hdr, count, 2);
XFS_CHECK_OFFSET(struct xfs_attr_shortform, list[0].namelen, 4); XFS_CHECK_OFFSET(struct xfs_attr_sf_entry, namelen, 0);
XFS_CHECK_OFFSET(struct xfs_attr_shortform, list[0].valuelen, 5); XFS_CHECK_OFFSET(struct xfs_attr_sf_entry, valuelen, 1);
XFS_CHECK_OFFSET(struct xfs_attr_shortform, list[0].flags, 6); XFS_CHECK_OFFSET(struct xfs_attr_sf_entry, flags, 2);
XFS_CHECK_OFFSET(struct xfs_attr_shortform, list[0].nameval, 7); XFS_CHECK_OFFSET(struct xfs_attr_sf_entry, nameval, 3);
XFS_CHECK_STRUCT_SIZE(xfs_da_blkinfo_t, 12); XFS_CHECK_STRUCT_SIZE(xfs_da_blkinfo_t, 12);
XFS_CHECK_STRUCT_SIZE(xfs_da_intnode_t, 16); XFS_CHECK_STRUCT_SIZE(xfs_da_intnode_t, 16);
XFS_CHECK_STRUCT_SIZE(xfs_da_node_entry_t, 8); XFS_CHECK_STRUCT_SIZE(xfs_da_node_entry_t, 8);

View File

@ -123,11 +123,9 @@ xfs_refcount_btrec_to_irec(
/* Simple checks for refcount records. */ /* Simple checks for refcount records. */
xfs_failaddr_t xfs_failaddr_t
xfs_refcount_check_irec( xfs_refcount_check_irec(
struct xfs_btree_cur *cur, struct xfs_perag *pag,
const struct xfs_refcount_irec *irec) const struct xfs_refcount_irec *irec)
{ {
struct xfs_perag *pag = cur->bc_ag.pag;
if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN) if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN)
return __this_address; return __this_address;
@ -179,7 +177,7 @@ xfs_refcount_get_rec(
return error; return error;
xfs_refcount_btrec_to_irec(rec, irec); xfs_refcount_btrec_to_irec(rec, irec);
fa = xfs_refcount_check_irec(cur, irec); fa = xfs_refcount_check_irec(cur->bc_ag.pag, irec);
if (fa) if (fa)
return xfs_refcount_complain_bad_rec(cur, fa, irec); return xfs_refcount_complain_bad_rec(cur, fa, irec);
@ -1153,7 +1151,7 @@ xfs_refcount_adjust_extents(
tmp.rc_startblock); tmp.rc_startblock);
error = xfs_free_extent_later(cur->bc_tp, fsbno, error = xfs_free_extent_later(cur->bc_tp, fsbno,
tmp.rc_blockcount, NULL, tmp.rc_blockcount, NULL,
XFS_AG_RESV_NONE); XFS_AG_RESV_NONE, false);
if (error) if (error)
goto out_error; goto out_error;
} }
@ -1215,7 +1213,7 @@ xfs_refcount_adjust_extents(
ext.rc_startblock); ext.rc_startblock);
error = xfs_free_extent_later(cur->bc_tp, fsbno, error = xfs_free_extent_later(cur->bc_tp, fsbno,
ext.rc_blockcount, NULL, ext.rc_blockcount, NULL,
XFS_AG_RESV_NONE); XFS_AG_RESV_NONE, false);
if (error) if (error)
goto out_error; goto out_error;
} }
@ -1458,7 +1456,7 @@ __xfs_refcount_add(
ri->ri_blockcount = blockcount; ri->ri_blockcount = blockcount;
xfs_refcount_update_get_group(tp->t_mountp, ri); xfs_refcount_update_get_group(tp->t_mountp, ri);
xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_REFCOUNT, &ri->ri_list); xfs_defer_add(tp, &ri->ri_list, &xfs_refcount_update_defer_type);
} }
/* /*
@ -1899,7 +1897,7 @@ xfs_refcount_recover_extent(
INIT_LIST_HEAD(&rr->rr_list); INIT_LIST_HEAD(&rr->rr_list);
xfs_refcount_btrec_to_irec(rec, &rr->rr_rrec); xfs_refcount_btrec_to_irec(rec, &rr->rr_rrec);
if (xfs_refcount_check_irec(cur, &rr->rr_rrec) != NULL || if (xfs_refcount_check_irec(cur->bc_ag.pag, &rr->rr_rrec) != NULL ||
XFS_IS_CORRUPT(cur->bc_mp, XFS_IS_CORRUPT(cur->bc_mp,
rr->rr_rrec.rc_domain != XFS_REFC_DOMAIN_COW)) { rr->rr_rrec.rc_domain != XFS_REFC_DOMAIN_COW)) {
kfree(rr); kfree(rr);
@ -1985,7 +1983,7 @@ xfs_refcount_recover_cow_leftovers(
/* Free the block. */ /* Free the block. */
error = xfs_free_extent_later(tp, fsb, error = xfs_free_extent_later(tp, fsb,
rr->rr_rrec.rc_blockcount, NULL, rr->rr_rrec.rc_blockcount, NULL,
XFS_AG_RESV_NONE); XFS_AG_RESV_NONE, false);
if (error) if (error)
goto out_trans; goto out_trans;
@ -2033,6 +2031,47 @@ xfs_refcount_has_records(
return xfs_btree_has_records(cur, &low, &high, NULL, outcome); return xfs_btree_has_records(cur, &low, &high, NULL, outcome);
} }
struct xfs_refcount_query_range_info {
xfs_refcount_query_range_fn fn;
void *priv;
};
/* Format btree record and pass to our callback. */
STATIC int
xfs_refcount_query_range_helper(
struct xfs_btree_cur *cur,
const union xfs_btree_rec *rec,
void *priv)
{
struct xfs_refcount_query_range_info *query = priv;
struct xfs_refcount_irec irec;
xfs_failaddr_t fa;
xfs_refcount_btrec_to_irec(rec, &irec);
fa = xfs_refcount_check_irec(cur->bc_ag.pag, &irec);
if (fa)
return xfs_refcount_complain_bad_rec(cur, fa, &irec);
return query->fn(cur, &irec, query->priv);
}
/* Find all refcount records between two keys. */
int
xfs_refcount_query_range(
struct xfs_btree_cur *cur,
const struct xfs_refcount_irec *low_rec,
const struct xfs_refcount_irec *high_rec,
xfs_refcount_query_range_fn fn,
void *priv)
{
union xfs_btree_irec low_brec = { .rc = *low_rec };
union xfs_btree_irec high_brec = { .rc = *high_rec };
struct xfs_refcount_query_range_info query = { .priv = priv, .fn = fn };
return xfs_btree_query_range(cur, &low_brec, &high_brec,
xfs_refcount_query_range_helper, &query);
}
int __init int __init
xfs_refcount_intent_init_cache(void) xfs_refcount_intent_init_cache(void)
{ {

View File

@ -117,7 +117,7 @@ extern int xfs_refcount_has_records(struct xfs_btree_cur *cur,
union xfs_btree_rec; union xfs_btree_rec;
extern void xfs_refcount_btrec_to_irec(const union xfs_btree_rec *rec, extern void xfs_refcount_btrec_to_irec(const union xfs_btree_rec *rec,
struct xfs_refcount_irec *irec); struct xfs_refcount_irec *irec);
xfs_failaddr_t xfs_refcount_check_irec(struct xfs_btree_cur *cur, xfs_failaddr_t xfs_refcount_check_irec(struct xfs_perag *pag,
const struct xfs_refcount_irec *irec); const struct xfs_refcount_irec *irec);
extern int xfs_refcount_insert(struct xfs_btree_cur *cur, extern int xfs_refcount_insert(struct xfs_btree_cur *cur,
struct xfs_refcount_irec *irec, int *stat); struct xfs_refcount_irec *irec, int *stat);
@ -127,4 +127,14 @@ extern struct kmem_cache *xfs_refcount_intent_cache;
int __init xfs_refcount_intent_init_cache(void); int __init xfs_refcount_intent_init_cache(void);
void xfs_refcount_intent_destroy_cache(void); void xfs_refcount_intent_destroy_cache(void);
typedef int (*xfs_refcount_query_range_fn)(
struct xfs_btree_cur *cur,
const struct xfs_refcount_irec *rec,
void *priv);
int xfs_refcount_query_range(struct xfs_btree_cur *cur,
const struct xfs_refcount_irec *low_rec,
const struct xfs_refcount_irec *high_rec,
xfs_refcount_query_range_fn fn, void *priv);
#endif /* __XFS_REFCOUNT_H__ */ #endif /* __XFS_REFCOUNT_H__ */

View File

@ -112,7 +112,7 @@ xfs_refcountbt_free_block(
be32_add_cpu(&agf->agf_refcount_blocks, -1); be32_add_cpu(&agf->agf_refcount_blocks, -1);
xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS); xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS);
return xfs_free_extent_later(cur->bc_tp, fsbno, 1, return xfs_free_extent_later(cur->bc_tp, fsbno, 1,
&XFS_RMAP_OINFO_REFC, XFS_AG_RESV_METADATA); &XFS_RMAP_OINFO_REFC, XFS_AG_RESV_METADATA, false);
} }
STATIC int STATIC int
@ -226,7 +226,18 @@ xfs_refcountbt_verify(
level = be16_to_cpu(block->bb_level); level = be16_to_cpu(block->bb_level);
if (pag && xfs_perag_initialised_agf(pag)) { if (pag && xfs_perag_initialised_agf(pag)) {
if (level >= pag->pagf_refcount_level) unsigned int maxlevel = pag->pagf_refcount_level;
#ifdef CONFIG_XFS_ONLINE_REPAIR
/*
* Online repair could be rewriting the refcount btree, so
* we'll validate against the larger of either tree while this
* is going on.
*/
maxlevel = max_t(unsigned int, maxlevel,
pag->pagf_repair_refcount_level);
#endif
if (level >= maxlevel)
return __this_address; return __this_address;
} else if (level >= mp->m_refc_maxlevels) } else if (level >= mp->m_refc_maxlevels)
return __this_address; return __this_address;

View File

@ -2567,7 +2567,7 @@ __xfs_rmap_add(
ri->ri_bmap = *bmap; ri->ri_bmap = *bmap;
xfs_rmap_update_get_group(tp->t_mountp, ri); xfs_rmap_update_get_group(tp->t_mountp, ri);
xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_RMAP, &ri->ri_list); xfs_defer_add(tp, &ri->ri_list, &xfs_rmap_update_defer_type);
} }
/* Map an extent into a file. */ /* Map an extent into a file. */

View File

@ -184,7 +184,7 @@ xfs_rtfind_back(
* Calculate first (leftmost) bit number to look at, * Calculate first (leftmost) bit number to look at,
* and mask for all the relevant bits in this word. * and mask for all the relevant bits in this word.
*/ */
firstbit = XFS_RTMAX((xfs_srtblock_t)(bit - len + 1), 0); firstbit = max_t(xfs_srtblock_t, bit - len + 1, 0);
mask = (((xfs_rtword_t)1 << (bit - firstbit + 1)) - 1) << mask = (((xfs_rtword_t)1 << (bit - firstbit + 1)) - 1) <<
firstbit; firstbit;
/* /*
@ -195,7 +195,7 @@ xfs_rtfind_back(
/* /*
* Different. Mark where we are and return. * Different. Mark where we are and return.
*/ */
i = bit - XFS_RTHIBIT(wdiff); i = bit - xfs_highbit32(wdiff);
*rtx = start - i + 1; *rtx = start - i + 1;
return 0; return 0;
} }
@ -233,7 +233,7 @@ xfs_rtfind_back(
/* /*
* Different, mark where we are and return. * Different, mark where we are and return.
*/ */
i += XFS_NBWORD - 1 - XFS_RTHIBIT(wdiff); i += XFS_NBWORD - 1 - xfs_highbit32(wdiff);
*rtx = start - i + 1; *rtx = start - i + 1;
return 0; return 0;
} }
@ -272,7 +272,7 @@ xfs_rtfind_back(
/* /*
* Different, mark where we are and return. * Different, mark where we are and return.
*/ */
i += XFS_NBWORD - 1 - XFS_RTHIBIT(wdiff); i += XFS_NBWORD - 1 - xfs_highbit32(wdiff);
*rtx = start - i + 1; *rtx = start - i + 1;
return 0; return 0;
} else } else
@ -338,7 +338,7 @@ xfs_rtfind_forw(
* Calculate last (rightmost) bit number to look at, * Calculate last (rightmost) bit number to look at,
* and mask for all the relevant bits in this word. * and mask for all the relevant bits in this word.
*/ */
lastbit = XFS_RTMIN(bit + len, XFS_NBWORD); lastbit = min(bit + len, XFS_NBWORD);
mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit; mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit;
/* /*
* Calculate the difference between the value there * Calculate the difference between the value there
@ -348,7 +348,7 @@ xfs_rtfind_forw(
/* /*
* Different. Mark where we are and return. * Different. Mark where we are and return.
*/ */
i = XFS_RTLOBIT(wdiff) - bit; i = xfs_lowbit32(wdiff) - bit;
*rtx = start + i - 1; *rtx = start + i - 1;
return 0; return 0;
} }
@ -386,7 +386,7 @@ xfs_rtfind_forw(
/* /*
* Different, mark where we are and return. * Different, mark where we are and return.
*/ */
i += XFS_RTLOBIT(wdiff); i += xfs_lowbit32(wdiff);
*rtx = start + i - 1; *rtx = start + i - 1;
return 0; return 0;
} }
@ -423,7 +423,7 @@ xfs_rtfind_forw(
/* /*
* Different, mark where we are and return. * Different, mark where we are and return.
*/ */
i += XFS_RTLOBIT(wdiff); i += xfs_lowbit32(wdiff);
*rtx = start + i - 1; *rtx = start + i - 1;
return 0; return 0;
} else } else
@ -452,63 +452,9 @@ xfs_trans_log_rtsummary(
} }
/* /*
* Read and/or modify the summary information for a given extent size, * Modify the summary information for a given extent size, bitmap block
* bitmap block combination. * combination.
* Keeps track of a current summary block, so we don't keep reading
* it from the buffer cache.
*
* Summary information is returned in *sum if specified.
* If no delta is specified, returns summary only.
*/ */
int
xfs_rtmodify_summary_int(
struct xfs_rtalloc_args *args,
int log, /* log2 of extent size */
xfs_fileoff_t bbno, /* bitmap block number */
int delta, /* change to make to summary info */
xfs_suminfo_t *sum) /* out: summary info for this block */
{
struct xfs_mount *mp = args->mp;
int error;
xfs_fileoff_t sb; /* summary fsblock */
xfs_rtsumoff_t so; /* index into the summary file */
unsigned int infoword;
/*
* Compute entry number in the summary file.
*/
so = xfs_rtsumoffs(mp, log, bbno);
/*
* Compute the block number in the summary file.
*/
sb = xfs_rtsumoffs_to_block(mp, so);
error = xfs_rtsummary_read_buf(args, sb);
if (error)
return error;
/*
* Point to the summary information, modify/log it, and/or copy it out.
*/
infoword = xfs_rtsumoffs_to_infoword(mp, so);
if (delta) {
xfs_suminfo_t val = xfs_suminfo_add(args, infoword, delta);
if (mp->m_rsum_cache) {
if (val == 0 && log + 1 == mp->m_rsum_cache[bbno])
mp->m_rsum_cache[bbno] = log;
if (val != 0 && log >= mp->m_rsum_cache[bbno])
mp->m_rsum_cache[bbno] = log + 1;
}
xfs_trans_log_rtsummary(args, infoword);
if (sum)
*sum = val;
} else if (sum) {
*sum = xfs_suminfo_get(args, infoword);
}
return 0;
}
int int
xfs_rtmodify_summary( xfs_rtmodify_summary(
struct xfs_rtalloc_args *args, struct xfs_rtalloc_args *args,
@ -516,7 +462,49 @@ xfs_rtmodify_summary(
xfs_fileoff_t bbno, /* bitmap block number */ xfs_fileoff_t bbno, /* bitmap block number */
int delta) /* in/out: summary block number */ int delta) /* in/out: summary block number */
{ {
return xfs_rtmodify_summary_int(args, log, bbno, delta, NULL); struct xfs_mount *mp = args->mp;
xfs_rtsumoff_t so = xfs_rtsumoffs(mp, log, bbno);
unsigned int infoword;
xfs_suminfo_t val;
int error;
error = xfs_rtsummary_read_buf(args, xfs_rtsumoffs_to_block(mp, so));
if (error)
return error;
infoword = xfs_rtsumoffs_to_infoword(mp, so);
val = xfs_suminfo_add(args, infoword, delta);
if (mp->m_rsum_cache) {
if (val == 0 && log + 1 == mp->m_rsum_cache[bbno])
mp->m_rsum_cache[bbno] = log;
if (val != 0 && log >= mp->m_rsum_cache[bbno])
mp->m_rsum_cache[bbno] = log + 1;
}
xfs_trans_log_rtsummary(args, infoword);
return 0;
}
/*
* Read and return the summary information for a given extent size, bitmap block
* combination.
*/
int
xfs_rtget_summary(
struct xfs_rtalloc_args *args,
int log, /* log2 of extent size */
xfs_fileoff_t bbno, /* bitmap block number */
xfs_suminfo_t *sum) /* out: summary info for this block */
{
struct xfs_mount *mp = args->mp;
xfs_rtsumoff_t so = xfs_rtsumoffs(mp, log, bbno);
int error;
error = xfs_rtsummary_read_buf(args, xfs_rtsumoffs_to_block(mp, so));
if (!error)
*sum = xfs_suminfo_get(args, xfs_rtsumoffs_to_infoword(mp, so));
return error;
} }
/* Log rtbitmap block from the word @from to the byte before @next. */ /* Log rtbitmap block from the word @from to the byte before @next. */
@ -585,7 +573,7 @@ xfs_rtmodify_range(
/* /*
* Compute first bit not changed and mask of relevant bits. * Compute first bit not changed and mask of relevant bits.
*/ */
lastbit = XFS_RTMIN(bit + len, XFS_NBWORD); lastbit = min(bit + len, XFS_NBWORD);
mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit; mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit;
/* /*
* Set/clear the active bits. * Set/clear the active bits.
@ -720,7 +708,7 @@ xfs_rtfree_range(
*/ */
if (preblock < start) { if (preblock < start) {
error = xfs_rtmodify_summary(args, error = xfs_rtmodify_summary(args,
XFS_RTBLOCKLOG(start - preblock), xfs_highbit64(start - preblock),
xfs_rtx_to_rbmblock(mp, preblock), -1); xfs_rtx_to_rbmblock(mp, preblock), -1);
if (error) { if (error) {
return error; return error;
@ -732,7 +720,7 @@ xfs_rtfree_range(
*/ */
if (postblock > end) { if (postblock > end) {
error = xfs_rtmodify_summary(args, error = xfs_rtmodify_summary(args,
XFS_RTBLOCKLOG(postblock - end), xfs_highbit64(postblock - end),
xfs_rtx_to_rbmblock(mp, end + 1), -1); xfs_rtx_to_rbmblock(mp, end + 1), -1);
if (error) { if (error) {
return error; return error;
@ -743,7 +731,7 @@ xfs_rtfree_range(
* (new) free extent. * (new) free extent.
*/ */
return xfs_rtmodify_summary(args, return xfs_rtmodify_summary(args,
XFS_RTBLOCKLOG(postblock + 1 - preblock), xfs_highbit64(postblock + 1 - preblock),
xfs_rtx_to_rbmblock(mp, preblock), 1); xfs_rtx_to_rbmblock(mp, preblock), 1);
} }
@ -799,7 +787,7 @@ xfs_rtcheck_range(
/* /*
* Compute first bit not examined. * Compute first bit not examined.
*/ */
lastbit = XFS_RTMIN(bit + len, XFS_NBWORD); lastbit = min(bit + len, XFS_NBWORD);
/* /*
* Mask of relevant bits. * Mask of relevant bits.
*/ */
@ -812,7 +800,7 @@ xfs_rtcheck_range(
/* /*
* Different, compute first wrong bit and return. * Different, compute first wrong bit and return.
*/ */
i = XFS_RTLOBIT(wdiff) - bit; i = xfs_lowbit32(wdiff) - bit;
*new = start + i; *new = start + i;
*stat = 0; *stat = 0;
return 0; return 0;
@ -851,7 +839,7 @@ xfs_rtcheck_range(
/* /*
* Different, compute first wrong bit and return. * Different, compute first wrong bit and return.
*/ */
i += XFS_RTLOBIT(wdiff); i += xfs_lowbit32(wdiff);
*new = start + i; *new = start + i;
*stat = 0; *stat = 0;
return 0; return 0;
@ -889,7 +877,7 @@ xfs_rtcheck_range(
/* /*
* Different, compute first wrong bit and return. * Different, compute first wrong bit and return.
*/ */
i += XFS_RTLOBIT(wdiff); i += xfs_lowbit32(wdiff);
*new = start + i; *new = start + i;
*stat = 0; *stat = 0;
return 0; return 0;
@ -1130,6 +1118,20 @@ xfs_rtbitmap_blockcount(
return howmany_64(rtextents, NBBY * mp->m_sb.sb_blocksize); return howmany_64(rtextents, NBBY * mp->m_sb.sb_blocksize);
} }
/*
* Compute the maximum level number of the realtime summary file, as defined by
* mkfs. The historic use of highbit32 on a 64-bit quantity prohibited correct
* use of rt volumes with more than 2^32 extents.
*/
uint8_t
xfs_compute_rextslog(
xfs_rtbxlen_t rtextents)
{
if (!rtextents)
return 0;
return xfs_highbit64(rtextents);
}
/* /*
* Compute the number of rtbitmap words needed to populate every block of a * Compute the number of rtbitmap words needed to populate every block of a
* bitmap that is large enough to track the given number of rt extents. * bitmap that is large enough to track the given number of rt extents.

View File

@ -321,8 +321,8 @@ int xfs_rtfind_forw(struct xfs_rtalloc_args *args, xfs_rtxnum_t start,
xfs_rtxnum_t limit, xfs_rtxnum_t *rtblock); xfs_rtxnum_t limit, xfs_rtxnum_t *rtblock);
int xfs_rtmodify_range(struct xfs_rtalloc_args *args, xfs_rtxnum_t start, int xfs_rtmodify_range(struct xfs_rtalloc_args *args, xfs_rtxnum_t start,
xfs_rtxlen_t len, int val); xfs_rtxlen_t len, int val);
int xfs_rtmodify_summary_int(struct xfs_rtalloc_args *args, int log, int xfs_rtget_summary(struct xfs_rtalloc_args *args, int log,
xfs_fileoff_t bbno, int delta, xfs_suminfo_t *sum); xfs_fileoff_t bbno, xfs_suminfo_t *sum);
int xfs_rtmodify_summary(struct xfs_rtalloc_args *args, int log, int xfs_rtmodify_summary(struct xfs_rtalloc_args *args, int log,
xfs_fileoff_t bbno, int delta); xfs_fileoff_t bbno, int delta);
int xfs_rtfree_range(struct xfs_rtalloc_args *args, xfs_rtxnum_t start, int xfs_rtfree_range(struct xfs_rtalloc_args *args, xfs_rtxnum_t start,
@ -351,6 +351,20 @@ xfs_rtfree_extent(
int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno,
xfs_filblks_t rtlen); xfs_filblks_t rtlen);
uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents);
/* Do we support an rt volume having this number of rtextents? */
static inline bool
xfs_validate_rtextents(
xfs_rtbxlen_t rtextents)
{
/* No runt rt volumes */
if (rtextents == 0)
return false;
return true;
}
xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t
rtextents); rtextents);
unsigned long long xfs_rtbitmap_wordcount(struct xfs_mount *mp, unsigned long long xfs_rtbitmap_wordcount(struct xfs_mount *mp,
@ -369,6 +383,8 @@ unsigned long long xfs_rtsummary_wordcount(struct xfs_mount *mp,
# define xfs_rtsummary_read_buf(a,b) (-ENOSYS) # define xfs_rtsummary_read_buf(a,b) (-ENOSYS)
# define xfs_rtbuf_cache_relse(a) (0) # define xfs_rtbuf_cache_relse(a) (0)
# define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) # define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS)
# define xfs_compute_rextslog(rtx) (0)
# define xfs_validate_rtextents(rtx) (false)
static inline xfs_filblks_t static inline xfs_filblks_t
xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents) xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents)
{ {

View File

@ -25,6 +25,7 @@
#include "xfs_da_format.h" #include "xfs_da_format.h"
#include "xfs_health.h" #include "xfs_health.h"
#include "xfs_ag.h" #include "xfs_ag.h"
#include "xfs_rtbitmap.h"
/* /*
* Physical superblock buffer manipulations. Shared with libxfs in userspace. * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@ -508,8 +509,9 @@ xfs_validate_sb_common(
rbmblocks = howmany_64(sbp->sb_rextents, rbmblocks = howmany_64(sbp->sb_rextents,
NBBY * sbp->sb_blocksize); NBBY * sbp->sb_blocksize);
if (sbp->sb_rextents != rexts || if (!xfs_validate_rtextents(rexts) ||
sbp->sb_rextslog != xfs_highbit32(sbp->sb_rextents) || sbp->sb_rextents != rexts ||
sbp->sb_rextslog != xfs_compute_rextslog(rexts) ||
sbp->sb_rbmblocks != rbmblocks) { sbp->sb_rbmblocks != rbmblocks) {
xfs_notice(mp, xfs_notice(mp,
"realtime geometry sanity check failed"); "realtime geometry sanity check failed");

View File

@ -139,7 +139,7 @@ bool xfs_symlink_hdr_ok(xfs_ino_t ino, uint32_t offset,
uint32_t size, struct xfs_buf *bp); uint32_t size, struct xfs_buf *bp);
void xfs_symlink_local_to_remote(struct xfs_trans *tp, struct xfs_buf *bp, void xfs_symlink_local_to_remote(struct xfs_trans *tp, struct xfs_buf *bp,
struct xfs_inode *ip, struct xfs_ifork *ifp); struct xfs_inode *ip, struct xfs_ifork *ifp);
xfs_failaddr_t xfs_symlink_shortform_verify(struct xfs_inode *ip); xfs_failaddr_t xfs_symlink_shortform_verify(void *sfp, int64_t size);
/* Computed inode geometry for the filesystem. */ /* Computed inode geometry for the filesystem. */
struct xfs_ino_geometry { struct xfs_ino_geometry {

View File

@ -175,7 +175,7 @@ xfs_symlink_local_to_remote(
if (!xfs_has_crc(mp)) { if (!xfs_has_crc(mp)) {
bp->b_ops = NULL; bp->b_ops = NULL;
memcpy(bp->b_addr, ifp->if_u1.if_data, ifp->if_bytes); memcpy(bp->b_addr, ifp->if_data, ifp->if_bytes);
xfs_trans_log_buf(tp, bp, 0, ifp->if_bytes - 1); xfs_trans_log_buf(tp, bp, 0, ifp->if_bytes - 1);
return; return;
} }
@ -191,7 +191,7 @@ xfs_symlink_local_to_remote(
buf = bp->b_addr; buf = bp->b_addr;
buf += xfs_symlink_hdr_set(mp, ip->i_ino, 0, ifp->if_bytes, bp); buf += xfs_symlink_hdr_set(mp, ip->i_ino, 0, ifp->if_bytes, bp);
memcpy(buf, ifp->if_u1.if_data, ifp->if_bytes); memcpy(buf, ifp->if_data, ifp->if_bytes);
xfs_trans_log_buf(tp, bp, 0, sizeof(struct xfs_dsymlink_hdr) + xfs_trans_log_buf(tp, bp, 0, sizeof(struct xfs_dsymlink_hdr) +
ifp->if_bytes - 1); ifp->if_bytes - 1);
} }
@ -202,15 +202,11 @@ xfs_symlink_local_to_remote(
*/ */
xfs_failaddr_t xfs_failaddr_t
xfs_symlink_shortform_verify( xfs_symlink_shortform_verify(
struct xfs_inode *ip) void *sfp,
int64_t size)
{ {
struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
char *sfp = (char *)ifp->if_u1.if_data;
int size = ifp->if_bytes;
char *endp = sfp + size; char *endp = sfp + size;
ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL);
/* /*
* Zero length symlinks should never occur in memory as they are * Zero length symlinks should never occur in memory as they are
* never allowed to exist on disk. * never allowed to exist on disk.

View File

@ -51,7 +51,6 @@ typedef void * xfs_failaddr_t;
#define NULLRFSBLOCK ((xfs_rfsblock_t)-1) #define NULLRFSBLOCK ((xfs_rfsblock_t)-1)
#define NULLRTBLOCK ((xfs_rtblock_t)-1) #define NULLRTBLOCK ((xfs_rtblock_t)-1)
#define NULLFILEOFF ((xfs_fileoff_t)-1) #define NULLFILEOFF ((xfs_fileoff_t)-1)
#define NULLRTEXTNO ((xfs_rtxnum_t)-1)
#define NULLAGBLOCK ((xfs_agblock_t)-1) #define NULLAGBLOCK ((xfs_agblock_t)-1)
#define NULLAGNUMBER ((xfs_agnumber_t)-1) #define NULLAGNUMBER ((xfs_agnumber_t)-1)
@ -208,6 +207,13 @@ enum xfs_ag_resv_type {
XFS_AG_RESV_AGFL, XFS_AG_RESV_AGFL,
XFS_AG_RESV_METADATA, XFS_AG_RESV_METADATA,
XFS_AG_RESV_RMAPBT, XFS_AG_RESV_RMAPBT,
/*
* Don't increase fdblocks when freeing extent. This is a pony for
* the bnobt repair functions to re-free the free space without
* altering fdblocks. If you think you need this you're wrong.
*/
XFS_AG_RESV_IGNORE,
}; };
/* Results of scanning a btree keyspace to check occupancy. */ /* Results of scanning a btree keyspace to check occupancy. */

103
fs/xfs/scrub/agb_bitmap.c Normal file
View File

@ -0,0 +1,103 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2018-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_shared.h"
#include "xfs_bit.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_btree.h"
#include "bitmap.h"
#include "scrub/agb_bitmap.h"
/*
* Record all btree blocks seen while iterating all records of a btree.
*
* We know that the btree query_all function starts at the left edge and walks
* towards the right edge of the tree. Therefore, we know that we can walk up
* the btree cursor towards the root; if the pointer for a given level points
* to the first record/key in that block, we haven't seen this block before;
* and therefore we need to remember that we saw this block in the btree.
*
* So if our btree is:
*
* 4
* / | \
* 1 2 3
*
* Pretend for this example that each leaf block has 100 btree records. For
* the first btree record, we'll observe that bc_levels[0].ptr == 1, so we
* record that we saw block 1. Then we observe that bc_levels[1].ptr == 1, so
* we record block 4. The list is [1, 4].
*
* For the second btree record, we see that bc_levels[0].ptr == 2, so we exit
* the loop. The list remains [1, 4].
*
* For the 101st btree record, we've moved onto leaf block 2. Now
* bc_levels[0].ptr == 1 again, so we record that we saw block 2. We see that
* bc_levels[1].ptr == 2, so we exit the loop. The list is now [1, 4, 2].
*
* For the 102nd record, bc_levels[0].ptr == 2, so we continue.
*
* For the 201st record, we've moved on to leaf block 3.
* bc_levels[0].ptr == 1, so we add 3 to the list. Now it is [1, 4, 2, 3].
*
* For the 300th record we just exit, with the list being [1, 4, 2, 3].
*/
/* Mark a btree block to the agblock bitmap. */
STATIC int
xagb_bitmap_visit_btblock(
struct xfs_btree_cur *cur,
int level,
void *priv)
{
struct xagb_bitmap *bitmap = priv;
struct xfs_buf *bp;
xfs_fsblock_t fsbno;
xfs_agblock_t agbno;
xfs_btree_get_block(cur, level, &bp);
if (!bp)
return 0;
fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp));
agbno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
return xagb_bitmap_set(bitmap, agbno, 1);
}
/* Mark all (per-AG) btree blocks in the agblock bitmap. */
int
xagb_bitmap_set_btblocks(
struct xagb_bitmap *bitmap,
struct xfs_btree_cur *cur)
{
return xfs_btree_visit_blocks(cur, xagb_bitmap_visit_btblock,
XFS_BTREE_VISIT_ALL, bitmap);
}
/*
* Record all the buffers pointed to by the btree cursor. Callers already
* engaged in a btree walk should call this function to capture the list of
* blocks going from the leaf towards the root.
*/
int
xagb_bitmap_set_btcur_path(
struct xagb_bitmap *bitmap,
struct xfs_btree_cur *cur)
{
int i;
int error;
for (i = 0; i < cur->bc_nlevels && cur->bc_levels[i].ptr == 1; i++) {
error = xagb_bitmap_visit_btblock(cur, i, bitmap);
if (error)
return error;
}
return 0;
}

68
fs/xfs/scrub/agb_bitmap.h Normal file
View File

@ -0,0 +1,68 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2018-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#ifndef __XFS_SCRUB_AGB_BITMAP_H__
#define __XFS_SCRUB_AGB_BITMAP_H__
/* Bitmaps, but for type-checked for xfs_agblock_t */
struct xagb_bitmap {
struct xbitmap32 agbitmap;
};
static inline void xagb_bitmap_init(struct xagb_bitmap *bitmap)
{
xbitmap32_init(&bitmap->agbitmap);
}
static inline void xagb_bitmap_destroy(struct xagb_bitmap *bitmap)
{
xbitmap32_destroy(&bitmap->agbitmap);
}
static inline int xagb_bitmap_clear(struct xagb_bitmap *bitmap,
xfs_agblock_t start, xfs_extlen_t len)
{
return xbitmap32_clear(&bitmap->agbitmap, start, len);
}
static inline int xagb_bitmap_set(struct xagb_bitmap *bitmap,
xfs_agblock_t start, xfs_extlen_t len)
{
return xbitmap32_set(&bitmap->agbitmap, start, len);
}
static inline bool xagb_bitmap_test(struct xagb_bitmap *bitmap,
xfs_agblock_t start, xfs_extlen_t *len)
{
return xbitmap32_test(&bitmap->agbitmap, start, len);
}
static inline int xagb_bitmap_disunion(struct xagb_bitmap *bitmap,
struct xagb_bitmap *sub)
{
return xbitmap32_disunion(&bitmap->agbitmap, &sub->agbitmap);
}
static inline uint32_t xagb_bitmap_hweight(struct xagb_bitmap *bitmap)
{
return xbitmap32_hweight(&bitmap->agbitmap);
}
static inline bool xagb_bitmap_empty(struct xagb_bitmap *bitmap)
{
return xbitmap32_empty(&bitmap->agbitmap);
}
static inline int xagb_bitmap_walk(struct xagb_bitmap *bitmap,
xbitmap32_walk_fn fn, void *priv)
{
return xbitmap32_walk(&bitmap->agbitmap, fn, priv);
}
int xagb_bitmap_set_btblocks(struct xagb_bitmap *bitmap,
struct xfs_btree_cur *cur);
int xagb_bitmap_set_btcur_path(struct xagb_bitmap *bitmap,
struct xfs_btree_cur *cur);
#endif /* __XFS_SCRUB_AGB_BITMAP_H__ */

View File

@ -26,6 +26,7 @@
#include "scrub/trace.h" #include "scrub/trace.h"
#include "scrub/repair.h" #include "scrub/repair.h"
#include "scrub/bitmap.h" #include "scrub/bitmap.h"
#include "scrub/agb_bitmap.h"
#include "scrub/reap.h" #include "scrub/reap.h"
/* Superblock */ /* Superblock */
@ -72,7 +73,7 @@ xrep_superblock(
/* Write this to disk. */ /* Write this to disk. */
xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_SB_BUF); xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_SB_BUF);
xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1); xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1);
return error; return 0;
} }
/* AGF */ /* AGF */
@ -341,7 +342,7 @@ xrep_agf_commit_new(
pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
set_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate); set_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
return 0; return xrep_roll_ag_trans(sc);
} }
/* Repair the AGF. v5 filesystems only. */ /* Repair the AGF. v5 filesystems only. */
@ -494,12 +495,11 @@ xrep_agfl_walk_rmap(
/* Strike out the blocks that are cross-linked according to the rmapbt. */ /* Strike out the blocks that are cross-linked according to the rmapbt. */
STATIC int STATIC int
xrep_agfl_check_extent( xrep_agfl_check_extent(
uint64_t start, uint32_t agbno,
uint64_t len, uint32_t len,
void *priv) void *priv)
{ {
struct xrep_agfl *ra = priv; struct xrep_agfl *ra = priv;
xfs_agblock_t agbno = start;
xfs_agblock_t last_agbno = agbno + len - 1; xfs_agblock_t last_agbno = agbno + len - 1;
int error; int error;
@ -647,8 +647,8 @@ struct xrep_agfl_fill {
/* Fill the AGFL with whatever blocks are in this extent. */ /* Fill the AGFL with whatever blocks are in this extent. */
static int static int
xrep_agfl_fill( xrep_agfl_fill(
uint64_t start, uint32_t start,
uint64_t len, uint32_t len,
void *priv) void *priv)
{ {
struct xrep_agfl_fill *af = priv; struct xrep_agfl_fill *af = priv;
@ -789,6 +789,9 @@ xrep_agfl(
/* Dump any AGFL overflow. */ /* Dump any AGFL overflow. */
error = xrep_reap_agblocks(sc, &agfl_extents, &XFS_RMAP_OINFO_AG, error = xrep_reap_agblocks(sc, &agfl_extents, &XFS_RMAP_OINFO_AG,
XFS_AG_RESV_AGFL); XFS_AG_RESV_AGFL);
if (error)
goto err;
err: err:
xagb_bitmap_destroy(&agfl_extents); xagb_bitmap_destroy(&agfl_extents);
return error; return error;
@ -962,7 +965,7 @@ xrep_agi_commit_new(
pag->pagi_freecount = be32_to_cpu(agi->agi_freecount); pag->pagi_freecount = be32_to_cpu(agi->agi_freecount);
set_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate); set_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate);
return 0; return xrep_roll_ag_trans(sc);
} }
/* Repair the AGI. */ /* Repair the AGI. */

View File

@ -9,13 +9,16 @@
#include "xfs_format.h" #include "xfs_format.h"
#include "xfs_trans_resv.h" #include "xfs_trans_resv.h"
#include "xfs_mount.h" #include "xfs_mount.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_btree.h" #include "xfs_btree.h"
#include "xfs_alloc.h" #include "xfs_alloc.h"
#include "xfs_rmap.h" #include "xfs_rmap.h"
#include "xfs_ag.h"
#include "scrub/scrub.h" #include "scrub/scrub.h"
#include "scrub/common.h" #include "scrub/common.h"
#include "scrub/btree.h" #include "scrub/btree.h"
#include "xfs_ag.h" #include "scrub/repair.h"
/* /*
* Set us up to scrub free space btrees. * Set us up to scrub free space btrees.
@ -24,10 +27,19 @@ int
xchk_setup_ag_allocbt( xchk_setup_ag_allocbt(
struct xfs_scrub *sc) struct xfs_scrub *sc)
{ {
int error;
if (xchk_need_intent_drain(sc)) if (xchk_need_intent_drain(sc))
xchk_fsgates_enable(sc, XCHK_FSGATES_DRAIN); xchk_fsgates_enable(sc, XCHK_FSGATES_DRAIN);
return xchk_setup_ag_btree(sc, false); error = xchk_setup_ag_btree(sc, false);
if (error)
return error;
if (xchk_could_repair(sc))
return xrep_setup_ag_allocbt(sc);
return 0;
} }
/* Free space btree scrubber. */ /* Free space btree scrubber. */
@ -127,7 +139,7 @@ xchk_allocbt_rec(
struct xchk_alloc *ca = bs->private; struct xchk_alloc *ca = bs->private;
xfs_alloc_btrec_to_irec(rec, &irec); xfs_alloc_btrec_to_irec(rec, &irec);
if (xfs_alloc_check_irec(bs->cur, &irec) != NULL) { if (xfs_alloc_check_irec(bs->cur->bc_ag.pag, &irec) != NULL) {
xchk_btree_set_corrupt(bs->sc, bs->cur, 0); xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
return 0; return 0;
} }
@ -138,33 +150,29 @@ xchk_allocbt_rec(
return 0; return 0;
} }
/* Scrub the freespace btrees for some AG. */ /* Scrub one of the freespace btrees for some AG. */
STATIC int int
xchk_allocbt( xchk_allocbt(
struct xfs_scrub *sc, struct xfs_scrub *sc)
xfs_btnum_t which)
{ {
struct xchk_alloc ca = { }; struct xchk_alloc ca = { };
struct xfs_btree_cur *cur; struct xfs_btree_cur *cur;
cur = which == XFS_BTNUM_BNO ? sc->sa.bno_cur : sc->sa.cnt_cur; switch (sc->sm->sm_type) {
case XFS_SCRUB_TYPE_BNOBT:
cur = sc->sa.bno_cur;
break;
case XFS_SCRUB_TYPE_CNTBT:
cur = sc->sa.cnt_cur;
break;
default:
ASSERT(0);
return -EIO;
}
return xchk_btree(sc, cur, xchk_allocbt_rec, &XFS_RMAP_OINFO_AG, &ca); return xchk_btree(sc, cur, xchk_allocbt_rec, &XFS_RMAP_OINFO_AG, &ca);
} }
int
xchk_bnobt(
struct xfs_scrub *sc)
{
return xchk_allocbt(sc, XFS_BTNUM_BNO);
}
int
xchk_cntbt(
struct xfs_scrub *sc)
{
return xchk_allocbt(sc, XFS_BTNUM_CNT);
}
/* xref check that the extent is not free */ /* xref check that the extent is not free */
void void
xchk_xref_is_used_space( xchk_xref_is_used_space(

934
fs/xfs/scrub/alloc_repair.c Normal file
View File

@ -0,0 +1,934 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2018-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_btree.h"
#include "xfs_btree_staging.h"
#include "xfs_bit.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_alloc.h"
#include "xfs_alloc_btree.h"
#include "xfs_rmap.h"
#include "xfs_rmap_btree.h"
#include "xfs_inode.h"
#include "xfs_refcount.h"
#include "xfs_extent_busy.h"
#include "xfs_health.h"
#include "xfs_bmap.h"
#include "xfs_ialloc.h"
#include "xfs_ag.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/btree.h"
#include "scrub/trace.h"
#include "scrub/repair.h"
#include "scrub/bitmap.h"
#include "scrub/agb_bitmap.h"
#include "scrub/xfile.h"
#include "scrub/xfarray.h"
#include "scrub/newbt.h"
#include "scrub/reap.h"
/*
* Free Space Btree Repair
* =======================
*
* The reverse mappings are supposed to record all space usage for the entire
* AG. Therefore, we can recreate the free extent records in an AG by looking
* for gaps in the physical extents recorded in the rmapbt. These records are
* staged in @free_records. Identifying the gaps is more difficult on a
* reflink filesystem because rmap records are allowed to overlap.
*
* Because the final step of building a new index is to free the space used by
* the old index, repair needs to find that space. Unfortunately, all
* structures that live in the free space (bnobt, cntbt, rmapbt, agfl) share
* the same rmapbt owner code (OWN_AG), so this is not straightforward.
*
* The scan of the reverse mapping information records the space used by OWN_AG
* in @old_allocbt_blocks, which (at this stage) is somewhat misnamed. While
* walking the rmapbt records, we create a second bitmap @not_allocbt_blocks to
* record all visited rmap btree blocks and all blocks owned by the AGFL.
*
* After that is where the definitions of old_allocbt_blocks shifts. This
* expression identifies possible former bnobt/cntbt blocks:
*
* (OWN_AG blocks) & ~(rmapbt blocks | agfl blocks);
*
* Substituting from above definitions, that becomes:
*
* old_allocbt_blocks & ~not_allocbt_blocks
*
* The OWN_AG bitmap itself isn't needed after this point, so what we really do
* instead is:
*
* old_allocbt_blocks &= ~not_allocbt_blocks;
*
* After this point, @old_allocbt_blocks is a bitmap of alleged former
* bnobt/cntbt blocks. The xagb_bitmap_disunion operation modifies its first
* parameter in place to avoid copying records around.
*
* Next, some of the space described by @free_records are diverted to the newbt
* reservation and used to format new btree blocks. The remaining records are
* written to the new btree indices. We reconstruct both bnobt and cntbt at
* the same time since we've already done all the work.
*
* We use the prefix 'xrep_abt' here because we regenerate both free space
* allocation btrees at the same time.
*/
struct xrep_abt {
/* Blocks owned by the rmapbt or the agfl. */
struct xagb_bitmap not_allocbt_blocks;
/* All OWN_AG blocks. */
struct xagb_bitmap old_allocbt_blocks;
/*
* New bnobt information. All btree block reservations are added to
* the reservation list in new_bnobt.
*/
struct xrep_newbt new_bnobt;
/* new cntbt information */
struct xrep_newbt new_cntbt;
/* Free space extents. */
struct xfarray *free_records;
struct xfs_scrub *sc;
/* Number of non-null records in @free_records. */
uint64_t nr_real_records;
/* get_records()'s position in the free space record array. */
xfarray_idx_t array_cur;
/*
* Next block we anticipate seeing in the rmap records. If the next
* rmap record is greater than next_agbno, we have found unused space.
*/
xfs_agblock_t next_agbno;
/* Number of free blocks in this AG. */
xfs_agblock_t nr_blocks;
/* Longest free extent we found in the AG. */
xfs_agblock_t longest;
};
/* Set up to repair AG free space btrees. */
int
xrep_setup_ag_allocbt(
struct xfs_scrub *sc)
{
unsigned int busy_gen;
/*
* Make sure the busy extent list is clear because we can't put extents
* on there twice.
*/
busy_gen = READ_ONCE(sc->sa.pag->pagb_gen);
if (xfs_extent_busy_list_empty(sc->sa.pag))
return 0;
return xfs_extent_busy_flush(sc->tp, sc->sa.pag, busy_gen, 0);
}
/* Check for any obvious conflicts in the free extent. */
STATIC int
xrep_abt_check_free_ext(
struct xfs_scrub *sc,
const struct xfs_alloc_rec_incore *rec)
{
enum xbtree_recpacking outcome;
int error;
if (xfs_alloc_check_irec(sc->sa.pag, rec) != NULL)
return -EFSCORRUPTED;
/* Must not be an inode chunk. */
error = xfs_ialloc_has_inodes_at_extent(sc->sa.ino_cur,
rec->ar_startblock, rec->ar_blockcount, &outcome);
if (error)
return error;
if (outcome != XBTREE_RECPACKING_EMPTY)
return -EFSCORRUPTED;
/* Must not be shared or CoW staging. */
if (sc->sa.refc_cur) {
error = xfs_refcount_has_records(sc->sa.refc_cur,
XFS_REFC_DOMAIN_SHARED, rec->ar_startblock,
rec->ar_blockcount, &outcome);
if (error)
return error;
if (outcome != XBTREE_RECPACKING_EMPTY)
return -EFSCORRUPTED;
error = xfs_refcount_has_records(sc->sa.refc_cur,
XFS_REFC_DOMAIN_COW, rec->ar_startblock,
rec->ar_blockcount, &outcome);
if (error)
return error;
if (outcome != XBTREE_RECPACKING_EMPTY)
return -EFSCORRUPTED;
}
return 0;
}
/*
* Stash a free space record for all the space since the last bno we found
* all the way up to @end.
*/
static int
xrep_abt_stash(
struct xrep_abt *ra,
xfs_agblock_t end)
{
struct xfs_alloc_rec_incore arec = {
.ar_startblock = ra->next_agbno,
.ar_blockcount = end - ra->next_agbno,
};
struct xfs_scrub *sc = ra->sc;
int error = 0;
if (xchk_should_terminate(sc, &error))
return error;
error = xrep_abt_check_free_ext(ra->sc, &arec);
if (error)
return error;
trace_xrep_abt_found(sc->mp, sc->sa.pag->pag_agno, &arec);
error = xfarray_append(ra->free_records, &arec);
if (error)
return error;
ra->nr_blocks += arec.ar_blockcount;
return 0;
}
/* Record extents that aren't in use from gaps in the rmap records. */
STATIC int
xrep_abt_walk_rmap(
struct xfs_btree_cur *cur,
const struct xfs_rmap_irec *rec,
void *priv)
{
struct xrep_abt *ra = priv;
int error;
/* Record all the OWN_AG blocks... */
if (rec->rm_owner == XFS_RMAP_OWN_AG) {
error = xagb_bitmap_set(&ra->old_allocbt_blocks,
rec->rm_startblock, rec->rm_blockcount);
if (error)
return error;
}
/* ...and all the rmapbt blocks... */
error = xagb_bitmap_set_btcur_path(&ra->not_allocbt_blocks, cur);
if (error)
return error;
/* ...and all the free space. */
if (rec->rm_startblock > ra->next_agbno) {
error = xrep_abt_stash(ra, rec->rm_startblock);
if (error)
return error;
}
/*
* rmap records can overlap on reflink filesystems, so project
* next_agbno as far out into the AG space as we currently know about.
*/
ra->next_agbno = max_t(xfs_agblock_t, ra->next_agbno,
rec->rm_startblock + rec->rm_blockcount);
return 0;
}
/* Collect an AGFL block for the not-to-release list. */
static int
xrep_abt_walk_agfl(
struct xfs_mount *mp,
xfs_agblock_t agbno,
void *priv)
{
struct xrep_abt *ra = priv;
return xagb_bitmap_set(&ra->not_allocbt_blocks, agbno, 1);
}
/*
* Compare two free space extents by block number. We want to sort in order of
* increasing block number.
*/
static int
xrep_bnobt_extent_cmp(
const void *a,
const void *b)
{
const struct xfs_alloc_rec_incore *ap = a;
const struct xfs_alloc_rec_incore *bp = b;
if (ap->ar_startblock > bp->ar_startblock)
return 1;
else if (ap->ar_startblock < bp->ar_startblock)
return -1;
return 0;
}
/*
* Re-sort the free extents by block number so that we can put the records into
* the bnobt in the correct order. Make sure the records do not overlap in
* physical space.
*/
STATIC int
xrep_bnobt_sort_records(
struct xrep_abt *ra)
{
struct xfs_alloc_rec_incore arec;
xfarray_idx_t cur = XFARRAY_CURSOR_INIT;
xfs_agblock_t next_agbno = 0;
int error;
error = xfarray_sort(ra->free_records, xrep_bnobt_extent_cmp, 0);
if (error)
return error;
while ((error = xfarray_iter(ra->free_records, &cur, &arec)) == 1) {
if (arec.ar_startblock < next_agbno)
return -EFSCORRUPTED;
next_agbno = arec.ar_startblock + arec.ar_blockcount;
}
return error;
}
/*
* Compare two free space extents by length and then block number. We want
* to sort first in order of increasing length and then in order of increasing
* block number.
*/
static int
xrep_cntbt_extent_cmp(
const void *a,
const void *b)
{
const struct xfs_alloc_rec_incore *ap = a;
const struct xfs_alloc_rec_incore *bp = b;
if (ap->ar_blockcount > bp->ar_blockcount)
return 1;
else if (ap->ar_blockcount < bp->ar_blockcount)
return -1;
return xrep_bnobt_extent_cmp(a, b);
}
/*
* Sort the free extents by length so so that we can put the records into the
* cntbt in the correct order. Don't let userspace kill us if we're resorting
* after allocating btree blocks.
*/
STATIC int
xrep_cntbt_sort_records(
struct xrep_abt *ra,
bool is_resort)
{
return xfarray_sort(ra->free_records, xrep_cntbt_extent_cmp,
is_resort ? 0 : XFARRAY_SORT_KILLABLE);
}
/*
* Iterate all reverse mappings to find (1) the gaps between rmap records (all
* unowned space), (2) the OWN_AG extents (which encompass the free space
* btrees, the rmapbt, and the agfl), (3) the rmapbt blocks, and (4) the AGFL
* blocks. The free space is (1) + (2) - (3) - (4).
*/
STATIC int
xrep_abt_find_freespace(
struct xrep_abt *ra)
{
struct xfs_scrub *sc = ra->sc;
struct xfs_mount *mp = sc->mp;
struct xfs_agf *agf = sc->sa.agf_bp->b_addr;
struct xfs_buf *agfl_bp;
xfs_agblock_t agend;
int error;
xagb_bitmap_init(&ra->not_allocbt_blocks);
xrep_ag_btcur_init(sc, &sc->sa);
/*
* Iterate all the reverse mappings to find gaps in the physical
* mappings, all the OWN_AG blocks, and all the rmapbt extents.
*/
error = xfs_rmap_query_all(sc->sa.rmap_cur, xrep_abt_walk_rmap, ra);
if (error)
goto err;
/* Insert a record for space between the last rmap and EOAG. */
agend = be32_to_cpu(agf->agf_length);
if (ra->next_agbno < agend) {
error = xrep_abt_stash(ra, agend);
if (error)
goto err;
}
/* Collect all the AGFL blocks. */
error = xfs_alloc_read_agfl(sc->sa.pag, sc->tp, &agfl_bp);
if (error)
goto err;
error = xfs_agfl_walk(mp, agf, agfl_bp, xrep_abt_walk_agfl, ra);
if (error)
goto err_agfl;
/* Compute the old bnobt/cntbt blocks. */
error = xagb_bitmap_disunion(&ra->old_allocbt_blocks,
&ra->not_allocbt_blocks);
if (error)
goto err_agfl;
ra->nr_real_records = xfarray_length(ra->free_records);
err_agfl:
xfs_trans_brelse(sc->tp, agfl_bp);
err:
xchk_ag_btcur_free(&sc->sa);
xagb_bitmap_destroy(&ra->not_allocbt_blocks);
return error;
}
/*
* We're going to use the observed free space records to reserve blocks for the
* new free space btrees, so we play an iterative game where we try to converge
* on the number of blocks we need:
*
* 1. Estimate how many blocks we'll need to store the records.
* 2. If the first free record has more blocks than we need, we're done.
* We will have to re-sort the records prior to building the cntbt.
* 3. If that record has exactly the number of blocks we need, null out the
* record. We're done.
* 4. Otherwise, we still need more blocks. Null out the record, subtract its
* length from the number of blocks we need, and go back to step 1.
*
* Fortunately, we don't have to do any transaction work to play this game, so
* we don't have to tear down the staging cursors.
*/
STATIC int
xrep_abt_reserve_space(
struct xrep_abt *ra,
struct xfs_btree_cur *bno_cur,
struct xfs_btree_cur *cnt_cur,
bool *needs_resort)
{
struct xfs_scrub *sc = ra->sc;
xfarray_idx_t record_nr;
unsigned int allocated = 0;
int error = 0;
record_nr = xfarray_length(ra->free_records) - 1;
do {
struct xfs_alloc_rec_incore arec;
uint64_t required;
unsigned int desired;
unsigned int len;
/* Compute how many blocks we'll need. */
error = xfs_btree_bload_compute_geometry(cnt_cur,
&ra->new_cntbt.bload, ra->nr_real_records);
if (error)
break;
error = xfs_btree_bload_compute_geometry(bno_cur,
&ra->new_bnobt.bload, ra->nr_real_records);
if (error)
break;
/* How many btree blocks do we need to store all records? */
required = ra->new_bnobt.bload.nr_blocks +
ra->new_cntbt.bload.nr_blocks;
ASSERT(required < INT_MAX);
/* If we've reserved enough blocks, we're done. */
if (allocated >= required)
break;
desired = required - allocated;
/* We need space but there's none left; bye! */
if (ra->nr_real_records == 0) {
error = -ENOSPC;
break;
}
/* Grab the first record from the list. */
error = xfarray_load(ra->free_records, record_nr, &arec);
if (error)
break;
ASSERT(arec.ar_blockcount <= UINT_MAX);
len = min_t(unsigned int, arec.ar_blockcount, desired);
trace_xrep_newbt_alloc_ag_blocks(sc->mp, sc->sa.pag->pag_agno,
arec.ar_startblock, len, XFS_RMAP_OWN_AG);
error = xrep_newbt_add_extent(&ra->new_bnobt, sc->sa.pag,
arec.ar_startblock, len);
if (error)
break;
allocated += len;
ra->nr_blocks -= len;
if (arec.ar_blockcount > desired) {
/*
* Record has more space than we need. The number of
* free records doesn't change, so shrink the free
* record, inform the caller that the records are no
* longer sorted by length, and exit.
*/
arec.ar_startblock += desired;
arec.ar_blockcount -= desired;
error = xfarray_store(ra->free_records, record_nr,
&arec);
if (error)
break;
*needs_resort = true;
return 0;
}
/*
* We're going to use up the entire record, so unset it and
* move on to the next one. This changes the number of free
* records (but doesn't break the sorting order), so we must
* go around the loop once more to re-run _bload_init.
*/
error = xfarray_unset(ra->free_records, record_nr);
if (error)
break;
ra->nr_real_records--;
record_nr--;
} while (1);
return error;
}
STATIC int
xrep_abt_dispose_one(
struct xrep_abt *ra,
struct xrep_newbt_resv *resv)
{
struct xfs_scrub *sc = ra->sc;
struct xfs_perag *pag = sc->sa.pag;
xfs_agblock_t free_agbno = resv->agbno + resv->used;
xfs_extlen_t free_aglen = resv->len - resv->used;
int error;
ASSERT(pag == resv->pag);
/* Add a deferred rmap for each extent we used. */
if (resv->used > 0)
xfs_rmap_alloc_extent(sc->tp, pag->pag_agno, resv->agbno,
resv->used, XFS_RMAP_OWN_AG);
/*
* For each reserved btree block we didn't use, add it to the free
* space btree. We didn't touch fdblocks when we reserved them, so
* we don't touch it now.
*/
if (free_aglen == 0)
return 0;
trace_xrep_newbt_free_blocks(sc->mp, resv->pag->pag_agno, free_agbno,
free_aglen, ra->new_bnobt.oinfo.oi_owner);
error = __xfs_free_extent(sc->tp, resv->pag, free_agbno, free_aglen,
&ra->new_bnobt.oinfo, XFS_AG_RESV_IGNORE, true);
if (error)
return error;
return xrep_defer_finish(sc);
}
/*
* Deal with all the space we reserved. Blocks that were allocated for the
* free space btrees need to have a (deferred) rmap added for the OWN_AG
* allocation, and blocks that didn't get used can be freed via the usual
* (deferred) means.
*/
STATIC void
xrep_abt_dispose_reservations(
struct xrep_abt *ra,
int error)
{
struct xrep_newbt_resv *resv, *n;
if (error)
goto junkit;
list_for_each_entry_safe(resv, n, &ra->new_bnobt.resv_list, list) {
error = xrep_abt_dispose_one(ra, resv);
if (error)
goto junkit;
}
junkit:
list_for_each_entry_safe(resv, n, &ra->new_bnobt.resv_list, list) {
xfs_perag_put(resv->pag);
list_del(&resv->list);
kfree(resv);
}
xrep_newbt_cancel(&ra->new_bnobt);
xrep_newbt_cancel(&ra->new_cntbt);
}
/* Retrieve free space data for bulk load. */
STATIC int
xrep_abt_get_records(
struct xfs_btree_cur *cur,
unsigned int idx,
struct xfs_btree_block *block,
unsigned int nr_wanted,
void *priv)
{
struct xfs_alloc_rec_incore *arec = &cur->bc_rec.a;
struct xrep_abt *ra = priv;
union xfs_btree_rec *block_rec;
unsigned int loaded;
int error;
for (loaded = 0; loaded < nr_wanted; loaded++, idx++) {
error = xfarray_load_next(ra->free_records, &ra->array_cur,
arec);
if (error)
return error;
ra->longest = max(ra->longest, arec->ar_blockcount);
block_rec = xfs_btree_rec_addr(cur, idx, block);
cur->bc_ops->init_rec_from_cur(cur, block_rec);
}
return loaded;
}
/* Feed one of the new btree blocks to the bulk loader. */
STATIC int
xrep_abt_claim_block(
struct xfs_btree_cur *cur,
union xfs_btree_ptr *ptr,
void *priv)
{
struct xrep_abt *ra = priv;
return xrep_newbt_claim_block(cur, &ra->new_bnobt, ptr);
}
/*
* Reset the AGF counters to reflect the free space btrees that we just
* rebuilt, then reinitialize the per-AG data.
*/
STATIC int
xrep_abt_reset_counters(
struct xrep_abt *ra)
{
struct xfs_scrub *sc = ra->sc;
struct xfs_perag *pag = sc->sa.pag;
struct xfs_agf *agf = sc->sa.agf_bp->b_addr;
unsigned int freesp_btreeblks = 0;
/*
* Compute the contribution to agf_btreeblks for the new free space
* btrees. This is the computed btree size minus anything we didn't
* use.
*/
freesp_btreeblks += ra->new_bnobt.bload.nr_blocks - 1;
freesp_btreeblks += ra->new_cntbt.bload.nr_blocks - 1;
freesp_btreeblks -= xrep_newbt_unused_blocks(&ra->new_bnobt);
freesp_btreeblks -= xrep_newbt_unused_blocks(&ra->new_cntbt);
/*
* The AGF header contains extra information related to the free space
* btrees, so we must update those fields here.
*/
agf->agf_btreeblks = cpu_to_be32(freesp_btreeblks +
(be32_to_cpu(agf->agf_rmap_blocks) - 1));
agf->agf_freeblks = cpu_to_be32(ra->nr_blocks);
agf->agf_longest = cpu_to_be32(ra->longest);
xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, XFS_AGF_BTREEBLKS |
XFS_AGF_LONGEST |
XFS_AGF_FREEBLKS);
/*
* After we commit the new btree to disk, it is possible that the
* process to reap the old btree blocks will race with the AIL trying
* to checkpoint the old btree blocks into the filesystem. If the new
* tree is shorter than the old one, the allocbt write verifier will
* fail and the AIL will shut down the filesystem.
*
* To avoid this, save the old incore btree height values as the alt
* height values before re-initializing the perag info from the updated
* AGF to capture all the new values.
*/
pag->pagf_repair_levels[XFS_BTNUM_BNOi] = pag->pagf_levels[XFS_BTNUM_BNOi];
pag->pagf_repair_levels[XFS_BTNUM_CNTi] = pag->pagf_levels[XFS_BTNUM_CNTi];
/* Reinitialize with the values we just logged. */
return xrep_reinit_pagf(sc);
}
/*
* Use the collected free space information to stage new free space btrees.
* If this is successful we'll return with the new btree root
* information logged to the repair transaction but not yet committed.
*/
STATIC int
xrep_abt_build_new_trees(
struct xrep_abt *ra)
{
struct xfs_scrub *sc = ra->sc;
struct xfs_btree_cur *bno_cur;
struct xfs_btree_cur *cnt_cur;
struct xfs_perag *pag = sc->sa.pag;
bool needs_resort = false;
int error;
/*
* Sort the free extents by length so that we can set up the free space
* btrees in as few extents as possible. This reduces the amount of
* deferred rmap / free work we have to do at the end.
*/
error = xrep_cntbt_sort_records(ra, false);
if (error)
return error;
/*
* Prepare to construct the new btree by reserving disk space for the
* new btree and setting up all the accounting information we'll need
* to root the new btree while it's under construction and before we
* attach it to the AG header.
*/
xrep_newbt_init_bare(&ra->new_bnobt, sc);
xrep_newbt_init_bare(&ra->new_cntbt, sc);
ra->new_bnobt.bload.get_records = xrep_abt_get_records;
ra->new_cntbt.bload.get_records = xrep_abt_get_records;
ra->new_bnobt.bload.claim_block = xrep_abt_claim_block;
ra->new_cntbt.bload.claim_block = xrep_abt_claim_block;
/* Allocate cursors for the staged btrees. */
bno_cur = xfs_allocbt_stage_cursor(sc->mp, &ra->new_bnobt.afake,
pag, XFS_BTNUM_BNO);
cnt_cur = xfs_allocbt_stage_cursor(sc->mp, &ra->new_cntbt.afake,
pag, XFS_BTNUM_CNT);
/* Last chance to abort before we start committing fixes. */
if (xchk_should_terminate(sc, &error))
goto err_cur;
/* Reserve the space we'll need for the new btrees. */
error = xrep_abt_reserve_space(ra, bno_cur, cnt_cur, &needs_resort);
if (error)
goto err_cur;
/*
* If we need to re-sort the free extents by length, do so so that we
* can put the records into the cntbt in the correct order.
*/
if (needs_resort) {
error = xrep_cntbt_sort_records(ra, needs_resort);
if (error)
goto err_cur;
}
/*
* Due to btree slack factors, it's possible for a new btree to be one
* level taller than the old btree. Update the alternate incore btree
* height so that we don't trip the verifiers when writing the new
* btree blocks to disk.
*/
pag->pagf_repair_levels[XFS_BTNUM_BNOi] =
ra->new_bnobt.bload.btree_height;
pag->pagf_repair_levels[XFS_BTNUM_CNTi] =
ra->new_cntbt.bload.btree_height;
/* Load the free space by length tree. */
ra->array_cur = XFARRAY_CURSOR_INIT;
ra->longest = 0;
error = xfs_btree_bload(cnt_cur, &ra->new_cntbt.bload, ra);
if (error)
goto err_levels;
error = xrep_bnobt_sort_records(ra);
if (error)
return error;
/* Load the free space by block number tree. */
ra->array_cur = XFARRAY_CURSOR_INIT;
error = xfs_btree_bload(bno_cur, &ra->new_bnobt.bload, ra);
if (error)
goto err_levels;
/*
* Install the new btrees in the AG header. After this point the old
* btrees are no longer accessible and the new trees are live.
*/
xfs_allocbt_commit_staged_btree(bno_cur, sc->tp, sc->sa.agf_bp);
xfs_btree_del_cursor(bno_cur, 0);
xfs_allocbt_commit_staged_btree(cnt_cur, sc->tp, sc->sa.agf_bp);
xfs_btree_del_cursor(cnt_cur, 0);
/* Reset the AGF counters now that we've changed the btree shape. */
error = xrep_abt_reset_counters(ra);
if (error)
goto err_newbt;
/* Dispose of any unused blocks and the accounting information. */
xrep_abt_dispose_reservations(ra, error);
return xrep_roll_ag_trans(sc);
err_levels:
pag->pagf_repair_levels[XFS_BTNUM_BNOi] = 0;
pag->pagf_repair_levels[XFS_BTNUM_CNTi] = 0;
err_cur:
xfs_btree_del_cursor(cnt_cur, error);
xfs_btree_del_cursor(bno_cur, error);
err_newbt:
xrep_abt_dispose_reservations(ra, error);
return error;
}
/*
* Now that we've logged the roots of the new btrees, invalidate all of the
* old blocks and free them.
*/
STATIC int
xrep_abt_remove_old_trees(
struct xrep_abt *ra)
{
struct xfs_perag *pag = ra->sc->sa.pag;
int error;
/* Free the old btree blocks if they're not in use. */
error = xrep_reap_agblocks(ra->sc, &ra->old_allocbt_blocks,
&XFS_RMAP_OINFO_AG, XFS_AG_RESV_IGNORE);
if (error)
return error;
/*
* Now that we've zapped all the old allocbt blocks we can turn off
* the alternate height mechanism.
*/
pag->pagf_repair_levels[XFS_BTNUM_BNOi] = 0;
pag->pagf_repair_levels[XFS_BTNUM_CNTi] = 0;
return 0;
}
/* Repair the freespace btrees for some AG. */
int
xrep_allocbt(
struct xfs_scrub *sc)
{
struct xrep_abt *ra;
struct xfs_mount *mp = sc->mp;
char *descr;
int error;
/* We require the rmapbt to rebuild anything. */
if (!xfs_has_rmapbt(mp))
return -EOPNOTSUPP;
ra = kzalloc(sizeof(struct xrep_abt), XCHK_GFP_FLAGS);
if (!ra)
return -ENOMEM;
ra->sc = sc;
/* We rebuild both data structures. */
sc->sick_mask = XFS_SICK_AG_BNOBT | XFS_SICK_AG_CNTBT;
/*
* Make sure the busy extent list is clear because we can't put extents
* on there twice. In theory we cleared this before we started, but
* let's not risk the filesystem.
*/
if (!xfs_extent_busy_list_empty(sc->sa.pag)) {
error = -EDEADLOCK;
goto out_ra;
}
/* Set up enough storage to handle maximally fragmented free space. */
descr = xchk_xfile_ag_descr(sc, "free space records");
error = xfarray_create(descr, mp->m_sb.sb_agblocks / 2,
sizeof(struct xfs_alloc_rec_incore),
&ra->free_records);
kfree(descr);
if (error)
goto out_ra;
/* Collect the free space data and find the old btree blocks. */
xagb_bitmap_init(&ra->old_allocbt_blocks);
error = xrep_abt_find_freespace(ra);
if (error)
goto out_bitmap;
/* Rebuild the free space information. */
error = xrep_abt_build_new_trees(ra);
if (error)
goto out_bitmap;
/* Kill the old trees. */
error = xrep_abt_remove_old_trees(ra);
if (error)
goto out_bitmap;
out_bitmap:
xagb_bitmap_destroy(&ra->old_allocbt_blocks);
xfarray_destroy(ra->free_records);
out_ra:
kfree(ra);
return error;
}
/* Make sure both btrees are ok after we've rebuilt them. */
int
xrep_revalidate_allocbt(
struct xfs_scrub *sc)
{
__u32 old_type = sc->sm->sm_type;
int error;
/*
* We must update sm_type temporarily so that the tree-to-tree cross
* reference checks will work in the correct direction, and also so
* that tracing will report correctly if there are more errors.
*/
sc->sm->sm_type = XFS_SCRUB_TYPE_BNOBT;
error = xchk_allocbt(sc);
if (error)
goto out;
sc->sm->sm_type = XFS_SCRUB_TYPE_CNTBT;
error = xchk_allocbt(sc);
out:
sc->sm->sm_type = old_type;
return error;
}

View File

@ -527,28 +527,23 @@ xchk_xattr_check_sf(
struct xfs_scrub *sc) struct xfs_scrub *sc)
{ {
struct xchk_xattr_buf *ab = sc->buf; struct xchk_xattr_buf *ab = sc->buf;
struct xfs_attr_shortform *sf; struct xfs_ifork *ifp = &sc->ip->i_af;
struct xfs_attr_sf_entry *sfe; struct xfs_attr_sf_hdr *sf = ifp->if_data;
struct xfs_attr_sf_entry *sfe = xfs_attr_sf_firstentry(sf);
struct xfs_attr_sf_entry *next; struct xfs_attr_sf_entry *next;
struct xfs_ifork *ifp; unsigned char *end = ifp->if_data + ifp->if_bytes;
unsigned char *end;
int i; int i;
int error = 0; int error = 0;
ifp = xfs_ifork_ptr(sc->ip, XFS_ATTR_FORK);
bitmap_zero(ab->usedmap, ifp->if_bytes); bitmap_zero(ab->usedmap, ifp->if_bytes);
sf = (struct xfs_attr_shortform *)sc->ip->i_af.if_u1.if_data; xchk_xattr_set_map(sc, ab->usedmap, 0, sizeof(*sf));
end = (unsigned char *)ifp->if_u1.if_data + ifp->if_bytes;
xchk_xattr_set_map(sc, ab->usedmap, 0, sizeof(sf->hdr));
sfe = &sf->list[0];
if ((unsigned char *)sfe > end) { if ((unsigned char *)sfe > end) {
xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0); xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
return 0; return 0;
} }
for (i = 0; i < sf->hdr.count; i++) { for (i = 0; i < sf->count; i++) {
unsigned char *name = sfe->nameval; unsigned char *name = sfe->nameval;
unsigned char *value = &sfe->nameval[sfe->namelen]; unsigned char *value = &sfe->nameval[sfe->namelen];

View File

@ -16,7 +16,9 @@
#include <linux/interval_tree_generic.h> #include <linux/interval_tree_generic.h>
struct xbitmap_node { /* u64 bitmap */
struct xbitmap64_node {
struct rb_node bn_rbnode; struct rb_node bn_rbnode;
/* First set bit of this interval and subtree. */ /* First set bit of this interval and subtree. */
@ -39,72 +41,72 @@ struct xbitmap_node {
* forward-declare them anyway for clarity. * forward-declare them anyway for clarity.
*/ */
static inline void static inline void
xbitmap_tree_insert(struct xbitmap_node *node, struct rb_root_cached *root); xbitmap64_tree_insert(struct xbitmap64_node *node, struct rb_root_cached *root);
static inline void static inline void
xbitmap_tree_remove(struct xbitmap_node *node, struct rb_root_cached *root); xbitmap64_tree_remove(struct xbitmap64_node *node, struct rb_root_cached *root);
static inline struct xbitmap_node * static inline struct xbitmap64_node *
xbitmap_tree_iter_first(struct rb_root_cached *root, uint64_t start, xbitmap64_tree_iter_first(struct rb_root_cached *root, uint64_t start,
uint64_t last); uint64_t last);
static inline struct xbitmap_node * static inline struct xbitmap64_node *
xbitmap_tree_iter_next(struct xbitmap_node *node, uint64_t start, xbitmap64_tree_iter_next(struct xbitmap64_node *node, uint64_t start,
uint64_t last); uint64_t last);
INTERVAL_TREE_DEFINE(struct xbitmap_node, bn_rbnode, uint64_t, INTERVAL_TREE_DEFINE(struct xbitmap64_node, bn_rbnode, uint64_t,
__bn_subtree_last, START, LAST, static inline, xbitmap_tree) __bn_subtree_last, START, LAST, static inline, xbitmap64_tree)
/* Iterate each interval of a bitmap. Do not change the bitmap. */ /* Iterate each interval of a bitmap. Do not change the bitmap. */
#define for_each_xbitmap_extent(bn, bitmap) \ #define for_each_xbitmap64_extent(bn, bitmap) \
for ((bn) = rb_entry_safe(rb_first(&(bitmap)->xb_root.rb_root), \ for ((bn) = rb_entry_safe(rb_first(&(bitmap)->xb_root.rb_root), \
struct xbitmap_node, bn_rbnode); \ struct xbitmap64_node, bn_rbnode); \
(bn) != NULL; \ (bn) != NULL; \
(bn) = rb_entry_safe(rb_next(&(bn)->bn_rbnode), \ (bn) = rb_entry_safe(rb_next(&(bn)->bn_rbnode), \
struct xbitmap_node, bn_rbnode)) struct xbitmap64_node, bn_rbnode))
/* Clear a range of this bitmap. */ /* Clear a range of this bitmap. */
int int
xbitmap_clear( xbitmap64_clear(
struct xbitmap *bitmap, struct xbitmap64 *bitmap,
uint64_t start, uint64_t start,
uint64_t len) uint64_t len)
{ {
struct xbitmap_node *bn; struct xbitmap64_node *bn;
struct xbitmap_node *new_bn; struct xbitmap64_node *new_bn;
uint64_t last = start + len - 1; uint64_t last = start + len - 1;
while ((bn = xbitmap_tree_iter_first(&bitmap->xb_root, start, last))) { while ((bn = xbitmap64_tree_iter_first(&bitmap->xb_root, start, last))) {
if (bn->bn_start < start && bn->bn_last > last) { if (bn->bn_start < start && bn->bn_last > last) {
uint64_t old_last = bn->bn_last; uint64_t old_last = bn->bn_last;
/* overlaps with the entire clearing range */ /* overlaps with the entire clearing range */
xbitmap_tree_remove(bn, &bitmap->xb_root); xbitmap64_tree_remove(bn, &bitmap->xb_root);
bn->bn_last = start - 1; bn->bn_last = start - 1;
xbitmap_tree_insert(bn, &bitmap->xb_root); xbitmap64_tree_insert(bn, &bitmap->xb_root);
/* add an extent */ /* add an extent */
new_bn = kmalloc(sizeof(struct xbitmap_node), new_bn = kmalloc(sizeof(struct xbitmap64_node),
XCHK_GFP_FLAGS); XCHK_GFP_FLAGS);
if (!new_bn) if (!new_bn)
return -ENOMEM; return -ENOMEM;
new_bn->bn_start = last + 1; new_bn->bn_start = last + 1;
new_bn->bn_last = old_last; new_bn->bn_last = old_last;
xbitmap_tree_insert(new_bn, &bitmap->xb_root); xbitmap64_tree_insert(new_bn, &bitmap->xb_root);
} else if (bn->bn_start < start) { } else if (bn->bn_start < start) {
/* overlaps with the left side of the clearing range */ /* overlaps with the left side of the clearing range */
xbitmap_tree_remove(bn, &bitmap->xb_root); xbitmap64_tree_remove(bn, &bitmap->xb_root);
bn->bn_last = start - 1; bn->bn_last = start - 1;
xbitmap_tree_insert(bn, &bitmap->xb_root); xbitmap64_tree_insert(bn, &bitmap->xb_root);
} else if (bn->bn_last > last) { } else if (bn->bn_last > last) {
/* overlaps with the right side of the clearing range */ /* overlaps with the right side of the clearing range */
xbitmap_tree_remove(bn, &bitmap->xb_root); xbitmap64_tree_remove(bn, &bitmap->xb_root);
bn->bn_start = last + 1; bn->bn_start = last + 1;
xbitmap_tree_insert(bn, &bitmap->xb_root); xbitmap64_tree_insert(bn, &bitmap->xb_root);
break; break;
} else { } else {
/* in the middle of the clearing range */ /* in the middle of the clearing range */
xbitmap_tree_remove(bn, &bitmap->xb_root); xbitmap64_tree_remove(bn, &bitmap->xb_root);
kfree(bn); kfree(bn);
} }
} }
@ -114,59 +116,59 @@ xbitmap_clear(
/* Set a range of this bitmap. */ /* Set a range of this bitmap. */
int int
xbitmap_set( xbitmap64_set(
struct xbitmap *bitmap, struct xbitmap64 *bitmap,
uint64_t start, uint64_t start,
uint64_t len) uint64_t len)
{ {
struct xbitmap_node *left; struct xbitmap64_node *left;
struct xbitmap_node *right; struct xbitmap64_node *right;
uint64_t last = start + len - 1; uint64_t last = start + len - 1;
int error; int error;
/* Is this whole range already set? */ /* Is this whole range already set? */
left = xbitmap_tree_iter_first(&bitmap->xb_root, start, last); left = xbitmap64_tree_iter_first(&bitmap->xb_root, start, last);
if (left && left->bn_start <= start && left->bn_last >= last) if (left && left->bn_start <= start && left->bn_last >= last)
return 0; return 0;
/* Clear out everything in the range we want to set. */ /* Clear out everything in the range we want to set. */
error = xbitmap_clear(bitmap, start, len); error = xbitmap64_clear(bitmap, start, len);
if (error) if (error)
return error; return error;
/* Do we have a left-adjacent extent? */ /* Do we have a left-adjacent extent? */
left = xbitmap_tree_iter_first(&bitmap->xb_root, start - 1, start - 1); left = xbitmap64_tree_iter_first(&bitmap->xb_root, start - 1, start - 1);
ASSERT(!left || left->bn_last + 1 == start); ASSERT(!left || left->bn_last + 1 == start);
/* Do we have a right-adjacent extent? */ /* Do we have a right-adjacent extent? */
right = xbitmap_tree_iter_first(&bitmap->xb_root, last + 1, last + 1); right = xbitmap64_tree_iter_first(&bitmap->xb_root, last + 1, last + 1);
ASSERT(!right || right->bn_start == last + 1); ASSERT(!right || right->bn_start == last + 1);
if (left && right) { if (left && right) {
/* combine left and right adjacent extent */ /* combine left and right adjacent extent */
xbitmap_tree_remove(left, &bitmap->xb_root); xbitmap64_tree_remove(left, &bitmap->xb_root);
xbitmap_tree_remove(right, &bitmap->xb_root); xbitmap64_tree_remove(right, &bitmap->xb_root);
left->bn_last = right->bn_last; left->bn_last = right->bn_last;
xbitmap_tree_insert(left, &bitmap->xb_root); xbitmap64_tree_insert(left, &bitmap->xb_root);
kfree(right); kfree(right);
} else if (left) { } else if (left) {
/* combine with left extent */ /* combine with left extent */
xbitmap_tree_remove(left, &bitmap->xb_root); xbitmap64_tree_remove(left, &bitmap->xb_root);
left->bn_last = last; left->bn_last = last;
xbitmap_tree_insert(left, &bitmap->xb_root); xbitmap64_tree_insert(left, &bitmap->xb_root);
} else if (right) { } else if (right) {
/* combine with right extent */ /* combine with right extent */
xbitmap_tree_remove(right, &bitmap->xb_root); xbitmap64_tree_remove(right, &bitmap->xb_root);
right->bn_start = start; right->bn_start = start;
xbitmap_tree_insert(right, &bitmap->xb_root); xbitmap64_tree_insert(right, &bitmap->xb_root);
} else { } else {
/* add an extent */ /* add an extent */
left = kmalloc(sizeof(struct xbitmap_node), XCHK_GFP_FLAGS); left = kmalloc(sizeof(struct xbitmap64_node), XCHK_GFP_FLAGS);
if (!left) if (!left)
return -ENOMEM; return -ENOMEM;
left->bn_start = start; left->bn_start = start;
left->bn_last = last; left->bn_last = last;
xbitmap_tree_insert(left, &bitmap->xb_root); xbitmap64_tree_insert(left, &bitmap->xb_root);
} }
return 0; return 0;
@ -174,21 +176,21 @@ xbitmap_set(
/* Free everything related to this bitmap. */ /* Free everything related to this bitmap. */
void void
xbitmap_destroy( xbitmap64_destroy(
struct xbitmap *bitmap) struct xbitmap64 *bitmap)
{ {
struct xbitmap_node *bn; struct xbitmap64_node *bn;
while ((bn = xbitmap_tree_iter_first(&bitmap->xb_root, 0, -1ULL))) { while ((bn = xbitmap64_tree_iter_first(&bitmap->xb_root, 0, -1ULL))) {
xbitmap_tree_remove(bn, &bitmap->xb_root); xbitmap64_tree_remove(bn, &bitmap->xb_root);
kfree(bn); kfree(bn);
} }
} }
/* Set up a per-AG block bitmap. */ /* Set up a per-AG block bitmap. */
void void
xbitmap_init( xbitmap64_init(
struct xbitmap *bitmap) struct xbitmap64 *bitmap)
{ {
bitmap->xb_root = RB_ROOT_CACHED; bitmap->xb_root = RB_ROOT_CACHED;
} }
@ -208,18 +210,18 @@ xbitmap_init(
* This is the logical equivalent of bitmap &= ~sub. * This is the logical equivalent of bitmap &= ~sub.
*/ */
int int
xbitmap_disunion( xbitmap64_disunion(
struct xbitmap *bitmap, struct xbitmap64 *bitmap,
struct xbitmap *sub) struct xbitmap64 *sub)
{ {
struct xbitmap_node *bn; struct xbitmap64_node *bn;
int error; int error;
if (xbitmap_empty(bitmap) || xbitmap_empty(sub)) if (xbitmap64_empty(bitmap) || xbitmap64_empty(sub))
return 0; return 0;
for_each_xbitmap_extent(bn, sub) { for_each_xbitmap64_extent(bn, sub) {
error = xbitmap_clear(bitmap, bn->bn_start, error = xbitmap64_clear(bitmap, bn->bn_start,
bn->bn_last - bn->bn_start + 1); bn->bn_last - bn->bn_start + 1);
if (error) if (error)
return error; return error;
@ -228,104 +230,15 @@ xbitmap_disunion(
return 0; return 0;
} }
/*
* Record all btree blocks seen while iterating all records of a btree.
*
* We know that the btree query_all function starts at the left edge and walks
* towards the right edge of the tree. Therefore, we know that we can walk up
* the btree cursor towards the root; if the pointer for a given level points
* to the first record/key in that block, we haven't seen this block before;
* and therefore we need to remember that we saw this block in the btree.
*
* So if our btree is:
*
* 4
* / | \
* 1 2 3
*
* Pretend for this example that each leaf block has 100 btree records. For
* the first btree record, we'll observe that bc_levels[0].ptr == 1, so we
* record that we saw block 1. Then we observe that bc_levels[1].ptr == 1, so
* we record block 4. The list is [1, 4].
*
* For the second btree record, we see that bc_levels[0].ptr == 2, so we exit
* the loop. The list remains [1, 4].
*
* For the 101st btree record, we've moved onto leaf block 2. Now
* bc_levels[0].ptr == 1 again, so we record that we saw block 2. We see that
* bc_levels[1].ptr == 2, so we exit the loop. The list is now [1, 4, 2].
*
* For the 102nd record, bc_levels[0].ptr == 2, so we continue.
*
* For the 201st record, we've moved on to leaf block 3.
* bc_levels[0].ptr == 1, so we add 3 to the list. Now it is [1, 4, 2, 3].
*
* For the 300th record we just exit, with the list being [1, 4, 2, 3].
*/
/* Mark a btree block to the agblock bitmap. */
STATIC int
xagb_bitmap_visit_btblock(
struct xfs_btree_cur *cur,
int level,
void *priv)
{
struct xagb_bitmap *bitmap = priv;
struct xfs_buf *bp;
xfs_fsblock_t fsbno;
xfs_agblock_t agbno;
xfs_btree_get_block(cur, level, &bp);
if (!bp)
return 0;
fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp));
agbno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno);
return xagb_bitmap_set(bitmap, agbno, 1);
}
/* Mark all (per-AG) btree blocks in the agblock bitmap. */
int
xagb_bitmap_set_btblocks(
struct xagb_bitmap *bitmap,
struct xfs_btree_cur *cur)
{
return xfs_btree_visit_blocks(cur, xagb_bitmap_visit_btblock,
XFS_BTREE_VISIT_ALL, bitmap);
}
/*
* Record all the buffers pointed to by the btree cursor. Callers already
* engaged in a btree walk should call this function to capture the list of
* blocks going from the leaf towards the root.
*/
int
xagb_bitmap_set_btcur_path(
struct xagb_bitmap *bitmap,
struct xfs_btree_cur *cur)
{
int i;
int error;
for (i = 0; i < cur->bc_nlevels && cur->bc_levels[i].ptr == 1; i++) {
error = xagb_bitmap_visit_btblock(cur, i, bitmap);
if (error)
return error;
}
return 0;
}
/* How many bits are set in this bitmap? */ /* How many bits are set in this bitmap? */
uint64_t uint64_t
xbitmap_hweight( xbitmap64_hweight(
struct xbitmap *bitmap) struct xbitmap64 *bitmap)
{ {
struct xbitmap_node *bn; struct xbitmap64_node *bn;
uint64_t ret = 0; uint64_t ret = 0;
for_each_xbitmap_extent(bn, bitmap) for_each_xbitmap64_extent(bn, bitmap)
ret += bn->bn_last - bn->bn_start + 1; ret += bn->bn_last - bn->bn_start + 1;
return ret; return ret;
@ -333,15 +246,15 @@ xbitmap_hweight(
/* Call a function for every run of set bits in this bitmap. */ /* Call a function for every run of set bits in this bitmap. */
int int
xbitmap_walk( xbitmap64_walk(
struct xbitmap *bitmap, struct xbitmap64 *bitmap,
xbitmap_walk_fn fn, xbitmap64_walk_fn fn,
void *priv) void *priv)
{ {
struct xbitmap_node *bn; struct xbitmap64_node *bn;
int error = 0; int error = 0;
for_each_xbitmap_extent(bn, bitmap) { for_each_xbitmap64_extent(bn, bitmap) {
error = fn(bn->bn_start, bn->bn_last - bn->bn_start + 1, priv); error = fn(bn->bn_start, bn->bn_last - bn->bn_start + 1, priv);
if (error) if (error)
break; break;
@ -352,23 +265,297 @@ xbitmap_walk(
/* Does this bitmap have no bits set at all? */ /* Does this bitmap have no bits set at all? */
bool bool
xbitmap_empty( xbitmap64_empty(
struct xbitmap *bitmap) struct xbitmap64 *bitmap)
{ {
return bitmap->xb_root.rb_root.rb_node == NULL; return bitmap->xb_root.rb_root.rb_node == NULL;
} }
/* Is the start of the range set or clear? And for how long? */ /* Is the start of the range set or clear? And for how long? */
bool bool
xbitmap_test( xbitmap64_test(
struct xbitmap *bitmap, struct xbitmap64 *bitmap,
uint64_t start, uint64_t start,
uint64_t *len) uint64_t *len)
{ {
struct xbitmap_node *bn; struct xbitmap64_node *bn;
uint64_t last = start + *len - 1; uint64_t last = start + *len - 1;
bn = xbitmap_tree_iter_first(&bitmap->xb_root, start, last); bn = xbitmap64_tree_iter_first(&bitmap->xb_root, start, last);
if (!bn)
return false;
if (bn->bn_start <= start) {
if (bn->bn_last < last)
*len = bn->bn_last - start + 1;
return true;
}
*len = bn->bn_start - start;
return false;
}
/* u32 bitmap */
struct xbitmap32_node {
struct rb_node bn_rbnode;
/* First set bit of this interval and subtree. */
uint32_t bn_start;
/* Last set bit of this interval. */
uint32_t bn_last;
/* Last set bit of this subtree. Do not touch this. */
uint32_t __bn_subtree_last;
};
/* Define our own interval tree type with uint32_t parameters. */
/*
* These functions are defined by the INTERVAL_TREE_DEFINE macro, but we'll
* forward-declare them anyway for clarity.
*/
static inline void
xbitmap32_tree_insert(struct xbitmap32_node *node, struct rb_root_cached *root);
static inline void
xbitmap32_tree_remove(struct xbitmap32_node *node, struct rb_root_cached *root);
static inline struct xbitmap32_node *
xbitmap32_tree_iter_first(struct rb_root_cached *root, uint32_t start,
uint32_t last);
static inline struct xbitmap32_node *
xbitmap32_tree_iter_next(struct xbitmap32_node *node, uint32_t start,
uint32_t last);
INTERVAL_TREE_DEFINE(struct xbitmap32_node, bn_rbnode, uint32_t,
__bn_subtree_last, START, LAST, static inline, xbitmap32_tree)
/* Iterate each interval of a bitmap. Do not change the bitmap. */
#define for_each_xbitmap32_extent(bn, bitmap) \
for ((bn) = rb_entry_safe(rb_first(&(bitmap)->xb_root.rb_root), \
struct xbitmap32_node, bn_rbnode); \
(bn) != NULL; \
(bn) = rb_entry_safe(rb_next(&(bn)->bn_rbnode), \
struct xbitmap32_node, bn_rbnode))
/* Clear a range of this bitmap. */
int
xbitmap32_clear(
struct xbitmap32 *bitmap,
uint32_t start,
uint32_t len)
{
struct xbitmap32_node *bn;
struct xbitmap32_node *new_bn;
uint32_t last = start + len - 1;
while ((bn = xbitmap32_tree_iter_first(&bitmap->xb_root, start, last))) {
if (bn->bn_start < start && bn->bn_last > last) {
uint32_t old_last = bn->bn_last;
/* overlaps with the entire clearing range */
xbitmap32_tree_remove(bn, &bitmap->xb_root);
bn->bn_last = start - 1;
xbitmap32_tree_insert(bn, &bitmap->xb_root);
/* add an extent */
new_bn = kmalloc(sizeof(struct xbitmap32_node),
XCHK_GFP_FLAGS);
if (!new_bn)
return -ENOMEM;
new_bn->bn_start = last + 1;
new_bn->bn_last = old_last;
xbitmap32_tree_insert(new_bn, &bitmap->xb_root);
} else if (bn->bn_start < start) {
/* overlaps with the left side of the clearing range */
xbitmap32_tree_remove(bn, &bitmap->xb_root);
bn->bn_last = start - 1;
xbitmap32_tree_insert(bn, &bitmap->xb_root);
} else if (bn->bn_last > last) {
/* overlaps with the right side of the clearing range */
xbitmap32_tree_remove(bn, &bitmap->xb_root);
bn->bn_start = last + 1;
xbitmap32_tree_insert(bn, &bitmap->xb_root);
break;
} else {
/* in the middle of the clearing range */
xbitmap32_tree_remove(bn, &bitmap->xb_root);
kfree(bn);
}
}
return 0;
}
/* Set a range of this bitmap. */
int
xbitmap32_set(
struct xbitmap32 *bitmap,
uint32_t start,
uint32_t len)
{
struct xbitmap32_node *left;
struct xbitmap32_node *right;
uint32_t last = start + len - 1;
int error;
/* Is this whole range already set? */
left = xbitmap32_tree_iter_first(&bitmap->xb_root, start, last);
if (left && left->bn_start <= start && left->bn_last >= last)
return 0;
/* Clear out everything in the range we want to set. */
error = xbitmap32_clear(bitmap, start, len);
if (error)
return error;
/* Do we have a left-adjacent extent? */
left = xbitmap32_tree_iter_first(&bitmap->xb_root, start - 1, start - 1);
ASSERT(!left || left->bn_last + 1 == start);
/* Do we have a right-adjacent extent? */
right = xbitmap32_tree_iter_first(&bitmap->xb_root, last + 1, last + 1);
ASSERT(!right || right->bn_start == last + 1);
if (left && right) {
/* combine left and right adjacent extent */
xbitmap32_tree_remove(left, &bitmap->xb_root);
xbitmap32_tree_remove(right, &bitmap->xb_root);
left->bn_last = right->bn_last;
xbitmap32_tree_insert(left, &bitmap->xb_root);
kfree(right);
} else if (left) {
/* combine with left extent */
xbitmap32_tree_remove(left, &bitmap->xb_root);
left->bn_last = last;
xbitmap32_tree_insert(left, &bitmap->xb_root);
} else if (right) {
/* combine with right extent */
xbitmap32_tree_remove(right, &bitmap->xb_root);
right->bn_start = start;
xbitmap32_tree_insert(right, &bitmap->xb_root);
} else {
/* add an extent */
left = kmalloc(sizeof(struct xbitmap32_node), XCHK_GFP_FLAGS);
if (!left)
return -ENOMEM;
left->bn_start = start;
left->bn_last = last;
xbitmap32_tree_insert(left, &bitmap->xb_root);
}
return 0;
}
/* Free everything related to this bitmap. */
void
xbitmap32_destroy(
struct xbitmap32 *bitmap)
{
struct xbitmap32_node *bn;
while ((bn = xbitmap32_tree_iter_first(&bitmap->xb_root, 0, -1U))) {
xbitmap32_tree_remove(bn, &bitmap->xb_root);
kfree(bn);
}
}
/* Set up a per-AG block bitmap. */
void
xbitmap32_init(
struct xbitmap32 *bitmap)
{
bitmap->xb_root = RB_ROOT_CACHED;
}
/*
* Remove all the blocks mentioned in @sub from the extents in @bitmap.
*
* The intent is that callers will iterate the rmapbt for all of its records
* for a given owner to generate @bitmap; and iterate all the blocks of the
* metadata structures that are not being rebuilt and have the same rmapbt
* owner to generate @sub. This routine subtracts all the extents
* mentioned in sub from all the extents linked in @bitmap, which leaves
* @bitmap as the list of blocks that are not accounted for, which we assume
* are the dead blocks of the old metadata structure. The blocks mentioned in
* @bitmap can be reaped.
*
* This is the logical equivalent of bitmap &= ~sub.
*/
int
xbitmap32_disunion(
struct xbitmap32 *bitmap,
struct xbitmap32 *sub)
{
struct xbitmap32_node *bn;
int error;
if (xbitmap32_empty(bitmap) || xbitmap32_empty(sub))
return 0;
for_each_xbitmap32_extent(bn, sub) {
error = xbitmap32_clear(bitmap, bn->bn_start,
bn->bn_last - bn->bn_start + 1);
if (error)
return error;
}
return 0;
}
/* How many bits are set in this bitmap? */
uint32_t
xbitmap32_hweight(
struct xbitmap32 *bitmap)
{
struct xbitmap32_node *bn;
uint32_t ret = 0;
for_each_xbitmap32_extent(bn, bitmap)
ret += bn->bn_last - bn->bn_start + 1;
return ret;
}
/* Call a function for every run of set bits in this bitmap. */
int
xbitmap32_walk(
struct xbitmap32 *bitmap,
xbitmap32_walk_fn fn,
void *priv)
{
struct xbitmap32_node *bn;
int error = 0;
for_each_xbitmap32_extent(bn, bitmap) {
error = fn(bn->bn_start, bn->bn_last - bn->bn_start + 1, priv);
if (error)
break;
}
return error;
}
/* Does this bitmap have no bits set at all? */
bool
xbitmap32_empty(
struct xbitmap32 *bitmap)
{
return bitmap->xb_root.rb_root.rb_node == NULL;
}
/* Is the start of the range set or clear? And for how long? */
bool
xbitmap32_test(
struct xbitmap32 *bitmap,
uint32_t start,
uint32_t *len)
{
struct xbitmap32_node *bn;
uint32_t last = start + *len - 1;
bn = xbitmap32_tree_iter_first(&bitmap->xb_root, start, last);
if (!bn) if (!bn)
return false; return false;
if (bn->bn_start <= start) { if (bn->bn_start <= start) {

View File

@ -6,17 +6,19 @@
#ifndef __XFS_SCRUB_BITMAP_H__ #ifndef __XFS_SCRUB_BITMAP_H__
#define __XFS_SCRUB_BITMAP_H__ #define __XFS_SCRUB_BITMAP_H__
struct xbitmap { /* u64 bitmap */
struct xbitmap64 {
struct rb_root_cached xb_root; struct rb_root_cached xb_root;
}; };
void xbitmap_init(struct xbitmap *bitmap); void xbitmap64_init(struct xbitmap64 *bitmap);
void xbitmap_destroy(struct xbitmap *bitmap); void xbitmap64_destroy(struct xbitmap64 *bitmap);
int xbitmap_clear(struct xbitmap *bitmap, uint64_t start, uint64_t len); int xbitmap64_clear(struct xbitmap64 *bitmap, uint64_t start, uint64_t len);
int xbitmap_set(struct xbitmap *bitmap, uint64_t start, uint64_t len); int xbitmap64_set(struct xbitmap64 *bitmap, uint64_t start, uint64_t len);
int xbitmap_disunion(struct xbitmap *bitmap, struct xbitmap *sub); int xbitmap64_disunion(struct xbitmap64 *bitmap, struct xbitmap64 *sub);
uint64_t xbitmap_hweight(struct xbitmap *bitmap); uint64_t xbitmap64_hweight(struct xbitmap64 *bitmap);
/* /*
* Return codes for the bitmap iterator functions are 0 to continue iterating, * Return codes for the bitmap iterator functions are 0 to continue iterating,
@ -25,84 +27,39 @@ uint64_t xbitmap_hweight(struct xbitmap *bitmap);
* iteration, because neither bitmap iterator ever generates that error code on * iteration, because neither bitmap iterator ever generates that error code on
* its own. Callers must not modify the bitmap while walking it. * its own. Callers must not modify the bitmap while walking it.
*/ */
typedef int (*xbitmap_walk_fn)(uint64_t start, uint64_t len, void *priv); typedef int (*xbitmap64_walk_fn)(uint64_t start, uint64_t len, void *priv);
int xbitmap_walk(struct xbitmap *bitmap, xbitmap_walk_fn fn, int xbitmap64_walk(struct xbitmap64 *bitmap, xbitmap64_walk_fn fn,
void *priv); void *priv);
bool xbitmap_empty(struct xbitmap *bitmap); bool xbitmap64_empty(struct xbitmap64 *bitmap);
bool xbitmap_test(struct xbitmap *bitmap, uint64_t start, uint64_t *len); bool xbitmap64_test(struct xbitmap64 *bitmap, uint64_t start, uint64_t *len);
/* Bitmaps, but for type-checked for xfs_agblock_t */ /* u32 bitmap */
struct xagb_bitmap { struct xbitmap32 {
struct xbitmap agbitmap; struct rb_root_cached xb_root;
}; };
static inline void xagb_bitmap_init(struct xagb_bitmap *bitmap) void xbitmap32_init(struct xbitmap32 *bitmap);
{ void xbitmap32_destroy(struct xbitmap32 *bitmap);
xbitmap_init(&bitmap->agbitmap);
}
static inline void xagb_bitmap_destroy(struct xagb_bitmap *bitmap) int xbitmap32_clear(struct xbitmap32 *bitmap, uint32_t start, uint32_t len);
{ int xbitmap32_set(struct xbitmap32 *bitmap, uint32_t start, uint32_t len);
xbitmap_destroy(&bitmap->agbitmap); int xbitmap32_disunion(struct xbitmap32 *bitmap, struct xbitmap32 *sub);
} uint32_t xbitmap32_hweight(struct xbitmap32 *bitmap);
static inline int xagb_bitmap_clear(struct xagb_bitmap *bitmap, /*
xfs_agblock_t start, xfs_extlen_t len) * Return codes for the bitmap iterator functions are 0 to continue iterating,
{ * and non-zero to stop iterating. Any non-zero value will be passed up to the
return xbitmap_clear(&bitmap->agbitmap, start, len); * iteration caller. The special value -ECANCELED can be used to stop
} * iteration, because neither bitmap iterator ever generates that error code on
static inline int xagb_bitmap_set(struct xagb_bitmap *bitmap, * its own. Callers must not modify the bitmap while walking it.
xfs_agblock_t start, xfs_extlen_t len) */
{ typedef int (*xbitmap32_walk_fn)(uint32_t start, uint32_t len, void *priv);
return xbitmap_set(&bitmap->agbitmap, start, len); int xbitmap32_walk(struct xbitmap32 *bitmap, xbitmap32_walk_fn fn,
} void *priv);
static inline bool bool xbitmap32_empty(struct xbitmap32 *bitmap);
xagb_bitmap_test( bool xbitmap32_test(struct xbitmap32 *bitmap, uint32_t start, uint32_t *len);
struct xagb_bitmap *bitmap,
xfs_agblock_t start,
xfs_extlen_t *len)
{
uint64_t biglen = *len;
bool ret;
ret = xbitmap_test(&bitmap->agbitmap, start, &biglen);
if (start + biglen >= UINT_MAX) {
ASSERT(0);
biglen = UINT_MAX - start;
}
*len = biglen;
return ret;
}
static inline int xagb_bitmap_disunion(struct xagb_bitmap *bitmap,
struct xagb_bitmap *sub)
{
return xbitmap_disunion(&bitmap->agbitmap, &sub->agbitmap);
}
static inline uint32_t xagb_bitmap_hweight(struct xagb_bitmap *bitmap)
{
return xbitmap_hweight(&bitmap->agbitmap);
}
static inline bool xagb_bitmap_empty(struct xagb_bitmap *bitmap)
{
return xbitmap_empty(&bitmap->agbitmap);
}
static inline int xagb_bitmap_walk(struct xagb_bitmap *bitmap,
xbitmap_walk_fn fn, void *priv)
{
return xbitmap_walk(&bitmap->agbitmap, fn, priv);
}
int xagb_bitmap_set_btblocks(struct xagb_bitmap *bitmap,
struct xfs_btree_cur *cur);
int xagb_bitmap_set_btcur_path(struct xagb_bitmap *bitmap,
struct xfs_btree_cur *cur);
#endif /* __XFS_SCRUB_BITMAP_H__ */ #endif /* __XFS_SCRUB_BITMAP_H__ */

View File

@ -19,9 +19,11 @@
#include "xfs_bmap_btree.h" #include "xfs_bmap_btree.h"
#include "xfs_rmap.h" #include "xfs_rmap.h"
#include "xfs_rmap_btree.h" #include "xfs_rmap_btree.h"
#include "xfs_health.h"
#include "scrub/scrub.h" #include "scrub/scrub.h"
#include "scrub/common.h" #include "scrub/common.h"
#include "scrub/btree.h" #include "scrub/btree.h"
#include "scrub/health.h"
#include "xfs_ag.h" #include "xfs_ag.h"
/* Set us up with an inode's bmap. */ /* Set us up with an inode's bmap. */
@ -48,9 +50,18 @@ xchk_setup_inode_bmap(
if (S_ISREG(VFS_I(sc->ip)->i_mode) && if (S_ISREG(VFS_I(sc->ip)->i_mode) &&
sc->sm->sm_type != XFS_SCRUB_TYPE_BMBTA) { sc->sm->sm_type != XFS_SCRUB_TYPE_BMBTA) {
struct address_space *mapping = VFS_I(sc->ip)->i_mapping; struct address_space *mapping = VFS_I(sc->ip)->i_mapping;
bool is_repair = xchk_could_repair(sc);
xchk_ilock(sc, XFS_MMAPLOCK_EXCL); xchk_ilock(sc, XFS_MMAPLOCK_EXCL);
/* Break all our leases, we're going to mess with things. */
if (is_repair) {
error = xfs_break_layouts(VFS_I(sc->ip),
&sc->ilock_flags, BREAK_WRITE);
if (error)
goto out;
}
inode_dio_wait(VFS_I(sc->ip)); inode_dio_wait(VFS_I(sc->ip));
/* /*
@ -71,6 +82,15 @@ xchk_setup_inode_bmap(
error = filemap_fdatawait_keep_errors(mapping); error = filemap_fdatawait_keep_errors(mapping);
if (error && (error != -ENOSPC && error != -EIO)) if (error && (error != -ENOSPC && error != -EIO))
goto out; goto out;
/* Drop the page cache if we're repairing block mappings. */
if (is_repair) {
error = invalidate_inode_pages2(
VFS_I(sc->ip)->i_mapping);
if (error)
goto out;
}
} }
/* Got the inode, lock it and we're ready to go. */ /* Got the inode, lock it and we're ready to go. */
@ -78,6 +98,10 @@ xchk_setup_inode_bmap(
if (error) if (error)
goto out; goto out;
error = xchk_ino_dqattach(sc);
if (error)
goto out;
xchk_ilock(sc, XFS_ILOCK_EXCL); xchk_ilock(sc, XFS_ILOCK_EXCL);
out: out:
/* scrub teardown will unlock and release the inode */ /* scrub teardown will unlock and release the inode */
@ -632,6 +656,82 @@ xchk_bmap_check_ag_rmaps(
return error; return error;
} }
/*
* Decide if we want to scan the reverse mappings to determine if the attr
* fork /really/ has zero space mappings.
*/
STATIC bool
xchk_bmap_check_empty_attrfork(
struct xfs_inode *ip)
{
struct xfs_ifork *ifp = &ip->i_af;
/*
* If the dinode repair found a bad attr fork, it will reset the fork
* to extents format with zero records and wait for the this scrubber
* to reconstruct the block mappings. If the fork is not in this
* state, then the fork cannot have been zapped.
*/
if (ifp->if_format != XFS_DINODE_FMT_EXTENTS || ifp->if_nextents != 0)
return false;
/*
* Files can have an attr fork in EXTENTS format with zero records for
* several reasons:
*
* a) an attr set created a fork but ran out of space
* b) attr replace deleted an old attr but failed during the set step
* c) the data fork was in btree format when all attrs were deleted, so
* the fork was left in place
* d) the inode repair code zapped the fork
*
* Only in case (d) do we want to scan the rmapbt to see if we need to
* rebuild the attr fork. The fork zap code clears all DAC permission
* bits and zeroes the uid and gid, so avoid the scan if any of those
* three conditions are not met.
*/
if ((VFS_I(ip)->i_mode & 0777) != 0)
return false;
if (!uid_eq(VFS_I(ip)->i_uid, GLOBAL_ROOT_UID))
return false;
if (!gid_eq(VFS_I(ip)->i_gid, GLOBAL_ROOT_GID))
return false;
return true;
}
/*
* Decide if we want to scan the reverse mappings to determine if the data
* fork /really/ has zero space mappings.
*/
STATIC bool
xchk_bmap_check_empty_datafork(
struct xfs_inode *ip)
{
struct xfs_ifork *ifp = &ip->i_df;
/* Don't support realtime rmap checks yet. */
if (XFS_IS_REALTIME_INODE(ip))
return false;
/*
* If the dinode repair found a bad data fork, it will reset the fork
* to extents format with zero records and wait for the this scrubber
* to reconstruct the block mappings. If the fork is not in this
* state, then the fork cannot have been zapped.
*/
if (ifp->if_format != XFS_DINODE_FMT_EXTENTS || ifp->if_nextents != 0)
return false;
/*
* If we encounter an empty data fork along with evidence that the fork
* might not really be empty, we need to scan the reverse mappings to
* decide if we're going to rebuild the fork. Data forks with nonzero
* file size are scanned.
*/
return i_size_read(VFS_I(ip)) != 0;
}
/* /*
* Decide if we want to walk every rmap btree in the fs to make sure that each * Decide if we want to walk every rmap btree in the fs to make sure that each
* rmap for this file fork has corresponding bmbt entries. * rmap for this file fork has corresponding bmbt entries.
@ -641,7 +741,6 @@ xchk_bmap_want_check_rmaps(
struct xchk_bmap_info *info) struct xchk_bmap_info *info)
{ {
struct xfs_scrub *sc = info->sc; struct xfs_scrub *sc = info->sc;
struct xfs_ifork *ifp;
if (!xfs_has_rmapbt(sc->mp)) if (!xfs_has_rmapbt(sc->mp))
return false; return false;
@ -650,28 +749,10 @@ xchk_bmap_want_check_rmaps(
if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
return false; return false;
/* Don't support realtime rmap checks yet. */ if (info->whichfork == XFS_ATTR_FORK)
if (info->is_rt) return xchk_bmap_check_empty_attrfork(sc->ip);
return false;
/* return xchk_bmap_check_empty_datafork(sc->ip);
* The inode repair code zaps broken inode forks by resetting them back
* to EXTENTS format and zero extent records. If we encounter a fork
* in this state along with evidence that the fork isn't supposed to be
* empty, we need to scan the reverse mappings to decide if we're going
* to rebuild the fork. Data forks with nonzero file size are scanned.
* xattr forks are never empty of content, so they are always scanned.
*/
ifp = xfs_ifork_ptr(sc->ip, info->whichfork);
if (ifp->if_format == XFS_DINODE_FMT_EXTENTS && ifp->if_nextents == 0) {
if (info->whichfork == XFS_DATA_FORK &&
i_size_read(VFS_I(sc->ip)) == 0)
return false;
return true;
}
return false;
} }
/* Make sure each rmap has a corresponding bmbt entry. */ /* Make sure each rmap has a corresponding bmbt entry. */
@ -939,7 +1020,20 @@ int
xchk_bmap_data( xchk_bmap_data(
struct xfs_scrub *sc) struct xfs_scrub *sc)
{ {
return xchk_bmap(sc, XFS_DATA_FORK); int error;
if (xchk_file_looks_zapped(sc, XFS_SICK_INO_BMBTD_ZAPPED)) {
xchk_ino_set_corrupt(sc, sc->ip->i_ino);
return 0;
}
error = xchk_bmap(sc, XFS_DATA_FORK);
if (error)
return error;
/* If the data fork is clean, it is clearly not zapped. */
xchk_mark_healthy_if_clean(sc, XFS_SICK_INO_BMBTD_ZAPPED);
return 0;
} }
/* Scrub an inode's attr fork. */ /* Scrub an inode's attr fork. */
@ -947,7 +1041,27 @@ int
xchk_bmap_attr( xchk_bmap_attr(
struct xfs_scrub *sc) struct xfs_scrub *sc)
{ {
return xchk_bmap(sc, XFS_ATTR_FORK); int error;
/*
* If the attr fork has been zapped, it's possible that forkoff was
* reset to zero and hence sc->ip->i_afp is NULL. We don't want the
* NULL ifp check in xchk_bmap to conclude that the attr fork is ok,
* so short circuit that logic by setting the corruption flag and
* returning immediately.
*/
if (xchk_file_looks_zapped(sc, XFS_SICK_INO_BMBTA_ZAPPED)) {
xchk_ino_set_corrupt(sc, sc->ip->i_ino);
return 0;
}
error = xchk_bmap(sc, XFS_ATTR_FORK);
if (error)
return error;
/* If the attr fork is clean, it is clearly not zapped. */
xchk_mark_healthy_if_clean(sc, XFS_SICK_INO_BMBTA_ZAPPED);
return 0;
} }
/* Scrub an inode's CoW fork. */ /* Scrub an inode's CoW fork. */

867
fs/xfs/scrub/bmap_repair.c Normal file
View File

@ -0,0 +1,867 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2018-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_btree.h"
#include "xfs_btree_staging.h"
#include "xfs_bit.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_inode.h"
#include "xfs_inode_fork.h"
#include "xfs_alloc.h"
#include "xfs_rtalloc.h"
#include "xfs_bmap.h"
#include "xfs_bmap_util.h"
#include "xfs_bmap_btree.h"
#include "xfs_rmap.h"
#include "xfs_rmap_btree.h"
#include "xfs_refcount.h"
#include "xfs_quota.h"
#include "xfs_ialloc.h"
#include "xfs_ag.h"
#include "xfs_reflink.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/btree.h"
#include "scrub/trace.h"
#include "scrub/repair.h"
#include "scrub/bitmap.h"
#include "scrub/fsb_bitmap.h"
#include "scrub/xfile.h"
#include "scrub/xfarray.h"
#include "scrub/newbt.h"
#include "scrub/reap.h"
/*
* Inode Fork Block Mapping (BMBT) Repair
* ======================================
*
* Gather all the rmap records for the inode and fork we're fixing, reset the
* incore fork, then recreate the btree.
*/
enum reflink_scan_state {
RLS_IRRELEVANT = -1, /* not applicable to this file */
RLS_UNKNOWN, /* shared extent scans required */
RLS_SET_IFLAG, /* iflag must be set */
};
struct xrep_bmap {
/* Old bmbt blocks */
struct xfsb_bitmap old_bmbt_blocks;
/* New fork. */
struct xrep_newbt new_bmapbt;
/* List of new bmap records. */
struct xfarray *bmap_records;
struct xfs_scrub *sc;
/* How many blocks did we find allocated to this file? */
xfs_rfsblock_t nblocks;
/* How many bmbt blocks did we find for this fork? */
xfs_rfsblock_t old_bmbt_block_count;
/* get_records()'s position in the free space record array. */
xfarray_idx_t array_cur;
/* How many real (non-hole, non-delalloc) mappings do we have? */
uint64_t real_mappings;
/* Which fork are we fixing? */
int whichfork;
/* What d the REFLINK flag be set when the repair is over? */
enum reflink_scan_state reflink_scan;
/* Do we allow unwritten extents? */
bool allow_unwritten;
};
/* Is this space extent shared? Flag the inode if it is. */
STATIC int
xrep_bmap_discover_shared(
struct xrep_bmap *rb,
xfs_fsblock_t startblock,
xfs_filblks_t blockcount)
{
struct xfs_scrub *sc = rb->sc;
xfs_agblock_t agbno;
xfs_agblock_t fbno;
xfs_extlen_t flen;
int error;
agbno = XFS_FSB_TO_AGBNO(sc->mp, startblock);
error = xfs_refcount_find_shared(sc->sa.refc_cur, agbno, blockcount,
&fbno, &flen, false);
if (error)
return error;
if (fbno != NULLAGBLOCK)
rb->reflink_scan = RLS_SET_IFLAG;
return 0;
}
/* Remember this reverse-mapping as a series of bmap records. */
STATIC int
xrep_bmap_from_rmap(
struct xrep_bmap *rb,
xfs_fileoff_t startoff,
xfs_fsblock_t startblock,
xfs_filblks_t blockcount,
bool unwritten)
{
struct xfs_bmbt_irec irec = {
.br_startoff = startoff,
.br_startblock = startblock,
.br_state = unwritten ? XFS_EXT_UNWRITTEN : XFS_EXT_NORM,
};
struct xfs_bmbt_rec rbe;
struct xfs_scrub *sc = rb->sc;
int error = 0;
/*
* If we're repairing the data fork of a non-reflinked regular file on
* a reflink filesystem, we need to figure out if this space extent is
* shared.
*/
if (rb->reflink_scan == RLS_UNKNOWN && !unwritten) {
error = xrep_bmap_discover_shared(rb, startblock, blockcount);
if (error)
return error;
}
do {
xfs_failaddr_t fa;
irec.br_blockcount = min_t(xfs_filblks_t, blockcount,
XFS_MAX_BMBT_EXTLEN);
fa = xfs_bmap_validate_extent(sc->ip, rb->whichfork, &irec);
if (fa)
return -EFSCORRUPTED;
xfs_bmbt_disk_set_all(&rbe, &irec);
trace_xrep_bmap_found(sc->ip, rb->whichfork, &irec);
if (xchk_should_terminate(sc, &error))
return error;
error = xfarray_append(rb->bmap_records, &rbe);
if (error)
return error;
rb->real_mappings++;
irec.br_startblock += irec.br_blockcount;
irec.br_startoff += irec.br_blockcount;
blockcount -= irec.br_blockcount;
} while (blockcount > 0);
return 0;
}
/* Check for any obvious errors or conflicts in the file mapping. */
STATIC int
xrep_bmap_check_fork_rmap(
struct xrep_bmap *rb,
struct xfs_btree_cur *cur,
const struct xfs_rmap_irec *rec)
{
struct xfs_scrub *sc = rb->sc;
enum xbtree_recpacking outcome;
int error;
/*
* Data extents for rt files are never stored on the data device, but
* everything else (xattrs, bmbt blocks) can be.
*/
if (XFS_IS_REALTIME_INODE(sc->ip) &&
!(rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK)))
return -EFSCORRUPTED;
/* Check that this is within the AG. */
if (!xfs_verify_agbext(cur->bc_ag.pag, rec->rm_startblock,
rec->rm_blockcount))
return -EFSCORRUPTED;
/* Check the file offset range. */
if (!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK) &&
!xfs_verify_fileext(sc->mp, rec->rm_offset, rec->rm_blockcount))
return -EFSCORRUPTED;
/* No contradictory flags. */
if ((rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK)) &&
(rec->rm_flags & XFS_RMAP_UNWRITTEN))
return -EFSCORRUPTED;
/* Make sure this isn't free space. */
error = xfs_alloc_has_records(sc->sa.bno_cur, rec->rm_startblock,
rec->rm_blockcount, &outcome);
if (error)
return error;
if (outcome != XBTREE_RECPACKING_EMPTY)
return -EFSCORRUPTED;
/* Must not be an inode chunk. */
error = xfs_ialloc_has_inodes_at_extent(sc->sa.ino_cur,
rec->rm_startblock, rec->rm_blockcount, &outcome);
if (error)
return error;
if (outcome != XBTREE_RECPACKING_EMPTY)
return -EFSCORRUPTED;
return 0;
}
/* Record extents that belong to this inode's fork. */
STATIC int
xrep_bmap_walk_rmap(
struct xfs_btree_cur *cur,
const struct xfs_rmap_irec *rec,
void *priv)
{
struct xrep_bmap *rb = priv;
struct xfs_mount *mp = cur->bc_mp;
xfs_fsblock_t fsbno;
int error = 0;
if (xchk_should_terminate(rb->sc, &error))
return error;
if (rec->rm_owner != rb->sc->ip->i_ino)
return 0;
error = xrep_bmap_check_fork_rmap(rb, cur, rec);
if (error)
return error;
/*
* Record all blocks allocated to this file even if the extent isn't
* for the fork we're rebuilding so that we can reset di_nblocks later.
*/
rb->nblocks += rec->rm_blockcount;
/* If this rmap isn't for the fork we want, we're done. */
if (rb->whichfork == XFS_DATA_FORK &&
(rec->rm_flags & XFS_RMAP_ATTR_FORK))
return 0;
if (rb->whichfork == XFS_ATTR_FORK &&
!(rec->rm_flags & XFS_RMAP_ATTR_FORK))
return 0;
/* Reject unwritten extents if we don't allow those. */
if ((rec->rm_flags & XFS_RMAP_UNWRITTEN) && !rb->allow_unwritten)
return -EFSCORRUPTED;
fsbno = XFS_AGB_TO_FSB(mp, cur->bc_ag.pag->pag_agno,
rec->rm_startblock);
if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) {
rb->old_bmbt_block_count += rec->rm_blockcount;
return xfsb_bitmap_set(&rb->old_bmbt_blocks, fsbno,
rec->rm_blockcount);
}
return xrep_bmap_from_rmap(rb, rec->rm_offset, fsbno,
rec->rm_blockcount,
rec->rm_flags & XFS_RMAP_UNWRITTEN);
}
/*
* Compare two block mapping records. We want to sort in order of increasing
* file offset.
*/
static int
xrep_bmap_extent_cmp(
const void *a,
const void *b)
{
const struct xfs_bmbt_rec *ba = a;
const struct xfs_bmbt_rec *bb = b;
xfs_fileoff_t ao = xfs_bmbt_disk_get_startoff(ba);
xfs_fileoff_t bo = xfs_bmbt_disk_get_startoff(bb);
if (ao > bo)
return 1;
else if (ao < bo)
return -1;
return 0;
}
/*
* Sort the bmap extents by fork offset or else the records will be in the
* wrong order. Ensure there are no overlaps in the file offset ranges.
*/
STATIC int
xrep_bmap_sort_records(
struct xrep_bmap *rb)
{
struct xfs_bmbt_irec irec;
xfs_fileoff_t next_off = 0;
xfarray_idx_t array_cur;
int error;
error = xfarray_sort(rb->bmap_records, xrep_bmap_extent_cmp,
XFARRAY_SORT_KILLABLE);
if (error)
return error;
foreach_xfarray_idx(rb->bmap_records, array_cur) {
struct xfs_bmbt_rec rec;
if (xchk_should_terminate(rb->sc, &error))
return error;
error = xfarray_load(rb->bmap_records, array_cur, &rec);
if (error)
return error;
xfs_bmbt_disk_get_all(&rec, &irec);
if (irec.br_startoff < next_off)
return -EFSCORRUPTED;
next_off = irec.br_startoff + irec.br_blockcount;
}
return 0;
}
/* Scan one AG for reverse mappings that we can turn into extent maps. */
STATIC int
xrep_bmap_scan_ag(
struct xrep_bmap *rb,
struct xfs_perag *pag)
{
struct xfs_scrub *sc = rb->sc;
int error;
error = xrep_ag_init(sc, pag, &sc->sa);
if (error)
return error;
error = xfs_rmap_query_all(sc->sa.rmap_cur, xrep_bmap_walk_rmap, rb);
xchk_ag_free(sc, &sc->sa);
return error;
}
/* Find the delalloc extents from the old incore extent tree. */
STATIC int
xrep_bmap_find_delalloc(
struct xrep_bmap *rb)
{
struct xfs_bmbt_irec irec;
struct xfs_iext_cursor icur;
struct xfs_bmbt_rec rbe;
struct xfs_inode *ip = rb->sc->ip;
struct xfs_ifork *ifp = xfs_ifork_ptr(ip, rb->whichfork);
int error = 0;
/*
* Skip this scan if we don't expect to find delayed allocation
* reservations in this fork.
*/
if (rb->whichfork == XFS_ATTR_FORK || ip->i_delayed_blks == 0)
return 0;
for_each_xfs_iext(ifp, &icur, &irec) {
if (!isnullstartblock(irec.br_startblock))
continue;
xfs_bmbt_disk_set_all(&rbe, &irec);
trace_xrep_bmap_found(ip, rb->whichfork, &irec);
if (xchk_should_terminate(rb->sc, &error))
return error;
error = xfarray_append(rb->bmap_records, &rbe);
if (error)
return error;
}
return 0;
}
/*
* Collect block mappings for this fork of this inode and decide if we have
* enough space to rebuild. Caller is responsible for cleaning up the list if
* anything goes wrong.
*/
STATIC int
xrep_bmap_find_mappings(
struct xrep_bmap *rb)
{
struct xfs_scrub *sc = rb->sc;
struct xfs_perag *pag;
xfs_agnumber_t agno;
int error = 0;
/* Iterate the rmaps for extents. */
for_each_perag(sc->mp, agno, pag) {
error = xrep_bmap_scan_ag(rb, pag);
if (error) {
xfs_perag_rele(pag);
return error;
}
}
return xrep_bmap_find_delalloc(rb);
}
/* Retrieve real extent mappings for bulk loading the bmap btree. */
STATIC int
xrep_bmap_get_records(
struct xfs_btree_cur *cur,
unsigned int idx,
struct xfs_btree_block *block,
unsigned int nr_wanted,
void *priv)
{
struct xfs_bmbt_rec rec;
struct xfs_bmbt_irec *irec = &cur->bc_rec.b;
struct xrep_bmap *rb = priv;
union xfs_btree_rec *block_rec;
unsigned int loaded;
int error;
for (loaded = 0; loaded < nr_wanted; loaded++, idx++) {
do {
error = xfarray_load(rb->bmap_records, rb->array_cur++,
&rec);
if (error)
return error;
xfs_bmbt_disk_get_all(&rec, irec);
} while (isnullstartblock(irec->br_startblock));
block_rec = xfs_btree_rec_addr(cur, idx, block);
cur->bc_ops->init_rec_from_cur(cur, block_rec);
}
return loaded;
}
/* Feed one of the new btree blocks to the bulk loader. */
STATIC int
xrep_bmap_claim_block(
struct xfs_btree_cur *cur,
union xfs_btree_ptr *ptr,
void *priv)
{
struct xrep_bmap *rb = priv;
return xrep_newbt_claim_block(cur, &rb->new_bmapbt, ptr);
}
/* Figure out how much space we need to create the incore btree root block. */
STATIC size_t
xrep_bmap_iroot_size(
struct xfs_btree_cur *cur,
unsigned int level,
unsigned int nr_this_level,
void *priv)
{
ASSERT(level > 0);
return XFS_BMAP_BROOT_SPACE_CALC(cur->bc_mp, nr_this_level);
}
/* Update the inode counters. */
STATIC int
xrep_bmap_reset_counters(
struct xrep_bmap *rb)
{
struct xfs_scrub *sc = rb->sc;
struct xbtree_ifakeroot *ifake = &rb->new_bmapbt.ifake;
int64_t delta;
if (rb->reflink_scan == RLS_SET_IFLAG)
sc->ip->i_diflags2 |= XFS_DIFLAG2_REFLINK;
/*
* Update the inode block counts to reflect the extents we found in the
* rmapbt.
*/
delta = ifake->if_blocks - rb->old_bmbt_block_count;
sc->ip->i_nblocks = rb->nblocks + delta;
xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
/*
* Adjust the quota counts by the difference in size between the old
* and new bmbt.
*/
xfs_trans_mod_dquot_byino(sc->tp, sc->ip, XFS_TRANS_DQ_BCOUNT, delta);
return 0;
}
/*
* Create a new iext tree and load it with block mappings. If the inode is
* in extents format, that's all we need to do to commit the new mappings.
* If it is in btree format, this takes care of preloading the incore tree.
*/
STATIC int
xrep_bmap_extents_load(
struct xrep_bmap *rb)
{
struct xfs_iext_cursor icur;
struct xfs_bmbt_irec irec;
struct xfs_ifork *ifp = rb->new_bmapbt.ifake.if_fork;
xfarray_idx_t array_cur;
int error;
ASSERT(ifp->if_bytes == 0);
/* Add all the mappings (incl. delalloc) to the incore extent tree. */
xfs_iext_first(ifp, &icur);
foreach_xfarray_idx(rb->bmap_records, array_cur) {
struct xfs_bmbt_rec rec;
error = xfarray_load(rb->bmap_records, array_cur, &rec);
if (error)
return error;
xfs_bmbt_disk_get_all(&rec, &irec);
xfs_iext_insert_raw(ifp, &icur, &irec);
if (!isnullstartblock(irec.br_startblock))
ifp->if_nextents++;
xfs_iext_next(ifp, &icur);
}
return xrep_ino_ensure_extent_count(rb->sc, rb->whichfork,
ifp->if_nextents);
}
/*
* Reserve new btree blocks, bulk load the bmap records into the ondisk btree,
* and load the incore extent tree.
*/
STATIC int
xrep_bmap_btree_load(
struct xrep_bmap *rb,
struct xfs_btree_cur *bmap_cur)
{
struct xfs_scrub *sc = rb->sc;
int error;
/* Compute how many blocks we'll need. */
error = xfs_btree_bload_compute_geometry(bmap_cur,
&rb->new_bmapbt.bload, rb->real_mappings);
if (error)
return error;
/* Last chance to abort before we start committing fixes. */
if (xchk_should_terminate(sc, &error))
return error;
/*
* Guess how many blocks we're going to need to rebuild an entire bmap
* from the number of extents we found, and pump up our transaction to
* have sufficient block reservation. We're allowed to exceed file
* quota to repair inconsistent metadata.
*/
error = xfs_trans_reserve_more_inode(sc->tp, sc->ip,
rb->new_bmapbt.bload.nr_blocks, 0, true);
if (error)
return error;
/* Reserve the space we'll need for the new btree. */
error = xrep_newbt_alloc_blocks(&rb->new_bmapbt,
rb->new_bmapbt.bload.nr_blocks);
if (error)
return error;
/* Add all observed bmap records. */
rb->array_cur = XFARRAY_CURSOR_INIT;
error = xfs_btree_bload(bmap_cur, &rb->new_bmapbt.bload, rb);
if (error)
return error;
/*
* Load the new bmap records into the new incore extent tree to
* preserve delalloc reservations for regular files. The directory
* code loads the extent tree during xfs_dir_open and assumes
* thereafter that it remains loaded, so we must not violate that
* assumption.
*/
return xrep_bmap_extents_load(rb);
}
/*
* Use the collected bmap information to stage a new bmap fork. If this is
* successful we'll return with the new fork information logged to the repair
* transaction but not yet committed. The caller must ensure that the inode
* is joined to the transaction; the inode will be joined to a clean
* transaction when the function returns.
*/
STATIC int
xrep_bmap_build_new_fork(
struct xrep_bmap *rb)
{
struct xfs_owner_info oinfo;
struct xfs_scrub *sc = rb->sc;
struct xfs_btree_cur *bmap_cur;
struct xbtree_ifakeroot *ifake = &rb->new_bmapbt.ifake;
int error;
error = xrep_bmap_sort_records(rb);
if (error)
return error;
/*
* Prepare to construct the new fork by initializing the new btree
* structure and creating a fake ifork in the ifakeroot structure.
*/
xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, rb->whichfork);
error = xrep_newbt_init_inode(&rb->new_bmapbt, sc, rb->whichfork,
&oinfo);
if (error)
return error;
rb->new_bmapbt.bload.get_records = xrep_bmap_get_records;
rb->new_bmapbt.bload.claim_block = xrep_bmap_claim_block;
rb->new_bmapbt.bload.iroot_size = xrep_bmap_iroot_size;
bmap_cur = xfs_bmbt_stage_cursor(sc->mp, sc->ip, ifake);
/*
* Figure out the size and format of the new fork, then fill it with
* all the bmap records we've found. Join the inode to the transaction
* so that we can roll the transaction while holding the inode locked.
*/
if (rb->real_mappings <= XFS_IFORK_MAXEXT(sc->ip, rb->whichfork)) {
ifake->if_fork->if_format = XFS_DINODE_FMT_EXTENTS;
error = xrep_bmap_extents_load(rb);
} else {
ifake->if_fork->if_format = XFS_DINODE_FMT_BTREE;
error = xrep_bmap_btree_load(rb, bmap_cur);
}
if (error)
goto err_cur;
/*
* Install the new fork in the inode. After this point the old mapping
* data are no longer accessible and the new tree is live. We delete
* the cursor immediately after committing the staged root because the
* staged fork might be in extents format.
*/
xfs_bmbt_commit_staged_btree(bmap_cur, sc->tp, rb->whichfork);
xfs_btree_del_cursor(bmap_cur, 0);
/* Reset the inode counters now that we've changed the fork. */
error = xrep_bmap_reset_counters(rb);
if (error)
goto err_newbt;
/* Dispose of any unused blocks and the accounting information. */
error = xrep_newbt_commit(&rb->new_bmapbt);
if (error)
return error;
return xrep_roll_trans(sc);
err_cur:
if (bmap_cur)
xfs_btree_del_cursor(bmap_cur, error);
err_newbt:
xrep_newbt_cancel(&rb->new_bmapbt);
return error;
}
/*
* Now that we've logged the new inode btree, invalidate all of the old blocks
* and free them, if there were any.
*/
STATIC int
xrep_bmap_remove_old_tree(
struct xrep_bmap *rb)
{
struct xfs_scrub *sc = rb->sc;
struct xfs_owner_info oinfo;
/* Free the old bmbt blocks if they're not in use. */
xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, rb->whichfork);
return xrep_reap_fsblocks(sc, &rb->old_bmbt_blocks, &oinfo);
}
/* Check for garbage inputs. Returns -ECANCELED if there's nothing to do. */
STATIC int
xrep_bmap_check_inputs(
struct xfs_scrub *sc,
int whichfork)
{
struct xfs_ifork *ifp = xfs_ifork_ptr(sc->ip, whichfork);
ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_ATTR_FORK);
if (!xfs_has_rmapbt(sc->mp))
return -EOPNOTSUPP;
/* No fork means nothing to rebuild. */
if (!ifp)
return -ECANCELED;
/*
* We only know how to repair extent mappings, which is to say that we
* only support extents and btree fork format. Repairs to a local
* format fork require a higher level repair function, so we do not
* have any work to do here.
*/
switch (ifp->if_format) {
case XFS_DINODE_FMT_DEV:
case XFS_DINODE_FMT_LOCAL:
case XFS_DINODE_FMT_UUID:
return -ECANCELED;
case XFS_DINODE_FMT_EXTENTS:
case XFS_DINODE_FMT_BTREE:
break;
default:
return -EFSCORRUPTED;
}
if (whichfork == XFS_ATTR_FORK)
return 0;
/* Only files, symlinks, and directories get to have data forks. */
switch (VFS_I(sc->ip)->i_mode & S_IFMT) {
case S_IFREG:
case S_IFDIR:
case S_IFLNK:
/* ok */
break;
default:
return -EINVAL;
}
/* Don't know how to rebuild realtime data forks. */
if (XFS_IS_REALTIME_INODE(sc->ip))
return -EOPNOTSUPP;
return 0;
}
/* Set up the initial state of the reflink scan. */
static inline enum reflink_scan_state
xrep_bmap_init_reflink_scan(
struct xfs_scrub *sc,
int whichfork)
{
/* cannot share on non-reflink filesystem */
if (!xfs_has_reflink(sc->mp))
return RLS_IRRELEVANT;
/* preserve flag if it's already set */
if (xfs_is_reflink_inode(sc->ip))
return RLS_SET_IFLAG;
/* can only share regular files */
if (!S_ISREG(VFS_I(sc->ip)->i_mode))
return RLS_IRRELEVANT;
/* cannot share attr fork extents */
if (whichfork != XFS_DATA_FORK)
return RLS_IRRELEVANT;
/* cannot share realtime extents */
if (XFS_IS_REALTIME_INODE(sc->ip))
return RLS_IRRELEVANT;
return RLS_UNKNOWN;
}
/* Repair an inode fork. */
int
xrep_bmap(
struct xfs_scrub *sc,
int whichfork,
bool allow_unwritten)
{
struct xrep_bmap *rb;
char *descr;
unsigned int max_bmbt_recs;
bool large_extcount;
int error = 0;
error = xrep_bmap_check_inputs(sc, whichfork);
if (error == -ECANCELED)
return 0;
if (error)
return error;
rb = kzalloc(sizeof(struct xrep_bmap), XCHK_GFP_FLAGS);
if (!rb)
return -ENOMEM;
rb->sc = sc;
rb->whichfork = whichfork;
rb->reflink_scan = xrep_bmap_init_reflink_scan(sc, whichfork);
rb->allow_unwritten = allow_unwritten;
/* Set up enough storage to handle the max records for this fork. */
large_extcount = xfs_has_large_extent_counts(sc->mp);
max_bmbt_recs = xfs_iext_max_nextents(large_extcount, whichfork);
descr = xchk_xfile_ino_descr(sc, "%s fork mapping records",
whichfork == XFS_DATA_FORK ? "data" : "attr");
error = xfarray_create(descr, max_bmbt_recs,
sizeof(struct xfs_bmbt_rec), &rb->bmap_records);
kfree(descr);
if (error)
goto out_rb;
/* Collect all reverse mappings for this fork's extents. */
xfsb_bitmap_init(&rb->old_bmbt_blocks);
error = xrep_bmap_find_mappings(rb);
if (error)
goto out_bitmap;
xfs_trans_ijoin(sc->tp, sc->ip, 0);
/* Rebuild the bmap information. */
error = xrep_bmap_build_new_fork(rb);
if (error)
goto out_bitmap;
/* Kill the old tree. */
error = xrep_bmap_remove_old_tree(rb);
if (error)
goto out_bitmap;
out_bitmap:
xfsb_bitmap_destroy(&rb->old_bmbt_blocks);
xfarray_destroy(rb->bmap_records);
out_rb:
kfree(rb);
return error;
}
/* Repair an inode's data fork. */
int
xrep_bmap_data(
struct xfs_scrub *sc)
{
return xrep_bmap(sc, XFS_DATA_FORK, true);
}
/* Repair an inode's attr fork. */
int
xrep_bmap_attr(
struct xfs_scrub *sc)
{
return xrep_bmap(sc, XFS_ATTR_FORK, false);
}

View File

@ -25,6 +25,7 @@
#include "xfs_trans_priv.h" #include "xfs_trans_priv.h"
#include "xfs_da_format.h" #include "xfs_da_format.h"
#include "xfs_da_btree.h" #include "xfs_da_btree.h"
#include "xfs_dir2_priv.h"
#include "xfs_attr.h" #include "xfs_attr.h"
#include "xfs_reflink.h" #include "xfs_reflink.h"
#include "xfs_ag.h" #include "xfs_ag.h"
@ -604,6 +605,7 @@ xchk_ag_free(
struct xchk_ag *sa) struct xchk_ag *sa)
{ {
xchk_ag_btcur_free(sa); xchk_ag_btcur_free(sa);
xrep_reset_perag_resv(sc);
if (sa->agf_bp) { if (sa->agf_bp) {
xfs_trans_brelse(sc->tp, sa->agf_bp); xfs_trans_brelse(sc->tp, sa->agf_bp);
sa->agf_bp = NULL; sa->agf_bp = NULL;
@ -733,6 +735,8 @@ xchk_iget(
xfs_ino_t inum, xfs_ino_t inum,
struct xfs_inode **ipp) struct xfs_inode **ipp)
{ {
ASSERT(sc->tp != NULL);
return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp); return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp);
} }
@ -816,6 +820,26 @@ again:
return 0; return 0;
} }
#ifdef CONFIG_XFS_QUOTA
/*
* Try to attach dquots to this inode if we think we might want to repair it.
* Callers must not hold any ILOCKs. If the dquots are broken and cannot be
* attached, a quotacheck will be scheduled.
*/
int
xchk_ino_dqattach(
struct xfs_scrub *sc)
{
ASSERT(sc->tp != NULL);
ASSERT(sc->ip != NULL);
if (!xchk_could_repair(sc))
return 0;
return xrep_ino_dqattach(sc);
}
#endif
/* Install an inode that we opened by handle for scrubbing. */ /* Install an inode that we opened by handle for scrubbing. */
int int
xchk_install_handle_inode( xchk_install_handle_inode(
@ -882,8 +906,8 @@ xchk_iget_for_scrubbing(
if (!xfs_verify_ino(sc->mp, sc->sm->sm_ino)) if (!xfs_verify_ino(sc->mp, sc->sm->sm_ino))
return -ENOENT; return -ENOENT;
/* Try a regular untrusted iget. */ /* Try a safe untrusted iget. */
error = xchk_iget(sc, sc->sm->sm_ino, &ip); error = xchk_iget_safe(sc, sc->sm->sm_ino, &ip);
if (!error) if (!error)
return xchk_install_handle_inode(sc, ip); return xchk_install_handle_inode(sc, ip);
if (error == -ENOENT) if (error == -ENOENT)
@ -1027,6 +1051,11 @@ xchk_setup_inode_contents(
error = xchk_trans_alloc(sc, resblks); error = xchk_trans_alloc(sc, resblks);
if (error) if (error)
goto out; goto out;
error = xchk_ino_dqattach(sc);
if (error)
goto out;
xchk_ilock(sc, XFS_ILOCK_EXCL); xchk_ilock(sc, XFS_ILOCK_EXCL);
out: out:
/* scrub teardown will unlock and release the inode for us */ /* scrub teardown will unlock and release the inode for us */
@ -1132,6 +1161,7 @@ xchk_metadata_inode_subtype(
unsigned int scrub_type) unsigned int scrub_type)
{ {
__u32 smtype = sc->sm->sm_type; __u32 smtype = sc->sm->sm_type;
unsigned int sick_mask = sc->sick_mask;
int error; int error;
sc->sm->sm_type = scrub_type; sc->sm->sm_type = scrub_type;
@ -1149,6 +1179,7 @@ xchk_metadata_inode_subtype(
break; break;
} }
sc->sick_mask = sick_mask;
sc->sm->sm_type = smtype; sc->sm->sm_type = smtype;
return error; return error;
} }

View File

@ -103,9 +103,15 @@ xchk_setup_rtsummary(struct xfs_scrub *sc)
} }
#endif #endif
#ifdef CONFIG_XFS_QUOTA #ifdef CONFIG_XFS_QUOTA
int xchk_ino_dqattach(struct xfs_scrub *sc);
int xchk_setup_quota(struct xfs_scrub *sc); int xchk_setup_quota(struct xfs_scrub *sc);
#else #else
static inline int static inline int
xchk_ino_dqattach(struct xfs_scrub *sc)
{
return 0;
}
static inline int
xchk_setup_quota(struct xfs_scrub *sc) xchk_setup_quota(struct xfs_scrub *sc)
{ {
return -ENOENT; return -ENOENT;
@ -151,12 +157,37 @@ void xchk_iunlock(struct xfs_scrub *sc, unsigned int ilock_flags);
void xchk_buffer_recheck(struct xfs_scrub *sc, struct xfs_buf *bp); void xchk_buffer_recheck(struct xfs_scrub *sc, struct xfs_buf *bp);
/*
* Grab the inode at @inum. The caller must have created a scrub transaction
* so that we can confirm the inumber by walking the inobt and not deadlock on
* a loop in the inobt.
*/
int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp); int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp);
int xchk_iget_agi(struct xfs_scrub *sc, xfs_ino_t inum, int xchk_iget_agi(struct xfs_scrub *sc, xfs_ino_t inum,
struct xfs_buf **agi_bpp, struct xfs_inode **ipp); struct xfs_buf **agi_bpp, struct xfs_inode **ipp);
void xchk_irele(struct xfs_scrub *sc, struct xfs_inode *ip); void xchk_irele(struct xfs_scrub *sc, struct xfs_inode *ip);
int xchk_install_handle_inode(struct xfs_scrub *sc, struct xfs_inode *ip); int xchk_install_handle_inode(struct xfs_scrub *sc, struct xfs_inode *ip);
/*
* Safe version of (untrusted) xchk_iget that uses an empty transaction to
* avoid deadlocking on loops in the inobt. This should only be used in a
* scrub or repair setup routine, and only prior to grabbing a transaction.
*/
static inline int
xchk_iget_safe(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp)
{
int error;
ASSERT(sc->tp == NULL);
error = xchk_trans_alloc(sc, 0);
if (error)
return error;
error = xchk_iget(sc, inum, ipp);
xchk_trans_cancel(sc);
return error;
}
/* /*
* Don't bother cross-referencing if we already found corruption or cross * Don't bother cross-referencing if we already found corruption or cross
* referencing discrepancies. * referencing discrepancies.
@ -167,6 +198,8 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm)
XFS_SCRUB_OFLAG_XCORRUPT); XFS_SCRUB_OFLAG_XCORRUPT);
} }
bool xchk_dir_looks_zapped(struct xfs_inode *dp);
#ifdef CONFIG_XFS_ONLINE_REPAIR #ifdef CONFIG_XFS_ONLINE_REPAIR
/* Decide if a repair is required. */ /* Decide if a repair is required. */
static inline bool xchk_needs_repair(const struct xfs_scrub_metadata *sm) static inline bool xchk_needs_repair(const struct xfs_scrub_metadata *sm)
@ -175,8 +208,21 @@ static inline bool xchk_needs_repair(const struct xfs_scrub_metadata *sm)
XFS_SCRUB_OFLAG_XCORRUPT | XFS_SCRUB_OFLAG_XCORRUPT |
XFS_SCRUB_OFLAG_PREEN); XFS_SCRUB_OFLAG_PREEN);
} }
/*
* "Should we prepare for a repair?"
*
* Return true if the caller permits us to repair metadata and we're not
* setting up for a post-repair evaluation.
*/
static inline bool xchk_could_repair(const struct xfs_scrub *sc)
{
return (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) &&
!(sc->flags & XREP_ALREADY_FIXED);
}
#else #else
# define xchk_needs_repair(sc) (false) # define xchk_needs_repair(sc) (false)
# define xchk_could_repair(sc) (false)
#endif /* CONFIG_XFS_ONLINE_REPAIR */ #endif /* CONFIG_XFS_ONLINE_REPAIR */
int xchk_metadata_inode_forks(struct xfs_scrub *sc); int xchk_metadata_inode_forks(struct xfs_scrub *sc);
@ -188,6 +234,16 @@ int xchk_metadata_inode_forks(struct xfs_scrub *sc);
#define xchk_xfile_descr(sc, fmt, ...) \ #define xchk_xfile_descr(sc, fmt, ...) \
kasprintf(XCHK_GFP_FLAGS, "XFS (%s): " fmt, \ kasprintf(XCHK_GFP_FLAGS, "XFS (%s): " fmt, \
(sc)->mp->m_super->s_id, ##__VA_ARGS__) (sc)->mp->m_super->s_id, ##__VA_ARGS__)
#define xchk_xfile_ag_descr(sc, fmt, ...) \
kasprintf(XCHK_GFP_FLAGS, "XFS (%s): AG 0x%x " fmt, \
(sc)->mp->m_super->s_id, \
(sc)->sa.pag ? (sc)->sa.pag->pag_agno : (sc)->sm->sm_agno, \
##__VA_ARGS__)
#define xchk_xfile_ino_descr(sc, fmt, ...) \
kasprintf(XCHK_GFP_FLAGS, "XFS (%s): inode 0x%llx " fmt, \
(sc)->mp->m_super->s_id, \
(sc)->ip ? (sc)->ip->i_ino : (sc)->sm->sm_ino, \
##__VA_ARGS__)
/* /*
* Setting up a hook to wait for intents to drain is costly -- we have to take * Setting up a hook to wait for intents to drain is costly -- we have to take

614
fs/xfs/scrub/cow_repair.c Normal file
View File

@ -0,0 +1,614 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2022-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_btree.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_inode.h"
#include "xfs_inode_fork.h"
#include "xfs_alloc.h"
#include "xfs_bmap.h"
#include "xfs_rmap.h"
#include "xfs_refcount.h"
#include "xfs_quota.h"
#include "xfs_ialloc.h"
#include "xfs_ag.h"
#include "xfs_error.h"
#include "xfs_errortag.h"
#include "xfs_icache.h"
#include "xfs_refcount_btree.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/trace.h"
#include "scrub/repair.h"
#include "scrub/bitmap.h"
#include "scrub/off_bitmap.h"
#include "scrub/fsb_bitmap.h"
#include "scrub/reap.h"
/*
* CoW Fork Mapping Repair
* =======================
*
* Although CoW staging extents are owned by incore CoW inode forks, on disk
* they are owned by the refcount btree. The ondisk metadata does not record
* any ownership information, which limits what we can do to repair the
* mappings in the CoW fork. At most, we can replace ifork mappings that lack
* an entry in the refcount btree or are described by a reverse mapping record
* whose owner is not OWN_COW.
*
* Replacing extents is also tricky -- we can't touch written CoW fork extents
* since they are undergoing writeback, and delalloc extents do not require
* repair since they only exist incore. Hence the most we can do is find the
* bad parts of unwritten mappings, allocate a replacement set of blocks, and
* replace the incore mapping. We use the regular reaping process to unmap
* or free the discarded blocks, as appropriate.
*/
struct xrep_cow {
struct xfs_scrub *sc;
/* Bitmap of file offset ranges that need replacing. */
struct xoff_bitmap bad_fileoffs;
/* Bitmap of fsblocks that were removed from the CoW fork. */
struct xfsb_bitmap old_cowfork_fsblocks;
/* CoW fork mappings used to scan for bad CoW staging extents. */
struct xfs_bmbt_irec irec;
/* refcount btree block number of irec.br_startblock */
unsigned int irec_startbno;
/* refcount btree block number of the next refcount record we expect */
unsigned int next_bno;
};
/* CoW staging extent. */
struct xrep_cow_extent {
xfs_fsblock_t fsbno;
xfs_extlen_t len;
};
/*
* Mark the part of the file range that corresponds to the given physical
* space. Caller must ensure that the physical range is within xc->irec.
*/
STATIC int
xrep_cow_mark_file_range(
struct xrep_cow *xc,
xfs_fsblock_t startblock,
xfs_filblks_t blockcount)
{
xfs_fileoff_t startoff;
startoff = xc->irec.br_startoff +
(startblock - xc->irec.br_startblock);
trace_xrep_cow_mark_file_range(xc->sc->ip, startblock, startoff,
blockcount);
return xoff_bitmap_set(&xc->bad_fileoffs, startoff, blockcount);
}
/*
* Trim @src to fit within the CoW fork mapping being examined, and put the
* result in @dst.
*/
static inline void
xrep_cow_trim_refcount(
struct xrep_cow *xc,
struct xfs_refcount_irec *dst,
const struct xfs_refcount_irec *src)
{
unsigned int adj;
memcpy(dst, src, sizeof(*dst));
if (dst->rc_startblock < xc->irec_startbno) {
adj = xc->irec_startbno - dst->rc_startblock;
dst->rc_blockcount -= adj;
dst->rc_startblock += adj;
}
if (dst->rc_startblock + dst->rc_blockcount >
xc->irec_startbno + xc->irec.br_blockcount) {
adj = (dst->rc_startblock + dst->rc_blockcount) -
(xc->irec_startbno + xc->irec.br_blockcount);
dst->rc_blockcount -= adj;
}
}
/* Mark any shared CoW staging extents. */
STATIC int
xrep_cow_mark_shared_staging(
struct xfs_btree_cur *cur,
const struct xfs_refcount_irec *rec,
void *priv)
{
struct xrep_cow *xc = priv;
struct xfs_refcount_irec rrec;
xfs_fsblock_t fsbno;
if (!xfs_refcount_check_domain(rec) ||
rec->rc_domain != XFS_REFC_DOMAIN_SHARED)
return -EFSCORRUPTED;
xrep_cow_trim_refcount(xc, &rrec, rec);
fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno,
rrec.rc_startblock);
return xrep_cow_mark_file_range(xc, fsbno, rrec.rc_blockcount);
}
/*
* Mark any portion of the CoW fork file offset range where there is not a CoW
* staging extent record in the refcountbt, and keep a record of where we did
* find correct refcountbt records. Staging records are always cleaned out at
* mount time, so any two inodes trying to map the same staging area would have
* already taken the fs down due to refcount btree verifier errors. Hence this
* inode should be the sole creator of the staging extent records ondisk.
*/
STATIC int
xrep_cow_mark_missing_staging(
struct xfs_btree_cur *cur,
const struct xfs_refcount_irec *rec,
void *priv)
{
struct xrep_cow *xc = priv;
struct xfs_refcount_irec rrec;
int error;
if (!xfs_refcount_check_domain(rec) ||
rec->rc_domain != XFS_REFC_DOMAIN_COW)
return -EFSCORRUPTED;
xrep_cow_trim_refcount(xc, &rrec, rec);
if (xc->next_bno >= rrec.rc_startblock)
goto next;
error = xrep_cow_mark_file_range(xc,
XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno,
xc->next_bno),
rrec.rc_startblock - xc->next_bno);
if (error)
return error;
next:
xc->next_bno = rrec.rc_startblock + rrec.rc_blockcount;
return 0;
}
/*
* Mark any area that does not correspond to a CoW staging rmap. These are
* cross-linked areas that must be avoided.
*/
STATIC int
xrep_cow_mark_missing_staging_rmap(
struct xfs_btree_cur *cur,
const struct xfs_rmap_irec *rec,
void *priv)
{
struct xrep_cow *xc = priv;
xfs_fsblock_t fsbno;
xfs_agblock_t rec_bno;
xfs_extlen_t rec_len;
unsigned int adj;
if (rec->rm_owner == XFS_RMAP_OWN_COW)
return 0;
rec_bno = rec->rm_startblock;
rec_len = rec->rm_blockcount;
if (rec_bno < xc->irec_startbno) {
adj = xc->irec_startbno - rec_bno;
rec_len -= adj;
rec_bno += adj;
}
if (rec_bno + rec_len > xc->irec_startbno + xc->irec.br_blockcount) {
adj = (rec_bno + rec_len) -
(xc->irec_startbno + xc->irec.br_blockcount);
rec_len -= adj;
}
fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, rec_bno);
return xrep_cow_mark_file_range(xc, fsbno, rec_len);
}
/*
* Find any part of the CoW fork mapping that isn't a single-owner CoW staging
* extent and mark the corresponding part of the file range in the bitmap.
*/
STATIC int
xrep_cow_find_bad(
struct xrep_cow *xc)
{
struct xfs_refcount_irec rc_low = { 0 };
struct xfs_refcount_irec rc_high = { 0 };
struct xfs_rmap_irec rm_low = { 0 };
struct xfs_rmap_irec rm_high = { 0 };
struct xfs_perag *pag;
struct xfs_scrub *sc = xc->sc;
xfs_agnumber_t agno;
int error;
agno = XFS_FSB_TO_AGNO(sc->mp, xc->irec.br_startblock);
xc->irec_startbno = XFS_FSB_TO_AGBNO(sc->mp, xc->irec.br_startblock);
pag = xfs_perag_get(sc->mp, agno);
if (!pag)
return -EFSCORRUPTED;
error = xrep_ag_init(sc, pag, &sc->sa);
if (error)
goto out_pag;
/* Mark any CoW fork extents that are shared. */
rc_low.rc_startblock = xc->irec_startbno;
rc_high.rc_startblock = xc->irec_startbno + xc->irec.br_blockcount - 1;
rc_low.rc_domain = rc_high.rc_domain = XFS_REFC_DOMAIN_SHARED;
error = xfs_refcount_query_range(sc->sa.refc_cur, &rc_low, &rc_high,
xrep_cow_mark_shared_staging, xc);
if (error)
goto out_sa;
/* Make sure there are CoW staging extents for the whole mapping. */
rc_low.rc_startblock = xc->irec_startbno;
rc_high.rc_startblock = xc->irec_startbno + xc->irec.br_blockcount - 1;
rc_low.rc_domain = rc_high.rc_domain = XFS_REFC_DOMAIN_COW;
xc->next_bno = xc->irec_startbno;
error = xfs_refcount_query_range(sc->sa.refc_cur, &rc_low, &rc_high,
xrep_cow_mark_missing_staging, xc);
if (error)
goto out_sa;
if (xc->next_bno < xc->irec_startbno + xc->irec.br_blockcount) {
error = xrep_cow_mark_file_range(xc,
XFS_AGB_TO_FSB(sc->mp, pag->pag_agno,
xc->next_bno),
xc->irec_startbno + xc->irec.br_blockcount -
xc->next_bno);
if (error)
goto out_sa;
}
/* Mark any area has an rmap that isn't a COW staging extent. */
rm_low.rm_startblock = xc->irec_startbno;
memset(&rm_high, 0xFF, sizeof(rm_high));
rm_high.rm_startblock = xc->irec_startbno + xc->irec.br_blockcount - 1;
error = xfs_rmap_query_range(sc->sa.rmap_cur, &rm_low, &rm_high,
xrep_cow_mark_missing_staging_rmap, xc);
if (error)
goto out_sa;
/*
* If userspace is forcing us to rebuild the CoW fork or someone turned
* on the debugging knob, replace everything in the CoW fork.
*/
if ((sc->sm->sm_flags & XFS_SCRUB_IFLAG_FORCE_REBUILD) ||
XFS_TEST_ERROR(false, sc->mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR)) {
error = xrep_cow_mark_file_range(xc, xc->irec.br_startblock,
xc->irec.br_blockcount);
if (error)
return error;
}
out_sa:
xchk_ag_free(sc, &sc->sa);
out_pag:
xfs_perag_put(pag);
return 0;
}
/*
* Allocate a replacement CoW staging extent of up to the given number of
* blocks, and fill out the mapping.
*/
STATIC int
xrep_cow_alloc(
struct xfs_scrub *sc,
xfs_extlen_t maxlen,
struct xrep_cow_extent *repl)
{
struct xfs_alloc_arg args = {
.tp = sc->tp,
.mp = sc->mp,
.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE,
.minlen = 1,
.maxlen = maxlen,
.prod = 1,
.resv = XFS_AG_RESV_NONE,
.datatype = XFS_ALLOC_USERDATA,
};
int error;
error = xfs_trans_reserve_more(sc->tp, maxlen, 0);
if (error)
return error;
error = xfs_alloc_vextent_start_ag(&args,
XFS_INO_TO_FSB(sc->mp, sc->ip->i_ino));
if (error)
return error;
if (args.fsbno == NULLFSBLOCK)
return -ENOSPC;
xfs_refcount_alloc_cow_extent(sc->tp, args.fsbno, args.len);
repl->fsbno = args.fsbno;
repl->len = args.len;
return 0;
}
/*
* Look up the current CoW fork mapping so that we only allocate enough to
* replace a single mapping. If we don't find a mapping that covers the start
* of the file range, or we find a delalloc or written extent, something is
* seriously wrong, since we didn't drop the ILOCK.
*/
static inline int
xrep_cow_find_mapping(
struct xrep_cow *xc,
struct xfs_iext_cursor *icur,
xfs_fileoff_t startoff,
struct xfs_bmbt_irec *got)
{
struct xfs_inode *ip = xc->sc->ip;
struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_COW_FORK);
if (!xfs_iext_lookup_extent(ip, ifp, startoff, icur, got))
goto bad;
if (got->br_startoff > startoff)
goto bad;
if (got->br_blockcount == 0)
goto bad;
if (isnullstartblock(got->br_startblock))
goto bad;
if (xfs_bmap_is_written_extent(got))
goto bad;
return 0;
bad:
ASSERT(0);
return -EFSCORRUPTED;
}
#define REPLACE_LEFT_SIDE (1U << 0)
#define REPLACE_RIGHT_SIDE (1U << 1)
/*
* Given a CoW fork mapping @got and a replacement mapping @repl, remap the
* beginning of @got with the space described by @rep.
*/
static inline void
xrep_cow_replace_mapping(
struct xfs_inode *ip,
struct xfs_iext_cursor *icur,
const struct xfs_bmbt_irec *got,
const struct xrep_cow_extent *repl)
{
struct xfs_bmbt_irec new = *got; /* struct copy */
ASSERT(repl->len > 0);
ASSERT(!isnullstartblock(got->br_startblock));
trace_xrep_cow_replace_mapping(ip, got, repl->fsbno, repl->len);
if (got->br_blockcount == repl->len) {
/*
* The new extent is a complete replacement for the existing
* extent. Update the COW fork record.
*/
new.br_startblock = repl->fsbno;
xfs_iext_update_extent(ip, BMAP_COWFORK, icur, &new);
return;
}
/*
* The new extent can replace the beginning of the COW fork record.
* Move the left side of @got upwards, then insert the new record.
*/
new.br_startoff += repl->len;
new.br_startblock += repl->len;
new.br_blockcount -= repl->len;
xfs_iext_update_extent(ip, BMAP_COWFORK, icur, &new);
new.br_startoff = got->br_startoff;
new.br_startblock = repl->fsbno;
new.br_blockcount = repl->len;
xfs_iext_insert(ip, icur, &new, BMAP_COWFORK);
}
/*
* Replace the unwritten CoW staging extent backing the given file range with a
* new space extent that isn't as problematic.
*/
STATIC int
xrep_cow_replace_range(
struct xrep_cow *xc,
xfs_fileoff_t startoff,
xfs_extlen_t *blockcount)
{
struct xfs_iext_cursor icur;
struct xrep_cow_extent repl;
struct xfs_bmbt_irec got;
struct xfs_scrub *sc = xc->sc;
xfs_fileoff_t nextoff;
xfs_extlen_t alloc_len;
int error;
/*
* Put the existing CoW fork mapping in @got. If @got ends before
* @rep, truncate @rep so we only replace one extent mapping at a time.
*/
error = xrep_cow_find_mapping(xc, &icur, startoff, &got);
if (error)
return error;
nextoff = min(startoff + *blockcount,
got.br_startoff + got.br_blockcount);
/*
* Allocate a replacement extent. If we don't fill all the blocks,
* shorten the quantity that will be deleted in this step.
*/
alloc_len = min_t(xfs_fileoff_t, XFS_MAX_BMBT_EXTLEN,
nextoff - startoff);
error = xrep_cow_alloc(sc, alloc_len, &repl);
if (error)
return error;
/*
* Replace the old mapping with the new one, and commit the metadata
* changes made so far.
*/
xrep_cow_replace_mapping(sc->ip, &icur, &got, &repl);
xfs_inode_set_cowblocks_tag(sc->ip);
error = xfs_defer_finish(&sc->tp);
if (error)
return error;
/* Note the old CoW staging extents; we'll reap them all later. */
error = xfsb_bitmap_set(&xc->old_cowfork_fsblocks, got.br_startblock,
repl.len);
if (error)
return error;
*blockcount = repl.len;
return 0;
}
/*
* Replace a bad part of an unwritten CoW staging extent with a fresh delalloc
* reservation.
*/
STATIC int
xrep_cow_replace(
uint64_t startoff,
uint64_t blockcount,
void *priv)
{
struct xrep_cow *xc = priv;
int error = 0;
while (blockcount > 0) {
xfs_extlen_t len = min_t(xfs_filblks_t, blockcount,
XFS_MAX_BMBT_EXTLEN);
error = xrep_cow_replace_range(xc, startoff, &len);
if (error)
break;
blockcount -= len;
startoff += len;
}
return error;
}
/*
* Repair an inode's CoW fork. The CoW fork is an in-core structure, so
* there's no btree to rebuid. Instead, we replace any mappings that are
* cross-linked or lack ondisk CoW fork records in the refcount btree.
*/
int
xrep_bmap_cow(
struct xfs_scrub *sc)
{
struct xrep_cow *xc;
struct xfs_iext_cursor icur;
struct xfs_ifork *ifp = xfs_ifork_ptr(sc->ip, XFS_COW_FORK);
int error;
if (!xfs_has_rmapbt(sc->mp) || !xfs_has_reflink(sc->mp))
return -EOPNOTSUPP;
if (!ifp)
return 0;
/* realtime files aren't supported yet */
if (XFS_IS_REALTIME_INODE(sc->ip))
return -EOPNOTSUPP;
/*
* If we're somehow not in extents format, then reinitialize it to
* an empty extent mapping fork and exit.
*/
if (ifp->if_format != XFS_DINODE_FMT_EXTENTS) {
ifp->if_format = XFS_DINODE_FMT_EXTENTS;
ifp->if_nextents = 0;
return 0;
}
xc = kzalloc(sizeof(struct xrep_cow), XCHK_GFP_FLAGS);
if (!xc)
return -ENOMEM;
xfs_trans_ijoin(sc->tp, sc->ip, 0);
xc->sc = sc;
xoff_bitmap_init(&xc->bad_fileoffs);
xfsb_bitmap_init(&xc->old_cowfork_fsblocks);
for_each_xfs_iext(ifp, &icur, &xc->irec) {
if (xchk_should_terminate(sc, &error))
goto out_bitmap;
/*
* delalloc reservations only exist incore, so there is no
* ondisk metadata that we can examine. Hence we leave them
* alone.
*/
if (isnullstartblock(xc->irec.br_startblock))
continue;
/*
* COW fork extents are only in the written state if writeback
* is actively writing to disk. We cannot restart the write
* at a different disk address since we've already issued the
* IO, so we leave these alone and hope for the best.
*/
if (xfs_bmap_is_written_extent(&xc->irec))
continue;
error = xrep_cow_find_bad(xc);
if (error)
goto out_bitmap;
}
/* Replace any bad unwritten mappings with fresh reservations. */
error = xoff_bitmap_walk(&xc->bad_fileoffs, xrep_cow_replace, xc);
if (error)
goto out_bitmap;
/*
* Reap as many of the old CoW blocks as we can. They are owned ondisk
* by the refcount btree, not the inode, so it is correct to treat them
* like inode metadata.
*/
error = xrep_reap_fsblocks(sc, &xc->old_cowfork_fsblocks,
&XFS_RMAP_OINFO_COW);
if (error)
goto out_bitmap;
out_bitmap:
xfsb_bitmap_destroy(&xc->old_cowfork_fsblocks);
xoff_bitmap_destroy(&xc->bad_fileoffs);
kmem_free(xc);
return error;
}

View File

@ -15,10 +15,12 @@
#include "xfs_icache.h" #include "xfs_icache.h"
#include "xfs_dir2.h" #include "xfs_dir2.h"
#include "xfs_dir2_priv.h" #include "xfs_dir2_priv.h"
#include "xfs_health.h"
#include "scrub/scrub.h" #include "scrub/scrub.h"
#include "scrub/common.h" #include "scrub/common.h"
#include "scrub/dabtree.h" #include "scrub/dabtree.h"
#include "scrub/readdir.h" #include "scrub/readdir.h"
#include "scrub/health.h"
/* Set us up to scrub directories. */ /* Set us up to scrub directories. */
int int
@ -760,6 +762,11 @@ xchk_directory(
if (!S_ISDIR(VFS_I(sc->ip)->i_mode)) if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
return -ENOENT; return -ENOENT;
if (xchk_file_looks_zapped(sc, XFS_SICK_INO_DIR_ZAPPED)) {
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
return 0;
}
/* Plausible size? */ /* Plausible size? */
if (sc->ip->i_disk_size < xfs_dir2_sf_hdr_size(0)) { if (sc->ip->i_disk_size < xfs_dir2_sf_hdr_size(0)) {
xchk_ino_set_corrupt(sc, sc->ip->i_ino); xchk_ino_set_corrupt(sc, sc->ip->i_ino);
@ -784,7 +791,36 @@ xchk_directory(
/* Look up every name in this directory by hash. */ /* Look up every name in this directory by hash. */
error = xchk_dir_walk(sc, sc->ip, xchk_dir_actor, NULL); error = xchk_dir_walk(sc, sc->ip, xchk_dir_actor, NULL);
if (error == -ECANCELED) if (error && error != -ECANCELED)
error = 0; return error;
return error;
/* If the dir is clean, it is clearly not zapped. */
xchk_mark_healthy_if_clean(sc, XFS_SICK_INO_DIR_ZAPPED);
return 0;
}
/*
* Decide if this directory has been zapped to satisfy the inode and ifork
* verifiers. Checking and repairing should be postponed until the directory
* is fixed.
*/
bool
xchk_dir_looks_zapped(
struct xfs_inode *dp)
{
/* Repair zapped this dir's data fork a short time ago */
if (xfs_ifork_zapped(dp, XFS_DATA_FORK))
return true;
/*
* If the dinode repair found a bad data fork, it will reset the fork
* to extents format with zero records and wait for the bmapbtd
* scrubber to reconstruct the block mappings. Directories always
* contain some content, so this is a clear sign of a zapped directory.
* The state checked by xfs_ifork_zapped is not persisted, so this is
* the secondary strategy if repairs are interrupted by a crash or an
* unmount.
*/
return dp->i_df.if_format == XFS_DINODE_FMT_EXTENTS &&
dp->i_df.if_nextents == 0;
} }

211
fs/xfs/scrub/dqiterate.c Normal file
View File

@ -0,0 +1,211 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_bit.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_inode.h"
#include "xfs_quota.h"
#include "xfs_qm.h"
#include "xfs_bmap.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/quota.h"
#include "scrub/trace.h"
/* Initialize a dquot iteration cursor. */
void
xchk_dqiter_init(
struct xchk_dqiter *cursor,
struct xfs_scrub *sc,
xfs_dqtype_t dqtype)
{
cursor->sc = sc;
cursor->bmap.br_startoff = NULLFILEOFF;
cursor->dqtype = dqtype & XFS_DQTYPE_REC_MASK;
cursor->quota_ip = xfs_quota_inode(sc->mp, cursor->dqtype);
cursor->id = 0;
}
/*
* Ensure that the cached data fork mapping for the dqiter cursor is fresh and
* covers the dquot pointed to by the scan cursor.
*/
STATIC int
xchk_dquot_iter_revalidate_bmap(
struct xchk_dqiter *cursor)
{
struct xfs_quotainfo *qi = cursor->sc->mp->m_quotainfo;
struct xfs_ifork *ifp = xfs_ifork_ptr(cursor->quota_ip,
XFS_DATA_FORK);
xfs_fileoff_t fileoff;
xfs_dqid_t this_id = cursor->id;
int nmaps = 1;
int error;
fileoff = this_id / qi->qi_dqperchunk;
/*
* If we have a mapping for cursor->id and it's still fresh, there's
* no need to reread the bmbt.
*/
if (cursor->bmap.br_startoff != NULLFILEOFF &&
cursor->if_seq == ifp->if_seq &&
cursor->bmap.br_startoff + cursor->bmap.br_blockcount > fileoff)
return 0;
/* Look up the data fork mapping for the dquot id of interest. */
error = xfs_bmapi_read(cursor->quota_ip, fileoff,
XFS_MAX_FILEOFF - fileoff, &cursor->bmap, &nmaps, 0);
if (error)
return error;
if (!nmaps) {
ASSERT(nmaps > 0);
return -EFSCORRUPTED;
}
if (cursor->bmap.br_startoff > fileoff) {
ASSERT(cursor->bmap.br_startoff == fileoff);
return -EFSCORRUPTED;
}
cursor->if_seq = ifp->if_seq;
trace_xchk_dquot_iter_revalidate_bmap(cursor, cursor->id);
return 0;
}
/* Advance the dqiter cursor to the next non-sparse region of the quota file. */
STATIC int
xchk_dquot_iter_advance_bmap(
struct xchk_dqiter *cursor,
uint64_t *next_ondisk_id)
{
struct xfs_quotainfo *qi = cursor->sc->mp->m_quotainfo;
struct xfs_ifork *ifp = xfs_ifork_ptr(cursor->quota_ip,
XFS_DATA_FORK);
xfs_fileoff_t fileoff;
uint64_t next_id;
int nmaps = 1;
int error;
/* Find the dquot id for the next non-hole mapping. */
do {
fileoff = cursor->bmap.br_startoff + cursor->bmap.br_blockcount;
if (fileoff > XFS_DQ_ID_MAX / qi->qi_dqperchunk) {
/* The hole goes beyond the max dquot id, we're done */
*next_ondisk_id = -1ULL;
return 0;
}
error = xfs_bmapi_read(cursor->quota_ip, fileoff,
XFS_MAX_FILEOFF - fileoff, &cursor->bmap,
&nmaps, 0);
if (error)
return error;
if (!nmaps) {
/* Must have reached the end of the mappings. */
*next_ondisk_id = -1ULL;
return 0;
}
if (cursor->bmap.br_startoff > fileoff) {
ASSERT(cursor->bmap.br_startoff == fileoff);
return -EFSCORRUPTED;
}
} while (!xfs_bmap_is_real_extent(&cursor->bmap));
next_id = cursor->bmap.br_startoff * qi->qi_dqperchunk;
if (next_id > XFS_DQ_ID_MAX) {
/* The hole goes beyond the max dquot id, we're done */
*next_ondisk_id = -1ULL;
return 0;
}
/* Propose jumping forward to the dquot in the next allocated block. */
*next_ondisk_id = next_id;
cursor->if_seq = ifp->if_seq;
trace_xchk_dquot_iter_advance_bmap(cursor, *next_ondisk_id);
return 0;
}
/*
* Find the id of the next highest incore dquot. Normally this will correspond
* exactly with the quota file block mappings, but repair might have erased a
* mapping because it was crosslinked; in that case, we need to re-allocate the
* space so that we can reset q_blkno.
*/
STATIC void
xchk_dquot_iter_advance_incore(
struct xchk_dqiter *cursor,
uint64_t *next_incore_id)
{
struct xfs_quotainfo *qi = cursor->sc->mp->m_quotainfo;
struct radix_tree_root *tree = xfs_dquot_tree(qi, cursor->dqtype);
struct xfs_dquot *dq;
unsigned int nr_found;
*next_incore_id = -1ULL;
mutex_lock(&qi->qi_tree_lock);
nr_found = radix_tree_gang_lookup(tree, (void **)&dq, cursor->id, 1);
if (nr_found)
*next_incore_id = dq->q_id;
mutex_unlock(&qi->qi_tree_lock);
trace_xchk_dquot_iter_advance_incore(cursor, *next_incore_id);
}
/*
* Walk all incore dquots of this filesystem. Caller must set *@cursorp to
* zero before the first call, and must not hold the quota file ILOCK.
* Returns 1 and a valid *@dqpp; 0 and *@dqpp == NULL when there are no more
* dquots to iterate; or a negative errno.
*/
int
xchk_dquot_iter(
struct xchk_dqiter *cursor,
struct xfs_dquot **dqpp)
{
struct xfs_mount *mp = cursor->sc->mp;
struct xfs_dquot *dq = NULL;
uint64_t next_ondisk, next_incore = -1ULL;
unsigned int lock_mode;
int error = 0;
if (cursor->id > XFS_DQ_ID_MAX)
return 0;
next_ondisk = cursor->id;
/* Revalidate and/or advance the cursor. */
lock_mode = xfs_ilock_data_map_shared(cursor->quota_ip);
error = xchk_dquot_iter_revalidate_bmap(cursor);
if (!error && !xfs_bmap_is_real_extent(&cursor->bmap))
error = xchk_dquot_iter_advance_bmap(cursor, &next_ondisk);
xfs_iunlock(cursor->quota_ip, lock_mode);
if (error)
return error;
if (next_ondisk > cursor->id)
xchk_dquot_iter_advance_incore(cursor, &next_incore);
/* Pick the next dquot in the sequence and return it. */
cursor->id = min(next_ondisk, next_incore);
if (cursor->id > XFS_DQ_ID_MAX)
return 0;
trace_xchk_dquot_iter(cursor, cursor->id);
error = xfs_qm_dqget(mp, cursor->id, cursor->dqtype, false, &dq);
if (error)
return error;
cursor->id = dq->q_id + 1;
*dqpp = dq;
return 1;
}

37
fs/xfs/scrub/fsb_bitmap.h Normal file
View File

@ -0,0 +1,37 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2018-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#ifndef __XFS_SCRUB_FSB_BITMAP_H__
#define __XFS_SCRUB_FSB_BITMAP_H__
/* Bitmaps, but for type-checked for xfs_fsblock_t */
struct xfsb_bitmap {
struct xbitmap64 fsbitmap;
};
static inline void xfsb_bitmap_init(struct xfsb_bitmap *bitmap)
{
xbitmap64_init(&bitmap->fsbitmap);
}
static inline void xfsb_bitmap_destroy(struct xfsb_bitmap *bitmap)
{
xbitmap64_destroy(&bitmap->fsbitmap);
}
static inline int xfsb_bitmap_set(struct xfsb_bitmap *bitmap,
xfs_fsblock_t start, xfs_filblks_t len)
{
return xbitmap64_set(&bitmap->fsbitmap, start, len);
}
static inline int xfsb_bitmap_walk(struct xfsb_bitmap *bitmap,
xbitmap64_walk_fn fn, void *priv)
{
return xbitmap64_walk(&bitmap->fsbitmap, fn, priv);
}
#endif /* __XFS_SCRUB_FSB_BITMAP_H__ */

View File

@ -10,8 +10,6 @@
#include "xfs_trans_resv.h" #include "xfs_trans_resv.h"
#include "xfs_mount.h" #include "xfs_mount.h"
#include "xfs_btree.h" #include "xfs_btree.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_ag.h" #include "xfs_ag.h"
#include "xfs_health.h" #include "xfs_health.h"
#include "scrub/scrub.h" #include "scrub/scrub.h"
@ -117,6 +115,38 @@ xchk_health_mask_for_scrub_type(
return type_to_health_flag[scrub_type].sick_mask; return type_to_health_flag[scrub_type].sick_mask;
} }
/*
* If the scrub state is clean, add @mask to the scrub sick mask to clear
* additional sick flags from the metadata object's sick state.
*/
void
xchk_mark_healthy_if_clean(
struct xfs_scrub *sc,
unsigned int mask)
{
if (!(sc->sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
XFS_SCRUB_OFLAG_XCORRUPT)))
sc->sick_mask |= mask;
}
/*
* If we're scrubbing a piece of file metadata for the first time, does it look
* like it has been zapped? Skip the check if we just repaired the metadata
* and are revalidating it.
*/
bool
xchk_file_looks_zapped(
struct xfs_scrub *sc,
unsigned int mask)
{
ASSERT((mask & ~XFS_SICK_INO_ZAPPED) == 0);
if (sc->flags & XREP_ALREADY_FIXED)
return false;
return xfs_inode_has_sickness(sc->ip, mask);
}
/* /*
* Update filesystem health assessments based on what we found and did. * Update filesystem health assessments based on what we found and did.
* *

View File

@ -10,5 +10,7 @@ unsigned int xchk_health_mask_for_scrub_type(__u32 scrub_type);
void xchk_update_health(struct xfs_scrub *sc); void xchk_update_health(struct xfs_scrub *sc);
bool xchk_ag_btree_healthy_enough(struct xfs_scrub *sc, struct xfs_perag *pag, bool xchk_ag_btree_healthy_enough(struct xfs_scrub *sc, struct xfs_perag *pag,
xfs_btnum_t btnum); xfs_btnum_t btnum);
void xchk_mark_healthy_if_clean(struct xfs_scrub *sc, unsigned int mask);
bool xchk_file_looks_zapped(struct xfs_scrub *sc, unsigned int mask);
#endif /* __XFS_SCRUB_HEALTH_H__ */ #endif /* __XFS_SCRUB_HEALTH_H__ */

View File

@ -585,7 +585,7 @@ xchk_iallocbt_rec(
uint16_t holemask; uint16_t holemask;
xfs_inobt_btrec_to_irec(mp, rec, &irec); xfs_inobt_btrec_to_irec(mp, rec, &irec);
if (xfs_inobt_check_irec(bs->cur, &irec) != NULL) { if (xfs_inobt_check_irec(bs->cur->bc_ag.pag, &irec) != NULL) {
xchk_btree_set_corrupt(bs->sc, bs->cur, 0); xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
return 0; return 0;
} }
@ -708,11 +708,10 @@ xchk_iallocbt_xref_rmap_inodes(
xchk_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0); xchk_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0);
} }
/* Scrub the inode btrees for some AG. */ /* Scrub one of the inode btrees for some AG. */
STATIC int int
xchk_iallocbt( xchk_iallocbt(
struct xfs_scrub *sc, struct xfs_scrub *sc)
xfs_btnum_t which)
{ {
struct xfs_btree_cur *cur; struct xfs_btree_cur *cur;
struct xchk_iallocbt iabt = { struct xchk_iallocbt iabt = {
@ -720,9 +719,23 @@ xchk_iallocbt(
.next_startino = NULLAGINO, .next_startino = NULLAGINO,
.next_cluster_ino = NULLAGINO, .next_cluster_ino = NULLAGINO,
}; };
xfs_btnum_t which;
int error; int error;
cur = which == XFS_BTNUM_INO ? sc->sa.ino_cur : sc->sa.fino_cur; switch (sc->sm->sm_type) {
case XFS_SCRUB_TYPE_INOBT:
cur = sc->sa.ino_cur;
which = XFS_BTNUM_INO;
break;
case XFS_SCRUB_TYPE_FINOBT:
cur = sc->sa.fino_cur;
which = XFS_BTNUM_FINO;
break;
default:
ASSERT(0);
return -EIO;
}
error = xchk_btree(sc, cur, xchk_iallocbt_rec, &XFS_RMAP_OINFO_INOBT, error = xchk_btree(sc, cur, xchk_iallocbt_rec, &XFS_RMAP_OINFO_INOBT,
&iabt); &iabt);
if (error) if (error)
@ -743,20 +756,6 @@ xchk_iallocbt(
return error; return error;
} }
int
xchk_inobt(
struct xfs_scrub *sc)
{
return xchk_iallocbt(sc, XFS_BTNUM_INO);
}
int
xchk_finobt(
struct xfs_scrub *sc)
{
return xchk_iallocbt(sc, XFS_BTNUM_FINO);
}
/* See if an inode btree has (or doesn't have) an inode chunk record. */ /* See if an inode btree has (or doesn't have) an inode chunk record. */
static inline void static inline void
xchk_xref_inode_check( xchk_xref_inode_check(

View File

@ -0,0 +1,884 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2018-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_btree.h"
#include "xfs_btree_staging.h"
#include "xfs_bit.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_inode.h"
#include "xfs_alloc.h"
#include "xfs_ialloc.h"
#include "xfs_ialloc_btree.h"
#include "xfs_icache.h"
#include "xfs_rmap.h"
#include "xfs_rmap_btree.h"
#include "xfs_log.h"
#include "xfs_trans_priv.h"
#include "xfs_error.h"
#include "xfs_health.h"
#include "xfs_ag.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/btree.h"
#include "scrub/trace.h"
#include "scrub/repair.h"
#include "scrub/bitmap.h"
#include "scrub/agb_bitmap.h"
#include "scrub/xfile.h"
#include "scrub/xfarray.h"
#include "scrub/newbt.h"
#include "scrub/reap.h"
/*
* Inode Btree Repair
* ==================
*
* A quick refresher of inode btrees on a v5 filesystem:
*
* - Inode records are read into memory in units of 'inode clusters'. However
* many inodes fit in a cluster buffer is the smallest number of inodes that
* can be allocated or freed. Clusters are never smaller than one fs block
* though they can span multiple blocks. The size (in fs blocks) is
* computed with xfs_icluster_size_fsb(). The fs block alignment of a
* cluster is computed with xfs_ialloc_cluster_alignment().
*
* - Each inode btree record can describe a single 'inode chunk'. The chunk
* size is defined to be 64 inodes. If sparse inodes are enabled, every
* inobt record must be aligned to the chunk size; if not, every record must
* be aligned to the start of a cluster. It is possible to construct an XFS
* geometry where one inobt record maps to multiple inode clusters; it is
* also possible to construct a geometry where multiple inobt records map to
* different parts of one inode cluster.
*
* - If sparse inodes are not enabled, the smallest unit of allocation for
* inode records is enough to contain one inode chunk's worth of inodes.
*
* - If sparse inodes are enabled, the holemask field will be active. Each
* bit of the holemask represents 4 potential inodes; if set, the
* corresponding space does *not* contain inodes and must be left alone.
* Clusters cannot be smaller than 4 inodes. The smallest unit of allocation
* of inode records is one inode cluster.
*
* So what's the rebuild algorithm?
*
* Iterate the reverse mapping records looking for OWN_INODES and OWN_INOBT
* records. The OWN_INOBT records are the old inode btree blocks and will be
* cleared out after we've rebuilt the tree. Each possible inode cluster
* within an OWN_INODES record will be read in; for each possible inobt record
* associated with that cluster, compute the freemask calculated from the
* i_mode data in the inode chunk. For sparse inodes the holemask will be
* calculated by creating the properly aligned inobt record and punching out
* any chunk that's missing. Inode allocations and frees grab the AGI first,
* so repair protects itself from concurrent access by locking the AGI.
*
* Once we've reconstructed all the inode records, we can create new inode
* btree roots and reload the btrees. We rebuild both inode trees at the same
* time because they have the same rmap owner and it would be more complex to
* figure out if the other tree isn't in need of a rebuild and which OWN_INOBT
* blocks it owns. We have all the data we need to build both, so dump
* everything and start over.
*
* We use the prefix 'xrep_ibt' because we rebuild both inode btrees at once.
*/
struct xrep_ibt {
/* Record under construction. */
struct xfs_inobt_rec_incore rie;
/* new inobt information */
struct xrep_newbt new_inobt;
/* new finobt information */
struct xrep_newbt new_finobt;
/* Old inode btree blocks we found in the rmap. */
struct xagb_bitmap old_iallocbt_blocks;
/* Reconstructed inode records. */
struct xfarray *inode_records;
struct xfs_scrub *sc;
/* Number of inodes assigned disk space. */
unsigned int icount;
/* Number of inodes in use. */
unsigned int iused;
/* Number of finobt records needed. */
unsigned int finobt_recs;
/* get_records()'s position in the inode record array. */
xfarray_idx_t array_cur;
};
/*
* Is this inode in use? If the inode is in memory we can tell from i_mode,
* otherwise we have to check di_mode in the on-disk buffer. We only care
* that the high (i.e. non-permission) bits of _mode are zero. This should be
* safe because repair keeps all AG headers locked until the end, and process
* trying to perform an inode allocation/free must lock the AGI.
*
* @cluster_ag_base is the inode offset of the cluster within the AG.
* @cluster_bp is the cluster buffer.
* @cluster_index is the inode offset within the inode cluster.
*/
STATIC int
xrep_ibt_check_ifree(
struct xrep_ibt *ri,
xfs_agino_t cluster_ag_base,
struct xfs_buf *cluster_bp,
unsigned int cluster_index,
bool *inuse)
{
struct xfs_scrub *sc = ri->sc;
struct xfs_mount *mp = sc->mp;
struct xfs_dinode *dip;
xfs_ino_t fsino;
xfs_agino_t agino;
xfs_agnumber_t agno = ri->sc->sa.pag->pag_agno;
unsigned int cluster_buf_base;
unsigned int offset;
int error;
agino = cluster_ag_base + cluster_index;
fsino = XFS_AGINO_TO_INO(mp, agno, agino);
/* Inode uncached or half assembled, read disk buffer */
cluster_buf_base = XFS_INO_TO_OFFSET(mp, cluster_ag_base);
offset = (cluster_buf_base + cluster_index) * mp->m_sb.sb_inodesize;
if (offset >= BBTOB(cluster_bp->b_length))
return -EFSCORRUPTED;
dip = xfs_buf_offset(cluster_bp, offset);
if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC)
return -EFSCORRUPTED;
if (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino)
return -EFSCORRUPTED;
/* Will the in-core inode tell us if it's in use? */
error = xchk_inode_is_allocated(sc, agino, inuse);
if (!error)
return 0;
*inuse = dip->di_mode != 0;
return 0;
}
/* Stash the accumulated inobt record for rebuilding. */
STATIC int
xrep_ibt_stash(
struct xrep_ibt *ri)
{
int error = 0;
if (xchk_should_terminate(ri->sc, &error))
return error;
ri->rie.ir_freecount = xfs_inobt_rec_freecount(&ri->rie);
if (xfs_inobt_check_irec(ri->sc->sa.pag, &ri->rie) != NULL)
return -EFSCORRUPTED;
if (ri->rie.ir_freecount > 0)
ri->finobt_recs++;
trace_xrep_ibt_found(ri->sc->mp, ri->sc->sa.pag->pag_agno, &ri->rie);
error = xfarray_append(ri->inode_records, &ri->rie);
if (error)
return error;
ri->rie.ir_startino = NULLAGINO;
return 0;
}
/*
* Given an extent of inodes and an inode cluster buffer, calculate the
* location of the corresponding inobt record (creating it if necessary),
* then update the parts of the holemask and freemask of that record that
* correspond to the inode extent we were given.
*
* @cluster_ir_startino is the AG inode number of an inobt record that we're
* proposing to create for this inode cluster. If sparse inodes are enabled,
* we must round down to a chunk boundary to find the actual sparse record.
* @cluster_bp is the buffer of the inode cluster.
* @nr_inodes is the number of inodes to check from the cluster.
*/
STATIC int
xrep_ibt_cluster_record(
struct xrep_ibt *ri,
xfs_agino_t cluster_ir_startino,
struct xfs_buf *cluster_bp,
unsigned int nr_inodes)
{
struct xfs_scrub *sc = ri->sc;
struct xfs_mount *mp = sc->mp;
xfs_agino_t ir_startino;
unsigned int cluster_base;
unsigned int cluster_index;
int error = 0;
ir_startino = cluster_ir_startino;
if (xfs_has_sparseinodes(mp))
ir_startino = rounddown(ir_startino, XFS_INODES_PER_CHUNK);
cluster_base = cluster_ir_startino - ir_startino;
/*
* If the accumulated inobt record doesn't map this cluster, add it to
* the list and reset it.
*/
if (ri->rie.ir_startino != NULLAGINO &&
ri->rie.ir_startino + XFS_INODES_PER_CHUNK <= ir_startino) {
error = xrep_ibt_stash(ri);
if (error)
return error;
}
if (ri->rie.ir_startino == NULLAGINO) {
ri->rie.ir_startino = ir_startino;
ri->rie.ir_free = XFS_INOBT_ALL_FREE;
ri->rie.ir_holemask = 0xFFFF;
ri->rie.ir_count = 0;
}
/* Record the whole cluster. */
ri->icount += nr_inodes;
ri->rie.ir_count += nr_inodes;
ri->rie.ir_holemask &= ~xfs_inobt_maskn(
cluster_base / XFS_INODES_PER_HOLEMASK_BIT,
nr_inodes / XFS_INODES_PER_HOLEMASK_BIT);
/* Which inodes within this cluster are free? */
for (cluster_index = 0; cluster_index < nr_inodes; cluster_index++) {
bool inuse = false;
error = xrep_ibt_check_ifree(ri, cluster_ir_startino,
cluster_bp, cluster_index, &inuse);
if (error)
return error;
if (!inuse)
continue;
ri->iused++;
ri->rie.ir_free &= ~XFS_INOBT_MASK(cluster_base +
cluster_index);
}
return 0;
}
/*
* For each inode cluster covering the physical extent recorded by the rmapbt,
* we must calculate the properly aligned startino of that cluster, then
* iterate each cluster to fill in used and filled masks appropriately. We
* then use the (startino, used, filled) information to construct the
* appropriate inode records.
*/
STATIC int
xrep_ibt_process_cluster(
struct xrep_ibt *ri,
xfs_agblock_t cluster_bno)
{
struct xfs_imap imap;
struct xfs_buf *cluster_bp;
struct xfs_scrub *sc = ri->sc;
struct xfs_mount *mp = sc->mp;
struct xfs_ino_geometry *igeo = M_IGEO(mp);
xfs_agino_t cluster_ag_base;
xfs_agino_t irec_index;
unsigned int nr_inodes;
int error;
nr_inodes = min_t(unsigned int, igeo->inodes_per_cluster,
XFS_INODES_PER_CHUNK);
/*
* Grab the inode cluster buffer. This is safe to do with a broken
* inobt because imap_to_bp directly maps the buffer without touching
* either inode btree.
*/
imap.im_blkno = XFS_AGB_TO_DADDR(mp, sc->sa.pag->pag_agno, cluster_bno);
imap.im_len = XFS_FSB_TO_BB(mp, igeo->blocks_per_cluster);
imap.im_boffset = 0;
error = xfs_imap_to_bp(mp, sc->tp, &imap, &cluster_bp);
if (error)
return error;
/*
* Record the contents of each possible inobt record mapping this
* cluster.
*/
cluster_ag_base = XFS_AGB_TO_AGINO(mp, cluster_bno);
for (irec_index = 0;
irec_index < igeo->inodes_per_cluster;
irec_index += XFS_INODES_PER_CHUNK) {
error = xrep_ibt_cluster_record(ri,
cluster_ag_base + irec_index, cluster_bp,
nr_inodes);
if (error)
break;
}
xfs_trans_brelse(sc->tp, cluster_bp);
return error;
}
/* Check for any obvious conflicts in the inode chunk extent. */
STATIC int
xrep_ibt_check_inode_ext(
struct xfs_scrub *sc,
xfs_agblock_t agbno,
xfs_extlen_t len)
{
struct xfs_mount *mp = sc->mp;
struct xfs_ino_geometry *igeo = M_IGEO(mp);
xfs_agino_t agino;
enum xbtree_recpacking outcome;
int error;
/* Inode records must be within the AG. */
if (!xfs_verify_agbext(sc->sa.pag, agbno, len))
return -EFSCORRUPTED;
/* The entire record must align to the inode cluster size. */
if (!IS_ALIGNED(agbno, igeo->blocks_per_cluster) ||
!IS_ALIGNED(agbno + len, igeo->blocks_per_cluster))
return -EFSCORRUPTED;
/*
* The entire record must also adhere to the inode cluster alignment
* size if sparse inodes are not enabled.
*/
if (!xfs_has_sparseinodes(mp) &&
(!IS_ALIGNED(agbno, igeo->cluster_align) ||
!IS_ALIGNED(agbno + len, igeo->cluster_align)))
return -EFSCORRUPTED;
/*
* On a sparse inode fs, this cluster could be part of a sparse chunk.
* Sparse clusters must be aligned to sparse chunk alignment.
*/
if (xfs_has_sparseinodes(mp) &&
(!IS_ALIGNED(agbno, mp->m_sb.sb_spino_align) ||
!IS_ALIGNED(agbno + len, mp->m_sb.sb_spino_align)))
return -EFSCORRUPTED;
/* Make sure the entire range of blocks are valid AG inodes. */
agino = XFS_AGB_TO_AGINO(mp, agbno);
if (!xfs_verify_agino(sc->sa.pag, agino))
return -EFSCORRUPTED;
agino = XFS_AGB_TO_AGINO(mp, agbno + len) - 1;
if (!xfs_verify_agino(sc->sa.pag, agino))
return -EFSCORRUPTED;
/* Make sure this isn't free space. */
error = xfs_alloc_has_records(sc->sa.bno_cur, agbno, len, &outcome);
if (error)
return error;
if (outcome != XBTREE_RECPACKING_EMPTY)
return -EFSCORRUPTED;
return 0;
}
/* Found a fragment of the old inode btrees; dispose of them later. */
STATIC int
xrep_ibt_record_old_btree_blocks(
struct xrep_ibt *ri,
const struct xfs_rmap_irec *rec)
{
if (!xfs_verify_agbext(ri->sc->sa.pag, rec->rm_startblock,
rec->rm_blockcount))
return -EFSCORRUPTED;
return xagb_bitmap_set(&ri->old_iallocbt_blocks, rec->rm_startblock,
rec->rm_blockcount);
}
/* Record extents that belong to inode cluster blocks. */
STATIC int
xrep_ibt_record_inode_blocks(
struct xrep_ibt *ri,
const struct xfs_rmap_irec *rec)
{
struct xfs_mount *mp = ri->sc->mp;
struct xfs_ino_geometry *igeo = M_IGEO(mp);
xfs_agblock_t cluster_base;
int error;
error = xrep_ibt_check_inode_ext(ri->sc, rec->rm_startblock,
rec->rm_blockcount);
if (error)
return error;
trace_xrep_ibt_walk_rmap(mp, ri->sc->sa.pag->pag_agno,
rec->rm_startblock, rec->rm_blockcount, rec->rm_owner,
rec->rm_offset, rec->rm_flags);
/*
* Record the free/hole masks for each inode cluster that could be
* mapped by this rmap record.
*/
for (cluster_base = 0;
cluster_base < rec->rm_blockcount;
cluster_base += igeo->blocks_per_cluster) {
error = xrep_ibt_process_cluster(ri,
rec->rm_startblock + cluster_base);
if (error)
return error;
}
return 0;
}
STATIC int
xrep_ibt_walk_rmap(
struct xfs_btree_cur *cur,
const struct xfs_rmap_irec *rec,
void *priv)
{
struct xrep_ibt *ri = priv;
int error = 0;
if (xchk_should_terminate(ri->sc, &error))
return error;
switch (rec->rm_owner) {
case XFS_RMAP_OWN_INOBT:
return xrep_ibt_record_old_btree_blocks(ri, rec);
case XFS_RMAP_OWN_INODES:
return xrep_ibt_record_inode_blocks(ri, rec);
}
return 0;
}
/*
* Iterate all reverse mappings to find the inodes (OWN_INODES) and the inode
* btrees (OWN_INOBT). Figure out if we have enough free space to reconstruct
* the inode btrees. The caller must clean up the lists if anything goes
* wrong.
*/
STATIC int
xrep_ibt_find_inodes(
struct xrep_ibt *ri)
{
struct xfs_scrub *sc = ri->sc;
int error;
ri->rie.ir_startino = NULLAGINO;
/* Collect all reverse mappings for inode blocks. */
xrep_ag_btcur_init(sc, &sc->sa);
error = xfs_rmap_query_all(sc->sa.rmap_cur, xrep_ibt_walk_rmap, ri);
xchk_ag_btcur_free(&sc->sa);
if (error)
return error;
/* If we have a record ready to go, add it to the array. */
if (ri->rie.ir_startino != NULLAGINO)
return xrep_ibt_stash(ri);
return 0;
}
/* Update the AGI counters. */
STATIC int
xrep_ibt_reset_counters(
struct xrep_ibt *ri)
{
struct xfs_scrub *sc = ri->sc;
struct xfs_agi *agi = sc->sa.agi_bp->b_addr;
unsigned int freecount = ri->icount - ri->iused;
/* Trigger inode count recalculation */
xfs_force_summary_recalc(sc->mp);
/*
* The AGI header contains extra information related to the inode
* btrees, so we must update those fields here.
*/
agi->agi_count = cpu_to_be32(ri->icount);
agi->agi_freecount = cpu_to_be32(freecount);
xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp,
XFS_AGI_COUNT | XFS_AGI_FREECOUNT);
/* Reinitialize with the values we just logged. */
return xrep_reinit_pagi(sc);
}
/* Retrieve finobt data for bulk load. */
STATIC int
xrep_fibt_get_records(
struct xfs_btree_cur *cur,
unsigned int idx,
struct xfs_btree_block *block,
unsigned int nr_wanted,
void *priv)
{
struct xfs_inobt_rec_incore *irec = &cur->bc_rec.i;
struct xrep_ibt *ri = priv;
union xfs_btree_rec *block_rec;
unsigned int loaded;
int error;
for (loaded = 0; loaded < nr_wanted; loaded++, idx++) {
do {
error = xfarray_load(ri->inode_records,
ri->array_cur++, irec);
} while (error == 0 && xfs_inobt_rec_freecount(irec) == 0);
if (error)
return error;
block_rec = xfs_btree_rec_addr(cur, idx, block);
cur->bc_ops->init_rec_from_cur(cur, block_rec);
}
return loaded;
}
/* Retrieve inobt data for bulk load. */
STATIC int
xrep_ibt_get_records(
struct xfs_btree_cur *cur,
unsigned int idx,
struct xfs_btree_block *block,
unsigned int nr_wanted,
void *priv)
{
struct xfs_inobt_rec_incore *irec = &cur->bc_rec.i;
struct xrep_ibt *ri = priv;
union xfs_btree_rec *block_rec;
unsigned int loaded;
int error;
for (loaded = 0; loaded < nr_wanted; loaded++, idx++) {
error = xfarray_load(ri->inode_records, ri->array_cur++, irec);
if (error)
return error;
block_rec = xfs_btree_rec_addr(cur, idx, block);
cur->bc_ops->init_rec_from_cur(cur, block_rec);
}
return loaded;
}
/* Feed one of the new inobt blocks to the bulk loader. */
STATIC int
xrep_ibt_claim_block(
struct xfs_btree_cur *cur,
union xfs_btree_ptr *ptr,
void *priv)
{
struct xrep_ibt *ri = priv;
return xrep_newbt_claim_block(cur, &ri->new_inobt, ptr);
}
/* Feed one of the new finobt blocks to the bulk loader. */
STATIC int
xrep_fibt_claim_block(
struct xfs_btree_cur *cur,
union xfs_btree_ptr *ptr,
void *priv)
{
struct xrep_ibt *ri = priv;
return xrep_newbt_claim_block(cur, &ri->new_finobt, ptr);
}
/* Make sure the records do not overlap in inumber address space. */
STATIC int
xrep_ibt_check_overlap(
struct xrep_ibt *ri)
{
struct xfs_inobt_rec_incore irec;
xfarray_idx_t cur;
xfs_agino_t next_agino = 0;
int error = 0;
foreach_xfarray_idx(ri->inode_records, cur) {
if (xchk_should_terminate(ri->sc, &error))
return error;
error = xfarray_load(ri->inode_records, cur, &irec);
if (error)
return error;
if (irec.ir_startino < next_agino)
return -EFSCORRUPTED;
next_agino = irec.ir_startino + XFS_INODES_PER_CHUNK;
}
return error;
}
/* Build new inode btrees and dispose of the old one. */
STATIC int
xrep_ibt_build_new_trees(
struct xrep_ibt *ri)
{
struct xfs_scrub *sc = ri->sc;
struct xfs_btree_cur *ino_cur;
struct xfs_btree_cur *fino_cur = NULL;
xfs_fsblock_t fsbno;
bool need_finobt;
int error;
need_finobt = xfs_has_finobt(sc->mp);
/*
* Create new btrees for staging all the inobt records we collected
* earlier. The records were collected in order of increasing agino,
* so we do not have to sort them. Ensure there are no overlapping
* records.
*/
error = xrep_ibt_check_overlap(ri);
if (error)
return error;
/*
* The new inode btrees will not be rooted in the AGI until we've
* successfully rebuilt the tree.
*
* Start by setting up the inobt staging cursor.
*/
fsbno = XFS_AGB_TO_FSB(sc->mp, sc->sa.pag->pag_agno,
XFS_IBT_BLOCK(sc->mp)),
xrep_newbt_init_ag(&ri->new_inobt, sc, &XFS_RMAP_OINFO_INOBT, fsbno,
XFS_AG_RESV_NONE);
ri->new_inobt.bload.claim_block = xrep_ibt_claim_block;
ri->new_inobt.bload.get_records = xrep_ibt_get_records;
ino_cur = xfs_inobt_stage_cursor(sc->sa.pag, &ri->new_inobt.afake,
XFS_BTNUM_INO);
error = xfs_btree_bload_compute_geometry(ino_cur, &ri->new_inobt.bload,
xfarray_length(ri->inode_records));
if (error)
goto err_inocur;
/* Set up finobt staging cursor. */
if (need_finobt) {
enum xfs_ag_resv_type resv = XFS_AG_RESV_METADATA;
if (sc->mp->m_finobt_nores)
resv = XFS_AG_RESV_NONE;
fsbno = XFS_AGB_TO_FSB(sc->mp, sc->sa.pag->pag_agno,
XFS_FIBT_BLOCK(sc->mp)),
xrep_newbt_init_ag(&ri->new_finobt, sc, &XFS_RMAP_OINFO_INOBT,
fsbno, resv);
ri->new_finobt.bload.claim_block = xrep_fibt_claim_block;
ri->new_finobt.bload.get_records = xrep_fibt_get_records;
fino_cur = xfs_inobt_stage_cursor(sc->sa.pag,
&ri->new_finobt.afake, XFS_BTNUM_FINO);
error = xfs_btree_bload_compute_geometry(fino_cur,
&ri->new_finobt.bload, ri->finobt_recs);
if (error)
goto err_finocur;
}
/* Last chance to abort before we start committing fixes. */
if (xchk_should_terminate(sc, &error))
goto err_finocur;
/* Reserve all the space we need to build the new btrees. */
error = xrep_newbt_alloc_blocks(&ri->new_inobt,
ri->new_inobt.bload.nr_blocks);
if (error)
goto err_finocur;
if (need_finobt) {
error = xrep_newbt_alloc_blocks(&ri->new_finobt,
ri->new_finobt.bload.nr_blocks);
if (error)
goto err_finocur;
}
/* Add all inobt records. */
ri->array_cur = XFARRAY_CURSOR_INIT;
error = xfs_btree_bload(ino_cur, &ri->new_inobt.bload, ri);
if (error)
goto err_finocur;
/* Add all finobt records. */
if (need_finobt) {
ri->array_cur = XFARRAY_CURSOR_INIT;
error = xfs_btree_bload(fino_cur, &ri->new_finobt.bload, ri);
if (error)
goto err_finocur;
}
/*
* Install the new btrees in the AG header. After this point the old
* btrees are no longer accessible and the new trees are live.
*/
xfs_inobt_commit_staged_btree(ino_cur, sc->tp, sc->sa.agi_bp);
xfs_btree_del_cursor(ino_cur, 0);
if (fino_cur) {
xfs_inobt_commit_staged_btree(fino_cur, sc->tp, sc->sa.agi_bp);
xfs_btree_del_cursor(fino_cur, 0);
}
/* Reset the AGI counters now that we've changed the inode roots. */
error = xrep_ibt_reset_counters(ri);
if (error)
goto err_finobt;
/* Free unused blocks and bitmap. */
if (need_finobt) {
error = xrep_newbt_commit(&ri->new_finobt);
if (error)
goto err_inobt;
}
error = xrep_newbt_commit(&ri->new_inobt);
if (error)
return error;
return xrep_roll_ag_trans(sc);
err_finocur:
if (need_finobt)
xfs_btree_del_cursor(fino_cur, error);
err_inocur:
xfs_btree_del_cursor(ino_cur, error);
err_finobt:
if (need_finobt)
xrep_newbt_cancel(&ri->new_finobt);
err_inobt:
xrep_newbt_cancel(&ri->new_inobt);
return error;
}
/*
* Now that we've logged the roots of the new btrees, invalidate all of the
* old blocks and free them.
*/
STATIC int
xrep_ibt_remove_old_trees(
struct xrep_ibt *ri)
{
struct xfs_scrub *sc = ri->sc;
int error;
/*
* Free the old inode btree blocks if they're not in use. It's ok to
* reap with XFS_AG_RESV_NONE even if the finobt had a per-AG
* reservation because we reset the reservation before releasing the
* AGI and AGF header buffer locks.
*/
error = xrep_reap_agblocks(sc, &ri->old_iallocbt_blocks,
&XFS_RMAP_OINFO_INOBT, XFS_AG_RESV_NONE);
if (error)
return error;
/*
* If the finobt is enabled and has a per-AG reservation, make sure we
* reinitialize the per-AG reservations.
*/
if (xfs_has_finobt(sc->mp) && !sc->mp->m_finobt_nores)
sc->flags |= XREP_RESET_PERAG_RESV;
return 0;
}
/* Repair both inode btrees. */
int
xrep_iallocbt(
struct xfs_scrub *sc)
{
struct xrep_ibt *ri;
struct xfs_mount *mp = sc->mp;
char *descr;
xfs_agino_t first_agino, last_agino;
int error = 0;
/* We require the rmapbt to rebuild anything. */
if (!xfs_has_rmapbt(mp))
return -EOPNOTSUPP;
ri = kzalloc(sizeof(struct xrep_ibt), XCHK_GFP_FLAGS);
if (!ri)
return -ENOMEM;
ri->sc = sc;
/* We rebuild both inode btrees. */
sc->sick_mask = XFS_SICK_AG_INOBT | XFS_SICK_AG_FINOBT;
/* Set up enough storage to handle an AG with nothing but inodes. */
xfs_agino_range(mp, sc->sa.pag->pag_agno, &first_agino, &last_agino);
last_agino /= XFS_INODES_PER_CHUNK;
descr = xchk_xfile_ag_descr(sc, "inode index records");
error = xfarray_create(descr, last_agino,
sizeof(struct xfs_inobt_rec_incore),
&ri->inode_records);
kfree(descr);
if (error)
goto out_ri;
/* Collect the inode data and find the old btree blocks. */
xagb_bitmap_init(&ri->old_iallocbt_blocks);
error = xrep_ibt_find_inodes(ri);
if (error)
goto out_bitmap;
/* Rebuild the inode indexes. */
error = xrep_ibt_build_new_trees(ri);
if (error)
goto out_bitmap;
/* Kill the old tree. */
error = xrep_ibt_remove_old_trees(ri);
if (error)
goto out_bitmap;
out_bitmap:
xagb_bitmap_destroy(&ri->old_iallocbt_blocks);
xfarray_destroy(ri->inode_records);
out_ri:
kfree(ri);
return error;
}
/* Make sure both btrees are ok after we've rebuilt them. */
int
xrep_revalidate_iallocbt(
struct xfs_scrub *sc)
{
__u32 old_type = sc->sm->sm_type;
int error;
/*
* We must update sm_type temporarily so that the tree-to-tree cross
* reference checks will work in the correct direction, and also so
* that tracing will report correctly if there are more errors.
*/
sc->sm->sm_type = XFS_SCRUB_TYPE_INOBT;
error = xchk_iallocbt(sc);
if (error)
goto out;
if (xfs_has_finobt(sc->mp)) {
sc->sm->sm_type = XFS_SCRUB_TYPE_FINOBT;
error = xchk_iallocbt(sc);
}
out:
sc->sm->sm_type = old_type;
return error;
}

View File

@ -25,6 +25,7 @@
#include "scrub/common.h" #include "scrub/common.h"
#include "scrub/btree.h" #include "scrub/btree.h"
#include "scrub/trace.h" #include "scrub/trace.h"
#include "scrub/repair.h"
/* Prepare the attached inode for scrubbing. */ /* Prepare the attached inode for scrubbing. */
static inline int static inline int
@ -39,6 +40,10 @@ xchk_prepare_iscrub(
if (error) if (error)
return error; return error;
error = xchk_ino_dqattach(sc);
if (error)
return error;
xchk_ilock(sc, XFS_ILOCK_EXCL); xchk_ilock(sc, XFS_ILOCK_EXCL);
return 0; return 0;
} }
@ -95,8 +100,8 @@ xchk_setup_inode(
if (!xfs_verify_ino(sc->mp, sc->sm->sm_ino)) if (!xfs_verify_ino(sc->mp, sc->sm->sm_ino))
return -ENOENT; return -ENOENT;
/* Try a regular untrusted iget. */ /* Try a safe untrusted iget. */
error = xchk_iget(sc, sc->sm->sm_ino, &ip); error = xchk_iget_safe(sc, sc->sm->sm_ino, &ip);
if (!error) if (!error)
return xchk_install_handle_iscrub(sc, ip); return xchk_install_handle_iscrub(sc, ip);
if (error == -ENOENT) if (error == -ENOENT)
@ -181,8 +186,11 @@ xchk_setup_inode(
* saying the inode is allocated and the icache being unable to load * saying the inode is allocated and the icache being unable to load
* the inode until we can flag the corruption in xchk_inode. The * the inode until we can flag the corruption in xchk_inode. The
* scrub function has to note the corruption, since we're not really * scrub function has to note the corruption, since we're not really
* supposed to do that from the setup function. * supposed to do that from the setup function. Save the mapping to
* make repairs to the ondisk inode buffer.
*/ */
if (xchk_could_repair(sc))
xrep_setup_inode(sc, &imap);
return 0; return 0;
out_cancel: out_cancel:
@ -338,6 +346,10 @@ xchk_inode_flags2(
if (xfs_dinode_has_bigtime(dip) && !xfs_has_bigtime(mp)) if (xfs_dinode_has_bigtime(dip) && !xfs_has_bigtime(mp))
goto bad; goto bad;
/* no large extent counts without the filesystem feature */
if ((flags2 & XFS_DIFLAG2_NREXT64) && !xfs_has_large_extent_counts(mp))
goto bad;
return; return;
bad: bad:
xchk_ino_set_corrupt(sc, ino); xchk_ino_set_corrupt(sc, ino);
@ -548,7 +560,7 @@ xchk_dinode(
} }
/* di_forkoff */ /* di_forkoff */
if (XFS_DFORK_APTR(dip) >= (char *)dip + mp->m_sb.sb_inodesize) if (XFS_DFORK_BOFF(dip) >= mp->m_sb.sb_inodesize)
xchk_ino_set_corrupt(sc, ino); xchk_ino_set_corrupt(sc, ino);
if (naextents != 0 && dip->di_forkoff == 0) if (naextents != 0 && dip->di_forkoff == 0)
xchk_ino_set_corrupt(sc, ino); xchk_ino_set_corrupt(sc, ino);

1525
fs/xfs/scrub/inode_repair.c Normal file

File diff suppressed because it is too large Load Diff

559
fs/xfs/scrub/newbt.c Normal file
View File

@ -0,0 +1,559 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2022-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_btree.h"
#include "xfs_btree_staging.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_inode.h"
#include "xfs_alloc.h"
#include "xfs_rmap.h"
#include "xfs_ag.h"
#include "xfs_defer.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/trace.h"
#include "scrub/repair.h"
#include "scrub/newbt.h"
/*
* Estimate proper slack values for a btree that's being reloaded.
*
* Under most circumstances, we'll take whatever default loading value the
* btree bulk loading code calculates for us. However, there are some
* exceptions to this rule:
*
* (0) If someone turned one of the debug knobs.
* (1) If this is a per-AG btree and the AG has less than 10% space free.
* (2) If this is an inode btree and the FS has less than 10% space free.
* In either case, format the new btree blocks almost completely full to
* minimize space usage.
*/
static void
xrep_newbt_estimate_slack(
struct xrep_newbt *xnr)
{
struct xfs_scrub *sc = xnr->sc;
struct xfs_btree_bload *bload = &xnr->bload;
uint64_t free;
uint64_t sz;
/*
* The xfs_globals values are set to -1 (i.e. take the bload defaults)
* unless someone has set them otherwise, so we just pull the values
* here.
*/
bload->leaf_slack = xfs_globals.bload_leaf_slack;
bload->node_slack = xfs_globals.bload_node_slack;
if (sc->ops->type == ST_PERAG) {
free = sc->sa.pag->pagf_freeblks;
sz = xfs_ag_block_count(sc->mp, sc->sa.pag->pag_agno);
} else {
free = percpu_counter_sum(&sc->mp->m_fdblocks);
sz = sc->mp->m_sb.sb_dblocks;
}
/* No further changes if there's more than 10% free space left. */
if (free >= div_u64(sz, 10))
return;
/*
* We're low on space; load the btrees as tightly as possible. Leave
* a couple of open slots in each btree block so that we don't end up
* splitting the btrees like crazy after a mount.
*/
if (bload->leaf_slack < 0)
bload->leaf_slack = 2;
if (bload->node_slack < 0)
bload->node_slack = 2;
}
/* Initialize accounting resources for staging a new AG btree. */
void
xrep_newbt_init_ag(
struct xrep_newbt *xnr,
struct xfs_scrub *sc,
const struct xfs_owner_info *oinfo,
xfs_fsblock_t alloc_hint,
enum xfs_ag_resv_type resv)
{
memset(xnr, 0, sizeof(struct xrep_newbt));
xnr->sc = sc;
xnr->oinfo = *oinfo; /* structure copy */
xnr->alloc_hint = alloc_hint;
xnr->resv = resv;
INIT_LIST_HEAD(&xnr->resv_list);
xnr->bload.max_dirty = XFS_B_TO_FSBT(sc->mp, 256U << 10); /* 256K */
xrep_newbt_estimate_slack(xnr);
}
/* Initialize accounting resources for staging a new inode fork btree. */
int
xrep_newbt_init_inode(
struct xrep_newbt *xnr,
struct xfs_scrub *sc,
int whichfork,
const struct xfs_owner_info *oinfo)
{
struct xfs_ifork *ifp;
ifp = kmem_cache_zalloc(xfs_ifork_cache, XCHK_GFP_FLAGS);
if (!ifp)
return -ENOMEM;
xrep_newbt_init_ag(xnr, sc, oinfo,
XFS_INO_TO_FSB(sc->mp, sc->ip->i_ino),
XFS_AG_RESV_NONE);
xnr->ifake.if_fork = ifp;
xnr->ifake.if_fork_size = xfs_inode_fork_size(sc->ip, whichfork);
return 0;
}
/*
* Initialize accounting resources for staging a new btree. Callers are
* expected to add their own reservations (and clean them up) manually.
*/
void
xrep_newbt_init_bare(
struct xrep_newbt *xnr,
struct xfs_scrub *sc)
{
xrep_newbt_init_ag(xnr, sc, &XFS_RMAP_OINFO_ANY_OWNER, NULLFSBLOCK,
XFS_AG_RESV_NONE);
}
/*
* Designate specific blocks to be used to build our new btree. @pag must be
* a passive reference.
*/
STATIC int
xrep_newbt_add_blocks(
struct xrep_newbt *xnr,
struct xfs_perag *pag,
const struct xfs_alloc_arg *args)
{
struct xfs_mount *mp = xnr->sc->mp;
struct xrep_newbt_resv *resv;
int error;
resv = kmalloc(sizeof(struct xrep_newbt_resv), XCHK_GFP_FLAGS);
if (!resv)
return -ENOMEM;
INIT_LIST_HEAD(&resv->list);
resv->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
resv->len = args->len;
resv->used = 0;
resv->pag = xfs_perag_hold(pag);
if (args->tp) {
ASSERT(xnr->oinfo.oi_offset == 0);
error = xfs_alloc_schedule_autoreap(args, true, &resv->autoreap);
if (error)
goto out_pag;
}
list_add_tail(&resv->list, &xnr->resv_list);
return 0;
out_pag:
xfs_perag_put(resv->pag);
kfree(resv);
return error;
}
/*
* Add an extent to the new btree reservation pool. Callers are required to
* reap this reservation manually if the repair is cancelled. @pag must be a
* passive reference.
*/
int
xrep_newbt_add_extent(
struct xrep_newbt *xnr,
struct xfs_perag *pag,
xfs_agblock_t agbno,
xfs_extlen_t len)
{
struct xfs_mount *mp = xnr->sc->mp;
struct xfs_alloc_arg args = {
.tp = NULL, /* no autoreap */
.oinfo = xnr->oinfo,
.fsbno = XFS_AGB_TO_FSB(mp, pag->pag_agno, agbno),
.len = len,
.resv = xnr->resv,
};
return xrep_newbt_add_blocks(xnr, pag, &args);
}
/* Don't let our allocation hint take us beyond this AG */
static inline void
xrep_newbt_validate_ag_alloc_hint(
struct xrep_newbt *xnr)
{
struct xfs_scrub *sc = xnr->sc;
xfs_agnumber_t agno = XFS_FSB_TO_AGNO(sc->mp, xnr->alloc_hint);
if (agno == sc->sa.pag->pag_agno &&
xfs_verify_fsbno(sc->mp, xnr->alloc_hint))
return;
xnr->alloc_hint = XFS_AGB_TO_FSB(sc->mp, sc->sa.pag->pag_agno,
XFS_AGFL_BLOCK(sc->mp) + 1);
}
/* Allocate disk space for a new per-AG btree. */
STATIC int
xrep_newbt_alloc_ag_blocks(
struct xrep_newbt *xnr,
uint64_t nr_blocks)
{
struct xfs_scrub *sc = xnr->sc;
struct xfs_mount *mp = sc->mp;
int error = 0;
ASSERT(sc->sa.pag != NULL);
while (nr_blocks > 0) {
struct xfs_alloc_arg args = {
.tp = sc->tp,
.mp = mp,
.oinfo = xnr->oinfo,
.minlen = 1,
.maxlen = nr_blocks,
.prod = 1,
.resv = xnr->resv,
};
xfs_agnumber_t agno;
xrep_newbt_validate_ag_alloc_hint(xnr);
error = xfs_alloc_vextent_near_bno(&args, xnr->alloc_hint);
if (error)
return error;
if (args.fsbno == NULLFSBLOCK)
return -ENOSPC;
agno = XFS_FSB_TO_AGNO(mp, args.fsbno);
trace_xrep_newbt_alloc_ag_blocks(mp, agno,
XFS_FSB_TO_AGBNO(mp, args.fsbno), args.len,
xnr->oinfo.oi_owner);
if (agno != sc->sa.pag->pag_agno) {
ASSERT(agno == sc->sa.pag->pag_agno);
return -EFSCORRUPTED;
}
error = xrep_newbt_add_blocks(xnr, sc->sa.pag, &args);
if (error)
return error;
nr_blocks -= args.len;
xnr->alloc_hint = args.fsbno + args.len;
error = xrep_defer_finish(sc);
if (error)
return error;
}
return 0;
}
/* Don't let our allocation hint take us beyond EOFS */
static inline void
xrep_newbt_validate_file_alloc_hint(
struct xrep_newbt *xnr)
{
struct xfs_scrub *sc = xnr->sc;
if (xfs_verify_fsbno(sc->mp, xnr->alloc_hint))
return;
xnr->alloc_hint = XFS_AGB_TO_FSB(sc->mp, 0, XFS_AGFL_BLOCK(sc->mp) + 1);
}
/* Allocate disk space for our new file-based btree. */
STATIC int
xrep_newbt_alloc_file_blocks(
struct xrep_newbt *xnr,
uint64_t nr_blocks)
{
struct xfs_scrub *sc = xnr->sc;
struct xfs_mount *mp = sc->mp;
int error = 0;
while (nr_blocks > 0) {
struct xfs_alloc_arg args = {
.tp = sc->tp,
.mp = mp,
.oinfo = xnr->oinfo,
.minlen = 1,
.maxlen = nr_blocks,
.prod = 1,
.resv = xnr->resv,
};
struct xfs_perag *pag;
xfs_agnumber_t agno;
xrep_newbt_validate_file_alloc_hint(xnr);
error = xfs_alloc_vextent_start_ag(&args, xnr->alloc_hint);
if (error)
return error;
if (args.fsbno == NULLFSBLOCK)
return -ENOSPC;
agno = XFS_FSB_TO_AGNO(mp, args.fsbno);
trace_xrep_newbt_alloc_file_blocks(mp, agno,
XFS_FSB_TO_AGBNO(mp, args.fsbno), args.len,
xnr->oinfo.oi_owner);
pag = xfs_perag_get(mp, agno);
if (!pag) {
ASSERT(0);
return -EFSCORRUPTED;
}
error = xrep_newbt_add_blocks(xnr, pag, &args);
xfs_perag_put(pag);
if (error)
return error;
nr_blocks -= args.len;
xnr->alloc_hint = args.fsbno + args.len;
error = xrep_defer_finish(sc);
if (error)
return error;
}
return 0;
}
/* Allocate disk space for our new btree. */
int
xrep_newbt_alloc_blocks(
struct xrep_newbt *xnr,
uint64_t nr_blocks)
{
if (xnr->sc->ip)
return xrep_newbt_alloc_file_blocks(xnr, nr_blocks);
return xrep_newbt_alloc_ag_blocks(xnr, nr_blocks);
}
/*
* Free the unused part of a space extent that was reserved for a new ondisk
* structure. Returns the number of EFIs logged or a negative errno.
*/
STATIC int
xrep_newbt_free_extent(
struct xrep_newbt *xnr,
struct xrep_newbt_resv *resv,
bool btree_committed)
{
struct xfs_scrub *sc = xnr->sc;
xfs_agblock_t free_agbno = resv->agbno;
xfs_extlen_t free_aglen = resv->len;
xfs_fsblock_t fsbno;
int error;
if (!btree_committed || resv->used == 0) {
/*
* If we're not committing a new btree or we didn't use the
* space reservation, let the existing EFI free the entire
* space extent.
*/
trace_xrep_newbt_free_blocks(sc->mp, resv->pag->pag_agno,
free_agbno, free_aglen, xnr->oinfo.oi_owner);
xfs_alloc_commit_autoreap(sc->tp, &resv->autoreap);
return 1;
}
/*
* We used space and committed the btree. Cancel the autoreap, remove
* the written blocks from the reservation, and possibly log a new EFI
* to free any unused reservation space.
*/
xfs_alloc_cancel_autoreap(sc->tp, &resv->autoreap);
free_agbno += resv->used;
free_aglen -= resv->used;
if (free_aglen == 0)
return 0;
trace_xrep_newbt_free_blocks(sc->mp, resv->pag->pag_agno, free_agbno,
free_aglen, xnr->oinfo.oi_owner);
ASSERT(xnr->resv != XFS_AG_RESV_AGFL);
ASSERT(xnr->resv != XFS_AG_RESV_IGNORE);
/*
* Use EFIs to free the reservations. This reduces the chance
* that we leak blocks if the system goes down.
*/
fsbno = XFS_AGB_TO_FSB(sc->mp, resv->pag->pag_agno, free_agbno);
error = xfs_free_extent_later(sc->tp, fsbno, free_aglen, &xnr->oinfo,
xnr->resv, true);
if (error)
return error;
return 1;
}
/* Free all the accounting info and disk space we reserved for a new btree. */
STATIC int
xrep_newbt_free(
struct xrep_newbt *xnr,
bool btree_committed)
{
struct xfs_scrub *sc = xnr->sc;
struct xrep_newbt_resv *resv, *n;
unsigned int freed = 0;
int error = 0;
/*
* If the filesystem already went down, we can't free the blocks. Skip
* ahead to freeing the incore metadata because we can't fix anything.
*/
if (xfs_is_shutdown(sc->mp))
goto junkit;
list_for_each_entry_safe(resv, n, &xnr->resv_list, list) {
int ret;
ret = xrep_newbt_free_extent(xnr, resv, btree_committed);
list_del(&resv->list);
xfs_perag_put(resv->pag);
kfree(resv);
if (ret < 0) {
error = ret;
goto junkit;
}
freed += ret;
if (freed >= XREP_MAX_ITRUNCATE_EFIS) {
error = xrep_defer_finish(sc);
if (error)
goto junkit;
freed = 0;
}
}
if (freed)
error = xrep_defer_finish(sc);
junkit:
/*
* If we still have reservations attached to @newbt, cleanup must have
* failed and the filesystem is about to go down. Clean up the incore
* reservations and try to commit to freeing the space we used.
*/
list_for_each_entry_safe(resv, n, &xnr->resv_list, list) {
xfs_alloc_commit_autoreap(sc->tp, &resv->autoreap);
list_del(&resv->list);
xfs_perag_put(resv->pag);
kfree(resv);
}
if (sc->ip) {
kmem_cache_free(xfs_ifork_cache, xnr->ifake.if_fork);
xnr->ifake.if_fork = NULL;
}
return error;
}
/*
* Free all the accounting info and unused disk space allocations after
* committing a new btree.
*/
int
xrep_newbt_commit(
struct xrep_newbt *xnr)
{
return xrep_newbt_free(xnr, true);
}
/*
* Free all the accounting info and all of the disk space we reserved for a new
* btree that we're not going to commit. We want to try to roll things back
* cleanly for things like ENOSPC midway through allocation.
*/
void
xrep_newbt_cancel(
struct xrep_newbt *xnr)
{
xrep_newbt_free(xnr, false);
}
/* Feed one of the reserved btree blocks to the bulk loader. */
int
xrep_newbt_claim_block(
struct xfs_btree_cur *cur,
struct xrep_newbt *xnr,
union xfs_btree_ptr *ptr)
{
struct xrep_newbt_resv *resv;
struct xfs_mount *mp = cur->bc_mp;
xfs_agblock_t agbno;
/*
* The first item in the list should always have a free block unless
* we're completely out.
*/
resv = list_first_entry(&xnr->resv_list, struct xrep_newbt_resv, list);
if (resv->used == resv->len)
return -ENOSPC;
/*
* Peel off a block from the start of the reservation. We allocate
* blocks in order to place blocks on disk in increasing record or key
* order. The block reservations tend to end up on the list in
* decreasing order, which hopefully results in leaf blocks ending up
* together.
*/
agbno = resv->agbno + resv->used;
resv->used++;
/* If we used all the blocks in this reservation, move it to the end. */
if (resv->used == resv->len)
list_move_tail(&resv->list, &xnr->resv_list);
trace_xrep_newbt_claim_block(mp, resv->pag->pag_agno, agbno, 1,
xnr->oinfo.oi_owner);
if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
ptr->l = cpu_to_be64(XFS_AGB_TO_FSB(mp, resv->pag->pag_agno,
agbno));
else
ptr->s = cpu_to_be32(agbno);
/* Relog all the EFIs. */
return xrep_defer_finish(xnr->sc);
}
/* How many reserved blocks are unused? */
unsigned int
xrep_newbt_unused_blocks(
struct xrep_newbt *xnr)
{
struct xrep_newbt_resv *resv;
unsigned int unused = 0;
list_for_each_entry(resv, &xnr->resv_list, list)
unused += resv->len - resv->used;
return unused;
}

68
fs/xfs/scrub/newbt.h Normal file
View File

@ -0,0 +1,68 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2022-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#ifndef __XFS_SCRUB_NEWBT_H__
#define __XFS_SCRUB_NEWBT_H__
struct xrep_newbt_resv {
/* Link to list of extents that we've reserved. */
struct list_head list;
struct xfs_perag *pag;
/* Auto-freeing this reservation if we don't commit. */
struct xfs_alloc_autoreap autoreap;
/* AG block of the extent we reserved. */
xfs_agblock_t agbno;
/* Length of the reservation. */
xfs_extlen_t len;
/* How much of this reservation has been used. */
xfs_extlen_t used;
};
struct xrep_newbt {
struct xfs_scrub *sc;
/* List of extents that we've reserved. */
struct list_head resv_list;
/* Fake root for new btree. */
union {
struct xbtree_afakeroot afake;
struct xbtree_ifakeroot ifake;
};
/* rmap owner of these blocks */
struct xfs_owner_info oinfo;
/* btree geometry for the bulk loader */
struct xfs_btree_bload bload;
/* Allocation hint */
xfs_fsblock_t alloc_hint;
/* per-ag reservation type */
enum xfs_ag_resv_type resv;
};
void xrep_newbt_init_bare(struct xrep_newbt *xnr, struct xfs_scrub *sc);
void xrep_newbt_init_ag(struct xrep_newbt *xnr, struct xfs_scrub *sc,
const struct xfs_owner_info *oinfo, xfs_fsblock_t alloc_hint,
enum xfs_ag_resv_type resv);
int xrep_newbt_init_inode(struct xrep_newbt *xnr, struct xfs_scrub *sc,
int whichfork, const struct xfs_owner_info *oinfo);
int xrep_newbt_alloc_blocks(struct xrep_newbt *xnr, uint64_t nr_blocks);
int xrep_newbt_add_extent(struct xrep_newbt *xnr, struct xfs_perag *pag,
xfs_agblock_t agbno, xfs_extlen_t len);
void xrep_newbt_cancel(struct xrep_newbt *xnr);
int xrep_newbt_commit(struct xrep_newbt *xnr);
int xrep_newbt_claim_block(struct xfs_btree_cur *cur, struct xrep_newbt *xnr,
union xfs_btree_ptr *ptr);
unsigned int xrep_newbt_unused_blocks(struct xrep_newbt *xnr);
#endif /* __XFS_SCRUB_NEWBT_H__ */

37
fs/xfs/scrub/off_bitmap.h Normal file
View File

@ -0,0 +1,37 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2022-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#ifndef __XFS_SCRUB_OFF_BITMAP_H__
#define __XFS_SCRUB_OFF_BITMAP_H__
/* Bitmaps, but for type-checked for xfs_fileoff_t */
struct xoff_bitmap {
struct xbitmap64 offbitmap;
};
static inline void xoff_bitmap_init(struct xoff_bitmap *bitmap)
{
xbitmap64_init(&bitmap->offbitmap);
}
static inline void xoff_bitmap_destroy(struct xoff_bitmap *bitmap)
{
xbitmap64_destroy(&bitmap->offbitmap);
}
static inline int xoff_bitmap_set(struct xoff_bitmap *bitmap,
xfs_fileoff_t off, xfs_filblks_t len)
{
return xbitmap64_set(&bitmap->offbitmap, off, len);
}
static inline int xoff_bitmap_walk(struct xoff_bitmap *bitmap,
xbitmap64_walk_fn fn, void *priv)
{
return xbitmap64_walk(&bitmap->offbitmap, fn, priv);
}
#endif /* __XFS_SCRUB_OFF_BITMAP_H__ */

View File

@ -156,6 +156,16 @@ xchk_parent_validate(
goto out_rele; goto out_rele;
} }
/*
* We cannot yet validate this parent pointer if the directory looks as
* though it has been zapped by the inode record repair code.
*/
if (xchk_dir_looks_zapped(dp)) {
error = -EBUSY;
xchk_set_incomplete(sc);
goto out_unlock;
}
/* Look for a directory entry in the parent pointing to the child. */ /* Look for a directory entry in the parent pointing to the child. */
error = xchk_dir_walk(sc, dp, xchk_parent_actor, &spc); error = xchk_dir_walk(sc, dp, xchk_parent_actor, &spc);
if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0, &error)) if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0, &error))
@ -217,6 +227,13 @@ xchk_parent(
*/ */
error = xchk_parent_validate(sc, parent_ino); error = xchk_parent_validate(sc, parent_ino);
} while (error == -EAGAIN); } while (error == -EAGAIN);
if (error == -EBUSY) {
/*
* We could not scan a directory, so we marked the check
* incomplete. No further error return is necessary.
*/
return 0;
}
return error; return error;
} }

View File

@ -6,6 +6,7 @@
#include "xfs.h" #include "xfs.h"
#include "xfs_fs.h" #include "xfs_fs.h"
#include "xfs_shared.h" #include "xfs_shared.h"
#include "xfs_bit.h"
#include "xfs_format.h" #include "xfs_format.h"
#include "xfs_trans_resv.h" #include "xfs_trans_resv.h"
#include "xfs_mount.h" #include "xfs_mount.h"
@ -17,9 +18,10 @@
#include "xfs_bmap.h" #include "xfs_bmap.h"
#include "scrub/scrub.h" #include "scrub/scrub.h"
#include "scrub/common.h" #include "scrub/common.h"
#include "scrub/quota.h"
/* Convert a scrub type code to a DQ flag, or return 0 if error. */ /* Convert a scrub type code to a DQ flag, or return 0 if error. */
static inline xfs_dqtype_t xfs_dqtype_t
xchk_quota_to_dqtype( xchk_quota_to_dqtype(
struct xfs_scrub *sc) struct xfs_scrub *sc)
{ {
@ -75,14 +77,70 @@ struct xchk_quota_info {
xfs_dqid_t last_id; xfs_dqid_t last_id;
}; };
/* There's a written block backing this dquot, right? */
STATIC int
xchk_quota_item_bmap(
struct xfs_scrub *sc,
struct xfs_dquot *dq,
xfs_fileoff_t offset)
{
struct xfs_bmbt_irec irec;
struct xfs_mount *mp = sc->mp;
int nmaps = 1;
int error;
if (!xfs_verify_fileoff(mp, offset)) {
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
return 0;
}
if (dq->q_fileoffset != offset) {
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
return 0;
}
error = xfs_bmapi_read(sc->ip, offset, 1, &irec, &nmaps, 0);
if (error)
return error;
if (nmaps != 1) {
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
return 0;
}
if (!xfs_verify_fsbno(mp, irec.br_startblock))
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
if (XFS_FSB_TO_DADDR(mp, irec.br_startblock) != dq->q_blkno)
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
if (!xfs_bmap_is_written_extent(&irec))
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
return 0;
}
/* Complain if a quota timer is incorrectly set. */
static inline void
xchk_quota_item_timer(
struct xfs_scrub *sc,
xfs_fileoff_t offset,
const struct xfs_dquot_res *res)
{
if ((res->softlimit && res->count > res->softlimit) ||
(res->hardlimit && res->count > res->hardlimit)) {
if (!res->timer)
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
} else {
if (res->timer)
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
}
}
/* Scrub the fields in an individual quota item. */ /* Scrub the fields in an individual quota item. */
STATIC int STATIC int
xchk_quota_item( xchk_quota_item(
struct xfs_dquot *dq, struct xchk_quota_info *sqi,
xfs_dqtype_t dqtype, struct xfs_dquot *dq)
void *priv)
{ {
struct xchk_quota_info *sqi = priv;
struct xfs_scrub *sc = sqi->sc; struct xfs_scrub *sc = sqi->sc;
struct xfs_mount *mp = sc->mp; struct xfs_mount *mp = sc->mp;
struct xfs_quotainfo *qi = mp->m_quotainfo; struct xfs_quotainfo *qi = mp->m_quotainfo;
@ -93,6 +151,17 @@ xchk_quota_item(
if (xchk_should_terminate(sc, &error)) if (xchk_should_terminate(sc, &error))
return error; return error;
/*
* We want to validate the bmap record for the storage backing this
* dquot, so we need to lock the dquot and the quota file. For quota
* operations, the locking order is first the ILOCK and then the dquot.
* However, dqiterate gave us a locked dquot, so drop the dquot lock to
* get the ILOCK.
*/
xfs_dqunlock(dq);
xchk_ilock(sc, XFS_ILOCK_SHARED);
xfs_dqlock(dq);
/* /*
* Except for the root dquot, the actual dquot we got must either have * Except for the root dquot, the actual dquot we got must either have
* the same or higher id as we saw before. * the same or higher id as we saw before.
@ -103,6 +172,11 @@ xchk_quota_item(
sqi->last_id = dq->q_id; sqi->last_id = dq->q_id;
error = xchk_quota_item_bmap(sc, dq, offset);
xchk_iunlock(sc, XFS_ILOCK_SHARED);
if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, offset, &error))
return error;
/* /*
* Warn if the hard limits are larger than the fs. * Warn if the hard limits are larger than the fs.
* Administrators can do this, though in production this seems * Administrators can do this, though in production this seems
@ -166,6 +240,10 @@ xchk_quota_item(
dq->q_rtb.count > dq->q_rtb.hardlimit) dq->q_rtb.count > dq->q_rtb.hardlimit)
xchk_fblock_set_warning(sc, XFS_DATA_FORK, offset); xchk_fblock_set_warning(sc, XFS_DATA_FORK, offset);
xchk_quota_item_timer(sc, offset, &dq->q_blk);
xchk_quota_item_timer(sc, offset, &dq->q_ino);
xchk_quota_item_timer(sc, offset, &dq->q_rtb);
out: out:
if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
return -ECANCELED; return -ECANCELED;
@ -191,7 +269,7 @@ xchk_quota_data_fork(
return error; return error;
/* Check for data fork problems that apply only to quota files. */ /* Check for data fork problems that apply only to quota files. */
max_dqid_off = ((xfs_dqid_t)-1) / qi->qi_dqperchunk; max_dqid_off = XFS_DQ_ID_MAX / qi->qi_dqperchunk;
ifp = xfs_ifork_ptr(sc->ip, XFS_DATA_FORK); ifp = xfs_ifork_ptr(sc->ip, XFS_DATA_FORK);
for_each_xfs_iext(ifp, &icur, &irec) { for_each_xfs_iext(ifp, &icur, &irec) {
if (xchk_should_terminate(sc, &error)) if (xchk_should_terminate(sc, &error))
@ -218,9 +296,11 @@ int
xchk_quota( xchk_quota(
struct xfs_scrub *sc) struct xfs_scrub *sc)
{ {
struct xchk_quota_info sqi; struct xchk_dqiter cursor = { };
struct xchk_quota_info sqi = { .sc = sc };
struct xfs_mount *mp = sc->mp; struct xfs_mount *mp = sc->mp;
struct xfs_quotainfo *qi = mp->m_quotainfo; struct xfs_quotainfo *qi = mp->m_quotainfo;
struct xfs_dquot *dq;
xfs_dqtype_t dqtype; xfs_dqtype_t dqtype;
int error = 0; int error = 0;
@ -239,10 +319,15 @@ xchk_quota(
* functions. * functions.
*/ */
xchk_iunlock(sc, sc->ilock_flags); xchk_iunlock(sc, sc->ilock_flags);
sqi.sc = sc;
sqi.last_id = 0; /* Now look for things that the quota verifiers won't complain about. */
error = xfs_qm_dqiterate(mp, dqtype, xchk_quota_item, &sqi); xchk_dqiter_init(&cursor, sc, dqtype);
xchk_ilock(sc, XFS_ILOCK_EXCL); while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) {
error = xchk_quota_item(&sqi, dq);
xfs_qm_dqput(dq);
if (error)
break;
}
if (error == -ECANCELED) if (error == -ECANCELED)
error = 0; error = 0;
if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, if (!xchk_fblock_process_error(sc, XFS_DATA_FORK,

36
fs/xfs/scrub/quota.h Normal file
View File

@ -0,0 +1,36 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2018-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#ifndef __XFS_SCRUB_QUOTA_H__
#define __XFS_SCRUB_QUOTA_H__
xfs_dqtype_t xchk_quota_to_dqtype(struct xfs_scrub *sc);
/* dquot iteration code */
struct xchk_dqiter {
struct xfs_scrub *sc;
/* Quota file that we're walking. */
struct xfs_inode *quota_ip;
/* Cached data fork mapping for the dquot. */
struct xfs_bmbt_irec bmap;
/* The next dquot to scan. */
uint64_t id;
/* Quota type (user/group/project). */
xfs_dqtype_t dqtype;
/* Data fork sequence number to detect stale mappings. */
unsigned int if_seq;
};
void xchk_dqiter_init(struct xchk_dqiter *cursor, struct xfs_scrub *sc,
xfs_dqtype_t dqtype);
int xchk_dquot_iter(struct xchk_dqiter *cursor, struct xfs_dquot **dqpp);
#endif /* __XFS_SCRUB_QUOTA_H__ */

575
fs/xfs/scrub/quota_repair.c Normal file
View File

@ -0,0 +1,575 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2018-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_btree.h"
#include "xfs_bit.h"
#include "xfs_format.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_inode.h"
#include "xfs_inode_fork.h"
#include "xfs_alloc.h"
#include "xfs_bmap.h"
#include "xfs_quota.h"
#include "xfs_qm.h"
#include "xfs_dquot.h"
#include "xfs_dquot_item.h"
#include "xfs_reflink.h"
#include "xfs_bmap_btree.h"
#include "xfs_trans_space.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/quota.h"
#include "scrub/trace.h"
#include "scrub/repair.h"
/*
* Quota Repair
* ============
*
* Quota repairs are fairly simplistic; we fix everything that the dquot
* verifiers complain about, cap any counters or limits that make no sense,
* and schedule a quotacheck if we had to fix anything. We also repair any
* data fork extent records that don't apply to metadata files.
*/
struct xrep_quota_info {
struct xfs_scrub *sc;
bool need_quotacheck;
};
/*
* Allocate a new block into a sparse hole in the quota file backing this
* dquot, initialize the block, and commit the whole mess.
*/
STATIC int
xrep_quota_item_fill_bmap_hole(
struct xfs_scrub *sc,
struct xfs_dquot *dq,
struct xfs_bmbt_irec *irec)
{
struct xfs_buf *bp;
struct xfs_mount *mp = sc->mp;
int nmaps = 1;
int error;
xfs_trans_ijoin(sc->tp, sc->ip, 0);
/* Map a block into the file. */
error = xfs_trans_reserve_more(sc->tp, XFS_QM_DQALLOC_SPACE_RES(mp),
0);
if (error)
return error;
error = xfs_bmapi_write(sc->tp, sc->ip, dq->q_fileoffset,
XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA, 0,
irec, &nmaps);
if (error)
return error;
if (nmaps != 1)
return -ENOSPC;
dq->q_blkno = XFS_FSB_TO_DADDR(mp, irec->br_startblock);
trace_xrep_dquot_item_fill_bmap_hole(sc->mp, dq->q_type, dq->q_id);
/* Initialize the new block. */
error = xfs_trans_get_buf(sc->tp, mp->m_ddev_targp, dq->q_blkno,
mp->m_quotainfo->qi_dqchunklen, 0, &bp);
if (error)
return error;
bp->b_ops = &xfs_dquot_buf_ops;
xfs_qm_init_dquot_blk(sc->tp, dq->q_id, dq->q_type, bp);
xfs_buf_set_ref(bp, XFS_DQUOT_REF);
/*
* Finish the mapping transactions and roll one more time to
* disconnect sc->ip from sc->tp.
*/
error = xrep_defer_finish(sc);
if (error)
return error;
return xfs_trans_roll(&sc->tp);
}
/* Make sure there's a written block backing this dquot */
STATIC int
xrep_quota_item_bmap(
struct xfs_scrub *sc,
struct xfs_dquot *dq,
bool *dirty)
{
struct xfs_bmbt_irec irec;
struct xfs_mount *mp = sc->mp;
struct xfs_quotainfo *qi = mp->m_quotainfo;
xfs_fileoff_t offset = dq->q_id / qi->qi_dqperchunk;
int nmaps = 1;
int error;
/* The computed file offset should always be valid. */
if (!xfs_verify_fileoff(mp, offset)) {
ASSERT(xfs_verify_fileoff(mp, offset));
return -EFSCORRUPTED;
}
dq->q_fileoffset = offset;
error = xfs_bmapi_read(sc->ip, offset, 1, &irec, &nmaps, 0);
if (error)
return error;
if (nmaps < 1 || !xfs_bmap_is_real_extent(&irec)) {
/* Hole/delalloc extent; allocate a real block. */
error = xrep_quota_item_fill_bmap_hole(sc, dq, &irec);
if (error)
return error;
} else if (irec.br_state != XFS_EXT_NORM) {
/* Unwritten extent, which we already took care of? */
ASSERT(irec.br_state == XFS_EXT_NORM);
return -EFSCORRUPTED;
} else if (dq->q_blkno != XFS_FSB_TO_DADDR(mp, irec.br_startblock)) {
/*
* If the cached daddr is incorrect, repair probably punched a
* hole out of the quota file and filled it back in with a new
* block. Update the block mapping in the dquot.
*/
dq->q_blkno = XFS_FSB_TO_DADDR(mp, irec.br_startblock);
}
*dirty = true;
return 0;
}
/* Reset quota timers if incorrectly set. */
static inline void
xrep_quota_item_timer(
struct xfs_scrub *sc,
const struct xfs_dquot_res *res,
bool *dirty)
{
if ((res->softlimit && res->count > res->softlimit) ||
(res->hardlimit && res->count > res->hardlimit)) {
if (!res->timer)
*dirty = true;
} else {
if (res->timer)
*dirty = true;
}
}
/* Scrub the fields in an individual quota item. */
STATIC int
xrep_quota_item(
struct xrep_quota_info *rqi,
struct xfs_dquot *dq)
{
struct xfs_scrub *sc = rqi->sc;
struct xfs_mount *mp = sc->mp;
xfs_ino_t fs_icount;
bool dirty = false;
int error = 0;
/* Last chance to abort before we start committing fixes. */
if (xchk_should_terminate(sc, &error))
return error;
/*
* We might need to fix holes in the bmap record for the storage
* backing this dquot, so we need to lock the dquot and the quota file.
* dqiterate gave us a locked dquot, so drop the dquot lock to get the
* ILOCK_EXCL.
*/
xfs_dqunlock(dq);
xchk_ilock(sc, XFS_ILOCK_EXCL);
xfs_dqlock(dq);
error = xrep_quota_item_bmap(sc, dq, &dirty);
xchk_iunlock(sc, XFS_ILOCK_EXCL);
if (error)
return error;
/* Check the limits. */
if (dq->q_blk.softlimit > dq->q_blk.hardlimit) {
dq->q_blk.softlimit = dq->q_blk.hardlimit;
dirty = true;
}
if (dq->q_ino.softlimit > dq->q_ino.hardlimit) {
dq->q_ino.softlimit = dq->q_ino.hardlimit;
dirty = true;
}
if (dq->q_rtb.softlimit > dq->q_rtb.hardlimit) {
dq->q_rtb.softlimit = dq->q_rtb.hardlimit;
dirty = true;
}
/*
* Check that usage doesn't exceed physical limits. However, on
* a reflink filesystem we're allowed to exceed physical space
* if there are no quota limits. We don't know what the real number
* is, but we can make quotacheck find out for us.
*/
if (!xfs_has_reflink(mp) && dq->q_blk.count > mp->m_sb.sb_dblocks) {
dq->q_blk.reserved -= dq->q_blk.count;
dq->q_blk.reserved += mp->m_sb.sb_dblocks;
dq->q_blk.count = mp->m_sb.sb_dblocks;
rqi->need_quotacheck = true;
dirty = true;
}
fs_icount = percpu_counter_sum(&mp->m_icount);
if (dq->q_ino.count > fs_icount) {
dq->q_ino.reserved -= dq->q_ino.count;
dq->q_ino.reserved += fs_icount;
dq->q_ino.count = fs_icount;
rqi->need_quotacheck = true;
dirty = true;
}
if (dq->q_rtb.count > mp->m_sb.sb_rblocks) {
dq->q_rtb.reserved -= dq->q_rtb.count;
dq->q_rtb.reserved += mp->m_sb.sb_rblocks;
dq->q_rtb.count = mp->m_sb.sb_rblocks;
rqi->need_quotacheck = true;
dirty = true;
}
xrep_quota_item_timer(sc, &dq->q_blk, &dirty);
xrep_quota_item_timer(sc, &dq->q_ino, &dirty);
xrep_quota_item_timer(sc, &dq->q_rtb, &dirty);
if (!dirty)
return 0;
trace_xrep_dquot_item(sc->mp, dq->q_type, dq->q_id);
dq->q_flags |= XFS_DQFLAG_DIRTY;
xfs_trans_dqjoin(sc->tp, dq);
if (dq->q_id) {
xfs_qm_adjust_dqlimits(dq);
xfs_qm_adjust_dqtimers(dq);
}
xfs_trans_log_dquot(sc->tp, dq);
error = xfs_trans_roll(&sc->tp);
xfs_dqlock(dq);
return error;
}
/* Fix a quota timer so that we can pass the verifier. */
STATIC void
xrep_quota_fix_timer(
struct xfs_mount *mp,
const struct xfs_disk_dquot *ddq,
__be64 softlimit,
__be64 countnow,
__be32 *timer,
time64_t timelimit)
{
uint64_t soft = be64_to_cpu(softlimit);
uint64_t count = be64_to_cpu(countnow);
time64_t new_timer;
uint32_t t;
if (!soft || count <= soft || *timer != 0)
return;
new_timer = xfs_dquot_set_timeout(mp,
ktime_get_real_seconds() + timelimit);
if (ddq->d_type & XFS_DQTYPE_BIGTIME)
t = xfs_dq_unix_to_bigtime(new_timer);
else
t = new_timer;
*timer = cpu_to_be32(t);
}
/* Fix anything the verifiers complain about. */
STATIC int
xrep_quota_block(
struct xfs_scrub *sc,
xfs_daddr_t daddr,
xfs_dqtype_t dqtype,
xfs_dqid_t id)
{
struct xfs_dqblk *dqblk;
struct xfs_disk_dquot *ddq;
struct xfs_quotainfo *qi = sc->mp->m_quotainfo;
struct xfs_def_quota *defq = xfs_get_defquota(qi, dqtype);
struct xfs_buf *bp = NULL;
enum xfs_blft buftype = 0;
int i;
int error;
error = xfs_trans_read_buf(sc->mp, sc->tp, sc->mp->m_ddev_targp, daddr,
qi->qi_dqchunklen, 0, &bp, &xfs_dquot_buf_ops);
switch (error) {
case -EFSBADCRC:
case -EFSCORRUPTED:
/* Failed verifier, retry read with no ops. */
error = xfs_trans_read_buf(sc->mp, sc->tp,
sc->mp->m_ddev_targp, daddr, qi->qi_dqchunklen,
0, &bp, NULL);
if (error)
return error;
break;
case 0:
dqblk = bp->b_addr;
ddq = &dqblk[0].dd_diskdq;
/*
* If there's nothing that would impede a dqiterate, we're
* done.
*/
if ((ddq->d_type & XFS_DQTYPE_REC_MASK) != dqtype ||
id == be32_to_cpu(ddq->d_id)) {
xfs_trans_brelse(sc->tp, bp);
return 0;
}
break;
default:
return error;
}
/* Something's wrong with the block, fix the whole thing. */
dqblk = bp->b_addr;
bp->b_ops = &xfs_dquot_buf_ops;
for (i = 0; i < qi->qi_dqperchunk; i++, dqblk++) {
ddq = &dqblk->dd_diskdq;
trace_xrep_disk_dquot(sc->mp, dqtype, id + i);
ddq->d_magic = cpu_to_be16(XFS_DQUOT_MAGIC);
ddq->d_version = XFS_DQUOT_VERSION;
ddq->d_type = dqtype;
ddq->d_id = cpu_to_be32(id + i);
if (xfs_has_bigtime(sc->mp) && ddq->d_id)
ddq->d_type |= XFS_DQTYPE_BIGTIME;
xrep_quota_fix_timer(sc->mp, ddq, ddq->d_blk_softlimit,
ddq->d_bcount, &ddq->d_btimer,
defq->blk.time);
xrep_quota_fix_timer(sc->mp, ddq, ddq->d_ino_softlimit,
ddq->d_icount, &ddq->d_itimer,
defq->ino.time);
xrep_quota_fix_timer(sc->mp, ddq, ddq->d_rtb_softlimit,
ddq->d_rtbcount, &ddq->d_rtbtimer,
defq->rtb.time);
/* We only support v5 filesystems so always set these. */
uuid_copy(&dqblk->dd_uuid, &sc->mp->m_sb.sb_meta_uuid);
xfs_update_cksum((char *)dqblk, sizeof(struct xfs_dqblk),
XFS_DQUOT_CRC_OFF);
dqblk->dd_lsn = 0;
}
switch (dqtype) {
case XFS_DQTYPE_USER:
buftype = XFS_BLFT_UDQUOT_BUF;
break;
case XFS_DQTYPE_GROUP:
buftype = XFS_BLFT_GDQUOT_BUF;
break;
case XFS_DQTYPE_PROJ:
buftype = XFS_BLFT_PDQUOT_BUF;
break;
}
xfs_trans_buf_set_type(sc->tp, bp, buftype);
xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1);
return xrep_roll_trans(sc);
}
/*
* Repair a quota file's data fork. The function returns with the inode
* joined.
*/
STATIC int
xrep_quota_data_fork(
struct xfs_scrub *sc,
xfs_dqtype_t dqtype)
{
struct xfs_bmbt_irec irec = { 0 };
struct xfs_iext_cursor icur;
struct xfs_quotainfo *qi = sc->mp->m_quotainfo;
struct xfs_ifork *ifp;
xfs_fileoff_t max_dqid_off;
xfs_fileoff_t off;
xfs_fsblock_t fsbno;
bool truncate = false;
bool joined = false;
int error = 0;
error = xrep_metadata_inode_forks(sc);
if (error)
goto out;
/* Check for data fork problems that apply only to quota files. */
max_dqid_off = XFS_DQ_ID_MAX / qi->qi_dqperchunk;
ifp = xfs_ifork_ptr(sc->ip, XFS_DATA_FORK);
for_each_xfs_iext(ifp, &icur, &irec) {
if (isnullstartblock(irec.br_startblock)) {
error = -EFSCORRUPTED;
goto out;
}
if (irec.br_startoff > max_dqid_off ||
irec.br_startoff + irec.br_blockcount - 1 > max_dqid_off) {
truncate = true;
break;
}
/* Convert unwritten extents to real ones. */
if (irec.br_state == XFS_EXT_UNWRITTEN) {
struct xfs_bmbt_irec nrec;
int nmap = 1;
if (!joined) {
xfs_trans_ijoin(sc->tp, sc->ip, 0);
joined = true;
}
error = xfs_bmapi_write(sc->tp, sc->ip,
irec.br_startoff, irec.br_blockcount,
XFS_BMAPI_CONVERT, 0, &nrec, &nmap);
if (error)
goto out;
if (nmap != 1) {
error = -ENOSPC;
goto out;
}
ASSERT(nrec.br_startoff == irec.br_startoff);
ASSERT(nrec.br_blockcount == irec.br_blockcount);
error = xfs_defer_finish(&sc->tp);
if (error)
goto out;
}
}
if (!joined) {
xfs_trans_ijoin(sc->tp, sc->ip, 0);
joined = true;
}
if (truncate) {
/* Erase everything after the block containing the max dquot */
error = xfs_bunmapi_range(&sc->tp, sc->ip, 0,
max_dqid_off * sc->mp->m_sb.sb_blocksize,
XFS_MAX_FILEOFF);
if (error)
goto out;
/* Remove all CoW reservations. */
error = xfs_reflink_cancel_cow_blocks(sc->ip, &sc->tp, 0,
XFS_MAX_FILEOFF, true);
if (error)
goto out;
sc->ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK;
/*
* Always re-log the inode so that our permanent transaction
* can keep on rolling it forward in the log.
*/
xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
}
/* Now go fix anything that fails the verifiers. */
for_each_xfs_iext(ifp, &icur, &irec) {
for (fsbno = irec.br_startblock, off = irec.br_startoff;
fsbno < irec.br_startblock + irec.br_blockcount;
fsbno += XFS_DQUOT_CLUSTER_SIZE_FSB,
off += XFS_DQUOT_CLUSTER_SIZE_FSB) {
error = xrep_quota_block(sc,
XFS_FSB_TO_DADDR(sc->mp, fsbno),
dqtype, off * qi->qi_dqperchunk);
if (error)
goto out;
}
}
out:
return error;
}
/*
* Go fix anything in the quota items that we could have been mad about. Now
* that we've checked the quota inode data fork we have to drop ILOCK_EXCL to
* use the regular dquot functions.
*/
STATIC int
xrep_quota_problems(
struct xfs_scrub *sc,
xfs_dqtype_t dqtype)
{
struct xchk_dqiter cursor = { };
struct xrep_quota_info rqi = { .sc = sc };
struct xfs_dquot *dq;
int error;
xchk_dqiter_init(&cursor, sc, dqtype);
while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) {
error = xrep_quota_item(&rqi, dq);
xfs_qm_dqput(dq);
if (error)
break;
}
if (error)
return error;
/* Make a quotacheck happen. */
if (rqi.need_quotacheck)
xrep_force_quotacheck(sc, dqtype);
return 0;
}
/* Repair all of a quota type's items. */
int
xrep_quota(
struct xfs_scrub *sc)
{
xfs_dqtype_t dqtype;
int error;
dqtype = xchk_quota_to_dqtype(sc);
/*
* Re-take the ILOCK so that we can fix any problems that we found
* with the data fork mappings, or with the dquot bufs themselves.
*/
if (!(sc->ilock_flags & XFS_ILOCK_EXCL))
xchk_ilock(sc, XFS_ILOCK_EXCL);
error = xrep_quota_data_fork(sc, dqtype);
if (error)
return error;
/*
* Finish deferred items and roll the transaction to unjoin the quota
* inode from transaction so that we can unlock the quota inode; we
* play only with dquots from now on.
*/
error = xrep_defer_finish(sc);
if (error)
return error;
error = xfs_trans_roll(&sc->tp);
if (error)
return error;
xchk_iunlock(sc, sc->ilock_flags);
/* Fix anything the dquot verifiers don't complain about. */
error = xrep_quota_problems(sc, dqtype);
if (error)
return error;
return xrep_trans_commit(sc);
}

View File

@ -36,16 +36,14 @@ xchk_dir_walk_sf(
struct xfs_mount *mp = dp->i_mount; struct xfs_mount *mp = dp->i_mount;
struct xfs_da_geometry *geo = mp->m_dir_geo; struct xfs_da_geometry *geo = mp->m_dir_geo;
struct xfs_dir2_sf_entry *sfep; struct xfs_dir2_sf_entry *sfep;
struct xfs_dir2_sf_hdr *sfp; struct xfs_dir2_sf_hdr *sfp = dp->i_df.if_data;
xfs_ino_t ino; xfs_ino_t ino;
xfs_dir2_dataptr_t dapos; xfs_dir2_dataptr_t dapos;
unsigned int i; unsigned int i;
int error; int error;
ASSERT(dp->i_df.if_bytes == dp->i_disk_size); ASSERT(dp->i_df.if_bytes == dp->i_disk_size);
ASSERT(dp->i_df.if_u1.if_data != NULL); ASSERT(sfp != NULL);
sfp = (struct xfs_dir2_sf_hdr *)dp->i_df.if_u1.if_data;
/* dot entry */ /* dot entry */
dapos = xfs_dir2_db_off_to_dataptr(geo, geo->datablk, dapos = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,

View File

@ -20,6 +20,7 @@
#include "xfs_ialloc_btree.h" #include "xfs_ialloc_btree.h"
#include "xfs_rmap.h" #include "xfs_rmap.h"
#include "xfs_rmap_btree.h" #include "xfs_rmap_btree.h"
#include "xfs_refcount.h"
#include "xfs_refcount_btree.h" #include "xfs_refcount_btree.h"
#include "xfs_extent_busy.h" #include "xfs_extent_busy.h"
#include "xfs_ag.h" #include "xfs_ag.h"
@ -31,11 +32,14 @@
#include "xfs_da_btree.h" #include "xfs_da_btree.h"
#include "xfs_attr.h" #include "xfs_attr.h"
#include "xfs_attr_remote.h" #include "xfs_attr_remote.h"
#include "xfs_defer.h"
#include "scrub/scrub.h" #include "scrub/scrub.h"
#include "scrub/common.h" #include "scrub/common.h"
#include "scrub/trace.h" #include "scrub/trace.h"
#include "scrub/repair.h" #include "scrub/repair.h"
#include "scrub/bitmap.h" #include "scrub/bitmap.h"
#include "scrub/agb_bitmap.h"
#include "scrub/fsb_bitmap.h"
#include "scrub/reap.h" #include "scrub/reap.h"
/* /*
@ -73,10 +77,10 @@
* with only the same rmap owner but the block is not owned by something with * with only the same rmap owner but the block is not owned by something with
* the same rmap owner, the block will be freed. * the same rmap owner, the block will be freed.
* *
* The caller is responsible for locking the AG headers for the entire rebuild * The caller is responsible for locking the AG headers/inode for the entire
* operation so that nothing else can sneak in and change the AG state while * rebuild operation so that nothing else can sneak in and change the incore
* we're not looking. We must also invalidate any buffers associated with * state while we're not looking. We must also invalidate any buffers
* @bitmap. * associated with @bitmap.
*/ */
/* Information about reaping extents after a repair. */ /* Information about reaping extents after a repair. */
@ -247,7 +251,7 @@ xreap_agextent_binval(
max_fsbs = min_t(xfs_agblock_t, agbno_next - bno, max_fsbs = min_t(xfs_agblock_t, agbno_next - bno,
xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX)); xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX));
for (fsbcount = 1; fsbcount < max_fsbs; fsbcount++) { for (fsbcount = 1; fsbcount <= max_fsbs; fsbcount++) {
struct xfs_buf *bp = NULL; struct xfs_buf *bp = NULL;
xfs_daddr_t daddr; xfs_daddr_t daddr;
int error; int error;
@ -377,6 +381,17 @@ xreap_agextent_iter(
trace_xreap_dispose_unmap_extent(sc->sa.pag, agbno, *aglenp); trace_xreap_dispose_unmap_extent(sc->sa.pag, agbno, *aglenp);
rs->force_roll = true; rs->force_roll = true;
if (rs->oinfo == &XFS_RMAP_OINFO_COW) {
/*
* If we're unmapping CoW staging extents, remove the
* records from the refcountbt, which will remove the
* rmap record as well.
*/
xfs_refcount_free_cow_extent(sc->tp, fsbno, *aglenp);
return 0;
}
return xfs_rmap_free(sc->tp, sc->sa.agf_bp, sc->sa.pag, agbno, return xfs_rmap_free(sc->tp, sc->sa.agf_bp, sc->sa.pag, agbno,
*aglenp, rs->oinfo); *aglenp, rs->oinfo);
} }
@ -395,6 +410,26 @@ xreap_agextent_iter(
return 0; return 0;
} }
/*
* If we're getting rid of CoW staging extents, use deferred work items
* to remove the refcountbt records (which removes the rmap records)
* and free the extent. We're not worried about the system going down
* here because log recovery walks the refcount btree to clean out the
* CoW staging extents.
*/
if (rs->oinfo == &XFS_RMAP_OINFO_COW) {
ASSERT(rs->resv == XFS_AG_RESV_NONE);
xfs_refcount_free_cow_extent(sc->tp, fsbno, *aglenp);
error = xfs_free_extent_later(sc->tp, fsbno, *aglenp, NULL,
rs->resv, true);
if (error)
return error;
rs->force_roll = true;
return 0;
}
/* Put blocks back on the AGFL one at a time. */ /* Put blocks back on the AGFL one at a time. */
if (rs->resv == XFS_AG_RESV_AGFL) { if (rs->resv == XFS_AG_RESV_AGFL) {
ASSERT(*aglenp == 1); ASSERT(*aglenp == 1);
@ -409,13 +444,17 @@ xreap_agextent_iter(
/* /*
* Use deferred frees to get rid of the old btree blocks to try to * Use deferred frees to get rid of the old btree blocks to try to
* minimize the window in which we could crash and lose the old blocks. * minimize the window in which we could crash and lose the old blocks.
* Add a defer ops barrier every other extent to avoid stressing the
* system with large EFIs.
*/ */
error = __xfs_free_extent_later(sc->tp, fsbno, *aglenp, rs->oinfo, error = xfs_free_extent_later(sc->tp, fsbno, *aglenp, rs->oinfo,
rs->resv, true); rs->resv, true);
if (error) if (error)
return error; return error;
rs->deferred++; rs->deferred++;
if (rs->deferred % 2 == 0)
xfs_defer_add_barrier(sc->tp);
return 0; return 0;
} }
@ -425,13 +464,12 @@ xreap_agextent_iter(
*/ */
STATIC int STATIC int
xreap_agmeta_extent( xreap_agmeta_extent(
uint64_t fsbno, uint32_t agbno,
uint64_t len, uint32_t len,
void *priv) void *priv)
{ {
struct xreap_state *rs = priv; struct xreap_state *rs = priv;
struct xfs_scrub *sc = rs->sc; struct xfs_scrub *sc = rs->sc;
xfs_agblock_t agbno = fsbno;
xfs_agblock_t agbno_next = agbno + len; xfs_agblock_t agbno_next = agbno + len;
int error = 0; int error = 0;
@ -496,3 +534,115 @@ xrep_reap_agblocks(
return 0; return 0;
} }
/*
* Break a file metadata extent into sub-extents by fate (crosslinked, not
* crosslinked), and dispose of each sub-extent separately. The extent must
* not cross an AG boundary.
*/
STATIC int
xreap_fsmeta_extent(
uint64_t fsbno,
uint64_t len,
void *priv)
{
struct xreap_state *rs = priv;
struct xfs_scrub *sc = rs->sc;
xfs_agnumber_t agno = XFS_FSB_TO_AGNO(sc->mp, fsbno);
xfs_agblock_t agbno = XFS_FSB_TO_AGBNO(sc->mp, fsbno);
xfs_agblock_t agbno_next = agbno + len;
int error = 0;
ASSERT(len <= XFS_MAX_BMBT_EXTLEN);
ASSERT(sc->ip != NULL);
ASSERT(!sc->sa.pag);
/*
* We're reaping blocks after repairing file metadata, which means that
* we have to init the xchk_ag structure ourselves.
*/
sc->sa.pag = xfs_perag_get(sc->mp, agno);
if (!sc->sa.pag)
return -EFSCORRUPTED;
error = xfs_alloc_read_agf(sc->sa.pag, sc->tp, 0, &sc->sa.agf_bp);
if (error)
goto out_pag;
while (agbno < agbno_next) {
xfs_extlen_t aglen;
bool crosslinked;
error = xreap_agextent_select(rs, agbno, agbno_next,
&crosslinked, &aglen);
if (error)
goto out_agf;
error = xreap_agextent_iter(rs, agbno, &aglen, crosslinked);
if (error)
goto out_agf;
if (xreap_want_defer_finish(rs)) {
/*
* Holds the AGF buffer across the deferred chain
* processing.
*/
error = xrep_defer_finish(sc);
if (error)
goto out_agf;
xreap_defer_finish_reset(rs);
} else if (xreap_want_roll(rs)) {
/*
* Hold the AGF buffer across the transaction roll so
* that we don't have to reattach it to the scrub
* context.
*/
xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
error = xfs_trans_roll_inode(&sc->tp, sc->ip);
xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
if (error)
goto out_agf;
xreap_reset(rs);
}
agbno += aglen;
}
out_agf:
xfs_trans_brelse(sc->tp, sc->sa.agf_bp);
sc->sa.agf_bp = NULL;
out_pag:
xfs_perag_put(sc->sa.pag);
sc->sa.pag = NULL;
return error;
}
/*
* Dispose of every block of every fs metadata extent in the bitmap.
* Do not use this to dispose of the mappings in an ondisk inode fork.
*/
int
xrep_reap_fsblocks(
struct xfs_scrub *sc,
struct xfsb_bitmap *bitmap,
const struct xfs_owner_info *oinfo)
{
struct xreap_state rs = {
.sc = sc,
.oinfo = oinfo,
.resv = XFS_AG_RESV_NONE,
};
int error;
ASSERT(xfs_has_rmapbt(sc->mp));
ASSERT(sc->ip != NULL);
error = xfsb_bitmap_walk(bitmap, xreap_fsmeta_extent, &rs);
if (error)
return error;
if (xreap_dirty(&rs))
return xrep_defer_finish(sc);
return 0;
}

View File

@ -6,7 +6,12 @@
#ifndef __XFS_SCRUB_REAP_H__ #ifndef __XFS_SCRUB_REAP_H__
#define __XFS_SCRUB_REAP_H__ #define __XFS_SCRUB_REAP_H__
struct xagb_bitmap;
struct xfsb_bitmap;
int xrep_reap_agblocks(struct xfs_scrub *sc, struct xagb_bitmap *bitmap, int xrep_reap_agblocks(struct xfs_scrub *sc, struct xagb_bitmap *bitmap,
const struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type); const struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type);
int xrep_reap_fsblocks(struct xfs_scrub *sc, struct xfsb_bitmap *bitmap,
const struct xfs_owner_info *oinfo);
#endif /* __XFS_SCRUB_REAP_H__ */ #endif /* __XFS_SCRUB_REAP_H__ */

View File

@ -441,7 +441,7 @@ xchk_refcountbt_rec(
struct xchk_refcbt_records *rrc = bs->private; struct xchk_refcbt_records *rrc = bs->private;
xfs_refcount_btrec_to_irec(rec, &irec); xfs_refcount_btrec_to_irec(rec, &irec);
if (xfs_refcount_check_irec(bs->cur, &irec) != NULL) { if (xfs_refcount_check_irec(bs->cur->bc_ag.pag, &irec) != NULL) {
xchk_btree_set_corrupt(bs->sc, bs->cur, 0); xchk_btree_set_corrupt(bs->sc, bs->cur, 0);
return 0; return 0;
} }

View File

@ -0,0 +1,794 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2018-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_btree.h"
#include "xfs_btree_staging.h"
#include "xfs_inode.h"
#include "xfs_bit.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_alloc.h"
#include "xfs_ialloc.h"
#include "xfs_rmap.h"
#include "xfs_rmap_btree.h"
#include "xfs_refcount.h"
#include "xfs_refcount_btree.h"
#include "xfs_error.h"
#include "xfs_ag.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/btree.h"
#include "scrub/trace.h"
#include "scrub/repair.h"
#include "scrub/bitmap.h"
#include "scrub/agb_bitmap.h"
#include "scrub/xfile.h"
#include "scrub/xfarray.h"
#include "scrub/newbt.h"
#include "scrub/reap.h"
/*
* Rebuilding the Reference Count Btree
* ====================================
*
* This algorithm is "borrowed" from xfs_repair. Imagine the rmap
* entries as rectangles representing extents of physical blocks, and
* that the rectangles can be laid down to allow them to overlap each
* other; then we know that we must emit a refcnt btree entry wherever
* the amount of overlap changes, i.e. the emission stimulus is
* level-triggered:
*
* - ---
* -- ----- ---- --- ------
* -- ---- ----------- ---- ---------
* -------------------------------- -----------
* ^ ^ ^^ ^^ ^ ^^ ^^^ ^^^^ ^ ^^ ^ ^ ^
* 2 1 23 21 3 43 234 2123 1 01 2 3 0
*
* For our purposes, a rmap is a tuple (startblock, len, fileoff, owner).
*
* Note that in the actual refcnt btree we don't store the refcount < 2
* cases because the bnobt tells us which blocks are free; single-use
* blocks aren't recorded in the bnobt or the refcntbt. If the rmapbt
* supports storing multiple entries covering a given block we could
* theoretically dispense with the refcntbt and simply count rmaps, but
* that's inefficient in the (hot) write path, so we'll take the cost of
* the extra tree to save time. Also there's no guarantee that rmap
* will be enabled.
*
* Given an array of rmaps sorted by physical block number, a starting
* physical block (sp), a bag to hold rmaps that cover sp, and the next
* physical block where the level changes (np), we can reconstruct the
* refcount btree as follows:
*
* While there are still unprocessed rmaps in the array,
* - Set sp to the physical block (pblk) of the next unprocessed rmap.
* - Add to the bag all rmaps in the array where startblock == sp.
* - Set np to the physical block where the bag size will change. This
* is the minimum of (the pblk of the next unprocessed rmap) and
* (startblock + len of each rmap in the bag).
* - Record the bag size as old_bag_size.
*
* - While the bag isn't empty,
* - Remove from the bag all rmaps where startblock + len == np.
* - Add to the bag all rmaps in the array where startblock == np.
* - If the bag size isn't old_bag_size, store the refcount entry
* (sp, np - sp, bag_size) in the refcnt btree.
* - If the bag is empty, break out of the inner loop.
* - Set old_bag_size to the bag size
* - Set sp = np.
* - Set np to the physical block where the bag size will change.
* This is the minimum of (the pblk of the next unprocessed rmap)
* and (startblock + len of each rmap in the bag).
*
* Like all the other repairers, we make a list of all the refcount
* records we need, then reinitialize the refcount btree root and
* insert all the records.
*/
/* The only parts of the rmap that we care about for computing refcounts. */
struct xrep_refc_rmap {
xfs_agblock_t startblock;
xfs_extlen_t blockcount;
} __packed;
struct xrep_refc {
/* refcount extents */
struct xfarray *refcount_records;
/* new refcountbt information */
struct xrep_newbt new_btree;
/* old refcountbt blocks */
struct xagb_bitmap old_refcountbt_blocks;
struct xfs_scrub *sc;
/* get_records()'s position in the refcount record array. */
xfarray_idx_t array_cur;
/* # of refcountbt blocks */
xfs_extlen_t btblocks;
};
/* Check for any obvious conflicts with this shared/CoW staging extent. */
STATIC int
xrep_refc_check_ext(
struct xfs_scrub *sc,
const struct xfs_refcount_irec *rec)
{
enum xbtree_recpacking outcome;
int error;
if (xfs_refcount_check_irec(sc->sa.pag, rec) != NULL)
return -EFSCORRUPTED;
/* Make sure this isn't free space. */
error = xfs_alloc_has_records(sc->sa.bno_cur, rec->rc_startblock,
rec->rc_blockcount, &outcome);
if (error)
return error;
if (outcome != XBTREE_RECPACKING_EMPTY)
return -EFSCORRUPTED;
/* Must not be an inode chunk. */
error = xfs_ialloc_has_inodes_at_extent(sc->sa.ino_cur,
rec->rc_startblock, rec->rc_blockcount, &outcome);
if (error)
return error;
if (outcome != XBTREE_RECPACKING_EMPTY)
return -EFSCORRUPTED;
return 0;
}
/* Record a reference count extent. */
STATIC int
xrep_refc_stash(
struct xrep_refc *rr,
enum xfs_refc_domain domain,
xfs_agblock_t agbno,
xfs_extlen_t len,
uint64_t refcount)
{
struct xfs_refcount_irec irec = {
.rc_startblock = agbno,
.rc_blockcount = len,
.rc_domain = domain,
};
struct xfs_scrub *sc = rr->sc;
int error = 0;
if (xchk_should_terminate(sc, &error))
return error;
irec.rc_refcount = min_t(uint64_t, MAXREFCOUNT, refcount);
error = xrep_refc_check_ext(rr->sc, &irec);
if (error)
return error;
trace_xrep_refc_found(sc->sa.pag, &irec);
return xfarray_append(rr->refcount_records, &irec);
}
/* Record a CoW staging extent. */
STATIC int
xrep_refc_stash_cow(
struct xrep_refc *rr,
xfs_agblock_t agbno,
xfs_extlen_t len)
{
return xrep_refc_stash(rr, XFS_REFC_DOMAIN_COW, agbno, len, 1);
}
/* Decide if an rmap could describe a shared extent. */
static inline bool
xrep_refc_rmap_shareable(
struct xfs_mount *mp,
const struct xfs_rmap_irec *rmap)
{
/* AG metadata are never sharable */
if (XFS_RMAP_NON_INODE_OWNER(rmap->rm_owner))
return false;
/* Metadata in files are never shareable */
if (xfs_internal_inum(mp, rmap->rm_owner))
return false;
/* Metadata and unwritten file blocks are not shareable. */
if (rmap->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK |
XFS_RMAP_UNWRITTEN))
return false;
return true;
}
/*
* Walk along the reverse mapping records until we find one that could describe
* a shared extent.
*/
STATIC int
xrep_refc_walk_rmaps(
struct xrep_refc *rr,
struct xrep_refc_rmap *rrm,
bool *have_rec)
{
struct xfs_rmap_irec rmap;
struct xfs_btree_cur *cur = rr->sc->sa.rmap_cur;
struct xfs_mount *mp = cur->bc_mp;
int have_gt;
int error = 0;
*have_rec = false;
/*
* Loop through the remaining rmaps. Remember CoW staging
* extents and the refcountbt blocks from the old tree for later
* disposal. We can only share written data fork extents, so
* keep looping until we find an rmap for one.
*/
do {
if (xchk_should_terminate(rr->sc, &error))
return error;
error = xfs_btree_increment(cur, 0, &have_gt);
if (error)
return error;
if (!have_gt)
return 0;
error = xfs_rmap_get_rec(cur, &rmap, &have_gt);
if (error)
return error;
if (XFS_IS_CORRUPT(mp, !have_gt))
return -EFSCORRUPTED;
if (rmap.rm_owner == XFS_RMAP_OWN_COW) {
error = xrep_refc_stash_cow(rr, rmap.rm_startblock,
rmap.rm_blockcount);
if (error)
return error;
} else if (rmap.rm_owner == XFS_RMAP_OWN_REFC) {
/* refcountbt block, dump it when we're done. */
rr->btblocks += rmap.rm_blockcount;
error = xagb_bitmap_set(&rr->old_refcountbt_blocks,
rmap.rm_startblock, rmap.rm_blockcount);
if (error)
return error;
}
} while (!xrep_refc_rmap_shareable(mp, &rmap));
rrm->startblock = rmap.rm_startblock;
rrm->blockcount = rmap.rm_blockcount;
*have_rec = true;
return 0;
}
static inline uint32_t
xrep_refc_encode_startblock(
const struct xfs_refcount_irec *irec)
{
uint32_t start;
start = irec->rc_startblock & ~XFS_REFC_COWFLAG;
if (irec->rc_domain == XFS_REFC_DOMAIN_COW)
start |= XFS_REFC_COWFLAG;
return start;
}
/* Sort in the same order as the ondisk records. */
static int
xrep_refc_extent_cmp(
const void *a,
const void *b)
{
const struct xfs_refcount_irec *ap = a;
const struct xfs_refcount_irec *bp = b;
uint32_t sa, sb;
sa = xrep_refc_encode_startblock(ap);
sb = xrep_refc_encode_startblock(bp);
if (sa > sb)
return 1;
if (sa < sb)
return -1;
return 0;
}
/*
* Sort the refcount extents by startblock or else the btree records will be in
* the wrong order. Make sure the records do not overlap in physical space.
*/
STATIC int
xrep_refc_sort_records(
struct xrep_refc *rr)
{
struct xfs_refcount_irec irec;
xfarray_idx_t cur;
enum xfs_refc_domain dom = XFS_REFC_DOMAIN_SHARED;
xfs_agblock_t next_agbno = 0;
int error;
error = xfarray_sort(rr->refcount_records, xrep_refc_extent_cmp,
XFARRAY_SORT_KILLABLE);
if (error)
return error;
foreach_xfarray_idx(rr->refcount_records, cur) {
if (xchk_should_terminate(rr->sc, &error))
return error;
error = xfarray_load(rr->refcount_records, cur, &irec);
if (error)
return error;
if (dom == XFS_REFC_DOMAIN_SHARED &&
irec.rc_domain == XFS_REFC_DOMAIN_COW) {
dom = irec.rc_domain;
next_agbno = 0;
}
if (dom != irec.rc_domain)
return -EFSCORRUPTED;
if (irec.rc_startblock < next_agbno)
return -EFSCORRUPTED;
next_agbno = irec.rc_startblock + irec.rc_blockcount;
}
return error;
}
#define RRM_NEXT(r) ((r).startblock + (r).blockcount)
/*
* Find the next block where the refcount changes, given the next rmap we
* looked at and the ones we're already tracking.
*/
static inline int
xrep_refc_next_edge(
struct xfarray *rmap_bag,
struct xrep_refc_rmap *next_rrm,
bool next_valid,
xfs_agblock_t *nbnop)
{
struct xrep_refc_rmap rrm;
xfarray_idx_t array_cur = XFARRAY_CURSOR_INIT;
xfs_agblock_t nbno = NULLAGBLOCK;
int error;
if (next_valid)
nbno = next_rrm->startblock;
while ((error = xfarray_iter(rmap_bag, &array_cur, &rrm)) == 1)
nbno = min_t(xfs_agblock_t, nbno, RRM_NEXT(rrm));
if (error)
return error;
/*
* We should have found /something/ because either next_rrm is the next
* interesting rmap to look at after emitting this refcount extent, or
* there are other rmaps in rmap_bag contributing to the current
* sharing count. But if something is seriously wrong, bail out.
*/
if (nbno == NULLAGBLOCK)
return -EFSCORRUPTED;
*nbnop = nbno;
return 0;
}
/*
* Walk forward through the rmap btree to collect all rmaps starting at
* @bno in @rmap_bag. These represent the file(s) that share ownership of
* the current block. Upon return, the rmap cursor points to the last record
* satisfying the startblock constraint.
*/
static int
xrep_refc_push_rmaps_at(
struct xrep_refc *rr,
struct xfarray *rmap_bag,
xfs_agblock_t bno,
struct xrep_refc_rmap *rrm,
bool *have,
uint64_t *stack_sz)
{
struct xfs_scrub *sc = rr->sc;
int have_gt;
int error;
while (*have && rrm->startblock == bno) {
error = xfarray_store_anywhere(rmap_bag, rrm);
if (error)
return error;
(*stack_sz)++;
error = xrep_refc_walk_rmaps(rr, rrm, have);
if (error)
return error;
}
error = xfs_btree_decrement(sc->sa.rmap_cur, 0, &have_gt);
if (error)
return error;
if (XFS_IS_CORRUPT(sc->mp, !have_gt))
return -EFSCORRUPTED;
return 0;
}
/* Iterate all the rmap records to generate reference count data. */
STATIC int
xrep_refc_find_refcounts(
struct xrep_refc *rr)
{
struct xrep_refc_rmap rrm;
struct xfs_scrub *sc = rr->sc;
struct xfarray *rmap_bag;
char *descr;
uint64_t old_stack_sz;
uint64_t stack_sz = 0;
xfs_agblock_t sbno;
xfs_agblock_t cbno;
xfs_agblock_t nbno;
bool have;
int error;
xrep_ag_btcur_init(sc, &sc->sa);
/*
* Set up a sparse array to store all the rmap records that we're
* tracking to generate a reference count record. If this exceeds
* MAXREFCOUNT, we clamp rc_refcount.
*/
descr = xchk_xfile_ag_descr(sc, "rmap record bag");
error = xfarray_create(descr, 0, sizeof(struct xrep_refc_rmap),
&rmap_bag);
kfree(descr);
if (error)
goto out_cur;
/* Start the rmapbt cursor to the left of all records. */
error = xfs_btree_goto_left_edge(sc->sa.rmap_cur);
if (error)
goto out_bag;
/* Process reverse mappings into refcount data. */
while (xfs_btree_has_more_records(sc->sa.rmap_cur)) {
/* Push all rmaps with pblk == sbno onto the stack */
error = xrep_refc_walk_rmaps(rr, &rrm, &have);
if (error)
goto out_bag;
if (!have)
break;
sbno = cbno = rrm.startblock;
error = xrep_refc_push_rmaps_at(rr, rmap_bag, sbno,
&rrm, &have, &stack_sz);
if (error)
goto out_bag;
/* Set nbno to the bno of the next refcount change */
error = xrep_refc_next_edge(rmap_bag, &rrm, have, &nbno);
if (error)
goto out_bag;
ASSERT(nbno > sbno);
old_stack_sz = stack_sz;
/* While stack isn't empty... */
while (stack_sz) {
xfarray_idx_t array_cur = XFARRAY_CURSOR_INIT;
/* Pop all rmaps that end at nbno */
while ((error = xfarray_iter(rmap_bag, &array_cur,
&rrm)) == 1) {
if (RRM_NEXT(rrm) != nbno)
continue;
error = xfarray_unset(rmap_bag, array_cur - 1);
if (error)
goto out_bag;
stack_sz--;
}
if (error)
goto out_bag;
/* Push array items that start at nbno */
error = xrep_refc_walk_rmaps(rr, &rrm, &have);
if (error)
goto out_bag;
if (have) {
error = xrep_refc_push_rmaps_at(rr, rmap_bag,
nbno, &rrm, &have, &stack_sz);
if (error)
goto out_bag;
}
/* Emit refcount if necessary */
ASSERT(nbno > cbno);
if (stack_sz != old_stack_sz) {
if (old_stack_sz > 1) {
error = xrep_refc_stash(rr,
XFS_REFC_DOMAIN_SHARED,
cbno, nbno - cbno,
old_stack_sz);
if (error)
goto out_bag;
}
cbno = nbno;
}
/* Stack empty, go find the next rmap */
if (stack_sz == 0)
break;
old_stack_sz = stack_sz;
sbno = nbno;
/* Set nbno to the bno of the next refcount change */
error = xrep_refc_next_edge(rmap_bag, &rrm, have,
&nbno);
if (error)
goto out_bag;
ASSERT(nbno > sbno);
}
}
ASSERT(stack_sz == 0);
out_bag:
xfarray_destroy(rmap_bag);
out_cur:
xchk_ag_btcur_free(&sc->sa);
return error;
}
#undef RRM_NEXT
/* Retrieve refcountbt data for bulk load. */
STATIC int
xrep_refc_get_records(
struct xfs_btree_cur *cur,
unsigned int idx,
struct xfs_btree_block *block,
unsigned int nr_wanted,
void *priv)
{
struct xfs_refcount_irec *irec = &cur->bc_rec.rc;
struct xrep_refc *rr = priv;
union xfs_btree_rec *block_rec;
unsigned int loaded;
int error;
for (loaded = 0; loaded < nr_wanted; loaded++, idx++) {
error = xfarray_load(rr->refcount_records, rr->array_cur++,
irec);
if (error)
return error;
block_rec = xfs_btree_rec_addr(cur, idx, block);
cur->bc_ops->init_rec_from_cur(cur, block_rec);
}
return loaded;
}
/* Feed one of the new btree blocks to the bulk loader. */
STATIC int
xrep_refc_claim_block(
struct xfs_btree_cur *cur,
union xfs_btree_ptr *ptr,
void *priv)
{
struct xrep_refc *rr = priv;
return xrep_newbt_claim_block(cur, &rr->new_btree, ptr);
}
/* Update the AGF counters. */
STATIC int
xrep_refc_reset_counters(
struct xrep_refc *rr)
{
struct xfs_scrub *sc = rr->sc;
struct xfs_perag *pag = sc->sa.pag;
/*
* After we commit the new btree to disk, it is possible that the
* process to reap the old btree blocks will race with the AIL trying
* to checkpoint the old btree blocks into the filesystem. If the new
* tree is shorter than the old one, the refcountbt write verifier will
* fail and the AIL will shut down the filesystem.
*
* To avoid this, save the old incore btree height values as the alt
* height values before re-initializing the perag info from the updated
* AGF to capture all the new values.
*/
pag->pagf_repair_refcount_level = pag->pagf_refcount_level;
/* Reinitialize with the values we just logged. */
return xrep_reinit_pagf(sc);
}
/*
* Use the collected refcount information to stage a new refcount btree. If
* this is successful we'll return with the new btree root information logged
* to the repair transaction but not yet committed.
*/
STATIC int
xrep_refc_build_new_tree(
struct xrep_refc *rr)
{
struct xfs_scrub *sc = rr->sc;
struct xfs_btree_cur *refc_cur;
struct xfs_perag *pag = sc->sa.pag;
xfs_fsblock_t fsbno;
int error;
error = xrep_refc_sort_records(rr);
if (error)
return error;
/*
* Prepare to construct the new btree by reserving disk space for the
* new btree and setting up all the accounting information we'll need
* to root the new btree while it's under construction and before we
* attach it to the AG header.
*/
fsbno = XFS_AGB_TO_FSB(sc->mp, pag->pag_agno, xfs_refc_block(sc->mp));
xrep_newbt_init_ag(&rr->new_btree, sc, &XFS_RMAP_OINFO_REFC, fsbno,
XFS_AG_RESV_METADATA);
rr->new_btree.bload.get_records = xrep_refc_get_records;
rr->new_btree.bload.claim_block = xrep_refc_claim_block;
/* Compute how many blocks we'll need. */
refc_cur = xfs_refcountbt_stage_cursor(sc->mp, &rr->new_btree.afake,
pag);
error = xfs_btree_bload_compute_geometry(refc_cur,
&rr->new_btree.bload,
xfarray_length(rr->refcount_records));
if (error)
goto err_cur;
/* Last chance to abort before we start committing fixes. */
if (xchk_should_terminate(sc, &error))
goto err_cur;
/* Reserve the space we'll need for the new btree. */
error = xrep_newbt_alloc_blocks(&rr->new_btree,
rr->new_btree.bload.nr_blocks);
if (error)
goto err_cur;
/*
* Due to btree slack factors, it's possible for a new btree to be one
* level taller than the old btree. Update the incore btree height so
* that we don't trip the verifiers when writing the new btree blocks
* to disk.
*/
pag->pagf_repair_refcount_level = rr->new_btree.bload.btree_height;
/* Add all observed refcount records. */
rr->array_cur = XFARRAY_CURSOR_INIT;
error = xfs_btree_bload(refc_cur, &rr->new_btree.bload, rr);
if (error)
goto err_level;
/*
* Install the new btree in the AG header. After this point the old
* btree is no longer accessible and the new tree is live.
*/
xfs_refcountbt_commit_staged_btree(refc_cur, sc->tp, sc->sa.agf_bp);
xfs_btree_del_cursor(refc_cur, 0);
/* Reset the AGF counters now that we've changed the btree shape. */
error = xrep_refc_reset_counters(rr);
if (error)
goto err_newbt;
/* Dispose of any unused blocks and the accounting information. */
error = xrep_newbt_commit(&rr->new_btree);
if (error)
return error;
return xrep_roll_ag_trans(sc);
err_level:
pag->pagf_repair_refcount_level = 0;
err_cur:
xfs_btree_del_cursor(refc_cur, error);
err_newbt:
xrep_newbt_cancel(&rr->new_btree);
return error;
}
/*
* Now that we've logged the roots of the new btrees, invalidate all of the
* old blocks and free them.
*/
STATIC int
xrep_refc_remove_old_tree(
struct xrep_refc *rr)
{
struct xfs_scrub *sc = rr->sc;
struct xfs_perag *pag = sc->sa.pag;
int error;
/* Free the old refcountbt blocks if they're not in use. */
error = xrep_reap_agblocks(sc, &rr->old_refcountbt_blocks,
&XFS_RMAP_OINFO_REFC, XFS_AG_RESV_METADATA);
if (error)
return error;
/*
* Now that we've zapped all the old refcountbt blocks we can turn off
* the alternate height mechanism and reset the per-AG space
* reservations.
*/
pag->pagf_repair_refcount_level = 0;
sc->flags |= XREP_RESET_PERAG_RESV;
return 0;
}
/* Rebuild the refcount btree. */
int
xrep_refcountbt(
struct xfs_scrub *sc)
{
struct xrep_refc *rr;
struct xfs_mount *mp = sc->mp;
char *descr;
int error;
/* We require the rmapbt to rebuild anything. */
if (!xfs_has_rmapbt(mp))
return -EOPNOTSUPP;
rr = kzalloc(sizeof(struct xrep_refc), XCHK_GFP_FLAGS);
if (!rr)
return -ENOMEM;
rr->sc = sc;
/* Set up enough storage to handle one refcount record per block. */
descr = xchk_xfile_ag_descr(sc, "reference count records");
error = xfarray_create(descr, mp->m_sb.sb_agblocks,
sizeof(struct xfs_refcount_irec),
&rr->refcount_records);
kfree(descr);
if (error)
goto out_rr;
/* Collect all reference counts. */
xagb_bitmap_init(&rr->old_refcountbt_blocks);
error = xrep_refc_find_refcounts(rr);
if (error)
goto out_bitmap;
/* Rebuild the refcount information. */
error = xrep_refc_build_new_tree(rr);
if (error)
goto out_bitmap;
/* Kill the old tree. */
error = xrep_refc_remove_old_tree(rr);
if (error)
goto out_bitmap;
out_bitmap:
xagb_bitmap_destroy(&rr->old_refcountbt_blocks);
xfarray_destroy(rr->refcount_records);
out_rr:
kfree(rr);
return error;
}

View File

@ -27,6 +27,9 @@
#include "xfs_quota.h" #include "xfs_quota.h"
#include "xfs_qm.h" #include "xfs_qm.h"
#include "xfs_defer.h" #include "xfs_defer.h"
#include "xfs_errortag.h"
#include "xfs_error.h"
#include "xfs_reflink.h"
#include "scrub/scrub.h" #include "scrub/scrub.h"
#include "scrub/common.h" #include "scrub/common.h"
#include "scrub/trace.h" #include "scrub/trace.h"
@ -176,6 +179,16 @@ xrep_roll_ag_trans(
return 0; return 0;
} }
/* Roll the scrub transaction, holding the primary metadata locked. */
int
xrep_roll_trans(
struct xfs_scrub *sc)
{
if (!sc->ip)
return xrep_roll_ag_trans(sc);
return xfs_trans_roll_inode(&sc->tp, sc->ip);
}
/* Finish all deferred work attached to the repair transaction. */ /* Finish all deferred work attached to the repair transaction. */
int int
xrep_defer_finish( xrep_defer_finish(
@ -673,6 +686,7 @@ xrep_find_ag_btree_roots(
return error; return error;
} }
#ifdef CONFIG_XFS_QUOTA
/* Force a quotacheck the next time we mount. */ /* Force a quotacheck the next time we mount. */
void void
xrep_force_quotacheck( xrep_force_quotacheck(
@ -699,10 +713,10 @@ xrep_force_quotacheck(
* *
* This function ensures that the appropriate dquots are attached to an inode. * This function ensures that the appropriate dquots are attached to an inode.
* We cannot allow the dquot code to allocate an on-disk dquot block here * We cannot allow the dquot code to allocate an on-disk dquot block here
* because we're already in transaction context with the inode locked. The * because we're already in transaction context. The on-disk dquot should
* on-disk dquot should already exist anyway. If the quota code signals * already exist anyway. If the quota code signals corruption or missing quota
* corruption or missing quota information, schedule quotacheck, which will * information, schedule quotacheck, which will repair corruptions in the quota
* repair corruptions in the quota metadata. * metadata.
*/ */
int int
xrep_ino_dqattach( xrep_ino_dqattach(
@ -710,7 +724,10 @@ xrep_ino_dqattach(
{ {
int error; int error;
error = xfs_qm_dqattach_locked(sc->ip, false); ASSERT(sc->tp != NULL);
ASSERT(sc->ip != NULL);
error = xfs_qm_dqattach(sc->ip);
switch (error) { switch (error) {
case -EFSBADCRC: case -EFSBADCRC:
case -EFSCORRUPTED: case -EFSCORRUPTED:
@ -734,3 +751,367 @@ xrep_ino_dqattach(
return error; return error;
} }
#endif /* CONFIG_XFS_QUOTA */
/*
* Ensure that the inode being repaired is ready to handle a certain number of
* extents, or return EFSCORRUPTED. Caller must hold the ILOCK of the inode
* being repaired and have joined it to the scrub transaction.
*/
int
xrep_ino_ensure_extent_count(
struct xfs_scrub *sc,
int whichfork,
xfs_extnum_t nextents)
{
xfs_extnum_t max_extents;
bool inode_has_nrext64;
inode_has_nrext64 = xfs_inode_has_large_extent_counts(sc->ip);
max_extents = xfs_iext_max_nextents(inode_has_nrext64, whichfork);
if (nextents <= max_extents)
return 0;
if (inode_has_nrext64)
return -EFSCORRUPTED;
if (!xfs_has_large_extent_counts(sc->mp))
return -EFSCORRUPTED;
max_extents = xfs_iext_max_nextents(true, whichfork);
if (nextents > max_extents)
return -EFSCORRUPTED;
sc->ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
return 0;
}
/*
* Initialize all the btree cursors for an AG repair except for the btree that
* we're rebuilding.
*/
void
xrep_ag_btcur_init(
struct xfs_scrub *sc,
struct xchk_ag *sa)
{
struct xfs_mount *mp = sc->mp;
/* Set up a bnobt cursor for cross-referencing. */
if (sc->sm->sm_type != XFS_SCRUB_TYPE_BNOBT &&
sc->sm->sm_type != XFS_SCRUB_TYPE_CNTBT) {
sa->bno_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
sc->sa.pag, XFS_BTNUM_BNO);
sa->cnt_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
sc->sa.pag, XFS_BTNUM_CNT);
}
/* Set up a inobt cursor for cross-referencing. */
if (sc->sm->sm_type != XFS_SCRUB_TYPE_INOBT &&
sc->sm->sm_type != XFS_SCRUB_TYPE_FINOBT) {
sa->ino_cur = xfs_inobt_init_cursor(sc->sa.pag, sc->tp,
sa->agi_bp, XFS_BTNUM_INO);
if (xfs_has_finobt(mp))
sa->fino_cur = xfs_inobt_init_cursor(sc->sa.pag,
sc->tp, sa->agi_bp, XFS_BTNUM_FINO);
}
/* Set up a rmapbt cursor for cross-referencing. */
if (sc->sm->sm_type != XFS_SCRUB_TYPE_RMAPBT &&
xfs_has_rmapbt(mp))
sa->rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp, sa->agf_bp,
sc->sa.pag);
/* Set up a refcountbt cursor for cross-referencing. */
if (sc->sm->sm_type != XFS_SCRUB_TYPE_REFCNTBT &&
xfs_has_reflink(mp))
sa->refc_cur = xfs_refcountbt_init_cursor(mp, sc->tp,
sa->agf_bp, sc->sa.pag);
}
/*
* Reinitialize the in-core AG state after a repair by rereading the AGF
* buffer. We had better get the same AGF buffer as the one that's attached
* to the scrub context.
*/
int
xrep_reinit_pagf(
struct xfs_scrub *sc)
{
struct xfs_perag *pag = sc->sa.pag;
struct xfs_buf *bp;
int error;
ASSERT(pag);
ASSERT(xfs_perag_initialised_agf(pag));
clear_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
error = xfs_alloc_read_agf(pag, sc->tp, 0, &bp);
if (error)
return error;
if (bp != sc->sa.agf_bp) {
ASSERT(bp == sc->sa.agf_bp);
return -EFSCORRUPTED;
}
return 0;
}
/*
* Reinitialize the in-core AG state after a repair by rereading the AGI
* buffer. We had better get the same AGI buffer as the one that's attached
* to the scrub context.
*/
int
xrep_reinit_pagi(
struct xfs_scrub *sc)
{
struct xfs_perag *pag = sc->sa.pag;
struct xfs_buf *bp;
int error;
ASSERT(pag);
ASSERT(xfs_perag_initialised_agi(pag));
clear_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate);
error = xfs_ialloc_read_agi(pag, sc->tp, &bp);
if (error)
return error;
if (bp != sc->sa.agi_bp) {
ASSERT(bp == sc->sa.agi_bp);
return -EFSCORRUPTED;
}
return 0;
}
/*
* Given an active reference to a perag structure, load AG headers and cursors.
* This should only be called to scan an AG while repairing file-based metadata.
*/
int
xrep_ag_init(
struct xfs_scrub *sc,
struct xfs_perag *pag,
struct xchk_ag *sa)
{
int error;
ASSERT(!sa->pag);
error = xfs_ialloc_read_agi(pag, sc->tp, &sa->agi_bp);
if (error)
return error;
error = xfs_alloc_read_agf(pag, sc->tp, 0, &sa->agf_bp);
if (error)
return error;
/* Grab our own passive reference from the caller's ref. */
sa->pag = xfs_perag_hold(pag);
xrep_ag_btcur_init(sc, sa);
return 0;
}
/* Reinitialize the per-AG block reservation for the AG we just fixed. */
int
xrep_reset_perag_resv(
struct xfs_scrub *sc)
{
int error;
if (!(sc->flags & XREP_RESET_PERAG_RESV))
return 0;
ASSERT(sc->sa.pag != NULL);
ASSERT(sc->ops->type == ST_PERAG);
ASSERT(sc->tp);
sc->flags &= ~XREP_RESET_PERAG_RESV;
error = xfs_ag_resv_free(sc->sa.pag);
if (error)
goto out;
error = xfs_ag_resv_init(sc->sa.pag, sc->tp);
if (error == -ENOSPC) {
xfs_err(sc->mp,
"Insufficient free space to reset per-AG reservation for AG %u after repair.",
sc->sa.pag->pag_agno);
error = 0;
}
out:
return error;
}
/* Decide if we are going to call the repair function for a scrub type. */
bool
xrep_will_attempt(
struct xfs_scrub *sc)
{
/* Userspace asked us to rebuild the structure regardless. */
if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_FORCE_REBUILD)
return true;
/* Let debug users force us into the repair routines. */
if (XFS_TEST_ERROR(false, sc->mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR))
return true;
/* Metadata is corrupt or failed cross-referencing. */
if (xchk_needs_repair(sc->sm))
return true;
return false;
}
/* Try to fix some part of a metadata inode by calling another scrubber. */
STATIC int
xrep_metadata_inode_subtype(
struct xfs_scrub *sc,
unsigned int scrub_type)
{
__u32 smtype = sc->sm->sm_type;
__u32 smflags = sc->sm->sm_flags;
unsigned int sick_mask = sc->sick_mask;
int error;
/*
* Let's see if the inode needs repair. We're going to open-code calls
* to the scrub and repair functions so that we can hang on to the
* resources that we already acquired instead of using the standard
* setup/teardown routines.
*/
sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
sc->sm->sm_type = scrub_type;
switch (scrub_type) {
case XFS_SCRUB_TYPE_INODE:
error = xchk_inode(sc);
break;
case XFS_SCRUB_TYPE_BMBTD:
error = xchk_bmap_data(sc);
break;
case XFS_SCRUB_TYPE_BMBTA:
error = xchk_bmap_attr(sc);
break;
default:
ASSERT(0);
error = -EFSCORRUPTED;
}
if (error)
goto out;
if (!xrep_will_attempt(sc))
goto out;
/*
* Repair some part of the inode. This will potentially join the inode
* to the transaction.
*/
switch (scrub_type) {
case XFS_SCRUB_TYPE_INODE:
error = xrep_inode(sc);
break;
case XFS_SCRUB_TYPE_BMBTD:
error = xrep_bmap(sc, XFS_DATA_FORK, false);
break;
case XFS_SCRUB_TYPE_BMBTA:
error = xrep_bmap(sc, XFS_ATTR_FORK, false);
break;
}
if (error)
goto out;
/*
* Finish all deferred intent items and then roll the transaction so
* that the inode will not be joined to the transaction when we exit
* the function.
*/
error = xfs_defer_finish(&sc->tp);
if (error)
goto out;
error = xfs_trans_roll(&sc->tp);
if (error)
goto out;
/*
* Clear the corruption flags and re-check the metadata that we just
* repaired.
*/
sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
switch (scrub_type) {
case XFS_SCRUB_TYPE_INODE:
error = xchk_inode(sc);
break;
case XFS_SCRUB_TYPE_BMBTD:
error = xchk_bmap_data(sc);
break;
case XFS_SCRUB_TYPE_BMBTA:
error = xchk_bmap_attr(sc);
break;
}
if (error)
goto out;
/* If corruption persists, the repair has failed. */
if (xchk_needs_repair(sc->sm)) {
error = -EFSCORRUPTED;
goto out;
}
out:
sc->sick_mask = sick_mask;
sc->sm->sm_type = smtype;
sc->sm->sm_flags = smflags;
return error;
}
/*
* Repair the ondisk forks of a metadata inode. The caller must ensure that
* sc->ip points to the metadata inode and the ILOCK is held on that inode.
* The inode must not be joined to the transaction before the call, and will
* not be afterwards.
*/
int
xrep_metadata_inode_forks(
struct xfs_scrub *sc)
{
bool dirty = false;
int error;
/* Repair the inode record and the data fork. */
error = xrep_metadata_inode_subtype(sc, XFS_SCRUB_TYPE_INODE);
if (error)
return error;
error = xrep_metadata_inode_subtype(sc, XFS_SCRUB_TYPE_BMBTD);
if (error)
return error;
/* Make sure the attr fork looks ok before we delete it. */
error = xrep_metadata_inode_subtype(sc, XFS_SCRUB_TYPE_BMBTA);
if (error)
return error;
/* Clear the reflink flag since metadata never shares. */
if (xfs_is_reflink_inode(sc->ip)) {
dirty = true;
xfs_trans_ijoin(sc->tp, sc->ip, 0);
error = xfs_reflink_clear_inode_flag(sc->ip, &sc->tp);
if (error)
return error;
}
/*
* If we modified the inode, roll the transaction but don't rejoin the
* inode to the new transaction because xrep_bmap_data can do that.
*/
if (dirty) {
error = xfs_trans_roll(&sc->tp);
if (error)
return error;
dirty = false;
}
return 0;
}

View File

@ -28,15 +28,28 @@ static inline int xrep_notsupported(struct xfs_scrub *sc)
/* Repair helpers */ /* Repair helpers */
int xrep_attempt(struct xfs_scrub *sc, struct xchk_stats_run *run); int xrep_attempt(struct xfs_scrub *sc, struct xchk_stats_run *run);
bool xrep_will_attempt(struct xfs_scrub *sc);
void xrep_failure(struct xfs_mount *mp); void xrep_failure(struct xfs_mount *mp);
int xrep_roll_ag_trans(struct xfs_scrub *sc); int xrep_roll_ag_trans(struct xfs_scrub *sc);
int xrep_roll_trans(struct xfs_scrub *sc);
int xrep_defer_finish(struct xfs_scrub *sc); int xrep_defer_finish(struct xfs_scrub *sc);
bool xrep_ag_has_space(struct xfs_perag *pag, xfs_extlen_t nr_blocks, bool xrep_ag_has_space(struct xfs_perag *pag, xfs_extlen_t nr_blocks,
enum xfs_ag_resv_type type); enum xfs_ag_resv_type type);
xfs_extlen_t xrep_calc_ag_resblks(struct xfs_scrub *sc); xfs_extlen_t xrep_calc_ag_resblks(struct xfs_scrub *sc);
static inline int
xrep_trans_commit(
struct xfs_scrub *sc)
{
int error = xfs_trans_commit(sc->tp);
sc->tp = NULL;
return error;
}
struct xbitmap; struct xbitmap;
struct xagb_bitmap; struct xagb_bitmap;
struct xfsb_bitmap;
int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink); int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink);
@ -57,8 +70,35 @@ struct xrep_find_ag_btree {
int xrep_find_ag_btree_roots(struct xfs_scrub *sc, struct xfs_buf *agf_bp, int xrep_find_ag_btree_roots(struct xfs_scrub *sc, struct xfs_buf *agf_bp,
struct xrep_find_ag_btree *btree_info, struct xfs_buf *agfl_bp); struct xrep_find_ag_btree *btree_info, struct xfs_buf *agfl_bp);
#ifdef CONFIG_XFS_QUOTA
void xrep_force_quotacheck(struct xfs_scrub *sc, xfs_dqtype_t type); void xrep_force_quotacheck(struct xfs_scrub *sc, xfs_dqtype_t type);
int xrep_ino_dqattach(struct xfs_scrub *sc); int xrep_ino_dqattach(struct xfs_scrub *sc);
#else
# define xrep_force_quotacheck(sc, type) ((void)0)
# define xrep_ino_dqattach(sc) (0)
#endif /* CONFIG_XFS_QUOTA */
int xrep_ino_ensure_extent_count(struct xfs_scrub *sc, int whichfork,
xfs_extnum_t nextents);
int xrep_reset_perag_resv(struct xfs_scrub *sc);
int xrep_bmap(struct xfs_scrub *sc, int whichfork, bool allow_unwritten);
int xrep_metadata_inode_forks(struct xfs_scrub *sc);
/* Repair setup functions */
int xrep_setup_ag_allocbt(struct xfs_scrub *sc);
struct xfs_imap;
int xrep_setup_inode(struct xfs_scrub *sc, const struct xfs_imap *imap);
void xrep_ag_btcur_init(struct xfs_scrub *sc, struct xchk_ag *sa);
int xrep_ag_init(struct xfs_scrub *sc, struct xfs_perag *pag,
struct xchk_ag *sa);
/* Metadata revalidators */
int xrep_revalidate_allocbt(struct xfs_scrub *sc);
int xrep_revalidate_iallocbt(struct xfs_scrub *sc);
/* Metadata repairers */ /* Metadata repairers */
@ -67,9 +107,34 @@ int xrep_superblock(struct xfs_scrub *sc);
int xrep_agf(struct xfs_scrub *sc); int xrep_agf(struct xfs_scrub *sc);
int xrep_agfl(struct xfs_scrub *sc); int xrep_agfl(struct xfs_scrub *sc);
int xrep_agi(struct xfs_scrub *sc); int xrep_agi(struct xfs_scrub *sc);
int xrep_allocbt(struct xfs_scrub *sc);
int xrep_iallocbt(struct xfs_scrub *sc);
int xrep_refcountbt(struct xfs_scrub *sc);
int xrep_inode(struct xfs_scrub *sc);
int xrep_bmap_data(struct xfs_scrub *sc);
int xrep_bmap_attr(struct xfs_scrub *sc);
int xrep_bmap_cow(struct xfs_scrub *sc);
#ifdef CONFIG_XFS_RT
int xrep_rtbitmap(struct xfs_scrub *sc);
#else
# define xrep_rtbitmap xrep_notsupported
#endif /* CONFIG_XFS_RT */
#ifdef CONFIG_XFS_QUOTA
int xrep_quota(struct xfs_scrub *sc);
#else
# define xrep_quota xrep_notsupported
#endif /* CONFIG_XFS_QUOTA */
int xrep_reinit_pagf(struct xfs_scrub *sc);
int xrep_reinit_pagi(struct xfs_scrub *sc);
#else #else
#define xrep_ino_dqattach(sc) (0)
#define xrep_will_attempt(sc) (false)
static inline int static inline int
xrep_attempt( xrep_attempt(
struct xfs_scrub *sc, struct xfs_scrub *sc,
@ -87,11 +152,45 @@ xrep_calc_ag_resblks(
return 0; return 0;
} }
static inline int
xrep_reset_perag_resv(
struct xfs_scrub *sc)
{
if (!(sc->flags & XREP_RESET_PERAG_RESV))
return 0;
ASSERT(0);
return -EOPNOTSUPP;
}
/* repair setup functions for no-repair */
static inline int
xrep_setup_nothing(
struct xfs_scrub *sc)
{
return 0;
}
#define xrep_setup_ag_allocbt xrep_setup_nothing
#define xrep_setup_inode(sc, imap) ((void)0)
#define xrep_revalidate_allocbt (NULL)
#define xrep_revalidate_iallocbt (NULL)
#define xrep_probe xrep_notsupported #define xrep_probe xrep_notsupported
#define xrep_superblock xrep_notsupported #define xrep_superblock xrep_notsupported
#define xrep_agf xrep_notsupported #define xrep_agf xrep_notsupported
#define xrep_agfl xrep_notsupported #define xrep_agfl xrep_notsupported
#define xrep_agi xrep_notsupported #define xrep_agi xrep_notsupported
#define xrep_allocbt xrep_notsupported
#define xrep_iallocbt xrep_notsupported
#define xrep_refcountbt xrep_notsupported
#define xrep_inode xrep_notsupported
#define xrep_bmap_data xrep_notsupported
#define xrep_bmap_attr xrep_notsupported
#define xrep_bmap_cow xrep_notsupported
#define xrep_rtbitmap xrep_notsupported
#define xrep_quota xrep_notsupported
#endif /* CONFIG_XFS_ONLINE_REPAIR */ #endif /* CONFIG_XFS_ONLINE_REPAIR */

View File

@ -24,6 +24,7 @@
#include "scrub/common.h" #include "scrub/common.h"
#include "scrub/btree.h" #include "scrub/btree.h"
#include "scrub/bitmap.h" #include "scrub/bitmap.h"
#include "scrub/agb_bitmap.h"
/* /*
* Set us up to scrub reverse mapping btrees. * Set us up to scrub reverse mapping btrees.

View File

@ -14,17 +14,33 @@
#include "xfs_rtbitmap.h" #include "xfs_rtbitmap.h"
#include "xfs_inode.h" #include "xfs_inode.h"
#include "xfs_bmap.h" #include "xfs_bmap.h"
#include "xfs_bit.h"
#include "scrub/scrub.h" #include "scrub/scrub.h"
#include "scrub/common.h" #include "scrub/common.h"
#include "scrub/repair.h"
#include "scrub/rtbitmap.h"
/* Set us up with the realtime metadata locked. */ /* Set us up with the realtime metadata locked. */
int int
xchk_setup_rtbitmap( xchk_setup_rtbitmap(
struct xfs_scrub *sc) struct xfs_scrub *sc)
{ {
struct xfs_mount *mp = sc->mp;
struct xchk_rtbitmap *rtb;
int error; int error;
error = xchk_trans_alloc(sc, 0); rtb = kzalloc(sizeof(struct xchk_rtbitmap), XCHK_GFP_FLAGS);
if (!rtb)
return -ENOMEM;
sc->buf = rtb;
if (xchk_could_repair(sc)) {
error = xrep_setup_rtbitmap(sc, rtb);
if (error)
return error;
}
error = xchk_trans_alloc(sc, rtb->resblks);
if (error) if (error)
return error; return error;
@ -32,7 +48,22 @@ xchk_setup_rtbitmap(
if (error) if (error)
return error; return error;
error = xchk_ino_dqattach(sc);
if (error)
return error;
xchk_ilock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP); xchk_ilock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP);
/*
* Now that we've locked the rtbitmap, we can't race with growfsrt
* trying to expand the bitmap or change the size of the rt volume.
* Hence it is safe to compute and check the geometry values.
*/
if (mp->m_sb.sb_rblocks) {
rtb->rextents = xfs_rtb_to_rtx(mp, mp->m_sb.sb_rblocks);
rtb->rextslog = xfs_compute_rextslog(rtb->rextents);
rtb->rbmblocks = xfs_rtbitmap_blockcount(mp, rtb->rextents);
}
return 0; return 0;
} }
@ -63,21 +94,30 @@ STATIC int
xchk_rtbitmap_check_extents( xchk_rtbitmap_check_extents(
struct xfs_scrub *sc) struct xfs_scrub *sc)
{ {
struct xfs_mount *mp = sc->mp;
struct xfs_bmbt_irec map; struct xfs_bmbt_irec map;
xfs_rtblock_t off; struct xfs_iext_cursor icur;
int nmap; struct xfs_mount *mp = sc->mp;
struct xfs_inode *ip = sc->ip;
xfs_fileoff_t off = 0;
xfs_fileoff_t endoff;
int error = 0; int error = 0;
for (off = 0; off < mp->m_sb.sb_rbmblocks;) { /* Mappings may not cross or lie beyond EOF. */
endoff = XFS_B_TO_FSB(mp, ip->i_disk_size);
if (xfs_iext_lookup_extent(ip, &ip->i_df, endoff, &icur, &map)) {
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, endoff);
return 0;
}
while (off < endoff) {
int nmap = 1;
if (xchk_should_terminate(sc, &error) || if (xchk_should_terminate(sc, &error) ||
(sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))
break; break;
/* Make sure we have a written extent. */ /* Make sure we have a written extent. */
nmap = 1; error = xfs_bmapi_read(ip, off, endoff - off, &map, &nmap,
error = xfs_bmapi_read(mp->m_rbmip, off,
mp->m_sb.sb_rbmblocks - off, &map, &nmap,
XFS_DATA_FORK); XFS_DATA_FORK);
if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, off, &error)) if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, off, &error))
break; break;
@ -98,12 +138,48 @@ int
xchk_rtbitmap( xchk_rtbitmap(
struct xfs_scrub *sc) struct xfs_scrub *sc)
{ {
struct xfs_mount *mp = sc->mp;
struct xchk_rtbitmap *rtb = sc->buf;
int error; int error;
/* Is the size of the rtbitmap correct? */ /* Is sb_rextents correct? */
if (sc->mp->m_rbmip->i_disk_size != if (mp->m_sb.sb_rextents != rtb->rextents) {
XFS_FSB_TO_B(sc->mp, sc->mp->m_sb.sb_rbmblocks)) { xchk_ino_set_corrupt(sc, mp->m_rbmip->i_ino);
xchk_ino_set_corrupt(sc, sc->mp->m_rbmip->i_ino); return 0;
}
/* Is sb_rextslog correct? */
if (mp->m_sb.sb_rextslog != rtb->rextslog) {
xchk_ino_set_corrupt(sc, mp->m_rbmip->i_ino);
return 0;
}
/*
* Is sb_rbmblocks large enough to handle the current rt volume? In no
* case can we exceed 4bn bitmap blocks since the super field is a u32.
*/
if (rtb->rbmblocks > U32_MAX) {
xchk_ino_set_corrupt(sc, mp->m_rbmip->i_ino);
return 0;
}
if (mp->m_sb.sb_rbmblocks != rtb->rbmblocks) {
xchk_ino_set_corrupt(sc, mp->m_rbmip->i_ino);
return 0;
}
/* The bitmap file length must be aligned to an fsblock. */
if (mp->m_rbmip->i_disk_size & mp->m_blockmask) {
xchk_ino_set_corrupt(sc, mp->m_rbmip->i_ino);
return 0;
}
/*
* Is the bitmap file itself large enough to handle the rt volume?
* growfsrt expands the bitmap file before updating sb_rextents, so the
* file can be larger than sb_rbmblocks.
*/
if (mp->m_rbmip->i_disk_size < XFS_FSB_TO_B(mp, rtb->rbmblocks)) {
xchk_ino_set_corrupt(sc, mp->m_rbmip->i_ino);
return 0; return 0;
} }
@ -116,12 +192,11 @@ xchk_rtbitmap(
if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))
return error; return error;
error = xfs_rtalloc_query_all(sc->mp, sc->tp, xchk_rtbitmap_rec, sc); error = xfs_rtalloc_query_all(mp, sc->tp, xchk_rtbitmap_rec, sc);
if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error)) if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
goto out; return error;
out: return 0;
return error;
} }
/* xref check that the extent is not free in the rtbitmap */ /* xref check that the extent is not free in the rtbitmap */

22
fs/xfs/scrub/rtbitmap.h Normal file
View File

@ -0,0 +1,22 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#ifndef __XFS_SCRUB_RTBITMAP_H__
#define __XFS_SCRUB_RTBITMAP_H__
struct xchk_rtbitmap {
uint64_t rextents;
uint64_t rbmblocks;
unsigned int rextslog;
unsigned int resblks;
};
#ifdef CONFIG_XFS_ONLINE_REPAIR
int xrep_setup_rtbitmap(struct xfs_scrub *sc, struct xchk_rtbitmap *rtb);
#else
# define xrep_setup_rtbitmap(sc, rtb) (0)
#endif /* CONFIG_XFS_ONLINE_REPAIR */
#endif /* __XFS_SCRUB_RTBITMAP_H__ */

View File

@ -0,0 +1,202 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2020-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_btree.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_inode.h"
#include "xfs_bit.h"
#include "xfs_bmap.h"
#include "xfs_bmap_btree.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/trace.h"
#include "scrub/repair.h"
#include "scrub/xfile.h"
#include "scrub/rtbitmap.h"
/* Set up to repair the realtime bitmap file metadata. */
int
xrep_setup_rtbitmap(
struct xfs_scrub *sc,
struct xchk_rtbitmap *rtb)
{
struct xfs_mount *mp = sc->mp;
unsigned long long blocks = 0;
/*
* Reserve enough blocks to write out a completely new bmbt for a
* maximally fragmented bitmap file. We do not hold the rtbitmap
* ILOCK yet, so this is entirely speculative.
*/
blocks = xfs_bmbt_calc_size(mp, mp->m_sb.sb_rbmblocks);
if (blocks > UINT_MAX)
return -EOPNOTSUPP;
rtb->resblks += blocks;
return 0;
}
/*
* Make sure that the given range of the data fork of the realtime file is
* mapped to written blocks. The caller must ensure that the inode is joined
* to the transaction.
*/
STATIC int
xrep_rtbitmap_data_mappings(
struct xfs_scrub *sc,
xfs_filblks_t len)
{
struct xfs_bmbt_irec map;
xfs_fileoff_t off = 0;
int error;
ASSERT(sc->ip != NULL);
while (off < len) {
int nmaps = 1;
/*
* If we have a real extent mapping this block then we're
* in ok shape.
*/
error = xfs_bmapi_read(sc->ip, off, len - off, &map, &nmaps,
XFS_DATA_FORK);
if (error)
return error;
if (nmaps == 0) {
ASSERT(nmaps != 0);
return -EFSCORRUPTED;
}
/*
* Written extents are ok. Holes are not filled because we
* do not know the freespace information.
*/
if (xfs_bmap_is_written_extent(&map) ||
map.br_startblock == HOLESTARTBLOCK) {
off = map.br_startoff + map.br_blockcount;
continue;
}
/*
* If we find a delalloc reservation then something is very
* very wrong. Bail out.
*/
if (map.br_startblock == DELAYSTARTBLOCK)
return -EFSCORRUPTED;
/* Make sure we're really converting an unwritten extent. */
if (map.br_state != XFS_EXT_UNWRITTEN) {
ASSERT(map.br_state == XFS_EXT_UNWRITTEN);
return -EFSCORRUPTED;
}
/* Make sure this block has a real zeroed extent mapped. */
nmaps = 1;
error = xfs_bmapi_write(sc->tp, sc->ip, map.br_startoff,
map.br_blockcount,
XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO,
0, &map, &nmaps);
if (error)
return error;
if (nmaps != 1)
return -EFSCORRUPTED;
/* Commit new extent and all deferred work. */
error = xrep_defer_finish(sc);
if (error)
return error;
off = map.br_startoff + map.br_blockcount;
}
return 0;
}
/* Fix broken rt volume geometry. */
STATIC int
xrep_rtbitmap_geometry(
struct xfs_scrub *sc,
struct xchk_rtbitmap *rtb)
{
struct xfs_mount *mp = sc->mp;
struct xfs_trans *tp = sc->tp;
/* Superblock fields */
if (mp->m_sb.sb_rextents != rtb->rextents)
xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_REXTENTS,
rtb->rextents - mp->m_sb.sb_rextents);
if (mp->m_sb.sb_rbmblocks != rtb->rbmblocks)
xfs_trans_mod_sb(tp, XFS_TRANS_SB_RBMBLOCKS,
rtb->rbmblocks - mp->m_sb.sb_rbmblocks);
if (mp->m_sb.sb_rextslog != rtb->rextslog)
xfs_trans_mod_sb(tp, XFS_TRANS_SB_REXTSLOG,
rtb->rextslog - mp->m_sb.sb_rextslog);
/* Fix broken isize */
sc->ip->i_disk_size = roundup_64(sc->ip->i_disk_size,
mp->m_sb.sb_blocksize);
if (sc->ip->i_disk_size < XFS_FSB_TO_B(mp, rtb->rbmblocks))
sc->ip->i_disk_size = XFS_FSB_TO_B(mp, rtb->rbmblocks);
xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
return xrep_roll_trans(sc);
}
/* Repair the realtime bitmap file metadata. */
int
xrep_rtbitmap(
struct xfs_scrub *sc)
{
struct xchk_rtbitmap *rtb = sc->buf;
struct xfs_mount *mp = sc->mp;
unsigned long long blocks = 0;
int error;
/* Impossibly large rtbitmap means we can't touch the filesystem. */
if (rtb->rbmblocks > U32_MAX)
return 0;
/*
* If the size of the rt bitmap file is larger than what we reserved,
* figure out if we need to adjust the block reservation in the
* transaction.
*/
blocks = xfs_bmbt_calc_size(mp, rtb->rbmblocks);
if (blocks > UINT_MAX)
return -EOPNOTSUPP;
if (blocks > rtb->resblks) {
error = xfs_trans_reserve_more(sc->tp, blocks, 0);
if (error)
return error;
rtb->resblks += blocks;
}
/* Fix inode core and forks. */
error = xrep_metadata_inode_forks(sc);
if (error)
return error;
xfs_trans_ijoin(sc->tp, sc->ip, 0);
/* Ensure no unwritten extents. */
error = xrep_rtbitmap_data_mappings(sc, rtb->rbmblocks);
if (error)
return error;
/* Fix inconsistent bitmap geometry */
return xrep_rtbitmap_geometry(sc, rtb);
}

View File

@ -31,6 +31,18 @@
* (potentially large) amount of data in pageable memory. * (potentially large) amount of data in pageable memory.
*/ */
struct xchk_rtsummary {
struct xfs_rtalloc_args args;
uint64_t rextents;
uint64_t rbmblocks;
uint64_t rsumsize;
unsigned int rsumlevels;
/* Memory buffer for the summary comparison. */
union xfs_suminfo_raw words[];
};
/* Set us up to check the rtsummary file. */ /* Set us up to check the rtsummary file. */
int int
xchk_setup_rtsummary( xchk_setup_rtsummary(
@ -38,8 +50,15 @@ xchk_setup_rtsummary(
{ {
struct xfs_mount *mp = sc->mp; struct xfs_mount *mp = sc->mp;
char *descr; char *descr;
struct xchk_rtsummary *rts;
int error; int error;
rts = kvzalloc(struct_size(rts, words, mp->m_blockwsize),
XCHK_GFP_FLAGS);
if (!rts)
return -ENOMEM;
sc->buf = rts;
/* /*
* Create an xfile to construct a new rtsummary file. The xfile allows * Create an xfile to construct a new rtsummary file. The xfile allows
* us to avoid pinning kernel memory for this purpose. * us to avoid pinning kernel memory for this purpose.
@ -54,15 +73,14 @@ xchk_setup_rtsummary(
if (error) if (error)
return error; return error;
/* Allocate a memory buffer for the summary comparison. */
sc->buf = kvmalloc(mp->m_sb.sb_blocksize, XCHK_GFP_FLAGS);
if (!sc->buf)
return -ENOMEM;
error = xchk_install_live_inode(sc, mp->m_rsumip); error = xchk_install_live_inode(sc, mp->m_rsumip);
if (error) if (error)
return error; return error;
error = xchk_ino_dqattach(sc);
if (error)
return error;
/* /*
* Locking order requires us to take the rtbitmap first. We must be * Locking order requires us to take the rtbitmap first. We must be
* careful to unlock it ourselves when we are done with the rtbitmap * careful to unlock it ourselves when we are done with the rtbitmap
@ -71,13 +89,29 @@ xchk_setup_rtsummary(
*/ */
xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
xchk_ilock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM); xchk_ilock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM);
/*
* Now that we've locked the rtbitmap and rtsummary, we can't race with
* growfsrt trying to expand the summary or change the size of the rt
* volume. Hence it is safe to compute and check the geometry values.
*/
if (mp->m_sb.sb_rblocks) {
xfs_filblks_t rsumblocks;
int rextslog;
rts->rextents = xfs_rtb_to_rtx(mp, mp->m_sb.sb_rblocks);
rextslog = xfs_compute_rextslog(rts->rextents);
rts->rsumlevels = rextslog + 1;
rts->rbmblocks = xfs_rtbitmap_blockcount(mp, rts->rextents);
rsumblocks = xfs_rtsummary_blockcount(mp, rts->rsumlevels,
rts->rbmblocks);
rts->rsumsize = XFS_FSB_TO_B(mp, rsumblocks);
}
return 0; return 0;
} }
/* Helper functions to record suminfo words in an xfile. */ /* Helper functions to record suminfo words in an xfile. */
typedef unsigned int xchk_rtsumoff_t;
static inline int static inline int
xfsum_load( xfsum_load(
struct xfs_scrub *sc, struct xfs_scrub *sc,
@ -143,7 +177,7 @@ xchk_rtsum_record_free(
/* Compute the relevant location in the rtsum file. */ /* Compute the relevant location in the rtsum file. */
rbmoff = xfs_rtx_to_rbmblock(mp, rec->ar_startext); rbmoff = xfs_rtx_to_rbmblock(mp, rec->ar_startext);
lenlog = XFS_RTBLOCKLOG(rec->ar_extcount); lenlog = xfs_highbit64(rec->ar_extcount);
offs = xfs_rtsumoffs(mp, lenlog, rbmoff); offs = xfs_rtsumoffs(mp, lenlog, rbmoff);
rtbno = xfs_rtx_to_rtb(mp, rec->ar_startext); rtbno = xfs_rtx_to_rtb(mp, rec->ar_startext);
@ -188,19 +222,29 @@ STATIC int
xchk_rtsum_compare( xchk_rtsum_compare(
struct xfs_scrub *sc) struct xfs_scrub *sc)
{ {
struct xfs_rtalloc_args args = {
.mp = sc->mp,
.tp = sc->tp,
};
struct xfs_mount *mp = sc->mp;
struct xfs_bmbt_irec map; struct xfs_bmbt_irec map;
xfs_fileoff_t off; struct xfs_iext_cursor icur;
xchk_rtsumoff_t sumoff = 0;
int nmap;
for (off = 0; off < XFS_B_TO_FSB(mp, mp->m_rsumsize); off++) { struct xfs_mount *mp = sc->mp;
union xfs_suminfo_raw *ondisk_info; struct xfs_inode *ip = sc->ip;
int error = 0; struct xchk_rtsummary *rts = sc->buf;
xfs_fileoff_t off = 0;
xfs_fileoff_t endoff;
xfs_rtsumoff_t sumoff = 0;
int error = 0;
rts->args.mp = sc->mp;
rts->args.tp = sc->tp;
/* Mappings may not cross or lie beyond EOF. */
endoff = XFS_B_TO_FSB(mp, ip->i_disk_size);
if (xfs_iext_lookup_extent(ip, &ip->i_df, endoff, &icur, &map)) {
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, endoff);
return 0;
}
while (off < endoff) {
int nmap = 1;
if (xchk_should_terminate(sc, &error)) if (xchk_should_terminate(sc, &error))
return error; return error;
@ -208,8 +252,7 @@ xchk_rtsum_compare(
return 0; return 0;
/* Make sure we have a written extent. */ /* Make sure we have a written extent. */
nmap = 1; error = xfs_bmapi_read(ip, off, endoff - off, &map, &nmap,
error = xfs_bmapi_read(mp->m_rsumip, off, 1, &map, &nmap,
XFS_DATA_FORK); XFS_DATA_FORK);
if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, off, &error)) if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, off, &error))
return error; return error;
@ -219,24 +262,33 @@ xchk_rtsum_compare(
return 0; return 0;
} }
off += map.br_blockcount;
}
for (off = 0; off < endoff; off++) {
union xfs_suminfo_raw *ondisk_info;
/* Read a block's worth of ondisk rtsummary file. */ /* Read a block's worth of ondisk rtsummary file. */
error = xfs_rtsummary_read_buf(&args, off); error = xfs_rtsummary_read_buf(&rts->args, off);
if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, off, &error)) if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, off, &error))
return error; return error;
/* Read a block's worth of computed rtsummary file. */ /* Read a block's worth of computed rtsummary file. */
error = xfsum_copyout(sc, sumoff, sc->buf, mp->m_blockwsize); error = xfsum_copyout(sc, sumoff, rts->words, mp->m_blockwsize);
if (error) { if (error) {
xfs_rtbuf_cache_relse(&args); xfs_rtbuf_cache_relse(&rts->args);
return error; return error;
} }
ondisk_info = xfs_rsumblock_infoptr(&args, 0); ondisk_info = xfs_rsumblock_infoptr(&rts->args, 0);
if (memcmp(ondisk_info, sc->buf, if (memcmp(ondisk_info, rts->words,
mp->m_blockwsize << XFS_WORDLOG) != 0) mp->m_blockwsize << XFS_WORDLOG) != 0) {
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, off); xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, off);
xfs_rtbuf_cache_relse(&rts->args);
return error;
}
xfs_rtbuf_cache_relse(&args); xfs_rtbuf_cache_relse(&rts->args);
sumoff += mp->m_blockwsize; sumoff += mp->m_blockwsize;
} }
@ -249,8 +301,43 @@ xchk_rtsummary(
struct xfs_scrub *sc) struct xfs_scrub *sc)
{ {
struct xfs_mount *mp = sc->mp; struct xfs_mount *mp = sc->mp;
struct xchk_rtsummary *rts = sc->buf;
int error = 0; int error = 0;
/* Is sb_rextents correct? */
if (mp->m_sb.sb_rextents != rts->rextents) {
xchk_ino_set_corrupt(sc, mp->m_rbmip->i_ino);
goto out_rbm;
}
/* Is m_rsumlevels correct? */
if (mp->m_rsumlevels != rts->rsumlevels) {
xchk_ino_set_corrupt(sc, mp->m_rsumip->i_ino);
goto out_rbm;
}
/* Is m_rsumsize correct? */
if (mp->m_rsumsize != rts->rsumsize) {
xchk_ino_set_corrupt(sc, mp->m_rsumip->i_ino);
goto out_rbm;
}
/* The summary file length must be aligned to an fsblock. */
if (mp->m_rsumip->i_disk_size & mp->m_blockmask) {
xchk_ino_set_corrupt(sc, mp->m_rsumip->i_ino);
goto out_rbm;
}
/*
* Is the summary file itself large enough to handle the rt volume?
* growfsrt expands the summary file before updating sb_rextents, so
* the file can be larger than rsumsize.
*/
if (mp->m_rsumip->i_disk_size < rts->rsumsize) {
xchk_ino_set_corrupt(sc, mp->m_rsumip->i_ino);
goto out_rbm;
}
/* Invoke the fork scrubber. */ /* Invoke the fork scrubber. */
error = xchk_metadata_inode_forks(sc); error = xchk_metadata_inode_forks(sc);
if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))

View File

@ -14,8 +14,6 @@
#include "xfs_inode.h" #include "xfs_inode.h"
#include "xfs_quota.h" #include "xfs_quota.h"
#include "xfs_qm.h" #include "xfs_qm.h"
#include "xfs_errortag.h"
#include "xfs_error.h"
#include "xfs_scrub.h" #include "xfs_scrub.h"
#include "scrub/scrub.h" #include "scrub/scrub.h"
#include "scrub/common.h" #include "scrub/common.h"
@ -238,27 +236,31 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
[XFS_SCRUB_TYPE_BNOBT] = { /* bnobt */ [XFS_SCRUB_TYPE_BNOBT] = { /* bnobt */
.type = ST_PERAG, .type = ST_PERAG,
.setup = xchk_setup_ag_allocbt, .setup = xchk_setup_ag_allocbt,
.scrub = xchk_bnobt, .scrub = xchk_allocbt,
.repair = xrep_notsupported, .repair = xrep_allocbt,
.repair_eval = xrep_revalidate_allocbt,
}, },
[XFS_SCRUB_TYPE_CNTBT] = { /* cntbt */ [XFS_SCRUB_TYPE_CNTBT] = { /* cntbt */
.type = ST_PERAG, .type = ST_PERAG,
.setup = xchk_setup_ag_allocbt, .setup = xchk_setup_ag_allocbt,
.scrub = xchk_cntbt, .scrub = xchk_allocbt,
.repair = xrep_notsupported, .repair = xrep_allocbt,
.repair_eval = xrep_revalidate_allocbt,
}, },
[XFS_SCRUB_TYPE_INOBT] = { /* inobt */ [XFS_SCRUB_TYPE_INOBT] = { /* inobt */
.type = ST_PERAG, .type = ST_PERAG,
.setup = xchk_setup_ag_iallocbt, .setup = xchk_setup_ag_iallocbt,
.scrub = xchk_inobt, .scrub = xchk_iallocbt,
.repair = xrep_notsupported, .repair = xrep_iallocbt,
.repair_eval = xrep_revalidate_iallocbt,
}, },
[XFS_SCRUB_TYPE_FINOBT] = { /* finobt */ [XFS_SCRUB_TYPE_FINOBT] = { /* finobt */
.type = ST_PERAG, .type = ST_PERAG,
.setup = xchk_setup_ag_iallocbt, .setup = xchk_setup_ag_iallocbt,
.scrub = xchk_finobt, .scrub = xchk_iallocbt,
.has = xfs_has_finobt, .has = xfs_has_finobt,
.repair = xrep_notsupported, .repair = xrep_iallocbt,
.repair_eval = xrep_revalidate_iallocbt,
}, },
[XFS_SCRUB_TYPE_RMAPBT] = { /* rmapbt */ [XFS_SCRUB_TYPE_RMAPBT] = { /* rmapbt */
.type = ST_PERAG, .type = ST_PERAG,
@ -272,31 +274,31 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
.setup = xchk_setup_ag_refcountbt, .setup = xchk_setup_ag_refcountbt,
.scrub = xchk_refcountbt, .scrub = xchk_refcountbt,
.has = xfs_has_reflink, .has = xfs_has_reflink,
.repair = xrep_notsupported, .repair = xrep_refcountbt,
}, },
[XFS_SCRUB_TYPE_INODE] = { /* inode record */ [XFS_SCRUB_TYPE_INODE] = { /* inode record */
.type = ST_INODE, .type = ST_INODE,
.setup = xchk_setup_inode, .setup = xchk_setup_inode,
.scrub = xchk_inode, .scrub = xchk_inode,
.repair = xrep_notsupported, .repair = xrep_inode,
}, },
[XFS_SCRUB_TYPE_BMBTD] = { /* inode data fork */ [XFS_SCRUB_TYPE_BMBTD] = { /* inode data fork */
.type = ST_INODE, .type = ST_INODE,
.setup = xchk_setup_inode_bmap, .setup = xchk_setup_inode_bmap,
.scrub = xchk_bmap_data, .scrub = xchk_bmap_data,
.repair = xrep_notsupported, .repair = xrep_bmap_data,
}, },
[XFS_SCRUB_TYPE_BMBTA] = { /* inode attr fork */ [XFS_SCRUB_TYPE_BMBTA] = { /* inode attr fork */
.type = ST_INODE, .type = ST_INODE,
.setup = xchk_setup_inode_bmap, .setup = xchk_setup_inode_bmap,
.scrub = xchk_bmap_attr, .scrub = xchk_bmap_attr,
.repair = xrep_notsupported, .repair = xrep_bmap_attr,
}, },
[XFS_SCRUB_TYPE_BMBTC] = { /* inode CoW fork */ [XFS_SCRUB_TYPE_BMBTC] = { /* inode CoW fork */
.type = ST_INODE, .type = ST_INODE,
.setup = xchk_setup_inode_bmap, .setup = xchk_setup_inode_bmap,
.scrub = xchk_bmap_cow, .scrub = xchk_bmap_cow,
.repair = xrep_notsupported, .repair = xrep_bmap_cow,
}, },
[XFS_SCRUB_TYPE_DIR] = { /* directory */ [XFS_SCRUB_TYPE_DIR] = { /* directory */
.type = ST_INODE, .type = ST_INODE,
@ -326,33 +328,31 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
.type = ST_FS, .type = ST_FS,
.setup = xchk_setup_rtbitmap, .setup = xchk_setup_rtbitmap,
.scrub = xchk_rtbitmap, .scrub = xchk_rtbitmap,
.has = xfs_has_realtime, .repair = xrep_rtbitmap,
.repair = xrep_notsupported,
}, },
[XFS_SCRUB_TYPE_RTSUM] = { /* realtime summary */ [XFS_SCRUB_TYPE_RTSUM] = { /* realtime summary */
.type = ST_FS, .type = ST_FS,
.setup = xchk_setup_rtsummary, .setup = xchk_setup_rtsummary,
.scrub = xchk_rtsummary, .scrub = xchk_rtsummary,
.has = xfs_has_realtime,
.repair = xrep_notsupported, .repair = xrep_notsupported,
}, },
[XFS_SCRUB_TYPE_UQUOTA] = { /* user quota */ [XFS_SCRUB_TYPE_UQUOTA] = { /* user quota */
.type = ST_FS, .type = ST_FS,
.setup = xchk_setup_quota, .setup = xchk_setup_quota,
.scrub = xchk_quota, .scrub = xchk_quota,
.repair = xrep_notsupported, .repair = xrep_quota,
}, },
[XFS_SCRUB_TYPE_GQUOTA] = { /* group quota */ [XFS_SCRUB_TYPE_GQUOTA] = { /* group quota */
.type = ST_FS, .type = ST_FS,
.setup = xchk_setup_quota, .setup = xchk_setup_quota,
.scrub = xchk_quota, .scrub = xchk_quota,
.repair = xrep_notsupported, .repair = xrep_quota,
}, },
[XFS_SCRUB_TYPE_PQUOTA] = { /* project quota */ [XFS_SCRUB_TYPE_PQUOTA] = { /* project quota */
.type = ST_FS, .type = ST_FS,
.setup = xchk_setup_quota, .setup = xchk_setup_quota,
.scrub = xchk_quota, .scrub = xchk_quota,
.repair = xrep_notsupported, .repair = xrep_quota,
}, },
[XFS_SCRUB_TYPE_FSCOUNTERS] = { /* fs summary counters */ [XFS_SCRUB_TYPE_FSCOUNTERS] = { /* fs summary counters */
.type = ST_FS, .type = ST_FS,
@ -531,7 +531,10 @@ retry_op:
/* Scrub for errors. */ /* Scrub for errors. */
check_start = xchk_stats_now(); check_start = xchk_stats_now();
error = sc->ops->scrub(sc); if ((sc->flags & XREP_ALREADY_FIXED) && sc->ops->repair_eval != NULL)
error = sc->ops->repair_eval(sc);
else
error = sc->ops->scrub(sc);
run.scrub_ns += xchk_stats_elapsed_ns(check_start); run.scrub_ns += xchk_stats_elapsed_ns(check_start);
if (error == -EDEADLOCK && !(sc->flags & XCHK_TRY_HARDER)) if (error == -EDEADLOCK && !(sc->flags & XCHK_TRY_HARDER))
goto try_harder; goto try_harder;
@ -542,23 +545,12 @@ retry_op:
xchk_update_health(sc); xchk_update_health(sc);
if ((sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && if (xchk_could_repair(sc)) {
!(sc->flags & XREP_ALREADY_FIXED)) {
bool needs_fix = xchk_needs_repair(sc->sm);
/* Userspace asked us to rebuild the structure regardless. */
if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_FORCE_REBUILD)
needs_fix = true;
/* Let debug users force us into the repair routines. */
if (XFS_TEST_ERROR(needs_fix, mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR))
needs_fix = true;
/* /*
* If userspace asked for a repair but it wasn't necessary, * If userspace asked for a repair but it wasn't necessary,
* report that back to userspace. * report that back to userspace.
*/ */
if (!needs_fix) { if (!xrep_will_attempt(sc)) {
sc->sm->sm_flags |= XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED; sc->sm->sm_flags |= XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED;
goto out_nofix; goto out_nofix;
} }

View File

@ -35,6 +35,14 @@ struct xchk_meta_ops {
/* Repair or optimize the metadata. */ /* Repair or optimize the metadata. */
int (*repair)(struct xfs_scrub *); int (*repair)(struct xfs_scrub *);
/*
* Re-scrub the metadata we repaired, in case there's extra work that
* we need to do to check our repair work. If this is NULL, we'll use
* the ->scrub function pointer, assuming that the regular scrub is
* sufficient.
*/
int (*repair_eval)(struct xfs_scrub *sc);
/* Decide if we even have this piece of metadata. */ /* Decide if we even have this piece of metadata. */
bool (*has)(struct xfs_mount *); bool (*has)(struct xfs_mount *);
@ -113,6 +121,7 @@ struct xfs_scrub {
#define XCHK_HAVE_FREEZE_PROT (1U << 1) /* do we have freeze protection? */ #define XCHK_HAVE_FREEZE_PROT (1U << 1) /* do we have freeze protection? */
#define XCHK_FSGATES_DRAIN (1U << 2) /* defer ops draining enabled */ #define XCHK_FSGATES_DRAIN (1U << 2) /* defer ops draining enabled */
#define XCHK_NEED_DRAIN (1U << 3) /* scrub needs to drain defer ops */ #define XCHK_NEED_DRAIN (1U << 3) /* scrub needs to drain defer ops */
#define XREP_RESET_PERAG_RESV (1U << 30) /* must reset AG space reservation */
#define XREP_ALREADY_FIXED (1U << 31) /* checking our repair work */ #define XREP_ALREADY_FIXED (1U << 31) /* checking our repair work */
/* /*
@ -129,10 +138,8 @@ int xchk_superblock(struct xfs_scrub *sc);
int xchk_agf(struct xfs_scrub *sc); int xchk_agf(struct xfs_scrub *sc);
int xchk_agfl(struct xfs_scrub *sc); int xchk_agfl(struct xfs_scrub *sc);
int xchk_agi(struct xfs_scrub *sc); int xchk_agi(struct xfs_scrub *sc);
int xchk_bnobt(struct xfs_scrub *sc); int xchk_allocbt(struct xfs_scrub *sc);
int xchk_cntbt(struct xfs_scrub *sc); int xchk_iallocbt(struct xfs_scrub *sc);
int xchk_inobt(struct xfs_scrub *sc);
int xchk_finobt(struct xfs_scrub *sc);
int xchk_rmapbt(struct xfs_scrub *sc); int xchk_rmapbt(struct xfs_scrub *sc);
int xchk_refcountbt(struct xfs_scrub *sc); int xchk_refcountbt(struct xfs_scrub *sc);
int xchk_inode(struct xfs_scrub *sc); int xchk_inode(struct xfs_scrub *sc);

Some files were not shown because too many files have changed in this diff Show More