New XFS code for 6.2:

- Fix a race condition w.r.t. percpu inode free counters
  - Fix a broken error return in xfs_remove
  - Print FS UUID at mount/unmount time
  - Numerous fixes to the online fsck code
  - Fix inode locking inconsistency problems when dealing with realtime
    metadata files
  - Actually merge pull requests so that we capture the cover letter
    contents
  - Fix a race between rebuilding VFS inode state and the AIL flushing
    inodes that could cause corrupt inodes to be written to the
    filesystem
  - Fix a data corruption problem resulting from a write() to an
    unwritten extent racing with writeback started on behalf of memory
    reclaim changing the extent state
  - Add debugging knobs so that we can test iomap invalidation
  - Fix the blockdev pagecache contents being stale after unmounting the
    filesystem, leading to spurious xfs_db errors and corrupt metadumps
  - Fix a file mapping corruption bug due to ilock cycling when attaching
    dquots to a file during delalloc reservation
  - Fix a refcount btree corruption problem due to the refcount
    adjustment code not handling MAXREFCOUNT correctly, resulting in
    unnecessary record splits
  - Fix COW staging extent alloctions not being classified as USERDATA,
    which results in filestreams being ignored and possible data
    corruption if the allocation was filled from the AGFL and the block
    buffer is still being tracked in the AIL
  - Fix new duplicated includes
  - Fix a race between the dquot shrinker and dquot freeing that could
    cause a UAF
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmOSEWsACgkQ+H93GTRK
 tOvpsg//Y8pgue8GFwyXq0LYEYb1yjueGIxDGz9SwkfMP9vADsdDpXxquHmes5M+
 Q9vMyFnfaizZs2oXD6Nw/+RJMyOa3ZQtNqjxJET5pTIBcWvdjsP9UGW+K+1uN7LT
 NsM7lgpxy8RfQFHjvFHpOysxGIpT70n3lz98qlwy1yIGF/EFE52pkKcArGjpIu4A
 wBdyL0hIBwXc27zLRahLxfwFaW/I40ka3D40EUYpNnAjE5Sy0YgLlsOCzrxN0UvY
 a9dlq+WFJjWDsLp6vr11ruewXAmzYG2m/3RdP2aLbmDHDvo06UkesKkPNhexlClM
 kRE/ZImmakqKlAqgtUbkxT06NbIKOxYslbcoOOLDneqb1grTcgk79J7jsMlLLU1s
 s1WyPMWR3wb0jjclgGBxd3c1nprdkvJSkBpyEOwIYLhwdPNuwqTwEVsq7TvasRLI
 dgals5/J6fBnIeTR7x2YObonQRd4FlkXFv+AVYpGVUJEI02eRgY3i7NJBZWyBKAS
 +Gcd1Bq1F387b0FRqq1iVhGD+NpoHHiP84bOQED9R9t0jP1AHj9t47f+Uuvjj2hN
 ByT7MpA0nZdbYGKU+rFyKsIvONyLdxyjL+jm6FkmrW+G25fJ1af2yhrVhZQhw7dm
 zLb1ntSnXvNTj4OopfKSDD2MPGf+2C/o2XJvAAS501pmsQefKOM=
 =plES
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.2-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull XFS updates from Darrick Wong:
 "The highlight of this is a batch of fixes for the online metadata
  checking code as we start the loooong march towards merging online
  repair. I aim to merge that in time for the 2023 LTS.

  There are also a large number of data corruption and race condition
  fixes in this patchset. Most notably fixed are write() calls to
  unwritten extents racing with writeback, which required some late(r
  than I prefer) code changes to iomap to support the necessary
  revalidations. I don't really like iomap changes going in past -rc4,
  but Dave and I have been working on it long enough that I chose to
  push it for 6.2 anyway.

  There are also a number of other subtle problems fixed, including the
  log racing with inode writeback to write inodes with incorrect link
  count to disk; file data mapping corruptions as a result of incorrect
  lock cycling when attaching dquots; refcount metadata corruption if
  one actually manages to share a block 2^32 times; and the log
  clobbering cow staging extents if they were formerly metadata blocks.

  Summary:

   - Fix a race condition w.r.t. percpu inode free counters

   - Fix a broken error return in xfs_remove

   - Print FS UUID at mount/unmount time

   - Numerous fixes to the online fsck code

   - Fix inode locking inconsistency problems when dealing with realtime
     metadata files

   - Actually merge pull requests so that we capture the cover letter
     contents

   - Fix a race between rebuilding VFS inode state and the AIL flushing
     inodes that could cause corrupt inodes to be written to the
     filesystem

   - Fix a data corruption problem resulting from a write() to an
     unwritten extent racing with writeback started on behalf of memory
     reclaim changing the extent state

   - Add debugging knobs so that we can test iomap invalidation

   - Fix the blockdev pagecache contents being stale after unmounting
     the filesystem, leading to spurious xfs_db errors and corrupt
     metadumps

   - Fix a file mapping corruption bug due to ilock cycling when
     attaching dquots to a file during delalloc reservation

   - Fix a refcount btree corruption problem due to the refcount
     adjustment code not handling MAXREFCOUNT correctly, resulting in
     unnecessary record splits

   - Fix COW staging extent alloctions not being classified as USERDATA,
     which results in filestreams being ignored and possible data
     corruption if the allocation was filled from the AGFL and the block
     buffer is still being tracked in the AIL

   - Fix new duplicated includes

   - Fix a race between the dquot shrinker and dquot freeing that could
     cause a UAF"

* tag 'xfs-6.2-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (50 commits)
  xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING
  xfs: Remove duplicated include in xfs_iomap.c
  xfs: invalidate xfs_bufs when allocating cow extents
  xfs: get rid of assert from xfs_btree_islastblock
  xfs: estimate post-merge refcounts correctly
  xfs: hoist refcount record merge predicates
  xfs: fix super block buf log item UAF during force shutdown
  xfs: wait iclog complete before tearing down AIL
  xfs: attach dquots to inode before reading data/cow fork mappings
  xfs: shut up -Wuninitialized in xfsaild_push
  xfs: use memcpy, not strncpy, to format the attr prefix during listxattr
  xfs: invalidate block device page cache during unmount
  xfs: add debug knob to slow down write for fun
  xfs: add debug knob to slow down writeback for fun
  xfs: drop write error injection is unfixable, remove it
  xfs: use iomap_valid method to detect stale cached iomaps
  iomap: write iomap validity checks
  xfs: xfs_bmap_punch_delalloc_range() should take a byte range
  iomap: buffered write failure should not truncate the page cache
  xfs,iomap: move delalloc punching to iomap
  ...
This commit is contained in:
Linus Torvalds 2022-12-14 10:11:51 -08:00
commit 87be949912
49 changed files with 1317 additions and 313 deletions

View File

@ -584,7 +584,7 @@ static int iomap_write_begin_inline(const struct iomap_iter *iter,
return iomap_read_inline_data(iter, folio);
}
static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
size_t len, struct folio **foliop)
{
const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
@ -618,6 +618,27 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
goto out_no_page;
}
/*
* Now we have a locked folio, before we do anything with it we need to
* check that the iomap we have cached is not stale. The inode extent
* mapping can change due to concurrent IO in flight (e.g.
* IOMAP_UNWRITTEN state can change and memory reclaim could have
* reclaimed a previously partially written page at this index after IO
* completion before this write reaches this file offset) and hence we
* could do the wrong thing here (zero a page range incorrectly or fail
* to zero) and corrupt data.
*/
if (page_ops && page_ops->iomap_valid) {
bool iomap_valid = page_ops->iomap_valid(iter->inode,
&iter->iomap);
if (!iomap_valid) {
iter->iomap.flags |= IOMAP_F_STALE;
status = 0;
goto out_unlock;
}
}
if (pos + len > folio_pos(folio) + folio_size(folio))
len = folio_pos(folio) + folio_size(folio) - pos;
@ -773,6 +794,8 @@ again:
status = iomap_write_begin(iter, pos, bytes, &folio);
if (unlikely(status))
break;
if (iter->iomap.flags & IOMAP_F_STALE)
break;
page = folio_file_page(folio, pos >> PAGE_SHIFT);
if (mapping_writably_mapped(mapping))
@ -832,6 +855,231 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
}
EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
/*
* Scan the data range passed to us for dirty page cache folios. If we find a
* dirty folio, punch out the preceeding range and update the offset from which
* the next punch will start from.
*
* We can punch out storage reservations under clean pages because they either
* contain data that has been written back - in which case the delalloc punch
* over that range is a no-op - or they have been read faults in which case they
* contain zeroes and we can remove the delalloc backing range and any new
* writes to those pages will do the normal hole filling operation...
*
* This makes the logic simple: we only need to keep the delalloc extents only
* over the dirty ranges of the page cache.
*
* This function uses [start_byte, end_byte) intervals (i.e. open ended) to
* simplify range iterations.
*/
static int iomap_write_delalloc_scan(struct inode *inode,
loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
int (*punch)(struct inode *inode, loff_t offset, loff_t length))
{
while (start_byte < end_byte) {
struct folio *folio;
/* grab locked page */
folio = filemap_lock_folio(inode->i_mapping,
start_byte >> PAGE_SHIFT);
if (!folio) {
start_byte = ALIGN_DOWN(start_byte, PAGE_SIZE) +
PAGE_SIZE;
continue;
}
/* if dirty, punch up to offset */
if (folio_test_dirty(folio)) {
if (start_byte > *punch_start_byte) {
int error;
error = punch(inode, *punch_start_byte,
start_byte - *punch_start_byte);
if (error) {
folio_unlock(folio);
folio_put(folio);
return error;
}
}
/*
* Make sure the next punch start is correctly bound to
* the end of this data range, not the end of the folio.
*/
*punch_start_byte = min_t(loff_t, end_byte,
folio_next_index(folio) << PAGE_SHIFT);
}
/* move offset to start of next folio in range */
start_byte = folio_next_index(folio) << PAGE_SHIFT;
folio_unlock(folio);
folio_put(folio);
}
return 0;
}
/*
* Punch out all the delalloc blocks in the range given except for those that
* have dirty data still pending in the page cache - those are going to be
* written and so must still retain the delalloc backing for writeback.
*
* As we are scanning the page cache for data, we don't need to reimplement the
* wheel - mapping_seek_hole_data() does exactly what we need to identify the
* start and end of data ranges correctly even for sub-folio block sizes. This
* byte range based iteration is especially convenient because it means we
* don't have to care about variable size folios, nor where the start or end of
* the data range lies within a folio, if they lie within the same folio or even
* if there are multiple discontiguous data ranges within the folio.
*
* It should be noted that mapping_seek_hole_data() is not aware of EOF, and so
* can return data ranges that exist in the cache beyond EOF. e.g. a page fault
* spanning EOF will initialise the post-EOF data to zeroes and mark it up to
* date. A write page fault can then mark it dirty. If we then fail a write()
* beyond EOF into that up to date cached range, we allocate a delalloc block
* beyond EOF and then have to punch it out. Because the range is up to date,
* mapping_seek_hole_data() will return it, and we will skip the punch because
* the folio is dirty. THis is incorrect - we always need to punch out delalloc
* beyond EOF in this case as writeback will never write back and covert that
* delalloc block beyond EOF. Hence we limit the cached data scan range to EOF,
* resulting in always punching out the range from the EOF to the end of the
* range the iomap spans.
*
* Intervals are of the form [start_byte, end_byte) (i.e. open ended) because it
* matches the intervals returned by mapping_seek_hole_data(). i.e. SEEK_DATA
* returns the start of a data range (start_byte), and SEEK_HOLE(start_byte)
* returns the end of the data range (data_end). Using closed intervals would
* require sprinkling this code with magic "+ 1" and "- 1" arithmetic and expose
* the code to subtle off-by-one bugs....
*/
static int iomap_write_delalloc_release(struct inode *inode,
loff_t start_byte, loff_t end_byte,
int (*punch)(struct inode *inode, loff_t pos, loff_t length))
{
loff_t punch_start_byte = start_byte;
loff_t scan_end_byte = min(i_size_read(inode), end_byte);
int error = 0;
/*
* Lock the mapping to avoid races with page faults re-instantiating
* folios and dirtying them via ->page_mkwrite whilst we walk the
* cache and perform delalloc extent removal. Failing to do this can
* leave dirty pages with no space reservation in the cache.
*/
filemap_invalidate_lock(inode->i_mapping);
while (start_byte < scan_end_byte) {
loff_t data_end;
start_byte = mapping_seek_hole_data(inode->i_mapping,
start_byte, scan_end_byte, SEEK_DATA);
/*
* If there is no more data to scan, all that is left is to
* punch out the remaining range.
*/
if (start_byte == -ENXIO || start_byte == scan_end_byte)
break;
if (start_byte < 0) {
error = start_byte;
goto out_unlock;
}
WARN_ON_ONCE(start_byte < punch_start_byte);
WARN_ON_ONCE(start_byte > scan_end_byte);
/*
* We find the end of this contiguous cached data range by
* seeking from start_byte to the beginning of the next hole.
*/
data_end = mapping_seek_hole_data(inode->i_mapping, start_byte,
scan_end_byte, SEEK_HOLE);
if (data_end < 0) {
error = data_end;
goto out_unlock;
}
WARN_ON_ONCE(data_end <= start_byte);
WARN_ON_ONCE(data_end > scan_end_byte);
error = iomap_write_delalloc_scan(inode, &punch_start_byte,
start_byte, data_end, punch);
if (error)
goto out_unlock;
/* The next data search starts at the end of this one. */
start_byte = data_end;
}
if (punch_start_byte < end_byte)
error = punch(inode, punch_start_byte,
end_byte - punch_start_byte);
out_unlock:
filemap_invalidate_unlock(inode->i_mapping);
return error;
}
/*
* When a short write occurs, the filesystem may need to remove reserved space
* that was allocated in ->iomap_begin from it's ->iomap_end method. For
* filesystems that use delayed allocation, we need to punch out delalloc
* extents from the range that are not dirty in the page cache. As the write can
* race with page faults, there can be dirty pages over the delalloc extent
* outside the range of a short write but still within the delalloc extent
* allocated for this iomap.
*
* This function uses [start_byte, end_byte) intervals (i.e. open ended) to
* simplify range iterations.
*
* The punch() callback *must* only punch delalloc extents in the range passed
* to it. It must skip over all other types of extents in the range and leave
* them completely unchanged. It must do this punch atomically with respect to
* other extent modifications.
*
* The punch() callback may be called with a folio locked to prevent writeback
* extent allocation racing at the edge of the range we are currently punching.
* The locked folio may or may not cover the range being punched, so it is not
* safe for the punch() callback to lock folios itself.
*
* Lock order is:
*
* inode->i_rwsem (shared or exclusive)
* inode->i_mapping->invalidate_lock (exclusive)
* folio_lock()
* ->punch
* internal filesystem allocation lock
*/
int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
struct iomap *iomap, loff_t pos, loff_t length,
ssize_t written,
int (*punch)(struct inode *inode, loff_t pos, loff_t length))
{
loff_t start_byte;
loff_t end_byte;
int blocksize = i_blocksize(inode);
if (iomap->type != IOMAP_DELALLOC)
return 0;
/* If we didn't reserve the blocks, we're not allowed to punch them. */
if (!(iomap->flags & IOMAP_F_NEW))
return 0;
/*
* start_byte refers to the first unused block after a short write. If
* nothing was written, round offset down to point at the first block in
* the range.
*/
if (unlikely(!written))
start_byte = round_down(pos, blocksize);
else
start_byte = round_up(pos + written, blocksize);
end_byte = round_up(pos + length, blocksize);
/* Nothing to do if we've written the entire delalloc extent */
if (start_byte >= end_byte)
return 0;
return iomap_write_delalloc_release(inode, start_byte, end_byte,
punch);
}
EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
static loff_t iomap_unshare_iter(struct iomap_iter *iter)
{
struct iomap *iomap = &iter->iomap;
@ -856,6 +1104,8 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
status = iomap_write_begin(iter, pos, bytes, &folio);
if (unlikely(status))
return status;
if (iter->iomap.flags & IOMAP_F_STALE)
break;
status = iomap_write_end(iter, pos, bytes, bytes, folio);
if (WARN_ON_ONCE(status == 0))
@ -911,6 +1161,8 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
status = iomap_write_begin(iter, pos, bytes, &folio);
if (status)
return status;
if (iter->iomap.flags & IOMAP_F_STALE)
break;
offset = offset_in_folio(folio, pos);
if (bytes > folio_size(folio) - offset)

View File

@ -7,12 +7,28 @@
#include <linux/iomap.h>
#include "trace.h"
/*
* Advance to the next range we need to map.
*
* If the iomap is marked IOMAP_F_STALE, it means the existing map was not fully
* processed - it was aborted because the extent the iomap spanned may have been
* changed during the operation. In this case, the iteration behaviour is to
* remap the unprocessed range of the iter, and that means we may need to remap
* even when we've made no progress (i.e. iter->processed = 0). Hence the
* "finished iterating" case needs to distinguish between
* (processed = 0) meaning we are done and (processed = 0 && stale) meaning we
* need to remap the entire remaining range.
*/
static inline int iomap_iter_advance(struct iomap_iter *iter)
{
bool stale = iter->iomap.flags & IOMAP_F_STALE;
/* handle the previous iteration (if any) */
if (iter->iomap.length) {
if (iter->processed <= 0)
if (iter->processed < 0)
return iter->processed;
if (!iter->processed && !stale)
return 0;
if (WARN_ON_ONCE(iter->processed > iomap_length(iter)))
return -EIO;
iter->pos += iter->processed;
@ -33,6 +49,7 @@ static inline void iomap_iter_done(struct iomap_iter *iter)
WARN_ON_ONCE(iter->iomap.offset > iter->pos);
WARN_ON_ONCE(iter->iomap.length == 0);
WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos);
WARN_ON_ONCE(iter->iomap.flags & IOMAP_F_STALE);
trace_iomap_iter_dstmap(iter->inode, &iter->iomap);
if (iter->srcmap.type != IOMAP_HOLE)

View File

@ -4058,7 +4058,7 @@ xfs_bmap_alloc_userdata(
* the busy list.
*/
bma->datatype = XFS_ALLOC_NOBUSY;
if (whichfork == XFS_DATA_FORK) {
if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK) {
bma->datatype |= XFS_ALLOC_USERDATA;
if (bma->offset == 0)
bma->datatype |= XFS_ALLOC_INITIAL_USER_DATA;
@ -4551,7 +4551,8 @@ xfs_bmapi_convert_delalloc(
* the extent. Just return the real extent at this offset.
*/
if (!isnullstartblock(bma.got.br_startblock)) {
xfs_bmbt_to_iomap(ip, iomap, &bma.got, 0, flags);
xfs_bmbt_to_iomap(ip, iomap, &bma.got, 0, flags,
xfs_iomap_inode_sequence(ip, flags));
*seq = READ_ONCE(ifp->if_seq);
goto out_trans_cancel;
}
@ -4599,7 +4600,8 @@ xfs_bmapi_convert_delalloc(
XFS_STATS_INC(mp, xs_xstrat_quick);
ASSERT(!isnullstartblock(bma.got.br_startblock));
xfs_bmbt_to_iomap(ip, iomap, &bma.got, 0, flags);
xfs_bmbt_to_iomap(ip, iomap, &bma.got, 0, flags,
xfs_iomap_inode_sequence(ip, flags));
*seq = READ_ONCE(ifp->if_seq);
if (whichfork == XFS_COW_FORK)

View File

@ -556,7 +556,6 @@ xfs_btree_islastblock(
struct xfs_buf *bp;
block = xfs_btree_get_block(cur, level, &bp);
ASSERT(block && xfs_btree_check_block(cur, block, level, bp) == 0);
if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
return block->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK);

View File

@ -40,13 +40,12 @@
#define XFS_ERRTAG_REFCOUNT_FINISH_ONE 25
#define XFS_ERRTAG_BMAP_FINISH_ONE 26
#define XFS_ERRTAG_AG_RESV_CRITICAL 27
/*
* DEBUG mode instrumentation to test and/or trigger delayed allocation
* block killing in the event of failed writes. When enabled, all
* buffered writes are silenty dropped and handled as if they failed.
* All delalloc blocks in the range of the write (including pre-existing
* delalloc blocks!) are tossed as part of the write failure error
* handling sequence.
* Drop-writes support removed because write error handling cannot trash
* pre-existing delalloc extents in any useful way anymore. We retain the
* definition so that we can reject it as an invalid value in
* xfs_errortag_valid().
*/
#define XFS_ERRTAG_DROP_WRITES 28
#define XFS_ERRTAG_LOG_BAD_CRC 29
@ -62,7 +61,9 @@
#define XFS_ERRTAG_LARP 39
#define XFS_ERRTAG_DA_LEAF_SPLIT 40
#define XFS_ERRTAG_ATTR_LEAF_TO_NODE 41
#define XFS_ERRTAG_MAX 42
#define XFS_ERRTAG_WB_DELAY_MS 42
#define XFS_ERRTAG_WRITE_DELAY_MS 43
#define XFS_ERRTAG_MAX 44
/*
* Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@ -95,7 +96,6 @@
#define XFS_RANDOM_REFCOUNT_FINISH_ONE 1
#define XFS_RANDOM_BMAP_FINISH_ONE 1
#define XFS_RANDOM_AG_RESV_CRITICAL 4
#define XFS_RANDOM_DROP_WRITES 1
#define XFS_RANDOM_LOG_BAD_CRC 1
#define XFS_RANDOM_LOG_ITEM_PIN 1
#define XFS_RANDOM_BUF_LRU_REF 2
@ -109,5 +109,7 @@
#define XFS_RANDOM_LARP 1
#define XFS_RANDOM_DA_LEAF_SPLIT 1
#define XFS_RANDOM_ATTR_LEAF_TO_NODE 1
#define XFS_RANDOM_WB_DELAY_MS 3000
#define XFS_RANDOM_WRITE_DELAY_MS 3000
#endif /* __XFS_ERRORTAG_H_ */

View File

@ -815,11 +815,136 @@ out_error:
/* Is this extent valid? */
static inline bool
xfs_refc_valid(
struct xfs_refcount_irec *rc)
const struct xfs_refcount_irec *rc)
{
return rc->rc_startblock != NULLAGBLOCK;
}
static inline xfs_nlink_t
xfs_refc_merge_refcount(
const struct xfs_refcount_irec *irec,
enum xfs_refc_adjust_op adjust)
{
/* Once a record hits MAXREFCOUNT, it is pinned there forever */
if (irec->rc_refcount == MAXREFCOUNT)
return MAXREFCOUNT;
return irec->rc_refcount + adjust;
}
static inline bool
xfs_refc_want_merge_center(
const struct xfs_refcount_irec *left,
const struct xfs_refcount_irec *cleft,
const struct xfs_refcount_irec *cright,
const struct xfs_refcount_irec *right,
bool cleft_is_cright,
enum xfs_refc_adjust_op adjust,
unsigned long long *ulenp)
{
unsigned long long ulen = left->rc_blockcount;
xfs_nlink_t new_refcount;
/*
* To merge with a center record, both shoulder records must be
* adjacent to the record we want to adjust. This is only true if
* find_left and find_right made all four records valid.
*/
if (!xfs_refc_valid(left) || !xfs_refc_valid(right) ||
!xfs_refc_valid(cleft) || !xfs_refc_valid(cright))
return false;
/* There must only be one record for the entire range. */
if (!cleft_is_cright)
return false;
/* The shoulder record refcounts must match the new refcount. */
new_refcount = xfs_refc_merge_refcount(cleft, adjust);
if (left->rc_refcount != new_refcount)
return false;
if (right->rc_refcount != new_refcount)
return false;
/*
* The new record cannot exceed the max length. ulen is a ULL as the
* individual record block counts can be up to (u32 - 1) in length
* hence we need to catch u32 addition overflows here.
*/
ulen += cleft->rc_blockcount + right->rc_blockcount;
if (ulen >= MAXREFCEXTLEN)
return false;
*ulenp = ulen;
return true;
}
static inline bool
xfs_refc_want_merge_left(
const struct xfs_refcount_irec *left,
const struct xfs_refcount_irec *cleft,
enum xfs_refc_adjust_op adjust)
{
unsigned long long ulen = left->rc_blockcount;
xfs_nlink_t new_refcount;
/*
* For a left merge, the left shoulder record must be adjacent to the
* start of the range. If this is true, find_left made left and cleft
* contain valid contents.
*/
if (!xfs_refc_valid(left) || !xfs_refc_valid(cleft))
return false;
/* Left shoulder record refcount must match the new refcount. */
new_refcount = xfs_refc_merge_refcount(cleft, adjust);
if (left->rc_refcount != new_refcount)
return false;
/*
* The new record cannot exceed the max length. ulen is a ULL as the
* individual record block counts can be up to (u32 - 1) in length
* hence we need to catch u32 addition overflows here.
*/
ulen += cleft->rc_blockcount;
if (ulen >= MAXREFCEXTLEN)
return false;
return true;
}
static inline bool
xfs_refc_want_merge_right(
const struct xfs_refcount_irec *cright,
const struct xfs_refcount_irec *right,
enum xfs_refc_adjust_op adjust)
{
unsigned long long ulen = right->rc_blockcount;
xfs_nlink_t new_refcount;
/*
* For a right merge, the right shoulder record must be adjacent to the
* end of the range. If this is true, find_right made cright and right
* contain valid contents.
*/
if (!xfs_refc_valid(right) || !xfs_refc_valid(cright))
return false;
/* Right shoulder record refcount must match the new refcount. */
new_refcount = xfs_refc_merge_refcount(cright, adjust);
if (right->rc_refcount != new_refcount)
return false;
/*
* The new record cannot exceed the max length. ulen is a ULL as the
* individual record block counts can be up to (u32 - 1) in length
* hence we need to catch u32 addition overflows here.
*/
ulen += cright->rc_blockcount;
if (ulen >= MAXREFCEXTLEN)
return false;
return true;
}
/*
* Try to merge with any extents on the boundaries of the adjustment range.
*/
@ -861,23 +986,15 @@ xfs_refcount_merge_extents(
(cleft.rc_blockcount == cright.rc_blockcount);
/* Try to merge left, cleft, and right. cleft must == cright. */
ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount +
right.rc_blockcount;
if (xfs_refc_valid(&left) && xfs_refc_valid(&right) &&
xfs_refc_valid(&cleft) && xfs_refc_valid(&cright) && cequal &&
left.rc_refcount == cleft.rc_refcount + adjust &&
right.rc_refcount == cleft.rc_refcount + adjust &&
ulen < MAXREFCEXTLEN) {
if (xfs_refc_want_merge_center(&left, &cleft, &cright, &right, cequal,
adjust, &ulen)) {
*shape_changed = true;
return xfs_refcount_merge_center_extents(cur, &left, &cleft,
&right, ulen, aglen);
}
/* Try to merge left and cleft. */
ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount;
if (xfs_refc_valid(&left) && xfs_refc_valid(&cleft) &&
left.rc_refcount == cleft.rc_refcount + adjust &&
ulen < MAXREFCEXTLEN) {
if (xfs_refc_want_merge_left(&left, &cleft, adjust)) {
*shape_changed = true;
error = xfs_refcount_merge_left_extent(cur, &left, &cleft,
agbno, aglen);
@ -893,10 +1010,7 @@ xfs_refcount_merge_extents(
}
/* Try to merge cright and right. */
ulen = (unsigned long long)right.rc_blockcount + cright.rc_blockcount;
if (xfs_refc_valid(&right) && xfs_refc_valid(&cright) &&
right.rc_refcount == cright.rc_refcount + adjust &&
ulen < MAXREFCEXTLEN) {
if (xfs_refc_want_merge_right(&cright, &right, adjust)) {
*shape_changed = true;
return xfs_refcount_merge_right_extent(cur, &right, &cright,
aglen);

View File

@ -972,7 +972,9 @@ xfs_log_sb(
*/
if (xfs_has_lazysbcount(mp)) {
mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
mp->m_sb.sb_ifree = min_t(uint64_t,
percpu_counter_sum(&mp->m_ifree),
mp->m_sb.sb_icount);
mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
}

View File

@ -609,9 +609,16 @@ out:
/* AGFL */
struct xchk_agfl_info {
unsigned int sz_entries;
/* Number of AGFL entries that the AGF claims are in use. */
unsigned int agflcount;
/* Number of AGFL entries that we found. */
unsigned int nr_entries;
/* Buffer to hold AGFL entries for extent checking. */
xfs_agblock_t *entries;
struct xfs_buf *agfl_bp;
struct xfs_scrub *sc;
};
@ -641,10 +648,10 @@ xchk_agfl_block(
struct xfs_scrub *sc = sai->sc;
if (xfs_verify_agbno(sc->sa.pag, agbno) &&
sai->nr_entries < sai->sz_entries)
sai->nr_entries < sai->agflcount)
sai->entries[sai->nr_entries++] = agbno;
else
xchk_block_set_corrupt(sc, sc->sa.agfl_bp);
xchk_block_set_corrupt(sc, sai->agfl_bp);
xchk_agfl_block_xref(sc, agbno);
@ -696,19 +703,26 @@ int
xchk_agfl(
struct xfs_scrub *sc)
{
struct xchk_agfl_info sai;
struct xchk_agfl_info sai = {
.sc = sc,
};
struct xfs_agf *agf;
xfs_agnumber_t agno = sc->sm->sm_agno;
unsigned int agflcount;
unsigned int i;
int error;
/* Lock the AGF and AGI so that nobody can touch this AG. */
error = xchk_ag_read_headers(sc, agno, &sc->sa);
if (!xchk_process_error(sc, agno, XFS_AGFL_BLOCK(sc->mp), &error))
goto out;
return error;
if (!sc->sa.agf_bp)
return -EFSCORRUPTED;
xchk_buffer_recheck(sc, sc->sa.agfl_bp);
/* Try to read the AGFL, and verify its structure if we get it. */
error = xfs_alloc_read_agfl(sc->sa.pag, sc->tp, &sai.agfl_bp);
if (!xchk_process_error(sc, agno, XFS_AGFL_BLOCK(sc->mp), &error))
return error;
xchk_buffer_recheck(sc, sai.agfl_bp);
xchk_agfl_xref(sc);
@ -717,24 +731,21 @@ xchk_agfl(
/* Allocate buffer to ensure uniqueness of AGFL entries. */
agf = sc->sa.agf_bp->b_addr;
agflcount = be32_to_cpu(agf->agf_flcount);
if (agflcount > xfs_agfl_size(sc->mp)) {
sai.agflcount = be32_to_cpu(agf->agf_flcount);
if (sai.agflcount > xfs_agfl_size(sc->mp)) {
xchk_block_set_corrupt(sc, sc->sa.agf_bp);
goto out;
}
memset(&sai, 0, sizeof(sai));
sai.sc = sc;
sai.sz_entries = agflcount;
sai.entries = kmem_zalloc(sizeof(xfs_agblock_t) * agflcount,
KM_MAYFAIL);
sai.entries = kvcalloc(sai.agflcount, sizeof(xfs_agblock_t),
XCHK_GFP_FLAGS);
if (!sai.entries) {
error = -ENOMEM;
goto out;
}
/* Check the blocks in the AGFL. */
error = xfs_agfl_walk(sc->mp, sc->sa.agf_bp->b_addr,
sc->sa.agfl_bp, xchk_agfl_block, &sai);
error = xfs_agfl_walk(sc->mp, sc->sa.agf_bp->b_addr, sai.agfl_bp,
xchk_agfl_block, &sai);
if (error == -ECANCELED) {
error = 0;
goto out_free;
@ -742,7 +753,7 @@ xchk_agfl(
if (error)
goto out_free;
if (agflcount != sai.nr_entries) {
if (sai.agflcount != sai.nr_entries) {
xchk_block_set_corrupt(sc, sc->sa.agf_bp);
goto out_free;
}
@ -758,7 +769,7 @@ xchk_agfl(
}
out_free:
kmem_free(sai.entries);
kvfree(sai.entries);
out:
return error;
}

View File

@ -442,12 +442,18 @@ out_revert:
/* AGFL */
struct xrep_agfl {
/* Bitmap of alleged AGFL blocks that we're not going to add. */
struct xbitmap crossed;
/* Bitmap of other OWN_AG metadata blocks. */
struct xbitmap agmetablocks;
/* Bitmap of free space. */
struct xbitmap *freesp;
/* rmapbt cursor for finding crosslinked blocks */
struct xfs_btree_cur *rmap_cur;
struct xfs_scrub *sc;
};
@ -477,6 +483,41 @@ xrep_agfl_walk_rmap(
return xbitmap_set_btcur_path(&ra->agmetablocks, cur);
}
/* Strike out the blocks that are cross-linked according to the rmapbt. */
STATIC int
xrep_agfl_check_extent(
struct xrep_agfl *ra,
uint64_t start,
uint64_t len)
{
xfs_agblock_t agbno = XFS_FSB_TO_AGBNO(ra->sc->mp, start);
xfs_agblock_t last_agbno = agbno + len - 1;
int error;
ASSERT(XFS_FSB_TO_AGNO(ra->sc->mp, start) == ra->sc->sa.pag->pag_agno);
while (agbno <= last_agbno) {
bool other_owners;
error = xfs_rmap_has_other_keys(ra->rmap_cur, agbno, 1,
&XFS_RMAP_OINFO_AG, &other_owners);
if (error)
return error;
if (other_owners) {
error = xbitmap_set(&ra->crossed, agbno, 1);
if (error)
return error;
}
if (xchk_should_terminate(ra->sc, &error))
return error;
agbno++;
}
return 0;
}
/*
* Map out all the non-AGFL OWN_AG space in this AG so that we can deduce
* which blocks belong to the AGFL.
@ -496,44 +537,58 @@ xrep_agfl_collect_blocks(
struct xrep_agfl ra;
struct xfs_mount *mp = sc->mp;
struct xfs_btree_cur *cur;
struct xbitmap_range *br, *n;
int error;
ra.sc = sc;
ra.freesp = agfl_extents;
xbitmap_init(&ra.agmetablocks);
xbitmap_init(&ra.crossed);
/* Find all space used by the free space btrees & rmapbt. */
cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.pag);
error = xfs_rmap_query_all(cur, xrep_agfl_walk_rmap, &ra);
if (error)
goto err;
xfs_btree_del_cursor(cur, error);
if (error)
goto out_bmp;
/* Find all blocks currently being used by the bnobt. */
cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp,
sc->sa.pag, XFS_BTNUM_BNO);
error = xbitmap_set_btblocks(&ra.agmetablocks, cur);
if (error)
goto err;
xfs_btree_del_cursor(cur, error);
if (error)
goto out_bmp;
/* Find all blocks currently being used by the cntbt. */
cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp,
sc->sa.pag, XFS_BTNUM_CNT);
error = xbitmap_set_btblocks(&ra.agmetablocks, cur);
if (error)
goto err;
xfs_btree_del_cursor(cur, error);
if (error)
goto out_bmp;
/*
* Drop the freesp meta blocks that are in use by btrees.
* The remaining blocks /should/ be AGFL blocks.
*/
error = xbitmap_disunion(agfl_extents, &ra.agmetablocks);
xbitmap_destroy(&ra.agmetablocks);
if (error)
return error;
goto out_bmp;
/* Strike out the blocks that are cross-linked. */
ra.rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.pag);
for_each_xbitmap_extent(br, n, agfl_extents) {
error = xrep_agfl_check_extent(&ra, br->start, br->len);
if (error)
break;
}
xfs_btree_del_cursor(ra.rmap_cur, error);
if (error)
goto out_bmp;
error = xbitmap_disunion(agfl_extents, &ra.crossed);
if (error)
goto out_bmp;
/*
* Calculate the new AGFL size. If we found more blocks than fit in
@ -541,11 +596,10 @@ xrep_agfl_collect_blocks(
*/
*flcount = min_t(uint64_t, xbitmap_hweight(agfl_extents),
xfs_agfl_size(mp));
return 0;
err:
out_bmp:
xbitmap_destroy(&ra.crossed);
xbitmap_destroy(&ra.agmetablocks);
xfs_btree_del_cursor(cur, error);
return error;
}
@ -631,7 +685,7 @@ xrep_agfl_init_header(
if (br->len)
break;
list_del(&br->list);
kmem_free(br);
kfree(br);
}
/* Write new AGFL to disk. */
@ -697,7 +751,6 @@ xrep_agfl(
* freespace overflow to the freespace btrees.
*/
sc->sa.agf_bp = agf_bp;
sc->sa.agfl_bp = agfl_bp;
error = xrep_roll_ag_trans(sc);
if (error)
goto err;

View File

@ -49,7 +49,7 @@ xchk_setup_xattr_buf(
if (ab) {
if (sz <= ab->sz)
return 0;
kmem_free(ab);
kvfree(ab);
sc->buf = NULL;
}
@ -79,7 +79,8 @@ xchk_setup_xattr(
* without the inode lock held, which means we can sleep.
*/
if (sc->flags & XCHK_TRY_HARDER) {
error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX, GFP_KERNEL);
error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX,
XCHK_GFP_FLAGS);
if (error)
return error;
}
@ -138,8 +139,7 @@ xchk_xattr_listent(
* doesn't work, we overload the seen_enough variable to convey
* the error message back to the main scrub function.
*/
error = xchk_setup_xattr_buf(sx->sc, valuelen,
GFP_KERNEL | __GFP_RETRY_MAYFAIL);
error = xchk_setup_xattr_buf(sx->sc, valuelen, XCHK_GFP_FLAGS);
if (error == -ENOMEM)
error = -EDEADLOCK;
if (error) {
@ -324,8 +324,7 @@ xchk_xattr_block(
return 0;
/* Allocate memory for block usage checking. */
error = xchk_setup_xattr_buf(ds->sc, 0,
GFP_KERNEL | __GFP_RETRY_MAYFAIL);
error = xchk_setup_xattr_buf(ds->sc, 0, XCHK_GFP_FLAGS);
if (error == -ENOMEM)
return -EDEADLOCK;
if (error)

View File

@ -10,6 +10,7 @@
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_btree.h"
#include "scrub/scrub.h"
#include "scrub/bitmap.h"
/*
@ -25,7 +26,7 @@ xbitmap_set(
{
struct xbitmap_range *bmr;
bmr = kmem_alloc(sizeof(struct xbitmap_range), KM_MAYFAIL);
bmr = kmalloc(sizeof(struct xbitmap_range), XCHK_GFP_FLAGS);
if (!bmr)
return -ENOMEM;
@ -47,7 +48,7 @@ xbitmap_destroy(
for_each_xbitmap_extent(bmr, n, bitmap) {
list_del(&bmr->list);
kmem_free(bmr);
kfree(bmr);
}
}
@ -174,15 +175,15 @@ xbitmap_disunion(
/* Total overlap, just delete ex. */
lp = lp->next;
list_del(&br->list);
kmem_free(br);
kfree(br);
break;
case 0:
/*
* Deleting from the middle: add the new right extent
* and then shrink the left extent.
*/
new_br = kmem_alloc(sizeof(struct xbitmap_range),
KM_MAYFAIL);
new_br = kmalloc(sizeof(struct xbitmap_range),
XCHK_GFP_FLAGS);
if (!new_br) {
error = -ENOMEM;
goto out;

View File

@ -90,6 +90,7 @@ out:
struct xchk_bmap_info {
struct xfs_scrub *sc;
struct xfs_iext_cursor icur;
xfs_fileoff_t lastoff;
bool is_rt;
bool is_shared;
@ -146,6 +147,48 @@ xchk_bmap_get_rmap(
return has_rmap;
}
static inline bool
xchk_bmap_has_prev(
struct xchk_bmap_info *info,
struct xfs_bmbt_irec *irec)
{
struct xfs_bmbt_irec got;
struct xfs_ifork *ifp;
ifp = xfs_ifork_ptr(info->sc->ip, info->whichfork);
if (!xfs_iext_peek_prev_extent(ifp, &info->icur, &got))
return false;
if (got.br_startoff + got.br_blockcount != irec->br_startoff)
return false;
if (got.br_startblock + got.br_blockcount != irec->br_startblock)
return false;
if (got.br_state != irec->br_state)
return false;
return true;
}
static inline bool
xchk_bmap_has_next(
struct xchk_bmap_info *info,
struct xfs_bmbt_irec *irec)
{
struct xfs_bmbt_irec got;
struct xfs_ifork *ifp;
ifp = xfs_ifork_ptr(info->sc->ip, info->whichfork);
if (!xfs_iext_peek_next_extent(ifp, &info->icur, &got))
return false;
if (irec->br_startoff + irec->br_blockcount != got.br_startoff)
return false;
if (irec->br_startblock + irec->br_blockcount != got.br_startblock)
return false;
if (got.br_state != irec->br_state)
return false;
return true;
}
/* Make sure that we have rmapbt records for this extent. */
STATIC void
xchk_bmap_xref_rmap(
@ -214,6 +257,34 @@ xchk_bmap_xref_rmap(
if (rmap.rm_flags & XFS_RMAP_BMBT_BLOCK)
xchk_fblock_xref_set_corrupt(info->sc, info->whichfork,
irec->br_startoff);
/*
* If the rmap starts before this bmbt record, make sure there's a bmbt
* record for the previous offset that is contiguous with this mapping.
* Skip this for CoW fork extents because the refcount btree (and not
* the inode) is the ondisk owner for those extents.
*/
if (info->whichfork != XFS_COW_FORK && rmap.rm_startblock < agbno &&
!xchk_bmap_has_prev(info, irec)) {
xchk_fblock_xref_set_corrupt(info->sc, info->whichfork,
irec->br_startoff);
return;
}
/*
* If the rmap ends after this bmbt record, make sure there's a bmbt
* record for the next offset that is contiguous with this mapping.
* Skip this for CoW fork extents because the refcount btree (and not
* the inode) is the ondisk owner for those extents.
*/
rmap_end = (unsigned long long)rmap.rm_startblock + rmap.rm_blockcount;
if (info->whichfork != XFS_COW_FORK &&
rmap_end > agbno + irec->br_blockcount &&
!xchk_bmap_has_next(info, irec)) {
xchk_fblock_xref_set_corrupt(info->sc, info->whichfork,
irec->br_startoff);
return;
}
}
/* Cross-reference a single rtdev extent record. */
@ -264,6 +335,8 @@ xchk_bmap_iextent_xref(
case XFS_COW_FORK:
xchk_xref_is_cow_staging(info->sc, agbno,
irec->br_blockcount);
xchk_xref_is_not_shared(info->sc, agbno,
irec->br_blockcount);
break;
}
@ -297,14 +370,13 @@ xchk_bmap_dirattr_extent(
}
/* Scrub a single extent record. */
STATIC int
STATIC void
xchk_bmap_iextent(
struct xfs_inode *ip,
struct xchk_bmap_info *info,
struct xfs_bmbt_irec *irec)
{
struct xfs_mount *mp = info->sc->mp;
int error = 0;
/*
* Check for out-of-order extents. This record could have come
@ -325,14 +397,6 @@ xchk_bmap_iextent(
xchk_fblock_set_corrupt(info->sc, info->whichfork,
irec->br_startoff);
/*
* Check for delalloc extents. We never iterate the ones in the
* in-core extent scan, and we should never see these in the bmbt.
*/
if (isnullstartblock(irec->br_startblock))
xchk_fblock_set_corrupt(info->sc, info->whichfork,
irec->br_startoff);
/* Make sure the extent points to a valid place. */
if (irec->br_blockcount > XFS_MAX_BMBT_EXTLEN)
xchk_fblock_set_corrupt(info->sc, info->whichfork,
@ -353,15 +417,12 @@ xchk_bmap_iextent(
irec->br_startoff);
if (info->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
return 0;
return;
if (info->is_rt)
xchk_bmap_rt_iextent_xref(ip, info, irec);
else
xchk_bmap_iextent_xref(ip, info, irec);
info->lastoff = irec->br_startoff + irec->br_blockcount;
return error;
}
/* Scrub a bmbt record. */
@ -599,14 +660,41 @@ xchk_bmap_check_rmaps(
for_each_perag(sc->mp, agno, pag) {
error = xchk_bmap_check_ag_rmaps(sc, whichfork, pag);
if (error)
break;
if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
break;
if (error ||
(sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) {
xfs_perag_put(pag);
return error;
}
}
if (pag)
xfs_perag_put(pag);
return error;
return 0;
}
/* Scrub a delalloc reservation from the incore extent map tree. */
STATIC void
xchk_bmap_iextent_delalloc(
struct xfs_inode *ip,
struct xchk_bmap_info *info,
struct xfs_bmbt_irec *irec)
{
struct xfs_mount *mp = info->sc->mp;
/*
* Check for out-of-order extents. This record could have come
* from the incore list, for which there is no ordering check.
*/
if (irec->br_startoff < info->lastoff)
xchk_fblock_set_corrupt(info->sc, info->whichfork,
irec->br_startoff);
if (!xfs_verify_fileext(mp, irec->br_startoff, irec->br_blockcount))
xchk_fblock_set_corrupt(info->sc, info->whichfork,
irec->br_startoff);
/* Make sure the extent points to a valid place. */
if (irec->br_blockcount > XFS_MAX_BMBT_EXTLEN)
xchk_fblock_set_corrupt(info->sc, info->whichfork,
irec->br_startoff);
}
/*
@ -626,7 +714,6 @@ xchk_bmap(
struct xfs_inode *ip = sc->ip;
struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork);
xfs_fileoff_t endoff;
struct xfs_iext_cursor icur;
int error = 0;
/* Non-existent forks can be ignored. */
@ -661,6 +748,8 @@ xchk_bmap(
case XFS_DINODE_FMT_DEV:
case XFS_DINODE_FMT_LOCAL:
/* No mappings to check. */
if (whichfork == XFS_COW_FORK)
xchk_fblock_set_corrupt(sc, whichfork, 0);
goto out;
case XFS_DINODE_FMT_EXTENTS:
break;
@ -690,20 +779,22 @@ xchk_bmap(
/* Scrub extent records. */
info.lastoff = 0;
ifp = xfs_ifork_ptr(ip, whichfork);
for_each_xfs_iext(ifp, &icur, &irec) {
for_each_xfs_iext(ifp, &info.icur, &irec) {
if (xchk_should_terminate(sc, &error) ||
(sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))
goto out;
if (isnullstartblock(irec.br_startblock))
continue;
if (irec.br_startoff >= endoff) {
xchk_fblock_set_corrupt(sc, whichfork,
irec.br_startoff);
goto out;
}
error = xchk_bmap_iextent(ip, &info, &irec);
if (error)
goto out;
if (isnullstartblock(irec.br_startblock))
xchk_bmap_iextent_delalloc(ip, &info, &irec);
else
xchk_bmap_iextent(ip, &info, &irec);
info.lastoff = irec.br_startoff + irec.br_blockcount;
}
error = xchk_bmap_check_rmaps(sc, whichfork);

View File

@ -408,7 +408,6 @@ xchk_btree_check_owner(
struct xfs_buf *bp)
{
struct xfs_btree_cur *cur = bs->cur;
struct check_owner *co;
/*
* In theory, xfs_btree_get_block should only give us a null buffer
@ -431,10 +430,13 @@ xchk_btree_check_owner(
* later scanning.
*/
if (cur->bc_btnum == XFS_BTNUM_BNO || cur->bc_btnum == XFS_BTNUM_RMAP) {
co = kmem_alloc(sizeof(struct check_owner),
KM_MAYFAIL);
struct check_owner *co;
co = kmalloc(sizeof(struct check_owner), XCHK_GFP_FLAGS);
if (!co)
return -ENOMEM;
INIT_LIST_HEAD(&co->list);
co->level = level;
co->daddr = xfs_buf_daddr(bp);
list_add_tail(&co->list, &bs->to_check);
@ -649,7 +651,7 @@ xchk_btree(
xchk_btree_set_corrupt(sc, cur, 0);
return 0;
}
bs = kmem_zalloc(cur_sz, KM_NOFS | KM_MAYFAIL);
bs = kzalloc(cur_sz, XCHK_GFP_FLAGS);
if (!bs)
return -ENOMEM;
bs->cur = cur;
@ -740,9 +742,9 @@ out:
error = xchk_btree_check_block_owner(bs, co->level,
co->daddr);
list_del(&co->list);
kmem_free(co);
kfree(co);
}
kmem_free(bs);
kfree(bs);
return error;
}

View File

@ -424,10 +424,6 @@ xchk_ag_read_headers(
if (error && want_ag_read_header_failure(sc, XFS_SCRUB_TYPE_AGF))
return error;
error = xfs_alloc_read_agfl(sa->pag, sc->tp, &sa->agfl_bp);
if (error && want_ag_read_header_failure(sc, XFS_SCRUB_TYPE_AGFL))
return error;
return 0;
}
@ -515,10 +511,6 @@ xchk_ag_free(
struct xchk_ag *sa)
{
xchk_ag_btcur_free(sa);
if (sa->agfl_bp) {
xfs_trans_brelse(sc->tp, sa->agfl_bp);
sa->agfl_bp = NULL;
}
if (sa->agf_bp) {
xfs_trans_brelse(sc->tp, sa->agf_bp);
sa->agf_bp = NULL;
@ -789,6 +781,33 @@ xchk_buffer_recheck(
trace_xchk_block_error(sc, xfs_buf_daddr(bp), fa);
}
static inline int
xchk_metadata_inode_subtype(
struct xfs_scrub *sc,
unsigned int scrub_type)
{
__u32 smtype = sc->sm->sm_type;
int error;
sc->sm->sm_type = scrub_type;
switch (scrub_type) {
case XFS_SCRUB_TYPE_INODE:
error = xchk_inode(sc);
break;
case XFS_SCRUB_TYPE_BMBTD:
error = xchk_bmap_data(sc);
break;
default:
ASSERT(0);
error = -EFSCORRUPTED;
break;
}
sc->sm->sm_type = smtype;
return error;
}
/*
* Scrub the attr/data forks of a metadata inode. The metadata inode must be
* pointed to by sc->ip and the ILOCK must be held.
@ -797,13 +816,17 @@ int
xchk_metadata_inode_forks(
struct xfs_scrub *sc)
{
__u32 smtype;
bool shared;
int error;
if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
return 0;
/* Check the inode record. */
error = xchk_metadata_inode_subtype(sc, XFS_SCRUB_TYPE_INODE);
if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))
return error;
/* Metadata inodes don't live on the rt device. */
if (sc->ip->i_diflags & XFS_DIFLAG_REALTIME) {
xchk_ino_set_corrupt(sc, sc->ip->i_ino);
@ -823,10 +846,7 @@ xchk_metadata_inode_forks(
}
/* Invoke the data fork scrubber. */
smtype = sc->sm->sm_type;
sc->sm->sm_type = XFS_SCRUB_TYPE_BMBTD;
error = xchk_bmap_data(sc);
sc->sm->sm_type = smtype;
error = xchk_metadata_inode_subtype(sc, XFS_SCRUB_TYPE_BMBTD);
if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))
return error;
@ -841,7 +861,7 @@ xchk_metadata_inode_forks(
xchk_ino_set_corrupt(sc, sc->ip->i_ino);
}
return error;
return 0;
}
/*

View File

@ -25,7 +25,7 @@ xchk_should_terminate(
if (fatal_signal_pending(current)) {
if (*error == 0)
*error = -EAGAIN;
*error = -EINTR;
return true;
}
return false;

View File

@ -486,7 +486,7 @@ xchk_da_btree(
return 0;
/* Set up initial da state. */
ds = kmem_zalloc(sizeof(struct xchk_da_btree), KM_NOFS | KM_MAYFAIL);
ds = kzalloc(sizeof(struct xchk_da_btree), XCHK_GFP_FLAGS);
if (!ds)
return -ENOMEM;
ds->dargs.dp = sc->ip;
@ -591,6 +591,6 @@ out:
out_state:
xfs_da_state_free(ds->state);
kmem_free(ds);
kfree(ds);
return error;
}

View File

@ -666,7 +666,12 @@ xchk_directory_blocks(
struct xfs_scrub *sc)
{
struct xfs_bmbt_irec got;
struct xfs_da_args args;
struct xfs_da_args args = {
.dp = sc ->ip,
.whichfork = XFS_DATA_FORK,
.geo = sc->mp->m_dir_geo,
.trans = sc->tp,
};
struct xfs_ifork *ifp = xfs_ifork_ptr(sc->ip, XFS_DATA_FORK);
struct xfs_mount *mp = sc->mp;
xfs_fileoff_t leaf_lblk;
@ -689,9 +694,6 @@ xchk_directory_blocks(
free_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_FREE_OFFSET);
/* Is this a block dir? */
args.dp = sc->ip;
args.geo = mp->m_dir_geo;
args.trans = sc->tp;
error = xfs_dir2_isblock(&args, &is_block);
if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
goto out;

View File

@ -14,6 +14,8 @@
#include "xfs_health.h"
#include "xfs_btree.h"
#include "xfs_ag.h"
#include "xfs_rtalloc.h"
#include "xfs_inode.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/trace.h"
@ -43,6 +45,16 @@
* our tolerance for mismatch between expected and actual counter values.
*/
struct xchk_fscounters {
struct xfs_scrub *sc;
uint64_t icount;
uint64_t ifree;
uint64_t fdblocks;
uint64_t frextents;
unsigned long long icount_min;
unsigned long long icount_max;
};
/*
* Since the expected value computation is lockless but only browses incore
* values, the percpu counters should be fairly close to each other. However,
@ -116,10 +128,11 @@ xchk_setup_fscounters(
struct xchk_fscounters *fsc;
int error;
sc->buf = kmem_zalloc(sizeof(struct xchk_fscounters), 0);
sc->buf = kzalloc(sizeof(struct xchk_fscounters), XCHK_GFP_FLAGS);
if (!sc->buf)
return -ENOMEM;
fsc = sc->buf;
fsc->sc = sc;
xfs_icount_range(sc->mp, &fsc->icount_min, &fsc->icount_max);
@ -138,6 +151,18 @@ xchk_setup_fscounters(
return xchk_trans_alloc(sc, 0);
}
/*
* Part 1: Collecting filesystem summary counts. For each AG, we add its
* summary counts (total inodes, free inodes, free data blocks) to an incore
* copy of the overall filesystem summary counts.
*
* To avoid false corruption reports in part 2, any failure in this part must
* set the INCOMPLETE flag even when a negative errno is returned. This care
* must be taken with certain errno values (i.e. EFSBADCRC, EFSCORRUPTED,
* ECANCELED) that are absorbed into a scrub state flag update by
* xchk_*_process_error.
*/
/* Count free space btree blocks manually for pre-lazysbcount filesystems. */
static int
xchk_fscount_btreeblks(
@ -225,8 +250,10 @@ retry:
}
if (pag)
xfs_perag_put(pag);
if (error)
if (error) {
xchk_set_incomplete(sc);
return error;
}
/*
* The global incore space reservation is taken from the incore
@ -267,6 +294,64 @@ retry:
return 0;
}
#ifdef CONFIG_XFS_RT
STATIC int
xchk_fscount_add_frextent(
struct xfs_mount *mp,
struct xfs_trans *tp,
const struct xfs_rtalloc_rec *rec,
void *priv)
{
struct xchk_fscounters *fsc = priv;
int error = 0;
fsc->frextents += rec->ar_extcount;
xchk_should_terminate(fsc->sc, &error);
return error;
}
/* Calculate the number of free realtime extents from the realtime bitmap. */
STATIC int
xchk_fscount_count_frextents(
struct xfs_scrub *sc,
struct xchk_fscounters *fsc)
{
struct xfs_mount *mp = sc->mp;
int error;
fsc->frextents = 0;
if (!xfs_has_realtime(mp))
return 0;
xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
error = xfs_rtalloc_query_all(sc->mp, sc->tp,
xchk_fscount_add_frextent, fsc);
if (error) {
xchk_set_incomplete(sc);
goto out_unlock;
}
out_unlock:
xfs_iunlock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
return error;
}
#else
STATIC int
xchk_fscount_count_frextents(
struct xfs_scrub *sc,
struct xchk_fscounters *fsc)
{
fsc->frextents = 0;
return 0;
}
#endif /* CONFIG_XFS_RT */
/*
* Part 2: Comparing filesystem summary counters. All we have to do here is
* sum the percpu counters and compare them to what we've observed.
*/
/*
* Is the @counter reasonably close to the @expected value?
*
@ -333,16 +418,17 @@ xchk_fscounters(
{
struct xfs_mount *mp = sc->mp;
struct xchk_fscounters *fsc = sc->buf;
int64_t icount, ifree, fdblocks;
int64_t icount, ifree, fdblocks, frextents;
int error;
/* Snapshot the percpu counters. */
icount = percpu_counter_sum(&mp->m_icount);
ifree = percpu_counter_sum(&mp->m_ifree);
fdblocks = percpu_counter_sum(&mp->m_fdblocks);
frextents = percpu_counter_sum(&mp->m_frextents);
/* No negative values, please! */
if (icount < 0 || ifree < 0 || fdblocks < 0)
if (icount < 0 || ifree < 0 || fdblocks < 0 || frextents < 0)
xchk_set_corrupt(sc);
/* See if icount is obviously wrong. */
@ -353,6 +439,10 @@ xchk_fscounters(
if (fdblocks > mp->m_sb.sb_dblocks)
xchk_set_corrupt(sc);
/* See if frextents is obviously wrong. */
if (frextents > mp->m_sb.sb_rextents)
xchk_set_corrupt(sc);
/*
* If ifree exceeds icount by more than the minimum variance then
* something's probably wrong with the counters.
@ -367,6 +457,13 @@ xchk_fscounters(
if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_INCOMPLETE)
return 0;
/* Count the free extents counter for rt volumes. */
error = xchk_fscount_count_frextents(sc, fsc);
if (!xchk_process_error(sc, 0, XFS_SB_BLOCK(mp), &error))
return error;
if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_INCOMPLETE)
return 0;
/* Compare the in-core counters with whatever we counted. */
if (!xchk_fscount_within_range(sc, icount, &mp->m_icount, fsc->icount))
xchk_set_corrupt(sc);
@ -378,5 +475,9 @@ xchk_fscounters(
fsc->fdblocks))
xchk_set_corrupt(sc);
if (!xchk_fscount_within_range(sc, frextents, &mp->m_frextents,
fsc->frextents))
xchk_set_corrupt(sc);
return 0;
}

View File

@ -365,7 +365,7 @@ xchk_dinode(
* pagecache can't cache all the blocks in this file due to
* overly large offsets, flag the inode for admin review.
*/
if (isize >= mp->m_super->s_maxbytes)
if (isize > mp->m_super->s_maxbytes)
xchk_ino_set_warning(sc, ino);
/* di_nblocks */

View File

@ -14,6 +14,7 @@
#include "xfs_inode.h"
#include "xfs_quota.h"
#include "xfs_qm.h"
#include "xfs_bmap.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
@ -84,7 +85,7 @@ xchk_quota_item(
int error = 0;
if (xchk_should_terminate(sc, &error))
return -ECANCELED;
return error;
/*
* Except for the root dquot, the actual dquot we got must either have
@ -189,11 +190,12 @@ xchk_quota_data_fork(
for_each_xfs_iext(ifp, &icur, &irec) {
if (xchk_should_terminate(sc, &error))
break;
/*
* delalloc extents or blocks mapped above the highest
* delalloc/unwritten extents or blocks mapped above the highest
* quota id shouldn't happen.
*/
if (isnullstartblock(irec.br_startblock) ||
if (!xfs_bmap_is_written_extent(&irec) ||
irec.br_startoff > max_dqid_off ||
irec.br_startoff + irec.br_blockcount - 1 > max_dqid_off) {
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK,

View File

@ -127,8 +127,8 @@ xchk_refcountbt_rmap_check(
* is healthy each rmap_irec we see will be in agbno order
* so we don't need insertion sort here.
*/
frag = kmem_alloc(sizeof(struct xchk_refcnt_frag),
KM_MAYFAIL);
frag = kmalloc(sizeof(struct xchk_refcnt_frag),
XCHK_GFP_FLAGS);
if (!frag)
return -ENOMEM;
memcpy(&frag->rm, rec, sizeof(frag->rm));
@ -215,7 +215,7 @@ xchk_refcountbt_process_rmap_fragments(
continue;
}
list_del(&frag->list);
kmem_free(frag);
kfree(frag);
nr++;
}
@ -257,11 +257,11 @@ done:
/* Delete fragments and work list. */
list_for_each_entry_safe(frag, n, &worklist, list) {
list_del(&frag->list);
kmem_free(frag);
kfree(frag);
}
list_for_each_entry_safe(frag, n, &refchk->fragments, list) {
list_del(&frag->list);
kmem_free(frag);
kfree(frag);
}
}
@ -306,7 +306,7 @@ xchk_refcountbt_xref_rmap(
out_free:
list_for_each_entry_safe(frag, n, &refchk.fragments, list) {
list_del(&frag->list);
kmem_free(frag);
kfree(frag);
}
}

View File

@ -61,7 +61,6 @@ xrep_attempt(
sc->flags |= XREP_ALREADY_FIXED;
return -EAGAIN;
case -EDEADLOCK:
case -EAGAIN:
/* Tell the caller to try again having grabbed all the locks. */
if (!(sc->flags & XCHK_TRY_HARDER)) {
sc->flags |= XCHK_TRY_HARDER;
@ -70,10 +69,15 @@ xrep_attempt(
/*
* We tried harder but still couldn't grab all the resources
* we needed to fix it. The corruption has not been fixed,
* so report back to userspace.
* so exit to userspace with the scan's output flags unchanged.
*/
return -EFSCORRUPTED;
return 0;
default:
/*
* EAGAIN tells the caller to re-scrub, so we cannot return
* that here.
*/
ASSERT(error != -EAGAIN);
return error;
}
}
@ -121,32 +125,40 @@ xrep_roll_ag_trans(
{
int error;
/* Keep the AG header buffers locked so we can keep going. */
if (sc->sa.agi_bp)
/*
* Keep the AG header buffers locked while we roll the transaction.
* Ensure that both AG buffers are dirty and held when we roll the
* transaction so that they move forward in the log without losing the
* bli (and hence the bli type) when the transaction commits.
*
* Normal code would never hold clean buffers across a roll, but repair
* needs both buffers to maintain a total lock on the AG.
*/
if (sc->sa.agi_bp) {
xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, XFS_AGI_MAGICNUM);
xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
if (sc->sa.agf_bp)
}
if (sc->sa.agf_bp) {
xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, XFS_AGF_MAGICNUM);
xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
if (sc->sa.agfl_bp)
xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
}
/*
* Roll the transaction. We still own the buffer and the buffer lock
* regardless of whether or not the roll succeeds. If the roll fails,
* the buffers will be released during teardown on our way out of the
* kernel. If it succeeds, we join them to the new transaction and
* move on.
* Roll the transaction. We still hold the AG header buffers locked
* regardless of whether or not that succeeds. On failure, the buffers
* will be released during teardown on our way out of the kernel. If
* successful, join the buffers to the new transaction and move on.
*/
error = xfs_trans_roll(&sc->tp);
if (error)
return error;
/* Join AG headers to the new transaction. */
/* Join the AG headers to the new transaction. */
if (sc->sa.agi_bp)
xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
if (sc->sa.agf_bp)
xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
if (sc->sa.agfl_bp)
xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
return 0;
}
@ -498,6 +510,7 @@ xrep_put_freelist(
struct xfs_scrub *sc,
xfs_agblock_t agbno)
{
struct xfs_buf *agfl_bp;
int error;
/* Make sure there's space on the freelist. */
@ -516,8 +529,12 @@ xrep_put_freelist(
return error;
/* Put the block on the AGFL. */
error = xfs_alloc_read_agfl(sc->sa.pag, sc->tp, &agfl_bp);
if (error)
return error;
error = xfs_alloc_put_freelist(sc->sa.pag, sc->tp, sc->sa.agf_bp,
sc->sa.agfl_bp, agbno, 0);
agfl_bp, agbno, 0);
if (error)
return error;
xfs_extent_busy_insert(sc->tp, sc->sa.pag, agbno, 1,

View File

@ -174,7 +174,7 @@ xchk_teardown(
if (sc->flags & XCHK_REAPING_DISABLED)
xchk_start_reaping(sc);
if (sc->buf) {
kmem_free(sc->buf);
kvfree(sc->buf);
sc->buf = NULL;
}
return error;
@ -467,7 +467,7 @@ xfs_scrub_metadata(
xfs_warn_mount(mp, XFS_OPSTATE_WARNED_SCRUB,
"EXPERIMENTAL online scrub feature in use. Use at your own risk!");
sc = kmem_zalloc(sizeof(struct xfs_scrub), KM_NOFS | KM_MAYFAIL);
sc = kzalloc(sizeof(struct xfs_scrub), XCHK_GFP_FLAGS);
if (!sc) {
error = -ENOMEM;
goto out;
@ -557,7 +557,7 @@ out_nofix:
out_teardown:
error = xchk_teardown(sc, error);
out_sc:
kmem_free(sc);
kfree(sc);
out:
trace_xchk_done(XFS_I(file_inode(file)), sm, error);
if (error == -EFSCORRUPTED || error == -EFSBADCRC) {

View File

@ -8,6 +8,15 @@
struct xfs_scrub;
/*
* Standard flags for allocating memory within scrub. NOFS context is
* configured by the process allocation scope. Scrub and repair must be able
* to back out gracefully if there isn't enough memory. Force-cast to avoid
* complaints from static checkers.
*/
#define XCHK_GFP_FLAGS ((__force gfp_t)(GFP_KERNEL | __GFP_NOWARN | \
__GFP_RETRY_MAYFAIL))
/* Type info and names for the scrub types. */
enum xchk_type {
ST_NONE = 1, /* disabled */
@ -39,7 +48,6 @@ struct xchk_ag {
/* AG btree roots */
struct xfs_buf *agf_bp;
struct xfs_buf *agfl_bp;
struct xfs_buf *agi_bp;
/* AG btrees */
@ -161,12 +169,4 @@ void xchk_xref_is_used_rt_space(struct xfs_scrub *sc, xfs_rtblock_t rtbno,
# define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0)
#endif
struct xchk_fscounters {
uint64_t icount;
uint64_t ifree;
uint64_t fdblocks;
unsigned long long icount_min;
unsigned long long icount_max;
};
#endif /* __XFS_SCRUB_SCRUB_H__ */

View File

@ -21,7 +21,7 @@ xchk_setup_symlink(
struct xfs_scrub *sc)
{
/* Allocate the buffer without the inode lock held. */
sc->buf = kvzalloc(XFS_SYMLINK_MAXLEN + 1, GFP_KERNEL);
sc->buf = kvzalloc(XFS_SYMLINK_MAXLEN + 1, XCHK_GFP_FLAGS);
if (!sc->buf)
return -ENOMEM;

View File

@ -17,6 +17,8 @@
#include "xfs_bmap.h"
#include "xfs_bmap_util.h"
#include "xfs_reflink.h"
#include "xfs_errortag.h"
#include "xfs_error.h"
struct xfs_writepage_ctx {
struct iomap_writepage_ctx ctx;
@ -114,9 +116,8 @@ xfs_end_ioend(
if (unlikely(error)) {
if (ioend->io_flags & IOMAP_F_SHARED) {
xfs_reflink_cancel_cow_range(ip, offset, size, true);
xfs_bmap_punch_delalloc_range(ip,
XFS_B_TO_FSBT(mp, offset),
XFS_B_TO_FSB(mp, size));
xfs_bmap_punch_delalloc_range(ip, offset,
offset + size);
}
goto done;
}
@ -218,11 +219,17 @@ xfs_imap_valid(
* checked (and found nothing at this offset) could have added
* overlapping blocks.
*/
if (XFS_WPC(wpc)->data_seq != READ_ONCE(ip->i_df.if_seq))
if (XFS_WPC(wpc)->data_seq != READ_ONCE(ip->i_df.if_seq)) {
trace_xfs_wb_data_iomap_invalid(ip, &wpc->iomap,
XFS_WPC(wpc)->data_seq, XFS_DATA_FORK);
return false;
}
if (xfs_inode_has_cow_data(ip) &&
XFS_WPC(wpc)->cow_seq != READ_ONCE(ip->i_cowfp->if_seq))
XFS_WPC(wpc)->cow_seq != READ_ONCE(ip->i_cowfp->if_seq)) {
trace_xfs_wb_cow_iomap_invalid(ip, &wpc->iomap,
XFS_WPC(wpc)->cow_seq, XFS_COW_FORK);
return false;
}
return true;
}
@ -286,6 +293,8 @@ xfs_map_blocks(
if (xfs_is_shutdown(mp))
return -EIO;
XFS_ERRORTAG_DELAY(mp, XFS_ERRTAG_WB_DELAY_MS);
/*
* COW fork blocks can overlap data fork blocks even if the blocks
* aren't shared. COW I/O always takes precedent, so we must always
@ -373,7 +382,7 @@ retry:
isnullstartblock(imap.br_startblock))
goto allocate_blocks;
xfs_bmbt_to_iomap(ip, &wpc->iomap, &imap, 0, 0);
xfs_bmbt_to_iomap(ip, &wpc->iomap, &imap, 0, 0, XFS_WPC(wpc)->data_seq);
trace_xfs_map_blocks_found(ip, offset, count, whichfork, &imap);
return 0;
allocate_blocks:
@ -455,12 +464,8 @@ xfs_discard_folio(
struct folio *folio,
loff_t pos)
{
struct inode *inode = folio->mapping->host;
struct xfs_inode *ip = XFS_I(inode);
struct xfs_inode *ip = XFS_I(folio->mapping->host);
struct xfs_mount *mp = ip->i_mount;
size_t offset = offset_in_folio(folio, pos);
xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, pos);
xfs_fileoff_t pageoff_fsb = XFS_B_TO_FSBT(mp, offset);
int error;
if (xfs_is_shutdown(mp))
@ -470,8 +475,9 @@ xfs_discard_folio(
"page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
folio, ip->i_ino, pos);
error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
i_blocks_per_folio(inode, folio) - pageoff_fsb);
error = xfs_bmap_punch_delalloc_range(ip, pos,
round_up(pos, folio_size(folio)));
if (error && !xfs_is_shutdown(mp))
xfs_alert(mp, "page discard unable to remove delalloc mapping.");
}

View File

@ -590,11 +590,13 @@ out_unlock_iolock:
int
xfs_bmap_punch_delalloc_range(
struct xfs_inode *ip,
xfs_fileoff_t start_fsb,
xfs_fileoff_t length)
xfs_off_t start_byte,
xfs_off_t end_byte)
{
struct xfs_mount *mp = ip->i_mount;
struct xfs_ifork *ifp = &ip->i_df;
xfs_fileoff_t end_fsb = start_fsb + length;
xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, start_byte);
xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, end_byte);
struct xfs_bmbt_irec got, del;
struct xfs_iext_cursor icur;
int error = 0;
@ -607,7 +609,7 @@ xfs_bmap_punch_delalloc_range(
while (got.br_startoff + got.br_blockcount > start_fsb) {
del = got;
xfs_trim_extent(&del, start_fsb, length);
xfs_trim_extent(&del, start_fsb, end_fsb - start_fsb);
/*
* A delete can push the cursor forward. Step back to the

View File

@ -31,7 +31,7 @@ xfs_bmap_rtalloc(struct xfs_bmalloca *ap)
#endif /* CONFIG_XFS_RT */
int xfs_bmap_punch_delalloc_range(struct xfs_inode *ip,
xfs_fileoff_t start_fsb, xfs_fileoff_t length);
xfs_off_t start_byte, xfs_off_t end_byte);
struct kgetbmap {
__s64 bmv_offset; /* file offset of segment in blocks */

View File

@ -1945,6 +1945,7 @@ xfs_free_buftarg(
list_lru_destroy(&btp->bt_lru);
blkdev_issue_flush(btp->bt_bdev);
invalidate_bdev(btp->bt_bdev);
fs_put_dax(btp->bt_daxdev, btp->bt_mount);
kmem_free(btp);

View File

@ -1018,6 +1018,8 @@ xfs_buf_item_relse(
trace_xfs_buf_item_relse(bp, _RET_IP_);
ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
if (atomic_read(&bip->bli_refcount))
return;
bp->b_log_item = NULL;
xfs_buf_rele(bp);
xfs_buf_item_free(bip);

View File

@ -46,7 +46,7 @@ static unsigned int xfs_errortag_random_default[] = {
XFS_RANDOM_REFCOUNT_FINISH_ONE,
XFS_RANDOM_BMAP_FINISH_ONE,
XFS_RANDOM_AG_RESV_CRITICAL,
XFS_RANDOM_DROP_WRITES,
0, /* XFS_RANDOM_DROP_WRITES has been removed */
XFS_RANDOM_LOG_BAD_CRC,
XFS_RANDOM_LOG_ITEM_PIN,
XFS_RANDOM_BUF_LRU_REF,
@ -60,6 +60,8 @@ static unsigned int xfs_errortag_random_default[] = {
XFS_RANDOM_LARP,
XFS_RANDOM_DA_LEAF_SPLIT,
XFS_RANDOM_ATTR_LEAF_TO_NODE,
XFS_RANDOM_WB_DELAY_MS,
XFS_RANDOM_WRITE_DELAY_MS,
};
struct xfs_errortag_attr {
@ -162,7 +164,6 @@ XFS_ERRORTAG_ATTR_RW(refcount_continue_update, XFS_ERRTAG_REFCOUNT_CONTINUE_UPDA
XFS_ERRORTAG_ATTR_RW(refcount_finish_one, XFS_ERRTAG_REFCOUNT_FINISH_ONE);
XFS_ERRORTAG_ATTR_RW(bmap_finish_one, XFS_ERRTAG_BMAP_FINISH_ONE);
XFS_ERRORTAG_ATTR_RW(ag_resv_critical, XFS_ERRTAG_AG_RESV_CRITICAL);
XFS_ERRORTAG_ATTR_RW(drop_writes, XFS_ERRTAG_DROP_WRITES);
XFS_ERRORTAG_ATTR_RW(log_bad_crc, XFS_ERRTAG_LOG_BAD_CRC);
XFS_ERRORTAG_ATTR_RW(log_item_pin, XFS_ERRTAG_LOG_ITEM_PIN);
XFS_ERRORTAG_ATTR_RW(buf_lru_ref, XFS_ERRTAG_BUF_LRU_REF);
@ -176,6 +177,8 @@ XFS_ERRORTAG_ATTR_RW(ag_resv_fail, XFS_ERRTAG_AG_RESV_FAIL);
XFS_ERRORTAG_ATTR_RW(larp, XFS_ERRTAG_LARP);
XFS_ERRORTAG_ATTR_RW(da_leaf_split, XFS_ERRTAG_DA_LEAF_SPLIT);
XFS_ERRORTAG_ATTR_RW(attr_leaf_to_node, XFS_ERRTAG_ATTR_LEAF_TO_NODE);
XFS_ERRORTAG_ATTR_RW(wb_delay_ms, XFS_ERRTAG_WB_DELAY_MS);
XFS_ERRORTAG_ATTR_RW(write_delay_ms, XFS_ERRTAG_WRITE_DELAY_MS);
static struct attribute *xfs_errortag_attrs[] = {
XFS_ERRORTAG_ATTR_LIST(noerror),
@ -206,7 +209,6 @@ static struct attribute *xfs_errortag_attrs[] = {
XFS_ERRORTAG_ATTR_LIST(refcount_finish_one),
XFS_ERRORTAG_ATTR_LIST(bmap_finish_one),
XFS_ERRORTAG_ATTR_LIST(ag_resv_critical),
XFS_ERRORTAG_ATTR_LIST(drop_writes),
XFS_ERRORTAG_ATTR_LIST(log_bad_crc),
XFS_ERRORTAG_ATTR_LIST(log_item_pin),
XFS_ERRORTAG_ATTR_LIST(buf_lru_ref),
@ -220,6 +222,8 @@ static struct attribute *xfs_errortag_attrs[] = {
XFS_ERRORTAG_ATTR_LIST(larp),
XFS_ERRORTAG_ATTR_LIST(da_leaf_split),
XFS_ERRORTAG_ATTR_LIST(attr_leaf_to_node),
XFS_ERRORTAG_ATTR_LIST(wb_delay_ms),
XFS_ERRORTAG_ATTR_LIST(write_delay_ms),
NULL,
};
ATTRIBUTE_GROUPS(xfs_errortag);
@ -256,6 +260,32 @@ xfs_errortag_del(
kmem_free(mp->m_errortag);
}
static bool
xfs_errortag_valid(
unsigned int error_tag)
{
if (error_tag >= XFS_ERRTAG_MAX)
return false;
/* Error out removed injection types */
if (error_tag == XFS_ERRTAG_DROP_WRITES)
return false;
return true;
}
bool
xfs_errortag_enabled(
struct xfs_mount *mp,
unsigned int tag)
{
if (!mp->m_errortag)
return false;
if (!xfs_errortag_valid(tag))
return false;
return mp->m_errortag[tag] != 0;
}
bool
xfs_errortag_test(
struct xfs_mount *mp,
@ -277,7 +307,9 @@ xfs_errortag_test(
if (!mp->m_errortag)
return false;
ASSERT(error_tag < XFS_ERRTAG_MAX);
if (!xfs_errortag_valid(error_tag))
return false;
randfactor = mp->m_errortag[error_tag];
if (!randfactor || get_random_u32_below(randfactor))
return false;
@ -293,7 +325,7 @@ xfs_errortag_get(
struct xfs_mount *mp,
unsigned int error_tag)
{
if (error_tag >= XFS_ERRTAG_MAX)
if (!xfs_errortag_valid(error_tag))
return -EINVAL;
return mp->m_errortag[error_tag];
@ -305,7 +337,7 @@ xfs_errortag_set(
unsigned int error_tag,
unsigned int tag_value)
{
if (error_tag >= XFS_ERRTAG_MAX)
if (!xfs_errortag_valid(error_tag))
return -EINVAL;
mp->m_errortag[error_tag] = tag_value;
@ -319,7 +351,7 @@ xfs_errortag_add(
{
BUILD_BUG_ON(ARRAY_SIZE(xfs_errortag_random_default) != XFS_ERRTAG_MAX);
if (error_tag >= XFS_ERRTAG_MAX)
if (!xfs_errortag_valid(error_tag))
return -EINVAL;
return xfs_errortag_set(mp, error_tag,

View File

@ -45,6 +45,18 @@ extern bool xfs_errortag_test(struct xfs_mount *mp, const char *expression,
const char *file, int line, unsigned int error_tag);
#define XFS_TEST_ERROR(expr, mp, tag) \
((expr) || xfs_errortag_test((mp), #expr, __FILE__, __LINE__, (tag)))
bool xfs_errortag_enabled(struct xfs_mount *mp, unsigned int tag);
#define XFS_ERRORTAG_DELAY(mp, tag) \
do { \
might_sleep(); \
if (!xfs_errortag_enabled((mp), (tag))) \
break; \
xfs_warn_ratelimited((mp), \
"Injecting %ums delay at file %s, line %d, on filesystem \"%s\"", \
(mp)->m_errortag[(tag)], __FILE__, __LINE__, \
(mp)->m_super->s_id); \
mdelay((mp)->m_errortag[(tag)]); \
} while (0)
extern int xfs_errortag_get(struct xfs_mount *mp, unsigned int error_tag);
extern int xfs_errortag_set(struct xfs_mount *mp, unsigned int error_tag,
@ -55,6 +67,7 @@ extern int xfs_errortag_clearall(struct xfs_mount *mp);
#define xfs_errortag_init(mp) (0)
#define xfs_errortag_del(mp)
#define XFS_TEST_ERROR(expr, mp, tag) (expr)
#define XFS_ERRORTAG_DELAY(mp, tag) ((void)0)
#define xfs_errortag_set(mp, tag, val) (ENOSYS)
#define xfs_errortag_add(mp, tag) (ENOSYS)
#define xfs_errortag_clearall(mp) (ENOSYS)

View File

@ -1325,7 +1325,7 @@ __xfs_filemap_fault(
if (write_fault) {
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
ret = iomap_page_mkwrite(vmf,
&xfs_buffered_write_iomap_ops);
&xfs_page_mkwrite_iomap_ops);
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
} else {
ret = filemap_fault(vmf);

View File

@ -524,7 +524,7 @@ xfs_getfsmap_rtdev_rtbitmap_query(
struct xfs_mount *mp = tp->t_mountp;
int error;
xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED);
xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
/*
* Set up query parameters to return free rtextents covering the range
@ -551,7 +551,7 @@ xfs_getfsmap_rtdev_rtbitmap_query(
if (error)
goto err;
err:
xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED);
xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
return error;
}

View File

@ -342,6 +342,9 @@ xfs_iget_recycle(
trace_xfs_iget_recycle(ip);
if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
return -EAGAIN;
/*
* We need to make it look like the inode is being reclaimed to prevent
* the actual reclaim workers from stomping over us while we recycle
@ -355,6 +358,7 @@ xfs_iget_recycle(
ASSERT(!rwsem_is_locked(&inode->i_rwsem));
error = xfs_reinit_inode(mp, inode);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
if (error) {
/*
* Re-initializing the inode failed, and we are in deep
@ -518,6 +522,8 @@ xfs_iget_cache_hit(
if (ip->i_flags & XFS_IRECLAIMABLE) {
/* Drops i_flags_lock and RCU read lock. */
error = xfs_iget_recycle(pag, ip);
if (error == -EAGAIN)
goto out_skip;
if (error)
return error;
} else {

View File

@ -2479,7 +2479,7 @@ xfs_remove(
error = xfs_dir_replace(tp, ip, &xfs_name_dotdot,
tp->t_mountp->m_sb.sb_rootino, 0);
if (error)
return error;
goto out_trans_cancel;
}
} else {
/*

View File

@ -48,13 +48,53 @@ xfs_alert_fsblock_zero(
return -EFSCORRUPTED;
}
u64
xfs_iomap_inode_sequence(
struct xfs_inode *ip,
u16 iomap_flags)
{
u64 cookie = 0;
if (iomap_flags & IOMAP_F_XATTR)
return READ_ONCE(ip->i_af.if_seq);
if ((iomap_flags & IOMAP_F_SHARED) && ip->i_cowfp)
cookie = (u64)READ_ONCE(ip->i_cowfp->if_seq) << 32;
return cookie | READ_ONCE(ip->i_df.if_seq);
}
/*
* Check that the iomap passed to us is still valid for the given offset and
* length.
*/
static bool
xfs_iomap_valid(
struct inode *inode,
const struct iomap *iomap)
{
struct xfs_inode *ip = XFS_I(inode);
if (iomap->validity_cookie !=
xfs_iomap_inode_sequence(ip, iomap->flags)) {
trace_xfs_iomap_invalid(ip, iomap);
return false;
}
XFS_ERRORTAG_DELAY(ip->i_mount, XFS_ERRTAG_WRITE_DELAY_MS);
return true;
}
const struct iomap_page_ops xfs_iomap_page_ops = {
.iomap_valid = xfs_iomap_valid,
};
int
xfs_bmbt_to_iomap(
struct xfs_inode *ip,
struct iomap *iomap,
struct xfs_bmbt_irec *imap,
unsigned int mapping_flags,
u16 iomap_flags)
u16 iomap_flags,
u64 sequence_cookie)
{
struct xfs_mount *mp = ip->i_mount;
struct xfs_buftarg *target = xfs_inode_buftarg(ip);
@ -91,6 +131,9 @@ xfs_bmbt_to_iomap(
if (xfs_ipincount(ip) &&
(ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
iomap->flags |= IOMAP_F_DIRTY;
iomap->validity_cookie = sequence_cookie;
iomap->page_ops = &xfs_iomap_page_ops;
return 0;
}
@ -195,7 +238,8 @@ xfs_iomap_write_direct(
xfs_fileoff_t offset_fsb,
xfs_fileoff_t count_fsb,
unsigned int flags,
struct xfs_bmbt_irec *imap)
struct xfs_bmbt_irec *imap,
u64 *seq)
{
struct xfs_mount *mp = ip->i_mount;
struct xfs_trans *tp;
@ -285,6 +329,7 @@ xfs_iomap_write_direct(
error = xfs_alert_fsblock_zero(ip, imap);
out_unlock:
*seq = xfs_iomap_inode_sequence(ip, 0);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
return error;
@ -743,6 +788,7 @@ xfs_direct_write_iomap_begin(
bool shared = false;
u16 iomap_flags = 0;
unsigned int lockmode = XFS_ILOCK_SHARED;
u64 seq;
ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO));
@ -811,9 +857,10 @@ xfs_direct_write_iomap_begin(
goto out_unlock;
}
seq = xfs_iomap_inode_sequence(ip, iomap_flags);
xfs_iunlock(ip, lockmode);
trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, iomap_flags);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, iomap_flags, seq);
allocate_blocks:
error = -EAGAIN;
@ -839,24 +886,26 @@ allocate_blocks:
xfs_iunlock(ip, lockmode);
error = xfs_iomap_write_direct(ip, offset_fsb, end_fsb - offset_fsb,
flags, &imap);
flags, &imap, &seq);
if (error)
return error;
trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags,
iomap_flags | IOMAP_F_NEW);
iomap_flags | IOMAP_F_NEW, seq);
out_found_cow:
xfs_iunlock(ip, lockmode);
length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount);
trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap);
if (imap.br_startblock != HOLESTARTBLOCK) {
error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0);
seq = xfs_iomap_inode_sequence(ip, 0);
error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0, seq);
if (error)
return error;
goto out_unlock;
}
return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED);
seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED);
xfs_iunlock(ip, lockmode);
return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED, seq);
out_unlock:
if (lockmode)
@ -915,6 +964,7 @@ xfs_buffered_write_iomap_begin(
int allocfork = XFS_DATA_FORK;
int error = 0;
unsigned int lockmode = XFS_ILOCK_EXCL;
u64 seq;
if (xfs_is_shutdown(mp))
return -EIO;
@ -926,6 +976,10 @@ xfs_buffered_write_iomap_begin(
ASSERT(!XFS_IS_REALTIME_INODE(ip));
error = xfs_qm_dqattach(ip);
if (error)
return error;
error = xfs_ilock_for_iomap(ip, flags, &lockmode);
if (error)
return error;
@ -1029,10 +1083,6 @@ xfs_buffered_write_iomap_begin(
allocfork = XFS_COW_FORK;
}
error = xfs_qm_dqattach_locked(ip, false);
if (error)
goto out_unlock;
if (eof && offset + count > XFS_ISIZE(ip)) {
/*
* Determine the initial size of the preallocation.
@ -1094,32 +1144,47 @@ retry:
* Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
* them out if the write happens to fail.
*/
seq = xfs_iomap_inode_sequence(ip, IOMAP_F_NEW);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
trace_xfs_iomap_alloc(ip, offset, count, allocfork, &imap);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, IOMAP_F_NEW);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, IOMAP_F_NEW, seq);
found_imap:
seq = xfs_iomap_inode_sequence(ip, 0);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0, seq);
found_cow:
xfs_iunlock(ip, XFS_ILOCK_EXCL);
seq = xfs_iomap_inode_sequence(ip, 0);
if (imap.br_startoff <= offset_fsb) {
error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0);
error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0, seq);
if (error)
return error;
goto out_unlock;
seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags,
IOMAP_F_SHARED);
IOMAP_F_SHARED, seq);
}
xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, 0);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, 0, seq);
out_unlock:
xfs_iunlock(ip, XFS_ILOCK_EXCL);
return error;
}
static int
xfs_buffered_write_delalloc_punch(
struct inode *inode,
loff_t offset,
loff_t length)
{
return xfs_bmap_punch_delalloc_range(XFS_I(inode), offset,
offset + length);
}
static int
xfs_buffered_write_iomap_end(
struct inode *inode,
@ -1129,56 +1194,17 @@ xfs_buffered_write_iomap_end(
unsigned flags,
struct iomap *iomap)
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
xfs_fileoff_t start_fsb;
xfs_fileoff_t end_fsb;
int error = 0;
if (iomap->type != IOMAP_DELALLOC)
return 0;
struct xfs_mount *mp = XFS_M(inode->i_sb);
int error;
/*
* Behave as if the write failed if drop writes is enabled. Set the NEW
* flag to force delalloc cleanup.
*/
if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_DROP_WRITES)) {
iomap->flags |= IOMAP_F_NEW;
written = 0;
error = iomap_file_buffered_write_punch_delalloc(inode, iomap, offset,
length, written, &xfs_buffered_write_delalloc_punch);
if (error && !xfs_is_shutdown(mp)) {
xfs_alert(mp, "%s: unable to clean up ino 0x%llx",
__func__, XFS_I(inode)->i_ino);
return error;
}
/*
* start_fsb refers to the first unused block after a short write. If
* nothing was written, round offset down to point at the first block in
* the range.
*/
if (unlikely(!written))
start_fsb = XFS_B_TO_FSBT(mp, offset);
else
start_fsb = XFS_B_TO_FSB(mp, offset + written);
end_fsb = XFS_B_TO_FSB(mp, offset + length);
/*
* Trim delalloc blocks if they were allocated by this write and we
* didn't manage to write the whole range.
*
* We don't need to care about racing delalloc as we hold i_mutex
* across the reserve/allocate/unreserve calls. If there are delalloc
* blocks in the range, they are ours.
*/
if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) {
truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
XFS_FSB_TO_B(mp, end_fsb) - 1);
error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
end_fsb - start_fsb);
if (error && !xfs_is_shutdown(mp)) {
xfs_alert(mp, "%s: unable to clean up ino %lld",
__func__, ip->i_ino);
return error;
}
}
return 0;
}
@ -1187,6 +1213,15 @@ const struct iomap_ops xfs_buffered_write_iomap_ops = {
.iomap_end = xfs_buffered_write_iomap_end,
};
/*
* iomap_page_mkwrite() will never fail in a way that requires delalloc extents
* that it allocated to be revoked. Hence we do not need an .iomap_end method
* for this operation.
*/
const struct iomap_ops xfs_page_mkwrite_iomap_ops = {
.iomap_begin = xfs_buffered_write_iomap_begin,
};
static int
xfs_read_iomap_begin(
struct inode *inode,
@ -1204,6 +1239,7 @@ xfs_read_iomap_begin(
int nimaps = 1, error = 0;
bool shared = false;
unsigned int lockmode = XFS_ILOCK_SHARED;
u64 seq;
ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)));
@ -1217,13 +1253,14 @@ xfs_read_iomap_begin(
&nimaps, 0);
if (!error && ((flags & IOMAP_REPORT) || IS_DAX(inode)))
error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
seq = xfs_iomap_inode_sequence(ip, shared ? IOMAP_F_SHARED : 0);
xfs_iunlock(ip, lockmode);
if (error)
return error;
trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags,
shared ? IOMAP_F_SHARED : 0);
shared ? IOMAP_F_SHARED : 0, seq);
}
const struct iomap_ops xfs_read_iomap_ops = {
@ -1248,6 +1285,7 @@ xfs_seek_iomap_begin(
struct xfs_bmbt_irec imap, cmap;
int error = 0;
unsigned lockmode;
u64 seq;
if (xfs_is_shutdown(mp))
return -EIO;
@ -1282,8 +1320,9 @@ xfs_seek_iomap_begin(
if (data_fsb < cow_fsb + cmap.br_blockcount)
end_fsb = min(end_fsb, data_fsb);
xfs_trim_extent(&cmap, offset_fsb, end_fsb);
seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED);
error = xfs_bmbt_to_iomap(ip, iomap, &cmap, flags,
IOMAP_F_SHARED);
IOMAP_F_SHARED, seq);
/*
* This is a COW extent, so we must probe the page cache
* because there could be dirty page cache being backed
@ -1304,8 +1343,9 @@ xfs_seek_iomap_begin(
imap.br_startblock = HOLESTARTBLOCK;
imap.br_state = XFS_EXT_NORM;
done:
seq = xfs_iomap_inode_sequence(ip, 0);
xfs_trim_extent(&imap, offset_fsb, end_fsb);
error = xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0);
error = xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0, seq);
out_unlock:
xfs_iunlock(ip, lockmode);
return error;
@ -1331,6 +1371,7 @@ xfs_xattr_iomap_begin(
struct xfs_bmbt_irec imap;
int nimaps = 1, error = 0;
unsigned lockmode;
int seq;
if (xfs_is_shutdown(mp))
return -EIO;
@ -1347,12 +1388,14 @@ xfs_xattr_iomap_begin(
error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
&nimaps, XFS_BMAPI_ATTRFORK);
out_unlock:
seq = xfs_iomap_inode_sequence(ip, IOMAP_F_XATTR);
xfs_iunlock(ip, lockmode);
if (error)
return error;
ASSERT(nimaps);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, IOMAP_F_XATTR, seq);
}
const struct iomap_ops xfs_xattr_iomap_ops = {

View File

@ -13,14 +13,15 @@ struct xfs_bmbt_irec;
int xfs_iomap_write_direct(struct xfs_inode *ip, xfs_fileoff_t offset_fsb,
xfs_fileoff_t count_fsb, unsigned int flags,
struct xfs_bmbt_irec *imap);
struct xfs_bmbt_irec *imap, u64 *sequence);
int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool);
xfs_fileoff_t xfs_iomap_eof_align_last_fsb(struct xfs_inode *ip,
xfs_fileoff_t end_fsb);
u64 xfs_iomap_inode_sequence(struct xfs_inode *ip, u16 iomap_flags);
int xfs_bmbt_to_iomap(struct xfs_inode *ip, struct iomap *iomap,
struct xfs_bmbt_irec *imap, unsigned int mapping_flags,
u16 iomap_flags);
u16 iomap_flags, u64 sequence_cookie);
int xfs_zero_range(struct xfs_inode *ip, loff_t pos, loff_t len,
bool *did_zero);
@ -47,6 +48,7 @@ xfs_aligned_fsb_count(
}
extern const struct iomap_ops xfs_buffered_write_iomap_ops;
extern const struct iomap_ops xfs_page_mkwrite_iomap_ops;
extern const struct iomap_ops xfs_direct_write_iomap_ops;
extern const struct iomap_ops xfs_read_iomap_ops;
extern const struct iomap_ops xfs_seek_iomap_ops;

View File

@ -644,12 +644,14 @@ xfs_log_mount(
int min_logfsbs;
if (!xfs_has_norecovery(mp)) {
xfs_notice(mp, "Mounting V%d Filesystem",
XFS_SB_VERSION_NUM(&mp->m_sb));
xfs_notice(mp, "Mounting V%d Filesystem %pU",
XFS_SB_VERSION_NUM(&mp->m_sb),
&mp->m_sb.sb_uuid);
} else {
xfs_notice(mp,
"Mounting V%d filesystem in no-recovery mode. Filesystem will be inconsistent.",
XFS_SB_VERSION_NUM(&mp->m_sb));
"Mounting V%d filesystem %pU in no-recovery mode. Filesystem will be inconsistent.",
XFS_SB_VERSION_NUM(&mp->m_sb),
&mp->m_sb.sb_uuid);
ASSERT(xfs_is_readonly(mp));
}
@ -886,6 +888,23 @@ xlog_force_iclog(
return xlog_state_release_iclog(iclog->ic_log, iclog, NULL);
}
/*
* Cycle all the iclogbuf locks to make sure all log IO completion
* is done before we tear down these buffers.
*/
static void
xlog_wait_iclog_completion(struct xlog *log)
{
int i;
struct xlog_in_core *iclog = log->l_iclog;
for (i = 0; i < log->l_iclog_bufs; i++) {
down(&iclog->ic_sema);
up(&iclog->ic_sema);
iclog = iclog->ic_next;
}
}
/*
* Wait for the iclog and all prior iclogs to be written disk as required by the
* log force state machine. Waiting on ic_force_wait ensures iclog completions
@ -1111,6 +1130,14 @@ xfs_log_unmount(
{
xfs_log_clean(mp);
/*
* If shutdown has come from iclog IO context, the log
* cleaning will have been skipped and so we need to wait
* for the iclog to complete shutdown processing before we
* tear anything down.
*/
xlog_wait_iclog_completion(mp->m_log);
xfs_buftarg_drain(mp->m_ddev_targp);
xfs_trans_ail_destroy(mp);
@ -2113,17 +2140,6 @@ xlog_dealloc_log(
xlog_in_core_t *iclog, *next_iclog;
int i;
/*
* Cycle all the iclogbuf locks to make sure all log IO completion
* is done before we tear down these buffers.
*/
iclog = log->l_iclog;
for (i = 0; i < log->l_iclog_bufs; i++) {
down(&iclog->ic_sema);
up(&iclog->ic_sema);
iclog = iclog->ic_next;
}
/*
* Destroy the CIL after waiting for iclog IO completion because an
* iclog EIO error will try to shut down the log, which accesses the

View File

@ -538,6 +538,20 @@ xfs_check_summary_counts(
return 0;
}
static void
xfs_unmount_check(
struct xfs_mount *mp)
{
if (xfs_is_shutdown(mp))
return;
if (percpu_counter_sum(&mp->m_ifree) >
percpu_counter_sum(&mp->m_icount)) {
xfs_alert(mp, "ifree/icount mismatch at unmount");
xfs_fs_mark_sick(mp, XFS_SICK_FS_COUNTERS);
}
}
/*
* Flush and reclaim dirty inodes in preparation for unmount. Inodes and
* internal inode structures can be sitting in the CIL and AIL at this point,
@ -1077,6 +1091,7 @@ xfs_unmountfs(
if (error)
xfs_warn(mp, "Unable to free reserved block pool. "
"Freespace may not be correct on next mount.");
xfs_unmount_check(mp);
xfs_log_unmount(mp);
xfs_da_unmount(mp);

View File

@ -125,6 +125,7 @@ xfs_fs_map_blocks(
int nimaps = 1;
uint lock_flags;
int error = 0;
u64 seq;
if (xfs_is_shutdown(mp))
return -EIO;
@ -176,6 +177,7 @@ xfs_fs_map_blocks(
lock_flags = xfs_ilock_data_map_shared(ip);
error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
&imap, &nimaps, bmapi_flags);
seq = xfs_iomap_inode_sequence(ip, 0);
ASSERT(!nimaps || imap.br_startblock != DELAYSTARTBLOCK);
@ -189,7 +191,7 @@ xfs_fs_map_blocks(
xfs_iunlock(ip, lock_flags);
error = xfs_iomap_write_direct(ip, offset_fsb,
end_fsb - offset_fsb, 0, &imap);
end_fsb - offset_fsb, 0, &imap, &seq);
if (error)
goto out_unlock;
@ -209,7 +211,7 @@ xfs_fs_map_blocks(
}
xfs_iunlock(ip, XFS_IOLOCK_EXCL);
error = xfs_bmbt_to_iomap(ip, iomap, &imap, 0, 0);
error = xfs_bmbt_to_iomap(ip, iomap, &imap, 0, 0, seq);
*device_generation = mp->m_generation;
return error;
out_unlock:

View File

@ -422,6 +422,14 @@ xfs_qm_dquot_isolate(
if (!xfs_dqlock_nowait(dqp))
goto out_miss_busy;
/*
* If something else is freeing this dquot and hasn't yet removed it
* from the LRU, leave it for the freeing task to complete the freeing
* process rather than risk it being free from under us here.
*/
if (dqp->q_flags & XFS_DQFLAG_FREEING)
goto out_miss_unlock;
/*
* This dquot has acquired a reference in the meantime remove it from
* the freelist and try again.
@ -441,10 +449,8 @@ xfs_qm_dquot_isolate(
* skip it so there is time for the IO to complete before we try to
* reclaim it again on the next LRU pass.
*/
if (!xfs_dqflock_nowait(dqp)) {
xfs_dqunlock(dqp);
goto out_miss_busy;
}
if (!xfs_dqflock_nowait(dqp))
goto out_miss_unlock;
if (XFS_DQ_IS_DIRTY(dqp)) {
struct xfs_buf *bp = NULL;
@ -478,6 +484,8 @@ xfs_qm_dquot_isolate(
XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaims);
return LRU_REMOVED;
out_miss_unlock:
xfs_dqunlock(dqp);
out_miss_busy:
trace_xfs_dqreclaim_busy(dqp);
XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses);

View File

@ -1311,10 +1311,10 @@ xfs_rtalloc_reinit_frextents(
uint64_t val = 0;
int error;
xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL);
xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
error = xfs_rtalloc_query_all(mp, NULL, xfs_rtalloc_count_frextent,
&val);
xfs_iunlock(mp->m_rbmip, XFS_ILOCK_EXCL);
xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
if (error)
return error;
@ -1325,6 +1325,41 @@ xfs_rtalloc_reinit_frextents(
return 0;
}
/*
* Read in the bmbt of an rt metadata inode so that we never have to load them
* at runtime. This enables the use of shared ILOCKs for rtbitmap scans. Use
* an empty transaction to avoid deadlocking on loops in the bmbt.
*/
static inline int
xfs_rtmount_iread_extents(
struct xfs_inode *ip,
unsigned int lock_class)
{
struct xfs_trans *tp;
int error;
error = xfs_trans_alloc_empty(ip->i_mount, &tp);
if (error)
return error;
xfs_ilock(ip, XFS_ILOCK_EXCL | lock_class);
error = xfs_iread_extents(tp, ip, XFS_DATA_FORK);
if (error)
goto out_unlock;
if (xfs_inode_has_attr_fork(ip)) {
error = xfs_iread_extents(tp, ip, XFS_ATTR_FORK);
if (error)
goto out_unlock;
}
out_unlock:
xfs_iunlock(ip, XFS_ILOCK_EXCL | lock_class);
xfs_trans_cancel(tp);
return error;
}
/*
* Get the bitmap and summary inodes and the summary cache into the mount
* structure at mount time.
@ -1342,14 +1377,27 @@ xfs_rtmount_inodes(
return error;
ASSERT(mp->m_rbmip != NULL);
error = xfs_rtmount_iread_extents(mp->m_rbmip, XFS_ILOCK_RTBITMAP);
if (error)
goto out_rele_bitmap;
error = xfs_iget(mp, NULL, sbp->sb_rsumino, 0, 0, &mp->m_rsumip);
if (error) {
xfs_irele(mp->m_rbmip);
return error;
}
if (error)
goto out_rele_bitmap;
ASSERT(mp->m_rsumip != NULL);
error = xfs_rtmount_iread_extents(mp->m_rsumip, XFS_ILOCK_RTSUM);
if (error)
goto out_rele_summary;
xfs_alloc_rsum_cache(mp, sbp->sb_rbmblocks);
return 0;
out_rele_summary:
xfs_irele(mp->m_rsumip);
out_rele_bitmap:
xfs_irele(mp->m_rbmip);
return error;
}
void

View File

@ -1110,7 +1110,7 @@ xfs_fs_put_super(
if (!sb->s_fs_info)
return;
xfs_notice(mp, "Unmounting Filesystem");
xfs_notice(mp, "Unmounting Filesystem %pU", &mp->m_sb.sb_uuid);
xfs_filestream_unmount(mp);
xfs_unmountfs(mp);

View File

@ -34,6 +34,8 @@
#include "xfs_ag.h"
#include "xfs_ag_resv.h"
#include "xfs_error.h"
#include <linux/iomap.h>
#include "xfs_iomap.h"
/*
* We include this last to have the helpers above available for the trace

View File

@ -3352,6 +3352,92 @@ DEFINE_EVENT(xfs_inode_irec_class, name, \
TP_PROTO(struct xfs_inode *ip, struct xfs_bmbt_irec *irec), \
TP_ARGS(ip, irec))
/* inode iomap invalidation events */
DECLARE_EVENT_CLASS(xfs_wb_invalid_class,
TP_PROTO(struct xfs_inode *ip, const struct iomap *iomap, unsigned int wpcseq, int whichfork),
TP_ARGS(ip, iomap, wpcseq, whichfork),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_ino_t, ino)
__field(u64, addr)
__field(loff_t, pos)
__field(u64, len)
__field(u16, type)
__field(u16, flags)
__field(u32, wpcseq)
__field(u32, forkseq)
),
TP_fast_assign(
__entry->dev = VFS_I(ip)->i_sb->s_dev;
__entry->ino = ip->i_ino;
__entry->addr = iomap->addr;
__entry->pos = iomap->offset;
__entry->len = iomap->length;
__entry->type = iomap->type;
__entry->flags = iomap->flags;
__entry->wpcseq = wpcseq;
__entry->forkseq = READ_ONCE(xfs_ifork_ptr(ip, whichfork)->if_seq);
),
TP_printk("dev %d:%d ino 0x%llx pos 0x%llx addr 0x%llx bytecount 0x%llx type 0x%x flags 0x%x wpcseq 0x%x forkseq 0x%x",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino,
__entry->pos,
__entry->addr,
__entry->len,
__entry->type,
__entry->flags,
__entry->wpcseq,
__entry->forkseq)
);
#define DEFINE_WB_INVALID_EVENT(name) \
DEFINE_EVENT(xfs_wb_invalid_class, name, \
TP_PROTO(struct xfs_inode *ip, const struct iomap *iomap, unsigned int wpcseq, int whichfork), \
TP_ARGS(ip, iomap, wpcseq, whichfork))
DEFINE_WB_INVALID_EVENT(xfs_wb_cow_iomap_invalid);
DEFINE_WB_INVALID_EVENT(xfs_wb_data_iomap_invalid);
DECLARE_EVENT_CLASS(xfs_iomap_invalid_class,
TP_PROTO(struct xfs_inode *ip, const struct iomap *iomap),
TP_ARGS(ip, iomap),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_ino_t, ino)
__field(u64, addr)
__field(loff_t, pos)
__field(u64, len)
__field(u64, validity_cookie)
__field(u64, inodeseq)
__field(u16, type)
__field(u16, flags)
),
TP_fast_assign(
__entry->dev = VFS_I(ip)->i_sb->s_dev;
__entry->ino = ip->i_ino;
__entry->addr = iomap->addr;
__entry->pos = iomap->offset;
__entry->len = iomap->length;
__entry->validity_cookie = iomap->validity_cookie;
__entry->type = iomap->type;
__entry->flags = iomap->flags;
__entry->inodeseq = xfs_iomap_inode_sequence(ip, iomap->flags);
),
TP_printk("dev %d:%d ino 0x%llx pos 0x%llx addr 0x%llx bytecount 0x%llx type 0x%x flags 0x%x validity_cookie 0x%llx inodeseq 0x%llx",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino,
__entry->pos,
__entry->addr,
__entry->len,
__entry->type,
__entry->flags,
__entry->validity_cookie,
__entry->inodeseq)
);
#define DEFINE_IOMAP_INVALID_EVENT(name) \
DEFINE_EVENT(xfs_iomap_invalid_class, name, \
TP_PROTO(struct xfs_inode *ip, const struct iomap *iomap), \
TP_ARGS(ip, iomap))
DEFINE_IOMAP_INVALID_EVENT(xfs_iomap_invalid);
/* refcount/reflink tracepoint definitions */
/* reflink tracepoints */

View File

@ -422,7 +422,7 @@ xfsaild_push(
struct xfs_ail_cursor cur;
struct xfs_log_item *lip;
xfs_lsn_t lsn;
xfs_lsn_t target;
xfs_lsn_t target = NULLCOMMITLSN;
long tout;
int stuck = 0;
int flushing = 0;
@ -472,6 +472,8 @@ xfsaild_push(
XFS_STATS_INC(mp, xs_push_ail);
ASSERT(target != NULLCOMMITLSN);
lsn = lip->li_lsn;
while ((XFS_LSN_CMP(lip->li_lsn, target) <= 0)) {
int lock_result;

View File

@ -210,7 +210,7 @@ __xfs_xattr_put_listent(
return;
}
offset = context->buffer + context->count;
strncpy(offset, prefix, prefix_len);
memcpy(offset, prefix, prefix_len);
offset += prefix_len;
strncpy(offset, (char *)name, namelen); /* real name */
offset += namelen;

View File

@ -49,26 +49,35 @@ struct vm_fault;
*
* IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of
* buffer heads for this mapping.
*
* IOMAP_F_XATTR indicates that the iomap is for an extended attribute extent
* rather than a file data extent.
*/
#define IOMAP_F_NEW 0x01
#define IOMAP_F_DIRTY 0x02
#define IOMAP_F_SHARED 0x04
#define IOMAP_F_MERGED 0x08
#define IOMAP_F_BUFFER_HEAD 0x10
#define IOMAP_F_ZONE_APPEND 0x20
#define IOMAP_F_NEW (1U << 0)
#define IOMAP_F_DIRTY (1U << 1)
#define IOMAP_F_SHARED (1U << 2)
#define IOMAP_F_MERGED (1U << 3)
#define IOMAP_F_BUFFER_HEAD (1U << 4)
#define IOMAP_F_ZONE_APPEND (1U << 5)
#define IOMAP_F_XATTR (1U << 6)
/*
* Flags set by the core iomap code during operations:
*
* IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
* has changed as the result of this write operation.
*
* IOMAP_F_STALE indicates that the iomap is not valid any longer and the file
* range it covers needs to be remapped by the high level before the operation
* can proceed.
*/
#define IOMAP_F_SIZE_CHANGED 0x100
#define IOMAP_F_SIZE_CHANGED (1U << 8)
#define IOMAP_F_STALE (1U << 9)
/*
* Flags from 0x1000 up are for file system specific usage:
*/
#define IOMAP_F_PRIVATE 0x1000
#define IOMAP_F_PRIVATE (1U << 12)
/*
@ -89,6 +98,7 @@ struct iomap {
void *inline_data;
void *private; /* filesystem private */
const struct iomap_page_ops *page_ops;
u64 validity_cookie; /* used with .iomap_valid() */
};
static inline sector_t iomap_sector(const struct iomap *iomap, loff_t pos)
@ -128,6 +138,23 @@ struct iomap_page_ops {
int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len);
void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
struct page *page);
/*
* Check that the cached iomap still maps correctly to the filesystem's
* internal extent map. FS internal extent maps can change while iomap
* is iterating a cached iomap, so this hook allows iomap to detect that
* the iomap needs to be refreshed during a long running write
* operation.
*
* The filesystem can store internal state (e.g. a sequence number) in
* iomap->validity_cookie when the iomap is first mapped to be able to
* detect changes between mapping time and whenever .iomap_valid() is
* called.
*
* This is called with the folio over the specified file position held
* locked by the iomap code.
*/
bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
};
/*
@ -226,6 +253,10 @@ static inline const struct iomap *iomap_iter_srcmap(const struct iomap_iter *i)
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops);
int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
struct iomap *iomap, loff_t pos, loff_t length, ssize_t written,
int (*punch)(struct inode *inode, loff_t pos, loff_t length));
int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);