linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-08-20 14:16:02 +00:00

Author	SHA1	Message	Date
Dave Chinner	9aebe805a5	xfs: convert growfs AG header init to use buffer lists We currently write all new AG headers synchronously, which can be slow for large grow operations. All we really need to do is ensure all the headers are on disk before we run the growfs transaction, so convert this to a buffer list and a delayed write operation. We block waiting for the delayed write buffer submission to complete, so this will fulfill the requirement to have all the buffers written correctly before proceeding. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-15 18:12:51 -07:00
Dave Chinner	cce77bcf48	xfs: factor out AG header initialisation from growfs core The intialisation of new AG headers is mostly common with the userspace mkfs code and growfs in the kernel, so start factoring it out so we can move it to libxfs and use it in both places. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-15 18:12:51 -07:00
Dave Chinner	879de98ead	xfs: one-shot cached buffers For the new growfs work, we want to ensure that we serialise secondary superblock updates with other operations (e.g. scrub) correctly, but we don't want to cache the buffers for long term reuse. We need cached buffers for serialisation, however. To solve this, introduce a "oneshot" buffer which will be marshalled through the cache but then released once the last current reference goes away. If the buffer is already cached, then we ignore the "one-shot" behaviour and leave the buffer in the state it was prior to the one-shot command being run. This means we don't perturb either the working set or existing cached buffer state by a one-shot operation. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-15 18:12:51 -07:00
Darrick J. Wong	84d42ea6b6	xfs: implement the metadata repair ioctl flag Plumb in the pieces necessary to make the "scrub" subfunction of the scrub ioctl actually work. This means that we make the IFLAG_REPAIR flag to the scrub ioctl actually do something, and we add an errortag knob so that xfstests can force the kernel to rebuild a metadata structure even if there's nothing wrong with it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-05-15 18:12:50 -07:00
Darrick J. Wong	718fa74b15	xfs: create tracepoints for online repair These tracepoints will be used to debug the online repair routines. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-05-15 18:12:50 -07:00
Darrick J. Wong	7644bd988d	xfs: teach xfs_bmapi_remap to accept some bmapi flags Teach xfs_bmapi_remap how to map in unwritten extent and to skip rmap updates. This enables us to rebuild real and unwritten extents from the rmapbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 18:12:50 -07:00
Darrick J. Wong	7cf199ba5a	xfs: make xfs_bmapi_remapi work with attribute forks Add a new flags argument to xfs_bmapi_remapi so that we can pass BMAPI flags into the function. This enables us to pass in BMAPI_ATTRFORK so that we can remap things into the attribute fork. Eventually the online repair code will use this to rebuild attribute forks, so make it non-static. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 18:12:50 -07:00
Darrick J. Wong	9f3a080ef1	xfs: hoist xfs_scrub_agfl_walk to libxfs as xfs_agfl_walk This function is basically a generic AGFL block iterator, so promote it to libxfs ahead of online repair wanting to use it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 18:12:50 -07:00
Darrick J. Wong	ddd10c2fe2	xfs: avoid ABBA deadlock when scrubbing parent pointers In normal operation, the XFS convention is to take an inode's iolock and then allocate a transaction. However, when scrubbing parent inodes this is inverted -- we allocated the transaction to do the scrub, and now we're trying to grab the parent's iolock. This can lead to ABBA deadlocks: some thread grabbed the parent's iolock and is waiting for space for a transaction while our parent scrubber is sitting on a transaction trying to get the parent's iolock. Therefore, convert all iolock attempts to use trylock; if that fails, they can use the existing mechanisms to back off and try again. The ABBA deadlock didn't happen with a non-repair scrub because the transactions don't reserve any space, but repair scrubs require reservation in order to update metadata. However, any other concurrent metadata update (e.g. directory create in the parent) could also induce this deadlock with the parent scrubber. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 18:12:50 -07:00
Darrick J. Wong	517b32b7fa	xfs: scrub the data fork of the realtime inodes The realtime bitmap and summary inodes live on the metadata device, so we can scrub their data forks with the regular scrubbers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 18:12:50 -07:00
Darrick J. Wong	87d9d609c2	xfs: quota scrub should use bmapbtd scrubber Replace the quota scrubber's open-coded data fork scrubber with a redirected call to the bmapbtd scrubber. This strengthens the quota scrub to include all the cross-referencing that it does. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 18:12:50 -07:00
Darrick J. Wong	8bc763c24d	xfs: don't continue scrub if already corrupt If we've already decided that something is corrupt, we might as well abort all the loops and exit as quickly as possible. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 18:12:50 -07:00
Darrick J. Wong	eac69e1676	xfs: refactor quota limits initialization Replace all the if (!error) weirdness with helper functions that follow our regular coding practices, and factor out the ternary expression soup. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	689e11c84b	xfs: superblock scrub should use short-lived buffers Secondary superblocks are rarely used, so create a helper to read a given non-primary AG's superblock and ensure that it won't stick around hogging memory. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	8389f3ffa2	xfs: skip scrub xref if corruption already noted Don't bother looking for cross-referencing problems if the metadata is already corrupt or we've already found a cross-referencing problem. Since we added a helper function for flags testing, convert existing users to use it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 17:57:05 -07:00
Dave Chinner	c9fbd7bbc2	xfs: clear sb->s_fs_info on mount failure We recently had an oops reported on a 4.14 kernel in xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage and so the m_perag_tree lookup walked into lala land. Essentially, the machine was under memory pressure when the mount was being run, xfs_fs_fill_super() failed after allocating the xfs_mount and attaching it to sb->s_fs_info. It then cleaned up and freed the xfs_mount, but the sb->s_fs_info field still pointed to the freed memory. Hence when the superblock shrinker then ran it fell off the bad pointer. With the superblock shrinker problem fixed at teh VFS level, this stale s_fs_info pointer is still a problem - we use it unconditionally in ->put_super when the superblock is being torn down, and hence we can still trip over it after a ->fill_super call failure. Hence we need to clear s_fs_info if xfs-fs_fill_super() fails, and we need to check if it's valid in the places it can potentially be dereferenced after a ->fill_super failure. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-15 17:57:05 -07:00
Dave Chinner	dae5cd8118	xfs: add mount delay debug option Similar to log_recovery_delay, this delay occurs between the VFS superblock being initialised and the xfs_mount being fully initialised. It also poisons the per-ag radix tree node so that it can be used for triggering shrinker races during mount such as the following: <run memory pressure workload in background> $ cat dirty-mount.sh #! /bin/bash umount -f /dev/pmem0 mkfs.xfs -f /dev/pmem0 mount /dev/pmem0 /mnt/test rm -f /mnt/test/foo xfs_io -fxc "pwrite 0 4k" -c fsync -c "shutdown" /mnt/test/foo umount /dev/pmem0 # let's crash it now! echo 30 > /sys/fs/xfs/debug/mount_delay mount /dev/pmem0 /mnt/test echo 0 > /sys/fs/xfs/debug/mount_delay umount /dev/pmem0 $ sudo ./dirty-mount.sh ..... [ 60.378118] CPU: 3 PID: 3577 Comm: fs_mark Tainted: G D W 4.16.0-rc5-dgc #440 [ 60.378120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 60.378124] RIP: 0010:radix_tree_next_chunk+0x76/0x320 [ 60.378127] RSP: 0018:ffffc9000276f4f8 EFLAGS: 00010282 [ 60.383670] RAX: a5a5a5a5a5a5a5a4 RBX: 0000000000000010 RCX: 000000000000001a [ 60.385277] RDX: 0000000000000000 RSI: ffffc9000276f540 RDI: 0000000000000000 [ 60.386554] RBP: 0000000000000000 R08: 0000000000000000 R09: a5a5a5a5a5a5a5a5 [ 60.388194] R10: 0000000000000006 R11: 0000000000000001 R12: ffffc9000276f598 [ 60.389288] R13: 0000000000000040 R14: 0000000000000228 R15: ffff880816cd6458 [ 60.390827] FS: 00007f5c124b9740(0000) GS:ffff88083fc00000(0000) knlGS:0000000000000000 [ 60.392253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 60.393423] CR2: 00007f5c11bba0b8 CR3: 000000035580e001 CR4: 00000000000606e0 [ 60.394519] Call Trace: [ 60.395252] radix_tree_gang_lookup_tag+0xc4/0x130 [ 60.395948] xfs_perag_get_tag+0x37/0xf0 [ 60.396522] xfs_reclaim_inodes_count+0x32/0x40 [ 60.397178] xfs_fs_nr_cached_objects+0x11/0x20 [ 60.397837] super_cache_count+0x35/0xc0 [ 60.399159] shrink_slab.part.66+0xb1/0x370 [ 60.400194] shrink_node+0x7e/0x1a0 [ 60.401058] try_to_free_pages+0x199/0x470 [ 60.402081] __alloc_pages_slowpath+0x3a1/0xd20 [ 60.403729] __alloc_pages_nodemask+0x1c3/0x200 [ 60.404941] cache_grow_begin+0x20b/0x2e0 [ 60.406164] fallback_alloc+0x160/0x200 [ 60.407088] kmem_cache_alloc+0x111/0x4e0 [ 60.408038] ? xfs_buf_rele+0x61/0x430 [ 60.408925] kmem_zone_alloc+0x61/0xe0 [ 60.409965] xfs_inode_alloc+0x24/0x1d0 ..... Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-15 17:57:05 -07:00
Brian Foster	4e529339af	xfs: factor out nodiscard helpers The changes to skip discards of speculative preallocation and unwritten extents introduced several new wrapper functions through the bunmapi -> extent free codepath to reduce churn in all of the associated callers. In several cases, these wrappers simply toggle a single flag to skip or not skip discards for the resulting blocks. The explicit _nodiscard() wrappers for such an isolated set of callers is a bit overkill. Kill off these wrappers and replace with the calls to the underlying functions in the contexts that need to control discard behavior. Retain the wrappers that preserve the original calling conventions to serve the original purpose of reducing code churn. This is a refactoring patch and does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	67482129cd	iomap: add a swapfile activation function Add a new iomap_swapfile_activate function so that filesystems can activate swap files without having to use the obsolete and slow bmap function. This enables XFS to support fallocate'd swap files and swap files on realtime devices. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	d6b636ebb1	xfs: halt auto-reclamation activities while rebuilding rmap Rebuilding the reverse-mapping tree requires us to quiesce all inodes in the filesystem, so we must stop background reclamation of post-EOF and CoW prealloc blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	95eb308caa	xfs: add BMAPI_NORMAP flag to perform block remapping without updating rmapbt Add a new flag, XFS_BMAPI_NORMAP, which will perform file block remapping without updating the rmapbt. This will be used by the repair code to reconstruct bmbts from the rmapbt, in which case we don't want the rmapbt update. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	08daa3ccf5	xfs: add repair helpers for the reference count btree Add a couple of functions to the refcount btree and generic btree code that will be used to repair the refcountbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	4d4f86b49f	xfs: add repair helpers for the reverse mapping btree Add a couple of functions to the reverse mapping btree that will be used to repair the rmapbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	7f8f1313d9	xfs: expose various functions to repair code Expose various helpers that the repair code will want to use. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	14861c4740	xfs: add helpers to calculate btree size Add a bunch of helper functions that calculate the sizes of various btrees. These will be used to repair btrees and btree headers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	9d9c90286a	xfs: refactor scrub transaction allocation function Since the transaction allocation helper is about to become more complex, move it to common.c and remove the redundant parameters. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	08a3a692ef	xfs: btree scrub should check minrecs Strengthen the btree block header checks to detect the number of records being less than the btree type's minimum record count. Certain blocks are allowed to violate this constraint -- specifically any btree block at the top of the tree can have fewer than minrecs records. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	631fc955bd	xfs: clean up scrub usage of KM_NOFS All scrub code runs in transaction context, which means that memory allocations are automatically run in PF_MEMALLOC_NOFS context. It's therefore unnecessary to pass in KM_NOFS to allocation routines, so clean them all out. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	eb41c93fef	xfs: avoid ilock games in the quota scrubber Refactor the quota scrubber to take the quotaofflock and grab the quota inode in the setup function so that we can treat quota in the same "scrub in the context of this inode" (i.e. sc->ip) manner as we treat any other inode. We do have to drop the quota inode's ILOCK_EXCL to use dqiterate, but since dquots have their own individual locks the ILOCK wasn't helping us anyway. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-15 17:57:05 -07:00
Darrick J. Wong	554ba96540	xfs: refactor dquot iteration Create a helper function to iterate all the dquots of a given type in the system, and refactor the dquot scrub to use it. This will get more use in the quota repair code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-05-15 17:56:59 -07:00
Darrick J. Wong	28b9060bd8	xfs: rename on-disk dquot counter zap functions The function 'xfs_qm_dqiterate' doesn't iterate dquots at all, it iterates all dquot blocks of a quota inode and clears the counters. Therefore, change the name to something more descriptive so that we can introduce a real dquot iterator later. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-05-10 08:56:48 -07:00
Darrick J. Wong	30ab2dcf2c	xfs: replace XFS_QMOPT_DQALLOC with a simple boolean DQALLOC is only ever used with xfs_qm_dqget*, and the only flag that the _dqget family of functions cares about is DQALLOC. Therefore, change it to a boolean 'can alloc?' flag for the dqget interfaces where that makes sense. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-10 08:56:48 -07:00
Darrick J. Wong	114e73ccfa	xfs: remove direct calls to _qm_dqread The quota initialization code needs an "uncached" variant of _dqget to read in default quota limits and timers before the dquot cache is fully set up. We've already split up _dqget into its component pieces so create a fourth variant to address this need, and make dqread internal to xfs_dquot.c again. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-05-10 08:56:48 -07:00
Darrick J. Wong	d63192c890	xfs: refactor xfs_qm_dqtobp and xfs_qm_dqalloc Separate the disk dquot read and allocation functionality into two helper functions, then refactor dqread to call them directly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-10 08:56:48 -07:00
Darrick J. Wong	617cd5c12c	xfs: refactor incore dquot initialization functions Create two incore dquot initialization functions that will help us to disentangle dqget and dqread. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-05-10 08:56:48 -07:00
Darrick J. Wong	0fcef1270f	xfs: fetch dquots directly during quotacheck Quotacheck only runs during mount, which means that there are no other processes in the system that could be doing chown or chproj. Therefore there's no potential for racing to attach dquots to the inode so we can drop all the ILOCK and race detection bits from quotacheck. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-05-10 08:56:48 -07:00
Darrick J. Wong	4882c19d2a	xfs: split out dqget for inodes from regular dqget There are two uses of dqget here -- one is to return the dquot for a given type and id, and the other is to return the dquot for a given type and inode. Those are two separate things, so split them into two smaller functions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-05-10 08:56:48 -07:00
Darrick J. Wong	c14cfccabe	xfs: remove unnecessary xfs_qm_dqattach parameter The flags argument is always zero, get rid of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-05-10 08:56:47 -07:00
Darrick J. Wong	d7103eeb00	xfs: delegate dqget input checks to helper function Move the dqget input checks to a separate function in preparation for splitting up the dqget functionality. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-05-10 08:56:47 -07:00
Darrick J. Wong	cc2047c4d0	xfs: refactor dquot cache handling Delegate the dquot cache handling (radix tree lookup and insertion) to separate helper functions so that we can continue to simplify the body of dqget. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-05-10 08:56:47 -07:00
Darrick J. Wong	2e330e76e0	xfs: refactor XFS_QMOPT_DQNEXT out of existence There's only one caller of DQNEXT and its semantics can be moved into a separate function, so create the function and get rid of the flag. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-05-10 08:56:47 -07:00
Darrick J. Wong	609001bca4	xfs: don't spray logs when dquot flush/purge fail When dquot flush or purge fail there's no need to spam the logs, we've already logged the IO error or fs shutdown that caused the flush failures. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-10 08:56:47 -07:00
Darrick J. Wong	7b6b50f55c	xfs: release new dquot buffer on defer_finish error In commit `efa092f3d4` "[XFS] Fixes a bug in the quota code when allocating a new dquot record", we allocate a new dquot block, grab a buffer to initialize it, and return the locked initialized dquot buffer to the caller for further in-core dquot initialization. Unfortunately, if the _bmap_finish errored out, _qm_dqalloc would also error out without bothering to free the (locked) buffer. Leaking a locked buffer caused hangs in generic/388 when quotas are enabled. Furthermore, the _bmap_finish -> _defer_finish conversion in `310a75a3c6` ("xfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_*") failed to observe that the buffer was held going into _defer_finish and therefore failed to notice that the buffer lock is /not/ maintained afterwards. Now that we can bjoin a buffer to a defer_ops, use this mechanism to ensure that the buffer stays locked across the _defer_finish. Release the holds and locks on the buffer as appropriate if we have to error out. There is a subtlety here for the caller in that the buffer emerges locked and held to the transaction, so if the _trans_commit fails we have to release the buffer explicitly. This fixes the unmount hang in generic/388 when quotas are enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-05-10 08:56:47 -07:00
Brian Foster	84ca484ecf	xfs: don't discard on free of unwritten extents Unwritten extents by definition have not been written to until they are converted to normal written extents. If unwritten extents are freed from a file, it is therefore guaranteed that the blocks have not been written to since allocation (note that zero range punches and reallocates blocks). To cut down on online discards generated from workloads that make use of preallocation, skip discards of extents if they are in the unwritten state when the extent is freed. Note that this optimization does not apply to log recovery, during which all freed extents are discarded if online discard is enabled. Also note that it may be possible for a filesystem crash to occur after write completion of an unwritten extent but before unwritten conversion such that the extent remains unwritten after log recovery. Since this pseudo-inconsistency may already be possible after a crash (consider writing to recently allocated blocks where the allocation transaction is lost after a crash), this change shouldn't introduce any fundamental limitations that don't already exist. In short, on storage stacks where discards are important, it's good practice to run an occasional fstrim even with online discard enabled in the filesystem, particularly after a crash. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-10 08:56:47 -07:00
Brian Foster	13b86fc337	xfs: skip online discard during eofblocks trims We've had reports of online discard operations being sent from XFS on write-only workloads. These discards occur as a result of eofblocks trims that can occur after a large file copy completes. These discards are slightly confusing for users who might be paying close attention to online discards (i.e., vdo) due to performance sensitivity. They also happen to be spurious because freed post-eof blocks by definition have not been written to during the current allocation cycle. Update xfs_free_eofblocks() to skip discards that are purely attributed to eofblocks trims. This cuts down the number of spurious discards that may occur on write-only workloads due to normal preallocation activity. Note that discards of post-eof extents can still occur from other codepaths that do not isolate handling of post-eof blocks from those within eof. For example, file unlinks and truncates may still cause discards for any file blocks affected by the operation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-10 08:56:47 -07:00
Brian Foster	fcb762f5de	xfs: add bmapi nodiscard flag Freed extents are unconditionally discarded when online discard is enabled. Define XFS_BMAPI_NODISCARD to allow callers to bypass discards when unnecessary. For example, this will be useful for eofblocks trimming. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-10 08:56:46 -07:00
Dave Chinner	e6631f8554	xfs: get rid of the log item descriptor It's just a connector between a transaction and a log item. There's a 1:1 relationship between a log item descriptor and a log item, and a 1:1 relationship between a log item descriptor and a transaction. Both relationships are created and terminated at the same time, so why do we even have the descriptor? Replace it with a specific list_head in the log item and a new log item dirtied flag to replace the XFS_LID_DIRTY flag. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [darrick: fix up deferred agfl intent finish_item use of LID_DIRTY] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-10 08:56:46 -07:00
Dave Chinner	1a2ebf835a	xfs: add some more debug checks to buffer log item reuse Just to make sure the item isn't associated with another transaction when we try to reuse it. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-10 08:56:46 -07:00
Dave Chinner	844e5e74c1	xfs: fix double ijoin in xfs_reflink_clear_inode_flag() xfs_reflink_clear_inode_flag double-joins an inode to a transaction, which is not allowed. Fix that and document that the caller must have already joined it. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> [darrick: edit out trace for nonexistent ASSERT] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-10 08:56:46 -07:00
Dave Chinner	c5295c6aad	xfs: fix double ijoin in xfs_reflink_cancel_cow_range xfs_reflink_cancel_cow_range joins an inode twice to the same transaction. This is not allowed, so fix it and document that the callers of xfs_reflink_cancel_cow_blocks() must have already joined the inode to the permanent transaction passed in. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> [darrick: edited the commit log to remove trace for nonexistent ASSERT] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-10 08:56:46 -07:00

1 2 3 4 5 ...

53545 commits