linux-stable/fs/ext4
Ritesh Harjani 07b5b8e1ac ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
There could be a race in function ext4_mb_discard_group_preallocations()
where the 1st thread may iterate through group's bb_prealloc_list and
remove all the PAs and add to function's local list head.
Now if the 2nd thread comes in to discard the group preallocations,
it will see that the group->bb_prealloc_list is empty and will return 0.

Consider for a case where we have less number of groups
(for e.g. just group 0),
this may even return an -ENOSPC error from ext4_mb_new_blocks()
(where we call for ext4_mb_discard_group_preallocations()).
But that is wrong, since 2nd thread should have waited for 1st thread
to release all the PAs and should have retried for allocation.
Since 1st thread was anyway going to discard the PAs.

The algorithm using this percpu seq counter goes below:
1. We sample the percpu discard_pa_seq counter before trying for block
   allocation in ext4_mb_new_blocks().
2. We increment this percpu discard_pa_seq counter when we either allocate
   or free these blocks i.e. while marking those blocks as used/free in
   mb_mark_used()/mb_free_blocks().
3. We also increment this percpu seq counter when we successfully identify
   that the bb_prealloc_list is not empty and hence proceed for discarding
   of those PAs inside ext4_mb_discard_group_preallocations().

Now to make sure that the regular fast path of block allocation is not
affected, as a small optimization we only sample the percpu seq counter
on that cpu. Only when the block allocation fails and when freed blocks
found were 0, that is when we sample percpu seq counter for all cpus using
below function ext4_get_discard_pa_seq_sum(). This happens after making
sure that all the PAs on grp->bb_prealloc_list got freed or if it's empty.

It can be well argued that why don't just check for grp->bb_free to
see if there are any free blocks to be allocated. So here are the two
concerns which were discussed:-

1. If for some reason the blocks available in the group are not
   appropriate for allocation logic (say for e.g.
   EXT4_MB_HINT_GOAL_ONLY, although this is not yet implemented), then
   the retry logic may result into infinte looping since grp->bb_free is
   non-zero.

2. Also before preallocation was clubbed with block allocation with the
   same ext4_lock_group() held, there were lot of races where grp->bb_free
   could not be reliably relied upon.
Due to above, this patch considers discard_pa_seq logic to determine if
we should retry for block allocation. Say if there are are n threads
trying for block allocation and none of those could allocate or discard
any of the blocks, then all of those n threads will fail the block
allocation and return -ENOSPC error. (Since the seq counter for all of
those will match as no block allocation/discard was done during that
duration).

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/7f254686903b87c419d798742fd9a1be34f0657b.1589955723.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:53 -04:00
..
acl.c ext4: handle ext4_mark_inode_dirty errors 2020-06-03 23:16:50 -04:00
acl.h ext4: fix up remaining files with SPDX cleanups 2017-12-17 22:00:59 -05:00
balloc.c ext4: balloc: use task_pid_nr() helper 2020-06-03 23:16:52 -04:00
bitmap.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
block_validity.c ext4: save all error info in save_error_info() and drop ext4_set_errno() 2020-04-01 17:29:06 -04:00
dir.c ext4: use flexible-array member in struct fname 2020-03-14 14:43:13 -04:00
ext4.h ext4: add casefold flag to EXT4_INODE_* flags 2020-06-03 23:16:53 -04:00
ext4_extents.h ext4: fix EXT_MAX_EXTENT/INDEX to check for zeroed eh_max 2020-06-03 23:16:49 -04:00
ext4_jbd2.c ext4: remove set but not used variable 'es' in ext4_jbd2.c 2020-04-15 23:58:49 -04:00
ext4_jbd2.h ext4: handle ext4_mark_inode_dirty errors 2020-06-03 23:16:50 -04:00
extents.c ext4: rework map struct instantiation in ext4_ext_map_blocks() 2020-06-03 23:16:53 -04:00
extents_status.c ext4: remove unnecessary comparisons to bool 2020-06-03 23:16:49 -04:00
extents_status.h ext4: fix extent_status trace points 2020-01-25 02:03:03 -05:00
file.c ext4: handle ext4_mark_inode_dirty errors 2020-06-03 23:16:50 -04:00
fsmap.c ext4: fix miscellaneous sparse warnings 2019-05-12 04:49:47 -04:00
fsmap.h ext4: fix up remaining files with SPDX cleanups 2017-12-17 22:00:59 -05:00
fsync.c ext4: fix race between ext4_sync_parent() and rename() 2020-06-03 23:16:51 -04:00
hash.c ext4: fix kernel oops caused by spurious casefold flag 2019-09-03 01:43:17 -04:00
ialloc.c ext4: fix buffer_head refcnt leak when ext4_iget() fails 2020-06-03 23:16:49 -04:00
indirect.c ext4: handle ext4_mark_inode_dirty errors 2020-06-03 23:16:50 -04:00
inline.c ext4: handle ext4_mark_inode_dirty errors 2020-06-03 23:16:50 -04:00
inode-test.c kunit: allow kunit tests to be loaded as a module 2020-01-09 16:42:29 -07:00
inode.c ext4: make ext_debug() implementation to use pr_debug() 2020-06-03 23:16:52 -04:00
ioctl.c 1) Replace ext4's bmap and iopoll implementations to use iomap. 2020-04-05 10:54:03 -07:00
Kconfig ext4: make ext_debug() implementation to use pr_debug() 2020-06-03 23:16:52 -04:00
Makefile kunit: allow kunit tests to be loaded as a module 2020-01-09 16:42:29 -07:00
mballoc.c ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling 2020-06-03 23:16:53 -04:00
mballoc.h ext4: mballoc: make mb_debug() implementation to use pr_debug() 2020-06-03 23:16:52 -04:00
migrate.c ext4: handle ext4_mark_inode_dirty errors 2020-06-03 23:16:50 -04:00
mmp.c ext4: save all error info in save_error_info() and drop ext4_set_errno() 2020-04-01 17:29:06 -04:00
move_extent.c ext4: save all error info in save_error_info() and drop ext4_set_errno() 2020-04-01 17:29:06 -04:00
namei.c ext4: handle ext4_mark_inode_dirty errors 2020-06-03 23:16:50 -04:00
page-io.c fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t 2020-03-28 13:21:08 +01:00
readpage.c ext4: remove unneeded check for error allocating bio_post_read_ctx 2020-01-17 16:24:54 -05:00
resize.c ext4: fix potential race between s_flex_groups online resizing and access 2020-02-21 19:31:46 -05:00
super.c ext4: handle ext4_mark_inode_dirty errors 2020-06-03 23:16:50 -04:00
symlink.c ext4: switch to fscrypt_get_symlink() 2018-01-11 22:10:40 -05:00
sysfs.c block: move the part_stat* helpers from genhd.h to a new header 2020-03-25 09:50:09 -06:00
truncate.h ext4: handle layout changes to pinned DAX mappings 2018-07-29 17:00:22 -04:00
verity.c fs-verity: implement readahead of Merkle tree pages 2020-01-14 13:27:32 -08:00
xattr.c ext4: handle ext4_mark_inode_dirty errors 2020-06-03 23:16:50 -04:00
xattr.h ext4: use flexible-array member for xattr structs 2020-03-14 14:43:13 -04:00
xattr_security.c ext4: use XATTR_CREATE in ext4_initxattrs() 2018-05-10 11:52:14 -04:00
xattr_trusted.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xattr_user.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00