Although subpage itself is conflicting with higher folio, since subpage
(sectorsize < PAGE_SIZE and nodesize < PAGE_SIZE) means we will never
need higher order folio, there is a hidden pitfall:
- btrfs_page_*() helpers
Those helpers are an abstraction to handle both subpage and non-subpage
cases, which means we're going to pass pages pointers to those helpers.
And since those helpers are shared between data and metadata paths, it's
unavoidable to let them to handle folios, including higher order
folios).
Meanwhile for true subpage case, we should only have a single page
backed folios anyway, thus add a new ASSERT() for btrfs_subpage_assert()
to ensure that.
Also since those helpers are shared between both data and metadata, add
some extra ASSERT()s for data path to make sure we only get single page
backed folio for now.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The root argument for btrfs_update_inode() always matches the root of the
given inode, so remove the root argument and get it from the inode
argument.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When marking an extent buffer as dirty, at btrfs_mark_buffer_dirty(),
we check if its generation matches the running transaction and if not we
just print a warning. Such mismatch is an indicator that something really
went wrong and only printing a warning message (and stack trace) is not
enough to prevent a corruption. Allowing a transaction to commit with such
an extent buffer will trigger an error if we ever try to read it from disk
due to a generation mismatch with its parent generation.
So abort the current transaction with -EUCLEAN if we notice a generation
mismatch. For this we need to pass a transaction handle to
btrfs_mark_buffer_dirty() which is always available except in test code,
in which case we can pass NULL since it operates on dummy extent buffers
and all test roots have a single node/leaf (root node at level 0).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This simply sends the same arguments into crc32c(), and is just used in
a few places. Remove this wrapper and directly call crc32c() in these
instances.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is the only place this helper is used, take it out of ctree.h and
move it into free-space-cache.c.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that we switched to write time activation, we no longer need to (and
must not) count the fresh region as zone unusable. This commit is similar
to revert of commit fa2068d7e9 ("btrfs: zoned: count fresh BG
region as zone unusable").
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When dumping free space, with btrfs_dump_free_space(), we pass a bytes
argument in order to count how many free space entries in the block group
have a size greater than or equal to that number of bytes. We then print
how many suitable entries we found, but we don't print the target number
of bytes, we just say "bytes". Change the message to actually print the
number of bytes, which makes debugging -ENOSPC issues a bit easier.
Also sligthly change the odd grammar and terminology: the sentence is
ending with 'is', which doesn't make sense, and the term 'blocks' is
confusing as we are referring to free space entries within the block
group's free space cache.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently find_first_extent_bit() returns a 0 if it found a range in the
given io tree and 1 if it didn't find any. There's no need to return any
errors, so make the return value a boolean and invert the logic to make
more sense: return true if it found a range and false if it didn't find
any range.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is completely related to block rsv's, move it out of the free space
cache code and into block-rsv.c.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Removing a free space entry from an in memory space cache requires having
the corresponding btrfs_free_space_ctl's 'tree_lock' held. We have several
code paths that remove an entry, so add assertions where appropriate to
verify we are holding the lock, as the lock is acquired by some other
function up in the call chain, which makes it easy to miss in the future.
Note: for this to work we need to lock the local btrfs_free_space_ctl at
load_free_space_cache(), which was not being done because it's local,
declared on the stack, so no other task has access to it.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When linking a free space entry, at link_free_space(), the caller should
be holding the spinlock 'tree_lock' of the given btrfs_free_space_ctl
argument, which is necessary for manipulating the red black tree of free
space entries (done by tree_insert_offset(), which already asserts the
lock is held) and for manipulating the 'free_space', 'free_extents',
'discardable_extents' and 'discardable_bytes' counters of the given
struct btrfs_free_space_ctl.
So assert that the spinlock 'tree_lock' of the given btrfs_free_space_ctl
is held by the current task. We have multiple code paths that end up
calling link_free_space(), and all currently take the lock before calling
it.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When searching for a free space entry by offset, at tree_search_offset(),
we are supposed to have the btrfs_free_space_ctl's 'tree_lock' held, so
assert that. We have multiple callers of tree_search_offset(), and all
currently hold the necessary lock before calling it.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are multiple code paths leading to tree_insert_offset(), and each
path takes the necessary locks before tree_insert_offset() is called,
since they do other things that require those locks to be held. This makes
it easy to miss the locking somewhere, so make tree_insert_offset() assert
that the required locks are being held by the calling task.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For the in-memory component of space caching (free space cache and free
space tree), three of the arguments passed to tree_insert_offset() can
always be taken from the new free space entry that we are about to add.
So simplify tree_insert_offset() to take the new entry instead of the
'offset', 'node' and 'bitmap' arguments. This will also allow to make
further changes simpler.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The are two computations of end offsets at do_trimming() that are not
necessary, as they were previously computed and stored in local const
variables. So just use the variables instead, to make the source code
shorter and easier to read.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At try_merge_free_space(), avoid calling twice rb_prev() to find the
previous node, as that requires looping through the red black tree, so
store the result of the rb_prev() call and then use it.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At copy_free_space_cache(), we add a new entry to the block group's ctl
before we free the entry from the temporary ctl. Adding a new entry
requires the allocation of a new struct btrfs_free_space, so we can
avoid a temporary extra allocation by freeing the entry from the
temporary ctl before we add a new entry to the main ctl, which possibly
also reduces the chances for a memory allocation failure in case of very
high memory pressure. So just do that.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When loading a free space cache from disk, at __load_free_space_cache(),
if we fail to insert a bitmap entry, we still increment the number of
total bitmaps in the btrfs_free_space_ctl structure, which is incorrect
since we failed to add the bitmap entry. On error we then empty the
cache by calling __btrfs_remove_free_space_cache(), which will result
in getting the total bitmaps counter set to 1.
A failure to load a free space cache is not critical, so if a failure
happens we just rebuild the cache by scanning the extent tree, which
happens at block-group.c:caching_thread(). Yet the failure will result
in having the total bitmaps of the btrfs_free_space_ctl always bigger
by 1 then the number of bitmap entries we have. So fix this by having
the total bitmaps counter be incremented only if we successfully added
the bitmap entry.
Fixes: a67509c300 ("Btrfs: add a io_ctl struct and helpers for dealing with the space cache")
Reviewed-by: Anand Jain <anand.jain@oracle.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The naming of space_info->active_total_bytes is misleading. It counts
not only active block groups but also full ones which are previously
active but now inactive. That confusion results in a bug not counting
the full BGs into active_total_bytes on mount time.
For a background, there are three kinds of block groups in terms of
activation.
1. Block groups never activated
2. Block groups currently active
3. Block groups previously active and currently inactive (due to fully
written or zone finish)
What we really wanted to exclude from "total_bytes" is the total size of
BGs #1. They seem empty and allocatable but since they are not activated,
we cannot rely on them to do the space reservation.
And, since BGs #1 never get activated, they should have no "used",
"reserved" and "pinned" bytes.
OTOH, BGs #3 can be counted in the "total", since they are already full
we cannot allocate from them anyway. For them, "total_bytes == used +
reserved + pinned + zone_unusable" should hold.
Tracking #2 and #3 as "active_total_bytes" (current implementation) is
confusing. And, tracking #1 and subtract that properly from "total_bytes"
every time you need space reservation is cumbersome.
Instead, we can count the whole region of a newly allocated block group as
zone_unusable. Then, once that block group is activated, release
[0 .. zone_capacity] from the zone_unusable counters. With this, we can
eliminate the confusing ->active_total_bytes and the code will be common
among regular and the zoned mode. Also, no additional counter is needed
with this approach.
Fixes: 6a921de589 ("btrfs: zoned: introduce space_info->active_total_bytes")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function is for internal interfaces so we should use the
btrfs_inode.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The div_factor* helpers calculate fraction or percentage fraction. The
name is a bit confusing, we use it only for percentage calculations and
there are two helpers.
There's a helper mult_frac that's for general fractions, that tries to
be accurate but we multiply and divide by small numbers so we can use
the div_u64 helper.
Rename the div_factor* helpers and use 1..100 percentage range, also drop
the case checking for percentage == 100, it's never hit.
The conversions:
* div_factor calculates tenths and the numbers need to be adjusted
* div_factor_fine is direct replacement
Signed-off-by: David Sterba <dsterba@suse.com>
This will make syncing fs.h to user space a little easier if we can pull
the super block specific helpers out of fs.h and put them in super.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Move these out of ctree.h into file.h to cut down on code in ctree.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Move these prototypes out of ctree.h and into file-item.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Update, reformat or reword function comments. This also removes the kdoc
marker so we don't get reports when the function name is missing.
Changes made:
- remove kdoc markers
- reformat the brief description to be a proper sentence
- reword to imperative voice
- align parameter list
- fix typos
Signed-off-by: David Sterba <dsterba@suse.com>
This is a large patch, but because they're all macros it's impossible to
split up. Simply copy all of the item accessors in ctree.h and paste
them in accessors.h, and then update any files to include the header so
everything compiles.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ reformat comments, style fixups ]
Signed-off-by: David Sterba <dsterba@suse.com>
We're going to use fs.h to hold fs wide related helpers and definitions,
move the FS_STATE enum and related helpers to fs.h, and then update all
files that need these definitions to include fs.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a bunch of printk helpers that are in ctree.h. These have
nothing to do with ctree.c, so move them into their own header.
Subsequent patches will cleanup the printk helpers.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is local to the free-space-cache.c code, remove it from ctree.h and
inode.c, create new init/exit functions for the cachep, and move it
locally to free-space-cache.c.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function btrfs_drop_extent_cache() doesn't really belong at file.c
because what it does is drop a range of extent maps for a file range.
It directly allocates and manipulates extent maps, by dropping,
splitting and replacing them in an extent map tree, so it should be
located at extent_map.c, where all manipulations of an extent map tree
and its extent maps are supposed to be done.
So move it out of file.c and into extent_map.c. Additionally do the
following changes:
1) Rename it into btrfs_drop_extent_map_range(), as this makes it more
clear about what it does. The term "cache" is a bit confusing as it's
not widely used, "extent maps" or "extent mapping" is much more common;
2) Change its 'skip_pinned' argument from int to bool;
3) Turn several of its local variables from int to bool, since they are
used as booleans;
4) Move the declaration of some variables out of the function's main
scope and into the scopes where they are used;
5) Remove pointless assignment of false to 'modified' early in the while
loop, as later that variable is set and it's not used before that
second assignment;
6) Remove checks for NULL before calling free_extent_map().
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of taking up a whole argument to indicate we're clearing
everything in a range, simply add another EXTENT bit to control this,
and then update all the callers to drop this argument from the
clear_extent_bit variants.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have two variants of lock/unlock extent, one set that takes a cached
state, another that does not. This is slightly annoying, and generally
speaking there are only a few places where we don't have a cached state.
Simplify this by making lock_extent/unlock_extent the only variant and
make it take a cached state, then convert all the callers appropriately.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is only used in the case that we are clearing EXTENT_LOCKED, so
infer this value from the bits passed in instead of taking it as an
argument.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are calling __btrfs_remove_free_space_cache everywhere to cleanup the
block group free space, however we can just use
btrfs_remove_free_space_cache and pass in the block group in all of
these places. Then we can remove __btrfs_remove_free_space_cache and
rename __btrfs_remove_free_space_cache_locked to
__btrfs_remove_free_space_cache.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that lockdep is staying enabled through our entire CI runs I started
seeing the following stack in generic/475
------------[ cut here ]------------
WARNING: CPU: 1 PID: 2171864 at fs/btrfs/discard.c:604 btrfs_discard_update_discardable+0x98/0xb0
CPU: 1 PID: 2171864 Comm: kworker/u4:0 Not tainted 5.19.0-rc8+ #789
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: btrfs-cache btrfs_work_helper
RIP: 0010:btrfs_discard_update_discardable+0x98/0xb0
RSP: 0018:ffffb857c2f7bad0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8c85c605c200 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffff86807c5b RDI: ffffffff868a831e
RBP: ffff8c85c4c54000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8c85c66932f0 R11: 0000000000000001 R12: ffff8c85c3899010
R13: ffff8c85d5be4f40 R14: ffff8c85c4c54000 R15: ffff8c86114bfa80
FS: 0000000000000000(0000) GS:ffff8c863bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2e7f168160 CR3: 000000010289a004 CR4: 0000000000370ee0
Call Trace:
__btrfs_remove_free_space_cache+0x27/0x30
load_free_space_cache+0xad2/0xaf0
caching_thread+0x40b/0x650
? lock_release+0x137/0x2d0
btrfs_work_helper+0xf2/0x3e0
? lock_is_held_type+0xe2/0x140
process_one_work+0x271/0x590
? process_one_work+0x590/0x590
worker_thread+0x52/0x3b0
? process_one_work+0x590/0x590
kthread+0xf0/0x120
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
This is the code
ctl = block_group->free_space_ctl;
discard_ctl = &block_group->fs_info->discard_ctl;
lockdep_assert_held(&ctl->tree_lock);
We have a temporary free space ctl for loading the free space cache in
order to avoid having allocations happening while we're loading the
cache. When we hit an error we free it all up, however this also calls
btrfs_discard_update_discardable, which requires
block_group->free_space_ctl->tree_lock to be held. However this is our
temporary ctl so this lock isn't held. Fix this by calling
__btrfs_remove_free_space_cache_locked instead so that we only clean up
the entries and do not mess with the discardable stats.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We use a bit field in the btrfs_block_group for different flags, however
this is awkward because we have to hold the block_group->lock for any
modification of any of these fields, and makes the code clunky for a few
of these flags. Convert these to a properly flags setup so we can
utilize the bit helpers.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Reinitialize the class of the lockdep map for struct inode's
mapping->invalidate_lock in load_free_space_cache() function in
fs/btrfs/free-space-cache.c. This will prevent lockdep from producing
false positives related to execution paths that make use of free space
inodes and paths that make use of normal inodes.
Specifically, with this change lockdep will create separate lock
dependencies that include the invalidate_lock, in the case that free
space inodes are used and in the case that normal inodes are used.
The lockdep class for this lock was first initialized in
inode_init_always() in fs/inode.c.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is an internal report on hitting the following ASSERT() in
recalculate_thresholds():
ASSERT(ctl->total_bitmaps <= max_bitmaps);
Above @max_bitmaps is calculated using the following variables:
- bytes_per_bg
8 * 4096 * 4096 (128M) for x86_64/x86.
- block_group->length
The length of the block group.
@max_bitmaps is the rounded up value of block_group->length / 128M.
Normally one free space cache should not have more bitmaps than above
value, but when it happens the ASSERT() can be triggered if
CONFIG_BTRFS_ASSERT is also enabled.
But the ASSERT() itself won't provide enough info to know which is going
wrong.
Is the bg too small thus it only allows one bitmap?
Or is there something else wrong?
So although I haven't found extra reports or crash dump to do further
investigation, add the extra info to make it more helpful to debug.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The chained assignments may be convenient to write, but make readability
a bit worse as it's too easy to overlook that there are several values
set on the same line while this is rather an exception. Making it
consistent everywhere avoids surprises.
The pattern where inode times are initialized reuses the first value and
the order is mtime, ctime. In other blocks the assignments are expanded
so the order of variables is similar to the neighboring code.
Signed-off-by: David Sterba <dsterba@suse.com>
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first argument
like ->read_folio
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmKNMDUACgkQDpNsjXcp
gj4/mwf/bpHhXH4ZoNIvtUpTF6rZbqeffmc0VrbxCZDZ6igRnRPglxZ9H9v6L53O
7B0FBQIfxgNKHZpdqGdOkv8cjg/GMe/HJUbEy5wOakYPo4L9fZpHbDZ9HM2Eankj
xBqLIBgBJ7doKr+Y62DAN19TVD8jfRfVtli5mqXJoNKf65J7BkxljoTH1L3EXD9d
nhLAgyQjR67JQrT/39KMW+17GqLhGefLQ4YnAMONtB6TVwX/lZmigKpzVaCi4r26
bnk5vaR/3PdjtNxIoYvxdc71y2Eg05n2jEq9Wcy1AaDv/5vbyZUlZ2aBSaIVbtKX
WfrhN9O3L0bU5qS7p9PoyfLc9wpq8A==
=djLv
-----END PGP SIGNATURE-----
Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache
Pull page cache updates from Matthew Wilcox:
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first
argument like ->read_folio
* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
nilfs2: Fix some kernel-doc comments
Appoint myself page cache maintainer
fs: Remove aops->freepage
secretmem: Convert to free_folio
nfs: Convert to free_folio
orangefs: Convert to free_folio
fs: Add free_folio address space operation
fs: Convert drop_buffers() to use a folio
fs: Change try_to_free_buffers() to take a folio
jbd2: Convert release_buffer_page() to use a folio
jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
reiserfs: Convert release_buffer_page() to use a folio
fs: Remove last vestiges of releasepage
ubifs: Convert to release_folio
reiserfs: Convert to release_folio
orangefs: Convert to release_folio
ocfs2: Convert to release_folio
nilfs2: Remove comment about releasepage
nfs: Convert to release_folio
jfs: Convert to release_folio
...
We keep track of the start offset of the block group with the lowest start
offset at fs_info->first_logical_byte. This requires explicitly updating
that field every time we add, delete or lookup a block group to/from the
red black tree at fs_info->block_group_cache_tree.
Since the block group with the lowest start address happens to always be
the one that is the leftmost node of the tree, we can use a red black tree
that caches the left most node. Then when we need the start address of
that block group, we can just quickly get the leftmost node in the tree
and extract the start offset of that node's block group. This avoids the
need to explicitly keep track of that address in the dedicated member
fs_info->first_logical_byte, and it also allows the next patch in the
series to switch the lock that protects the red black tree from a spin
lock to a read/write lock - without this change it would be tricky
because block group searches also update fs_info->first_logical_byte.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For non-zoned file systems it's useful to have the auto reclaim feature,
however there are different use cases for non-zoned, for example we may
not want to reclaim metadata chunks ever, only data chunks. Move this
sysfs flag to per-space_info. This won't affect current users because
this tunable only ever did anything for zoned, and that is currently
hidden behind BTRFS_CONFIG_DEBUG.
Tested-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ jth restore global bg_reclaim_threshold ]
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a "weak" conversion which converts straight back to using pages.
A full conversion should be performed at some point, hopefully by
someone familiar with the filesystem.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
In the future we're going to want to use btrfs_truncate_inode_items
without looking up the associated inode. In order to accommodate this
add the inode to btrfs_truncate_control and handle the case where
control->inode is NULL appropriately. This is fairly straightforward,
we simply need to add a helper for the trace points, as the file extent
map update is controlled by a flag on btrfs_truncate_control.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the future we are going to want to truncate inode items without
needing to have an btrfs_inode to pass in, so add ino to the
btrfs_truncate_control and use that to look up the inode items to
truncate.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We only care about updating the file extent range when we are doing a
normal truncation. We skip this for tree logging currently, but we can
also skip this for eviction as well. Using a flag makes it more
explicit when we want to do this work.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We currently have a bunch of awkward checks to make sure we only update
the inode i_bytes if we're truncating the real inode. Instead keep
track of the number of bytes we need to sub in the
btrfs_truncate_control, and then do the appropriate adjustment in the
truncate paths that care.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We currently will update the i_size of the inode as we truncate it down,
however we skip this if we're calling btrfs_truncate_inode_items from
the tree log code. However we also don't care about this in the case of
evict. Instead keep track of this value in the btrfs_truncate_control
and then have btrfs_truncate() and the free space cache truncate path
both do the i_size update themselves.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I'm going to be adding more arguments and counters to
btrfs_truncate_inode_items, so add a control struct to handle all of the
extra arguments to make it easier to follow.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we are locking the extent and dropping the extent cache for
any inodes we truncate, unless they're in the tree log. We call this
helper from:
- truncate
- evict
- tree log
- free space cache truncation
For evict we've already dropped all of the extent cache for this inode
once we've gotten here, and we're the only one accessing this inode, so
this step is unnecessary.
For the tree log code we already skip this part.
Pull this work into the truncate path and the free space cache
truncation path.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>