Commit graph

12475 commits

Author SHA1 Message Date
David Sterba
321f4992c1 btrfs: reduce size of struct btrfs_ref
We can reduce two members' size that in turn reduce size of struct
btrfs_ref from 64 to 56 bytes. As the structure is often used as a local
variable several functions reduce their stack usage.

- make enum btrfs_ref_type packed, there are only 4 values

- switch action and its values to a packed enum

Final structure layout:

struct btrfs_ref {
        enum btrfs_ref_type        type;                 /*     0     1 */
        enum btrfs_delayed_ref_action action;            /*     1     1 */
        bool                       skip_qgroup;          /*     2     1 */

        /* XXX 5 bytes hole, try to pack */

        u64                        bytenr;               /*     8     8 */
        u64                        len;                  /*    16     8 */
        u64                        parent;               /*    24     8 */
        union {
                struct btrfs_data_ref data_ref;          /*    32    24 */
                struct btrfs_tree_ref tree_ref;          /*    32    16 */
        };                                               /*    32    24 */

        /* size: 56, cachelines: 1, members: 7 */
        /* sum members: 51, holes: 1, sum holes: 5 */
        /* last cacheline: 56 bytes */
};

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:04 +02:00
David Sterba
e41570d379 btrfs: reduce size and reorder compression members in struct btrfs_inode
Currently the compression type values are bounded and fit to an u8, we
can pack the btrfs_inode a bit by reordering them to the space created
by the location key. This reduces size from 1112 to 1104.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:04 +02:00
David Sterba
105c8c4214 btrfs: reduce size of prelim_ref::level
The values of level are bounded and fit into a byte so let's use it for
the structure to reduce size from 88 to 80 bytes on a release build,
which increases number of objects in the default 8K slab from 93 to 102.

struct prelim_ref {
        struct rb_node             rbnode __attribute__((__aligned__(8))); /*     0    24 */
        u64                        root_id;              /*    24     8 */
        struct btrfs_key           key_for_search;       /*    32    17 */
        u8                         level;                /*    49     1 */

        /* XXX 2 bytes hole, try to pack */

        int                        count;                /*    52     4 */
        struct extent_inode_elem * inode_list;           /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        parent;               /*    64     8 */
        u64                        wanted_disk_byte;     /*    72     8 */

        /* size: 80, cachelines: 2, members: 8 */
        /* sum members: 78, holes: 1, sum holes: 2 */
        /* forced alignments: 1 */
        /* last cacheline: 16 bytes */
} __attribute__((__aligned__(8)));

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:04 +02:00
David Sterba
02cd00fa78 btrfs: reduce arguments of helpers space accounting root item
There are two helpers to increase used bytes of root items that add or
subtract one node size, we don't need to pass the argument for that.
Rename the function so it matches the root item member that gets
changed.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:04 +02:00
David Sterba
007dec8c7e btrfs: reduce parameters of btrfs_pin_extent_for_log_replay
Both callers of btrfs_pin_extent_for_log_replay expand the parameters to
extent buffer members. We can simply pass the extent buffer instead.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:04 +02:00
David Sterba
f863c50277 btrfs: reduce parameters of btrfs_pin_reserved_extent
There is only one caller of btrfs_pin_reserved_extent that expands the
parameters to extent buffer members. We can simply pass the extent
buffer instead.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:04 +02:00
David Sterba
203f6a8772 btrfs: drop __must_check annotations
Drop all __must_check annotations because they're used in random
functions and not consistently. All errors should be handled.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:04 +02:00
David Sterba
9580503bcb btrfs: reformat remaining kdoc style comments
Function name in the comment does not bring much value to code not
exposed as API and we don't stick to the kdoc format anymore. Update
formatting of parameter descriptions.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:04 +02:00
David Sterba
33b6b25191 btrfs: move functions comments from qgroup.h to qgroup.c
We keep the comments next to the implementation, there were some left
to move.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:04 +02:00
Anand Jain
cb6eb4757e btrfs: comment about fsid and metadata_uuid relationship
Add a comment explaining the relationship between fsid and metadata_uuid
in the on-disk superblock and the in-memory struct btrfs_fs_devices.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:04 +02:00
Jiapeng Chong
1246873114 btrfs: qgroup: remove unused helpers for ulist aux data
These functions are defined in the qgroup.c file, but not called
anymore since commit "btrfs: qgroup: use qgroup_iterator_nested to in
qgroup_update_refcnt()" so we can delete them.

fs/btrfs/qgroup.c:149:19: warning: unused function 'qgroup_to_aux'.
fs/btrfs/qgroup.c:154:36: warning: unused function 'unode_aux_to_qgroup'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6566
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Qu Wenruo
79ace7b807 btrfs: qgroup: prealloc btrfs_qgroup_list for __add_relation_rb()
Currently we go GFP_ATOMIC allocation for qgroup relation add, this
includes the following 3 call sites:

- btrfs_read_qgroup_config()
  This is not really needed, as at that time we're still in single
  thread mode, and no spin lock is held.

- btrfs_add_qgroup_relation()
  This one is holding a spinlock, but we're ensured to add at most one
  relation, thus we can easily do a preallocation and use the
  preallocated memory to avoid GFP_ATOMIC.

- btrfs_qgroup_inherit()
  This is a little more tricky, as we may have as many relationships as
  inherit::num_qgroups.
  Thus we have to properly allocate an array then preallocate all the
  memory.

This patch would remove the GFP_ATOMIC allocation for above involved
call sites, by doing preallocation before holding the spinlock, and let
__add_relation_rb() to handle the freeing of the structure.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Qu Wenruo
8d54518b5e btrfs: qgroup: pre-allocate btrfs_qgroup to reduce GFP_ATOMIC usage
Qgroup is the heaviest user of GFP_ATOMIC, but one call site does not
really need GFP_ATOMIC, that is add_qgroup_rb().

That function only searches the rbtree to find if we already have such
entry.  If not, then it would try to allocate memory for it.

This means we can afford to pre-allocate such structure unconditionally,
then free the memory if it's not needed.

Considering this function is not a hot path, only utilized by the
following functions:

- btrfs_qgroup_inherit()
  For "btrfs subvolume snapshot -i" option.

- btrfs_read_qgroup_config()
  At mount time, and we're ensured there would be no existing rb tree
  entry for each qgroup.

- btrfs_create_qgroup()

Thus we're completely safe to pre-allocate the extra memory for btrfs_qgroup
structure, and reduce unnecessary GFP_ATOMIC usage.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Qu Wenruo
dce28769a3 btrfs: qgroup: use qgroup_iterator_nested to in qgroup_update_refcnt()
The ulist @qgroups is utilized to record all involved qgroups from both
old and new roots inside btrfs_qgroup_account_extent().

Due to the fact that qgroup_update_refcnt() itself is already utilizing
qgroup_iterator, here we have to introduce another list_head,
btrfs_qgroup::nested_iterator, allowing nested iteration.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Qu Wenruo
a4a81383fb btrfs: qgroup: use qgroup_iterator to replace tmp ulist in qgroup_update_refcnt()
For function qgroup_update_refcnt(), we use @tmp list to iterate all the
involved qgroups of a subvolume.

It's a perfect match for qgroup_iterator facility, as that @tmp ulist
has a very limited lifespan (just inside the while() loop).

By migrating to qgroup_iterator, we can get rid of the GFP_ATOMIC memory
allocation and no error handling is needed.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Qu Wenruo
a0bdc04b07 btrfs: qgroup: use qgroup_iterator in __qgroup_excl_accounting()
With the new qgroup_iterator_add() and qgroup_iterator_clean(), we can
get rid of the ulist and its GFP_ATOMIC memory allocation.

Furthermore we can merge the code handling the initial and parent
qgroups into one loop, and drop the @tmp ulist parameter for involved
call sites.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Qu Wenruo
0913445082 btrfs: qgroup: use qgroup_iterator in qgroup_convert_meta()
With the new qgroup_iterator_add() and qgroup_iterator_clean(), we can
get rid of the ulist and its GFP_ATOMIC memory allocation.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Qu Wenruo
25152cb7a8 btrfs: qgroup: use qgroup_iterator in btrfs_qgroup_free_refroot()
With the new qgroup_iterator_add() and qgroup_iterator_clean(), we can
get rid of the ulist and its GFP_ATOMIC memory allocation.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Qu Wenruo
686c4a5a42 btrfs: qgroup: iterate qgroups without memory allocation for qgroup_reserve()
Qgroup heavily relies on ulist to go through all the involved
qgroups, but since we're using ulist inside fs_info->qgroup_lock
spinlock, this means we're doing a lot of GFP_ATOMIC allocations.

This patch reduces the GFP_ATOMIC usage for qgroup_reserve() by
eliminating the memory allocation completely.

This is done by moving the needed memory to btrfs_qgroup::iterator
list_head, so that we can put all the involved qgroup into a on-stack
list, thus eliminating the need to allocate memory while holding
spinlock.

The only cost is the slightly higher memory usage, but considering the
reduce GFP_ATOMIC during a hot path, it should still be acceptable.

Function qgroup_reserve() is the perfect start point for this
conversion.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Josef Bacik
2a3a1dd99e btrfs: remove extraneous includes from ctree.h
We don't need any of these includes in the ctree.h header file for the
header file itself, remove them to clean up ctree.h a little bit.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Josef Bacik
c60a28806c btrfs: include linux/security.h in super.c
We use some of the security related code in here, include it in super.c
so we can remove the include from ctree.h.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Josef Bacik
5335f4376c btrfs: include trace header in where necessary
If we no longer include the tracepoints from ctree.h we fail to compile
because we have the dependency in some of the header files and source
files.  Add the include where we have these dependencies to allow us to
remove the include from ctree.h.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Josef Bacik
82cc2ade2a btrfs: add btrfs_delayed_ref_head declaration to extent-tree.h
extent-tree.h uses btrfs_delayed_ref_head in a function argument but
doesn't pull it's declaration from anywhere, add it to the top of the
header.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Josef Bacik
04cc63d12c btrfs: add fscrypt related dependencies to respective headers
These headers have struct fscrypt_str as function arguments, so add
struct fscrypt_str to the theader, and include linux/fscrypt.h in
btrfs_inode.h as it also needs the definition of struct fscrypt_name for
the new inode args.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Josef Bacik
3ecb43cb64 btrfs: include linux/iomap.h in file.c
We use the iomap code in file.c, include it so we have our dependencies.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Josef Bacik
f005d997c4 btrfs: include asm/unaligned.h in accessors.h
We use the unaligned helpers directly in accessors.h, add the include
here.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Josef Bacik
1b9e6a15bc btrfs: move btrfs_name_hash to dir-item.h
This is related to the name hashing for dir items, move it into
dir-item.h.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Josef Bacik
98e4f060c4 btrfs: move btrfs_extref_hash into inode-item.h
Ideally this would be un-inlined, but that is a cleanup for later.  For
now move this into inode-item.h, which is where the extref code lives.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Josef Bacik
03e8634896 btrfs: remove btrfs_crc32c wrapper
This simply sends the same arguments into crc32c(), and is just used in
a few places.  Remove this wrapper and directly call crc32c() in these
instances.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Josef Bacik
102f2640a3 btrfs: move btrfs_crc32c_final into free-space-cache.c
This is the only place this helper is used, take it out of ctree.h and
move it into free-space-cache.c.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Qu Wenruo
1c94674b25 btrfs: do not require EXTENT_NOWAIT for btrfs_redirty_list_add()
The flag EXTENT_NOWAIT is a special flag to notify extent-io-tree code
that this operation should not sleep for the extent state preallocation.

However for btrfs_redirty_list_add(), all callers are able to sleep:

- clean_log_buffer()
  Just 2 lines before, we call btrfs_pin_reserved_extent(), which calls
  pin_down_extent(), and that function does not require EXTENT_NOWAIT.
  Thus we're safe to call it without EXTENT_NOWAIT.

- btrfs_free_tree_block()
  This function have several call sites which trigger tree read, e.g.
  walk_up_proc(), thus we're safe to call it without EXTENT_NOWAIT.

Thus there is no need to require EXTENT_NOWAIT flag.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Anand Jain
f7361d8c3f btrfs: sipmlify uuid parameters of alloc_fs_devices()
Among all the callers, only the device_list_add() function uses the
second argument of alloc_fs_devices(). It passes metadata_uuid when
available, otherwise, it passes NULL. And in turn, alloc_fs_devices()
is designed to copy either metadata_uuid or fsid into
fs_devices::metadata_uuid.

So remove the second argument in alloc_fs_devices(), and always copy the
fsid.  In the caller device_list_add() function, we will overwrite it
with metadata_uuid when it is available.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Filipe Manana
01fc062bd0 btrfs: update comment for reservation of metadata space for delayed items
The second comment at btrfs_delayed_item_reserve_metadata() refers to a
field named "index_items_size" of a delayed inode, however that field
does not exists - it existed in a previous patch version, but then it
split into the fields "curr_index_batch_size" and "index_item_leaves"
in the final patch version that was picked. So update the comment.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Linus Torvalds
759d1b653f for-6.6-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmUmbQMACgkQxWXV+ddt
 WDtBshAAqOwMrqRwOKOze/LQ4Kl9A8p0l+XxYdt7nRSY7n15xpN6uLVsc0gTwO5n
 HOquDe2ivrpdOXI6ArcujTTFHaBGX+mmubU/yi54MH0iwuCR32dYhj3j7mDUIf6F
 GpTEjgxIdE4AMUw7e7Rzqbdcmq//+H+bBdm+2YkNNEBmPP06483GYthjKJ7zWdrn
 pPksR9f611aHU4jZnKZJeHgZh4iVrIszIxkjeMD5NJ6KUb8LJmISLOOJzowkmugt
 JH8bd1F/+/53MmpntWGnHnURI9J6UxBL0cNnYW26FjY21N3RGR2BumotW73hYaD7
 6fwuxs4ZWlLqHUtIOaAVUUSfEVse7k/i7m4+sDB1JLh26alqUHunqCFV+3ROTnOY
 jHwWW+qyQhxJnfgtHyDrwcybfW0V41hhmDIhoeezkSDtbnacNTMfwzXS2ELcp0KJ
 /13TCruweFN0g4lBR8HfbKJCCzPayxCirtubx1nIMRysHfo10aDWz1MSvr3mkOyo
 gwif/j9BMKN0+fg6l9eZNHWHfQ8qfL3dvSRBlvJcP5mnG5ZuVkxJUFH0m/UfdFbZ
 sbeJHSP9wex5tJKmG3kJPAuZWwGLHCiMMCnsWoq+02KV8IXrw3Ji5z/8Hhsb51Ps
 r7BGRO2A2rD9XLJtc9BCiwiV177/WknmTUtRpOyxHFfb37bKmHg=
 =Wz/9
 -----END PGP SIGNATURE-----

Merge tag 'for-6.6-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A revert of recent mount option parsing fix, this breaks mounts with
  security options.

  The second patch is a flexible array annotation"

* tag 'for-6.6-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: add __counted_by for struct btrfs_delayed_item and use struct_size()
  Revert "btrfs: reject unknown mount options early"
2023-10-11 13:58:32 -07:00
Gustavo A. R. Silva
75f5f60bf7 btrfs: add __counted_by for struct btrfs_delayed_item and use struct_size()
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

While there, use struct_size() helper, instead of the open-coded
version, to calculate the size for the allocation of the whole
flexible structure, including of course, the flexible-array member.

This code was found with the help of Coccinelle, and audited and
fixed manually.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-11 11:37:19 +02:00
David Sterba
54f67decdd Revert "btrfs: reject unknown mount options early"
This reverts commit 5f521494cc.

The patch breaks mounts with security mount options like

  $ mount -o context=system_u:object_r:root_t:s0 /dev/sdX /mn
  mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdX, missing codepage or helper program, ...

We cannot reject all unknown options in btrfs_parse_subvol_options() as
intended, the security options can be present at this point and it's not
possible to enumerate them in a future proof way. This means unknown
mount options are silently accepted like before when the filesystem is
mounted with either -o subvol=/path or as followup mounts of the same
device.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-10 15:27:56 +02:00
Linus Torvalds
7de25c855b for-6.6-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmUe+t0ACgkQxWXV+ddt
 WDv6MA/7B31L45dH+qHM3XFUygJuTBk44OynDSRD/JrPS6ruycu3QpWCZ82+ozUz
 v8ULN3xJV4j2EWWa7w20CNfMITqEdOAvHHX6GAuXwTfLwy3ov+/L8tOt2OAQ44go
 kr6jiQULdBwfMxEp+6a5kMw0enVuEz3H+P8gWWUfQHuse+Cgk1TIdvLL8YuaoL0x
 mEphDtNLFh7UcsKxxVwgNXWowPxIO62xW/11hJKrF9ZpyFfER1TzfaO9kZStH2oe
 ylHYkWsVf6GdHtXlsVnvDSNdj+GW/KLRLWKouQNjbInSjmZzEBliBbVbXLCI1fvO
 /LpN1uu8T1XezBvxoEFw2JenkmFqMDg+ocl81owoG/IdJLOqPWCerUGb7VPtooT3
 dLx3buXXVBhx70qRdCgg5SwsjNTSElV5Ub9AnYGP5oux5of8oLOb9dSpQsxcE7iE
 yJEltu6+A1X+uVFHiDI8IIGghyZRq2UXc6zVdE3cHFfjwwB22aOtcRKZDw4O3Qzn
 DMuACRWZk8WL9gpQZEPa07JmSS3VPN6iY1gq3CYeZpoHOW6BMMDYb2p5/f+yNbWW
 a2JkDW+BnorEqqssMUyB2tf5k3fbOn1M15LSAH5oVXKA/F7dlxnSQksa7AI/pfFK
 InAmPLWQhzcIuNhpUs/+FwZ2csc0mbAWroX+fIRF3S99GR2e9ag=
 =/WDi
 -----END PGP SIGNATURE-----

Merge tag 'for-6.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - reject unknown mount options

 - adjust transaction abort error message level

 - fix one more build warning with -Wmaybe-uninitialized

 - proper error handling in several COW-related cases

* tag 'for-6.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: error out when reallocating block for defrag using a stale transaction
  btrfs: error when COWing block from a root that is being deleted
  btrfs: error out when COWing block using a stale transaction
  btrfs: always print transaction aborted messages with an error level
  btrfs: reject unknown mount options early
  btrfs: fix some -Wmaybe-uninitialized warnings in ioctl.c
2023-10-06 08:07:47 -07:00
Filipe Manana
e36f949140 btrfs: error out when reallocating block for defrag using a stale transaction
At btrfs_realloc_node() we have these checks to verify we are not using a
stale transaction (a past transaction with an unblocked state or higher),
and the only thing we do is to trigger two WARN_ON(). This however is a
critical problem, highly unexpected and if it happens it's most likely due
to a bug, so we should error out and turn the fs into error state so that
such issue is much more easily noticed if it's triggered.

The problem is critical because in btrfs_realloc_node() we COW tree blocks,
and using such stale transaction will lead to not persisting the extent
buffers used for the COW operations, as allocating tree block adds the
range of the respective extent buffers to the ->dirty_pages iotree of the
transaction, and a stale transaction, in the unlocked state or higher,
will not flush dirty extent buffers anymore, therefore resulting in not
persisting the tree block and resource leaks (not cleaning the dirty_pages
iotree for example).

So do the following changes:

1) Return -EUCLEAN if we find a stale transaction;

2) Turn the fs into error state, with error -EUCLEAN, so that no
   transaction can be committed, and generate a stack trace;

3) Combine both conditions into a single if statement, as both are related
   and have the same error message;

4) Mark the check as unlikely, since this is not expected to ever happen.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04 01:04:33 +02:00
Filipe Manana
a2caab2988 btrfs: error when COWing block from a root that is being deleted
At btrfs_cow_block() we check if the block being COWed belongs to a root
that is being deleted and if so we log an error message. However this is
an unexpected case and it indicates a bug somewhere, so we should return
an error and abort the transaction. So change this in the following ways:

1) Abort the transaction with -EUCLEAN, so that if the issue ever happens
   it can easily be noticed;

2) Change the logged message level from error to critical, and change the
   message itself to print the block's logical address and the ID of the
   root;

3) Return -EUCLEAN to the caller;

4) As this is an unexpected scenario, that should never happen, mark the
   check as unlikely, allowing the compiler to potentially generate better
   code.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04 01:04:28 +02:00
Filipe Manana
48774f3bf8 btrfs: error out when COWing block using a stale transaction
At btrfs_cow_block() we have these checks to verify we are not using a
stale transaction (a past transaction with an unblocked state or higher),
and the only thing we do is to trigger a WARN with a message and a stack
trace. This however is a critical problem, highly unexpected and if it
happens it's most likely due to a bug, so we should error out and turn the
fs into error state so that such issue is much more easily noticed if it's
triggered.

The problem is critical because using such stale transaction will lead to
not persisting the extent buffer used for the COW operation, as allocating
a tree block adds the range of the respective extent buffer to the
->dirty_pages iotree of the transaction, and a stale transaction, in the
unlocked state or higher, will not flush dirty extent buffers anymore,
therefore resulting in not persisting the tree block and resource leaks
(not cleaning the dirty_pages iotree for example).

So do the following changes:

1) Return -EUCLEAN if we find a stale transaction;

2) Turn the fs into error state, with error -EUCLEAN, so that no
   transaction can be committed, and generate a stack trace;

3) Combine both conditions into a single if statement, as both are related
   and have the same error message;

4) Mark the check as unlikely, since this is not expected to ever happen.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04 01:04:24 +02:00
Filipe Manana
f8d1b011ca btrfs: always print transaction aborted messages with an error level
Commit b7af0635c8 ("btrfs: print transaction aborted messages with an
error level") changed the log level of transaction aborted messages from
a debug level to an error level, so that such messages are always visible
even on production systems where the log level is normally above the debug
level (and also on some syzbot reports).

Later, commit fccf0c842e ("btrfs: move btrfs_abort_transaction to
transaction.c") changed the log level back to debug level when the error
number for a transaction abort should not have a stack trace printed.
This happened for absolutely no reason. It's always useful to print
transaction abort messages with an error level, regardless of whether
the error number should cause a stack trace or not.

So change back the log level to error level.

Fixes: fccf0c842e ("btrfs: move btrfs_abort_transaction to transaction.c")
CC: stable@vger.kernel.org # 6.5+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04 01:03:59 +02:00
Qu Wenruo
5f521494cc btrfs: reject unknown mount options early
[BUG]
The following script would allow invalid mount options to be specified
(although such invalid options would just be ignored):

  # mkfs.btrfs -f $dev
  # mount $dev $mnt1		<<< Successful mount expected
  # mount $dev $mnt2 -o junk	<<< Failed mount expected
  # echo $?
  0

[CAUSE]
For the 2nd mount, since the fs is already mounted, we won't go through
open_ctree() thus no btrfs_parse_options(), but only through
btrfs_parse_subvol_options().

However we do not treat unrecognized options from valid but irrelevant
options, thus those invalid options would just be ignored by
btrfs_parse_subvol_options().

[FIX]
Add the handling for Opt_err to handle invalid options and error out,
while still ignore other valid options inside btrfs_parse_subvol_options().

Reported-by: Anand Jain <anand.jain@oracle.com>
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04 01:03:08 +02:00
Josef Bacik
9147b9ded4 btrfs: fix some -Wmaybe-uninitialized warnings in ioctl.c
Jens reported the following warnings from -Wmaybe-uninitialized recent
Linus' branch.

  In file included from ./include/asm-generic/rwonce.h:26,
		   from ./arch/arm64/include/asm/rwonce.h:71,
		   from ./include/linux/compiler.h:246,
		   from ./include/linux/export.h:5,
		   from ./include/linux/linkage.h:7,
		   from ./include/linux/kernel.h:17,
		   from fs/btrfs/ioctl.c:6:
  In function ‘instrument_copy_from_user_before’,
      inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3,
      inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7,
      inlined from ‘btrfs_ioctl_space_info’ at fs/btrfs/ioctl.c:2999:6,
      inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4616:10:
  ./include/linux/kasan-checks.h:38:27: warning: ‘space_args’ may be used
  uninitialized [-Wmaybe-uninitialized]
     38 | #define kasan_check_write __kasan_check_write
  ./include/linux/instrumented.h:129:9: note: in expansion of macro
  ‘kasan_check_write’
    129 |         kasan_check_write(to, n);
	|         ^~~~~~~~~~~~~~~~~
  ./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’:
  ./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const
  volatile void *’ to ‘__kasan_check_write’ declared here
     20 | bool __kasan_check_write(const volatile void *p, unsigned int
	size);
	|      ^~~~~~~~~~~~~~~~~~~
  fs/btrfs/ioctl.c:2981:39: note: ‘space_args’ declared here
   2981 |         struct btrfs_ioctl_space_args space_args;
	|                                       ^~~~~~~~~~
  In function ‘instrument_copy_from_user_before’,
      inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3,
      inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7,
      inlined from ‘_btrfs_ioctl_send’ at fs/btrfs/ioctl.c:4343:9,
      inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4658:10:
  ./include/linux/kasan-checks.h:38:27: warning: ‘args32’ may be used
  uninitialized [-Wmaybe-uninitialized]
     38 | #define kasan_check_write __kasan_check_write
  ./include/linux/instrumented.h:129:9: note: in expansion of macro
  ‘kasan_check_write’
    129 |         kasan_check_write(to, n);
	|         ^~~~~~~~~~~~~~~~~
  ./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’:
  ./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const
  volatile void *’ to ‘__kasan_check_write’ declared here
     20 | bool __kasan_check_write(const volatile void *p, unsigned int
	size);
	|      ^~~~~~~~~~~~~~~~~~~
  fs/btrfs/ioctl.c:4341:49: note: ‘args32’ declared here
   4341 |                 struct btrfs_ioctl_send_args_32 args32;
	|                                                 ^~~~~~

This was due to his config options and having KASAN turned on,
which adds some extra checks around copy_from_user(), which then
triggered the -Wmaybe-uninitialized checker for these cases.

Fix the warnings by initializing the different structs we're copying
into.

Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04 01:03:05 +02:00
Linus Torvalds
cac405a3bf for-6.6-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmURvloACgkQxWXV+ddt
 WDt+CQ/+NgBtQn7eyABsdHzXWPxpFyGZrdw5ldKnly3G+WDW2GKMaZ6CpDuEZGNQ
 vMAkSGX5LIHXvO79pDnGG0i+bRINWrc5HZVZ/p5Da6wplBTgIPlbLmxaZX9MJLbx
 j7Oz37GXiQJY8BxnVCnsb+bhhTrTbO9HFUQr/nxefIvu22OBdL1WXYcfuBOeEsFG
 qr/aeC52YqCVgXvt+8a5DqAKE0NWc4PFMFUMo4vlf1xuL652fvff7xiup1CAIgBh
 qsCa17E7q+qjri2phAhbFNadfpH5wGfyjTWScOlaFuXjRhW2v2oqz3WU5IQj4dmu
 PI+k++PLUzIxT0IcjD1YbZzRFaEI6fR2W0GA4LK08fjVehh2ao5jOjtRgLl8HlqG
 qC5fslAPzUxRmwMmCjSGfXF14sgtyLy8eVWf69xn06/1cbEmfHDrWNXP1QHuq6eT
 Jqy8Ywia3jRzzfZ1utABJPLBW4hFQKkyobtyd67fxslUFmtuLvLqGTiOdmVFiD9K
 o+BF2xjEz2n8O1+aRZk5SFNC9zcaASaRg/wQrhvSI9qxM18fh4TXgKQOniLzAK7v
 lZc+JkegFW4CVquCUpmbsdZAOpVNRXfPOJIt/w6G+oRbaiTvPUnrH+uyq8IGREbw
 E7d8XIP0qlF0DQBGK4Mw/riZz/e5MmEKNjza6M+fj2uglpfWTv4=
 =6WEW
 -----END PGP SIGNATURE-----

Merge tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - delayed refs fixes:
     - fix race when refilling delayed refs block reserve
     - prevent transaction block reserve underflow when starting
       transaction
     - error message and value adjustments

 - fix build warnings with CONFIG_CC_OPTIMIZE_FOR_SIZE and
   -Wmaybe-uninitialized

 - fix for smatch report where uninitialized data from invalid extent
   buffer range could be returned to the caller

 - fix numeric overflow in statfs when calculating lower threshold
   for a full filesystem

* tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: initialize start_slot in btrfs_log_prealloc_extents
  btrfs: make sure to initialize start and len in find_free_dev_extent
  btrfs: reset destination buffer when read_extent_buffer() gets invalid range
  btrfs: properly report 0 avail for very full file systems
  btrfs: log message if extent item not found when running delayed extent op
  btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref()
  btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1
  btrfs: prevent transaction block reserve underflow when starting transaction
  btrfs: fix race when refilling delayed refs block reserve
2023-09-26 09:44:08 -07:00
Linus Torvalds
b5cbe7c00a v6.6-rc3.vfs.ctime.revert
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZQsZLQAKCRCRxhvAZXjc
 op0vAP96hkSUnmXmxTr8GHId3yfElN8ZZ3aSfePeBdljjKEZVAEA2+cbHLy4GqRi
 TpjP1HNIdmtbVSC2ZnrgqkbwGageQgg=
 =s92y
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull finegrained timestamp reverts from Christian Brauner:
 "Earlier this week we sent a few minor fixes for the multi-grained
  timestamp work in [1]. While we were polishing those up after Linus
  realized that there might be a nicer way to fix them we received a
  regression report in [2] that fine grained timestamps break gnulib
  tests and thus possibly other tools.

  The kernel will elide fine-grain timestamp updates when no one is
  actively querying for them to avoid performance impacts. So a sequence
  like write(f1) stat(f2) write(f2) stat(f2) write(f1) stat(f1) may
  result in timestamp f1 to be older than the final f2 timestamp even
  though f1 was last written too but the second write didn't update the
  timestamp.

  Such plotholes can lead to subtle bugs when programs compare
  timestamps. For example, the nap() function in [2] will estimate that
  it needs to wait one ns on a fine-grain timestamp enabled filesytem
  between subsequent calls to observe a timestamp change. But in general
  we don't update timestamps with more than one jiffie if we think that
  no one is actively querying for fine-grain timestamps to avoid
  performance impacts.

  While discussing various fixes the decision was to go back to the
  drawing board and ultimately to explore a solution that involves only
  exposing such fine-grained timestamps to nfs internally and never to
  userspace.

  As there are multiple solutions discussed the honest thing to do here
  is not to fix this up or disable it but to cleanly revert. The general
  infrastructure will probably come back but there is no reason to keep
  this code in mainline.

  The general changes to timestamp handling are valid and a good cleanup
  that will stay. The revert is fully bisectable"

Link: https://lore.kernel.org/all/20230918-hirte-neuzugang-4c2324e7bae3@brauner [1]
Link: https://lore.kernel.org/all/bf0524debb976627693e12ad23690094e4514303.camel@linuxfromscratch.org [2]

* tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  Revert "fs: add infrastructure for multigrain timestamps"
  Revert "btrfs: convert to multigrain timestamps"
  Revert "ext4: switch to multigrain timestamps"
  Revert "xfs: switch to multigrain timestamps"
  Revert "tmpfs: add support for multigrain timestamps"
2023-09-21 10:15:26 -07:00
Josef Bacik
b4c639f699 btrfs: initialize start_slot in btrfs_log_prealloc_extents
Jens reported a compiler warning when using
CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this

  fs/btrfs/tree-log.c: In function ‘btrfs_log_prealloc_extents’:
  fs/btrfs/tree-log.c:4828:23: warning: ‘start_slot’ may be used
  uninitialized [-Wmaybe-uninitialized]
   4828 |                 ret = copy_items(trans, inode, dst_path, path,
	|                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   4829 |                                  start_slot, ins_nr, 1, 0);
	|                                  ~~~~~~~~~~~~~~~~~~~~~~~~~
  fs/btrfs/tree-log.c:4725:13: note: ‘start_slot’ was declared here
   4725 |         int start_slot;
	|             ^~~~~~~~~~

The compiler is incorrect, as we only use this code when ins_len > 0,
and when ins_len > 0 we have start_slot properly initialized.  However
we generally find the -Wmaybe-uninitialized warnings valuable, so
initialize start_slot to get rid of the warning.

Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-21 18:52:23 +02:00
Josef Bacik
20218dfbaa btrfs: make sure to initialize start and len in find_free_dev_extent
Jens reported a compiler error when using CONFIG_CC_OPTIMIZE_FOR_SIZE=y
that looks like this

  In function ‘gather_device_info’,
      inlined from ‘btrfs_create_chunk’ at fs/btrfs/volumes.c:5507:8:
  fs/btrfs/volumes.c:5245:48: warning: ‘dev_offset’ may be used uninitialized [-Wmaybe-uninitialized]
   5245 |                 devices_info[ndevs].dev_offset = dev_offset;
	|                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
  fs/btrfs/volumes.c: In function ‘btrfs_create_chunk’:
  fs/btrfs/volumes.c:5196:13: note: ‘dev_offset’ was declared here
   5196 |         u64 dev_offset;

This occurs because find_free_dev_extent is responsible for setting
dev_offset, however if we get an -ENOMEM at the top of the function
we'll return without setting the value.

This isn't actually a problem because we will see the -ENOMEM in
gather_device_info() and return and not use the uninitialized value,
however we also just don't want the compiler warning so rework the code
slightly in find_free_dev_extent() to make sure it's always setting
*start and *len to avoid the compiler warning.

Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-21 18:52:20 +02:00
Qu Wenruo
74ee79142c btrfs: reset destination buffer when read_extent_buffer() gets invalid range
Commit f98b6215d7 ("btrfs: extent_io: do extra check for extent buffer
read write functions") changed how we handle invalid extent buffer range
for read_extent_buffer().

Previously if the range is invalid we just set the destination to zero,
but after the patch we do nothing and error out.

This can lead to smatch static checker errors like:

  fs/btrfs/print-tree.c:186 print_uuid_item() error: uninitialized symbol 'subvol_id'.
  fs/btrfs/tests/extent-io-tests.c:338 check_eb_bitmap() error: uninitialized symbol 'has'.
  fs/btrfs/tests/extent-io-tests.c:353 check_eb_bitmap() error: uninitialized symbol 'has'.
  fs/btrfs/uuid-tree.c:203 btrfs_uuid_tree_remove() error: uninitialized symbol 'read_subid'.
  fs/btrfs/uuid-tree.c:353 btrfs_uuid_tree_iterate() error: uninitialized symbol 'subid_le'.
  fs/btrfs/uuid-tree.c:72 btrfs_uuid_tree_lookup() error: uninitialized symbol 'data'.
  fs/btrfs/volumes.c:7415 btrfs_dev_stats_value() error: uninitialized symbol 'val'.

Fix those warnings by reverting back to the old memset() behavior.
By this we keep the static checker happy and would still make a lot of
noise when such invalid ranges are passed in.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: f98b6215d7 ("btrfs: extent_io: do extra check for extent buffer read write functions")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20 20:44:57 +02:00
Josef Bacik
58bfe2ccec btrfs: properly report 0 avail for very full file systems
A user reported some issues with smaller file systems that get very
full.  While investigating this issue I noticed that df wasn't showing
100% full, despite having 0 chunk space and having < 1MiB of available
metadata space.

This turns out to be an overflow issue, we're doing:

  total_available_metadata_space - SZ_4M < global_block_rsv_size

to determine if there's not enough space to make metadata allocations,
which overflows if total_available_metadata_space is < 4M.  Fix this by
checking to see if our available space is greater than the 4M threshold.
This makes df properly report 100% usage on the file system.

CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20 20:44:40 +02:00
Filipe Manana
8ec0a4a577 btrfs: log message if extent item not found when running delayed extent op
When running a delayed extent operation, if we don't find the extent item
in the extent tree we just return -EIO without any logged message. This
indicates some bug or possibly a memory or fs corruption, so the return
value should not be -EIO but -EUCLEAN instead, and since it's not expected
to ever happen, print an informative error message so that if it happens
we have some idea of what went wrong, where to look at.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20 20:42:58 +02:00