Commit graph

57 commits

Author SHA1 Message Date
Qu Wenruo
ce4a71ee15 btrfs: subpage: remove btrfs_fs_info::subpage_info member
The member btrfs_fs_info::subpage_info stores the cached bitmap start
position inside the merged bitmap.

However in reality there is only one thing depending on the sectorsize,
bitmap_nr_bits, which records the number of sectors that fit inside a
page.

The sequence of sub-bitmaps have fixed order, thus it's just a quick
multiplication to calculate the start position of each sub-bitmaps.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10 16:51:18 +02:00
Linus Torvalds
53a5182c8a for-6.11-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmaak1cACgkQxWXV+ddt
 WDvtDw/9GLN/pzAhH+rg0v51b4Ofxqa5QXuz7GIHb+UOqOdiZIjun+UmBst8794t
 oZ0rhO1sjXVNwci5TG2oWUHcoAkjoPrVgJ/X5nzBHOzHJCQlV1kNzgsDyY+HFL9F
 FBkJ6u1pCXUIxry6zp55NLWLMcjtFdMphjLXENh+GzUx5aAXl7VamK41B3+pHGfW
 gDCNJ41vaaHqTGN8+9AADpZ2Oo4CUCtr8eNxGXZ0O5/0s5CdRJrjphtBTcGwbGch
 SYVceySM4ElLmFi+hKFxx68MkWgnuTnBhZhbOM4V2fPFv0hLzXyQ7OrAk+MY8O2m
 AHLx2jHKfCdONFeTUdt1cY5/wM6Afhy+N/iKtv3t2+Wi70C/LEHQRY1Op3rfx1rn
 vOIbR9IXEHVi6ncO/E9c4LacSqLd/KSOaZn2Z/6i5wN+NY86CrCMuPwr6Pv0LL6x
 aSHka47SFFTQLvHUdwmzexJ2YuosFdI0BhpiEu8ylAZTJ17yDJatk8wM+FB6Rfh1
 vdPMRi93nVfrCwkU63Y2gqtJ3ncb3mbk/0uUdtMMflZJgjL0qkxTmcu3pSEdzIYR
 gHFLVlns4cljL5PB9yMH/JjYjYn+Y6bCVvVyuhQZ3FAanUVOSFin/YWfo1bIi1et
 ENNP+lhUKYvKLcz3QcnQpX3a6PkFPrmFi5wniAvxymrmVKJ3g3E=
 =ROO7
 -----END PGP SIGNATURE-----

Merge tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "A fix for build breakage on 32bit platforms"

* tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: change BTRFS_MOUNT_* flags to 64bit type
2024-07-19 14:34:52 -07:00
Qu Wenruo
c3ece6b7ff btrfs: change BTRFS_MOUNT_* flags to 64bit type
Currently the BTRFS_MOUNT_* flags are already beyond 32 bits, this is
going to cause compilation errors for some 32 bit systems, as their
unsigned long is only 32 bits long, thus flag
BTRFS_MOUNT_IGNORESUPERFLAGS overflows and can lead to errors.

Fix the problem by:

- Migrate all existing BTRFS_MOUNT_* flags to unsigned long long
- Migrate all mount option related variables to unsigned long long
  * btrfs_fs_info::mount_opt
  * btrfs_fs_context::mount_opt
  * mount_opt parameter of btrfs_check_options()
  * old_opts parameter of btrfs_remount_begin()
  * old_opts parameter of btrfs_remount_cleanup()
  * mount_opt parameter of btrfs_check_mountopts_zoned()
  * mount_opt and opt parameters of check_ro_option()

Fixes: 32e6216512 ("btrfs: introduce new "rescue=ignoresuperflags" mount option")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-19 17:20:23 +02:00
Linus Torvalds
a1b547f0f2 for-6.11-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmaVN3MACgkQxWXV+ddt
 WDtpIRAAl+1NjsEj8e5V/UYn8Jr06ujTOnrkR3PCTICxDHbUaMLkQEw21H0K/ogQ
 3fOiEVpSlZOfKdYXtXaMQbC0jd/Af2eA10Uht96nAEjAtxu1uJ4cFZGu2meNdXZP
 xUioivJ/CElMPH2aluG6FaQvUTqmhrEr8tSoYbxzQmUd434q9kqqyjtw1tfzYDG1
 VDn2f7ykhpB/8P0aoqgWSshWTmaCzG0GkuI28o1o0iZUIF/P9TKdzxlLRW6BVHE7
 T2oGLEQjN1GQbCH75L4IeNJDkCBVfcDcbZkUDJ/ae4Pt/jJQTFY53YIP9wXFZQnd
 mdfHmK7Atpsk75ATftYSq+ENkbQ5fsuut5CD63u54gAqA4M1FncDXTAWS1Y30F76
 P8juSCmsSy0o3gTflDIo/IMdntoh/JmncwwStF6oKzmyUZZzzarsqM8mc1P03ZNt
 3ttlnbY7lC1TDAlD5J2wXE0INCT2pN+4C9IToWdRypeuLu6qrI7cQ0oylyp9OVQM
 t9umTXm0B6s1cyqEDjJf0xJZS/JTHYwu7S4EmAJwicgiLpOjABVTmO8021rVmDJy
 TAUu6yEhSsrTT6Dxm7/2Et1EEOKFF5hhsG1SiGD9oUIZK6B5+0waT+rbkEWl7osR
 4/TAv2zX6tuCc7HIW0fQloM/6/Gyd5wcDVaQNDUzFA075uKstwY=
 =k5d3
 -----END PGP SIGNATURE-----

Merge tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "The highlights are new logic behind background block group reclaim,
  automatic removal of qgroup after removing a subvolume and new
  'rescue=' mount options.

  The rest is optimizations, cleanups and refactoring.

  User visible features:

   - dynamic block group reclaim:
      - tunable framework to avoid situations where eager data
        allocations prevent creating new metadata chunks due to lack of
        unallocated space
      - reuse sysfs knob bg_reclaim_threshold (otherwise used only in
        zoned mode) for a fixed value threshold
      - new on/off sysfs knob "dynamic_reclaim" calculating the value
        based on heuristics, aiming to keep spare working space for
        relocating chunks but not to needlessly relocate partially
        utilized block groups or reclaim newly allocated ones
      - stats are exported in sysfs per block group type, files
        "reclaim_*"
      - this may increase IO load at unexpected times but the corner
        case of no allocatable block groups is known to be worse

   - automatically remove qgroup of deleted subvolumes:
      - adjust qgroup removal conditions, make sure all related
        subvolume data are already removed, or return EBUSY, also take
        into account setting of sysfs drop_subtree_threshold
      - also works in squota mode

   - mount option updates: new modes of 'rescue=' that allow to mount
     images (read-only) that could have been partially converted by user
     space tools
      - ignoremetacsums  - invalid metadata checksums are ignored
      - ignoresuperflags - super block flags that track conversion in
                           progress (like UUID or checksums)

  Core:

   - size of struct btrfs_inode is now below 1024 (on a release config),
     improved memory packing and other secondary effects

   - switch tracking of open inodes from rb-tree to xarray, minor
     performance improvement

   - reduce number of empty transaction commits when there are no dirty
     data/metadata

   - memory allocation optimizations (reduced numbers, reordering out of
     critical sections)

   - extent map structure optimizations and refactoring, more sanity
     checks

   - more subpage in zoned mode preparations or fixes

   - general snapshot code cleanups, improvements and documentation

   - tree-checker updates: more file extent ram_bytes fixes, continued

   - raid-stripe-tree update (not backward compatible):
      - remove extent encoding field from the structure, can be inferred
        from other information
      - requires btrfs-progs 6.9.1 or newer

   - cleanups and refactoring
      - error message updates
      - error handling improvements
      - return type and parameter cleanups and improvements"

* tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (152 commits)
  btrfs: fix extent map use-after-free when adding pages to compressed bio
  btrfs: fix bitmap leak when loading free space cache on duplicate entry
  btrfs: remove the BUG_ON() inside extent_range_clear_dirty_for_io()
  btrfs: move extent_range_clear_dirty_for_io() into inode.c
  btrfs: enhance compression error messages
  btrfs: fix data race when accessing the last_trans field of a root
  btrfs: rename the extra_gfp parameter of btrfs_alloc_page_array()
  btrfs: remove the extra_gfp parameter from btrfs_alloc_folio_array()
  btrfs: introduce new "rescue=ignoresuperflags" mount option
  btrfs: introduce new "rescue=ignoremetacsums" mount option
  btrfs: output the unrecognized super block flags as hex
  btrfs: remove unused Opt enums
  btrfs: tree-checker: add extra ram_bytes and disk_num_bytes check
  btrfs: fix the ram_bytes assignment for truncated ordered extents
  btrfs: make validate_extent_map() catch ram_bytes mismatch
  btrfs: ignore incorrect btrfs_file_extent_item::ram_bytes
  btrfs: cleanup the bytenr usage inside btrfs_extent_item_to_extent_map()
  btrfs: fix typo in error message in btrfs_validate_super()
  btrfs: move the direct IO code into its own file
  btrfs: pass a btrfs_inode to btrfs_set_prop()
  ...
2024-07-17 12:38:04 -07:00
Filipe Manana
4484940514 btrfs: avoid races when tracking progress for extent map shrinking
We store the progress (root and inode numbers) of the extent map shrinker
in fs_info without any synchronization but we can have multiple tasks
calling into the shrinker during memory allocations when there's enough
memory pressure for example.

This can result in a task A reading fs_info->extent_map_shrinker_last_ino
after another task B updates it, and task A reading
fs_info->extent_map_shrinker_last_root before task B updates it, making
task A see an odd state that isn't necessarily harmful but may make it
skip certain inode ranges or do more work than necessary by going over
the same inodes again. These unprotected accesses would also trigger
warnings from tools like KCSAN.

So add a lock to protect access to these progress fields.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11 16:50:54 +02:00
Qu Wenruo
32e6216512 btrfs: introduce new "rescue=ignoresuperflags" mount option
This new mount option allows the kernel to skip the super flags check,
it's mostly to allow the kernel to do a rescue mount of an interrupted
checksum conversion.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11 15:33:30 +02:00
Qu Wenruo
169aaaf2e0 btrfs: introduce new "rescue=ignoremetacsums" mount option
Introduce "rescue=ignoremetacsums" to ignore metadata csums, all the
other metadata sanity checks are still kept as is.

This new mount option is mostly to allow the kernel to mount an
interrupted checksum conversion (at the metadata csum overwrite stage).

And since the main part of metadata sanity checks is inside
tree-checker, we shouldn't lose much safety, and the new mount option is
rescue mount option it requires full read-only mount.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11 15:33:29 +02:00
David Sterba
2917f74102 btrfs: constify pointer parameters where applicable
We can add const to many parameters, this is for clarity and minor
addition to safety. There are some minor effects, in the assembly
code and .ko measured on release config. This patch does not cover all
possible conversions.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11 15:33:22 +02:00
Qu Wenruo
3b8dbf3425 btrfs: cleanup recursive include of the same header
We have several headers that are including themselves, triggering clangd
warnings.
Such includes are caused by commit 602035d7fe ("btrfs: add forward
declarations and headers, part 2").

Just remove such unnecessary include.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11 15:33:22 +02:00
Thorsten Blum
58a774ca16 btrfs: remove duplicate included header from fs.h
Remove duplicate included header file linux/blkdev.h .

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07 21:31:10 +02:00
Filipe Manana
956a17d9d0 btrfs: add a shrinker for extent maps
Extent maps are used either to represent existing file extent items, or to
represent new extents that are going to be written and the respective file
extent items are created when the ordered extent completes.

We currently don't have any limit for how many extent maps we can have,
neither per inode nor globally. Most of the time this not too noticeable
because extent maps are removed in the following situations:

1) When evicting an inode;

2) When releasing folios (pages) through the btrfs_release_folio() address
   space operation callback.

   However we won't release extent maps in the folio range if the folio is
   either dirty or under writeback or if the inode's i_size is less than
   or equals to 16M (see try_release_extent_mapping(). This 16M i_size
   constraint was added back in 2008 with commit 70dec8079d ("Btrfs:
   extent_io and extent_state optimizations"), but there's no explanation
   about why we have it or why the 16M value.

This means that for buffered IO we can reach an OOM situation due to too
many extent maps if either of the following happens:

1) There's a set of tasks constantly doing IO on many files with a size
   not larger than 16M, specially if they keep the files open for very
   long periods, therefore preventing inode eviction.

   This requires a really high number of such files, and having many non
   mergeable extent maps (due to random 4K writes for example) and a
   machine with very little memory;

2) There's a set tasks constantly doing random write IO (therefore
   creating many non mergeable extent maps) on files and keeping them
   open for long periods of time, so inode eviction doesn't happen and
   there's always a lot of dirty pages or pages under writeback,
   preventing btrfs_release_folio() from releasing the respective extent
   maps.

This second case was actually reported in the thread pointed by the Link
tag below, and it requires a very large file under heavy IO and a machine
with very little amount of RAM, which is probably hard to happen in
practice in a real world use case.

However when using direct IO this is not so hard to happen, because the
page cache is not used, and therefore btrfs_release_folio() is never
called. Which means extent maps are dropped only when evicting the inode,
and that means that if we have tasks that keep a file descriptor open and
keep doing IO on a very large file (or files), we can exhaust memory due
to an unbounded amount of extent maps. This is especially easy to happen
if we have a huge file with millions of small extents and their extent
maps are not mergeable (non contiguous offsets and disk locations).
This was reported in that thread with the following fio test:

   $ cat test.sh
   #!/bin/bash

   DEV=/dev/sdj
   MNT=/mnt/sdj
   MOUNT_OPTIONS="-o ssd"
   MKFS_OPTIONS=""

   cat <<EOF > /tmp/fio-job.ini
   [global]
   name=fio-rand-write
   filename=$MNT/fio-rand-write
   rw=randwrite
   bs=4K
   direct=1
   numjobs=16
   fallocate=none
   time_based
   runtime=90000

   [file1]
   size=300G
   ioengine=libaio
   iodepth=16

   EOF

   umount $MNT &> /dev/null
   mkfs.btrfs -f $MKFS_OPTIONS $DEV
   mount $MOUNT_OPTIONS $DEV $MNT

   fio /tmp/fio-job.ini
   umount $MNT

Monitoring the btrfs_extent_map slab while running the test with:

   $ watch -d -n 1 'cat /sys/kernel/slab/btrfs_extent_map/objects \
                        /sys/kernel/slab/btrfs_extent_map/total_objects'

Shows the number of active and total extent maps skyrocketing to tens of
millions, and on systems with a short amount of memory it's easy and quick
to get into an OOM situation, as reported in that thread.

So to avoid this issue add a shrinker that will remove extents maps, as
long as they are not pinned, and takes proper care with any concurrent
fsync to avoid missing extents (setting the full sync flag while in the
middle of a fast fsync). This shrinker is triggered through the callbacks
nr_cached_objects and free_cached_objects of struct super_operations.

The shrinker will iterate over all roots and over all inodes of each
root, and keeps track of the last scanned root and inode, so that the
next time it runs, it starts from that root and from the next inode.
This is similar to what xfs does for its inode reclaim (implements those
callbacks, and cycles through inodes by starting from where it ended
last time).

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07 21:31:06 +02:00
Filipe Manana
f1d97e7691 btrfs: add a global per cpu counter to track number of used extent maps
Add a per cpu counter that tracks the total number of extent maps that are
in extent trees of inodes that belong to fs trees. This is going to be
used in an upcoming change that adds a shrinker for extent maps. Only
extent maps for fs trees are considered, because for special trees such as
the data relocation tree we don't want to evict their extent maps which
are critical for the relocation to work, and since those are limited, it's
not a concern to have them in memory during the relocation of a block
group. Another case are extent maps for free space cache inodes, which
must always remain in memory, but those are limited (there's only one per
free space cache inode, which means one per block group).

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07 21:31:06 +02:00
David Sterba
5ab2b18088 btrfs: factor out validation of btrfs_ioctl_vol_args::name
The validation of vol args name in several ioctls is not done properly.
a terminating NUL is written to the end of the buffer unconditionally,
assuming that this would be the last place in case the buffer is used
completely. This does not communicate back the actual error (either an
invalid or too long path).

Factor out all such cases and use a helper to do the verification,
simply look for NUL in the buffer. There's no expected practical change,
the size of buffer is 4088, this is enough for most paths or names.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04 16:24:52 +01:00
David Sterba
5693a1286a btrfs: add forward declarations and headers, part 3
Do a cleanup in the rest of the headers:

- add forward declarations for types referenced by pointers
- add includes when types need them

This fixes potential compilation problems if the headers are reordered
or the missing includes are not provided indirectly.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04 16:24:49 +01:00
David Sterba
41044b41ad btrfs: add helper to get fs_info from struct inode pointer
Add a convenience helper to get a fs_info from a VFS inode pointer
instead of open coding the chain or using btrfs_sb() that in some cases
does one more pointer hop.  This is implemented as a macro (still with
type checking) so we don't need full definitions of struct btrfs_inode,
btrfs_root or btrfs_fs_info.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04 16:24:49 +01:00
David Sterba
b33d2e535f btrfs: add helpers to get fs_info from page/folio pointers
Add convenience helpers to get a fs_info from a page or folio pointer
instead of open coding the chain or using btrfs_sb() that in some cases
does one more pointer hop.  This is implemented as a macro (still with
type checking) so we don't need full definitions of struct page, folio,
btrfs_root and btrfs_fs_info. The latter can't be static inlines as this
would create loop between ctree.h <-> fs.h, or the headers would have to
be restructured.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04 16:24:49 +01:00
David Sterba
c8293894af btrfs: add helpers to get inode from page/folio pointers
Add convenience helpers to get a struct btrfs_inode from a page or folio
pointer instead of open coding the chain or intermediate BTRFS_I. This
is implemented as a macro (still with type checking) so we don't need
full definitions of struct page or address_space.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04 16:24:49 +01:00
Filipe Manana
4d9450113e btrfs: document what the spinlock unused_bgs_lock protects
Add some comments to struct btrfs_fs_info to explicitly document which
members are protected by the spinlock unused_bgs_lock. It is currently
used to protect two linked lists, the reclaim_bgs and unused_bgs lists.

So add an explicit comment on top of each list to mention its protected
by unused_bgs_lock, as well as comment on top of unused_bgs_lock to
mention the lists it protects.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04 16:24:46 +01:00
David Sterba
2b712e3bb2 btrfs: remove unused included headers
With help of neovim, LSP and clangd we can identify header files that
are not actually needed to be included in the .c files. This is focused
only on removal (with minor fixups), further cleanups are possible but
will require doing the header files properly with forward declarations,
minimized includes and include-what-you-use care.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04 16:24:46 +01:00
Josef Bacik
6941823cc8 btrfs: remove old mount API code
Now that we've switched to the new mount API, remove the old stuff.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:04 +01:00
Josef Bacik
2496bff6e5 btrfs: add a NOSPACECACHE mount option flag
With the old mount API we'd pre-populate the mount options with the
space cache settings of the file system, and then the user toggled them
on or off with the mount options.  When we switch to the new mount API
the mount options will be set before we get into opening the file
system, so we need a flag to indicate that the user explicitly asked for
-o nospace_cache so we can make the appropriate changes after the fact.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:04 +01:00
Filipe Manana
7dc66abb5a btrfs: use a dedicated data structure for chunk maps
Currently we abuse the extent_map structure for two purposes:

1) To actually represent extents for inodes;
2) To represent chunk mappings.

This is odd and has several disadvantages:

1) To create a chunk map, we need to do two memory allocations: one for
   an extent_map structure and another one for a map_lookup structure, so
   more potential for an allocation failure and more complicated code to
   manage and link two structures;

2) For a chunk map we actually only use 3 fields (24 bytes) of the
   respective extent map structure: the 'start' field to have the logical
   start address of the chunk, the 'len' field to have the chunk's size,
   and the 'orig_block_len' field to contain the chunk's stripe size.

   Besides wasting a memory, it's also odd and not intuitive at all to
   have the stripe size in a field named 'orig_block_len'.

   We are also using 'block_len' of the extent_map structure to contain
   the chunk size, so we have 2 fields for the same value, 'len' and
   'block_len', which is pointless;

3) When an extent map is associated to a chunk mapping, we set the bit
   EXTENT_FLAG_FS_MAPPING on its flags and then make its member named
   'map_lookup' point to the associated map_lookup structure. This means
   that for an extent map associated to an inode extent, we are not using
   this 'map_lookup' pointer, so wasting 8 bytes (on a 64 bits platform);

4) Extent maps associated to a chunk mapping are never merged or split so
   it's pointless to use the existing extent map infrastructure.

So add a dedicated data structure named 'btrfs_chunk_map' to represent
chunk mappings, this is basically the existing map_lookup structure with
some extra fields:

1) 'start' to contain the chunk logical address;
2) 'chunk_len' to contain the chunk's length;
3) 'stripe_size' for the stripe size;
4) 'rb_node' for insertion into a rb tree;
5) 'refs' for reference counting.

This way we do a single memory allocation for chunk mappings and we don't
waste memory for them with unused/unnecessary fields from an extent_map.

We also save 8 bytes from the extent_map structure by removing the
'map_lookup' pointer, so the size of struct extent_map is reduced from
144 bytes down to 136 bytes, and we can now have 30 extents map per 4K
page instead of 28.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:02 +01:00
Filipe Manana
0124855ff1 btrfs: add and use helpers for reading and writing last_trans_committed
Currently the last_trans_committed field of struct btrfs_fs_info is
modified and read without any locking or other protection. For example
early in the fsync path, skip_inode_logging() is called which reads
fs_info->last_trans_committed, but at the same time we can have a
transaction commit completing and updating that field.

In the case of an fsync this is harmless and any data race should be
rare and at most cause an unnecessary logging of an inode.

To avoid data race warnings from tools like KCSAN and other issues such
as load and store tearing (amongst others, see [1]), create helpers to
access the last_trans_committed field of struct btrfs_fs_info using
READ_ONCE() and WRITE_ONCE(), and use these helpers everywhere.

[1] https://lwn.net/Articles/793253/

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:17 +02:00
Filipe Manana
4a4f8fe2b0 btrfs: add and use helpers for reading and writing fs_info->generation
Currently the generation field of struct btrfs_fs_info is always modified
while holding fs_info->trans_lock locked. Most readers will access this
field without taking that lock but while holding a transaction handle,
which is safe to do due to the transaction life cycle.

However there are other readers that are neither holding the lock nor
holding a transaction handle open:

1) When reading an inode from disk, at btrfs_read_locked_inode();

2) When reading the generation to expose it to sysfs, at
   btrfs_generation_show();

3) Early in the fsync path, at skip_inode_logging();

4) When creating a hole at btrfs_cont_expand(), during write paths,
   truncate and reflinking;

5) In the fs_info ioctl (btrfs_ioctl_fs_info());

6) While mounting the filesystem, in the open_ctree() path. In these
   cases it's safe to directly read fs_info->generation as no one
   can concurrently start a transaction and update fs_info->generation.

In case of the fsync path, races here should be harmless, and in the worst
case they may cause a fsync to log an inode when it's not really needed,
so nothing bad from a functional perspective. In the other cases it's not
so clear if functional problems may arise, though in case 1 rare things
like a load/store tearing [1] may cause the BTRFS_INODE_NEEDS_FULL_SYNC
flag not being set on an inode and therefore result in incorrect logging
later on in case a fsync call is made.

To avoid data race warnings from tools like KCSAN and other issues such
as load and store tearing (amongst others, see [1]), create helpers to
access the generation field of struct btrfs_fs_info using READ_ONCE() and
WRITE_ONCE(), and use these helpers where needed.

[1] https://lwn.net/Articles/793253/

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:17 +02:00
Boris Burkov
bd7c1ea3a3 btrfs: qgroup: check generation when recording simple quota delta
Simple quotas count extents only from the moment the feature is enabled.
Therefore, if we do something like:

1. create subvol S
2. write F in S
3. enable quotas
4. remove F
5. write G in S

then after 3. and 4. we would expect the simple quota usage of S to be 0
(putting aside some metadata extents that might be written) and after
5., it should be the size of G plus metadata. Therefore, we need to be
able to determine whether a particular quota delta we are processing
predates simple quota enablement.

To do this, store the transaction id when quotas were enabled. In
fs_info for immediate use and in the quota status item to make it
recoverable on mount. When we see a delta, check if the generation of
the extent item is less than that of quota enablement. If so, we should
ignore the delta from this extent.

Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:11 +02:00
Boris Burkov
182940f4f4 btrfs: qgroup: add new quota mode for simple quotas
Add a new quota mode called "simple quotas". It can be enabled by the
existing quota enable ioctl via a new command, and sets an incompat
bit, as the implementation of simple quotas will make backwards
incompatible changes to the disk format of the extent tree.

Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:10 +02:00
Johannes Thumshirn
e9b9b911e0 btrfs: add raid stripe tree to features enabled with debug config
Until the raid stripe tree code is well enough tested and feature
complete, "hide" it behind CONFIG_BTRFS_DEBUG so only people who
want to use it are actually using it.

The scrub support may still fail some tests (btrfs/060 and up) and will
be fixed, RAID5/6 is not supported.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:10 +02:00
Johannes Thumshirn
515020900d btrfs: read raid stripe tree from disk
If we find the raid-stripe-tree on mount, read it from disk. This is
a backward incompatible feature. The rescue=ignorebadroots mount option
will skip this tree.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:09 +02:00
Qu Wenruo
6d3a61945b btrfs: warn on tree blocks which are not nodesize aligned
A long time ago, we had some metadata chunks which started at sector
boundary but not aligned to nodesize boundary.

This led to some older filesystems which can have tree blocks only
aligned to sectorsize, but not nodesize.

Later 'btrfs check' gained the ability to detect and warn about such tree
blocks, and kernel fixed the chunk allocation behavior, nowadays those
tree blocks should be pretty rare.

But in the future, if we want to migrate metadata to folio, we cannot
have such tree blocks, as filemap_add_folio() requires the page index to
be aligned with the folio number of pages.  Such unaligned tree blocks
can lead to VM_BUG_ON().

So this patch adds extra warning for those unaligned tree blocks, as a
preparation for the future folio migration.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:09 +02:00
Qu Wenruo
732fab95ab btrfs: check-integrity: remove CONFIG_BTRFS_FS_CHECK_INTEGRITY option
Since all check-integrity entry points have been removed, let's also
remove the config and all related code relying on that.

And since we have removed the mount option for check-integrity, we also
need to re-number all the BTRFS_MOUNT_* enums.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:05 +02:00
Naohiro Aota
13bb483d32 btrfs: zoned: activate metadata block group on write time
In the current implementation, block groups are activated at reservation
time to ensure that all reserved bytes can be written to an active metadata
block group. However, this approach has proven to be less efficient, as it
activates block groups more frequently than necessary, putting pressure on
the active zone resource and leading to potential issues such as early
ENOSPC or hung_task.

Another drawback of the current method is that it hampers metadata
over-commit, and necessitates additional flush operations and block group
allocations, resulting in decreased overall performance.

To address these issues, this commit introduces a write-time activation of
metadata and system block group. This involves reserving at least one
active block group specifically for a metadata and system block group.

Since metadata write-out is always allocated sequentially, when we need to
write to a non-active block group, we can wait for the ongoing IOs to
complete, activate a new block group, and then proceed with writing to the
new block group.

Fixes: b093151391 ("btrfs: zoned: activate metadata block group on flush_space")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-21 14:52:19 +02:00
Filipe Manana
ae3364e521 btrfs: store the error that turned the fs into error state
Currently when we turn the fs into an error state, typically after a
transaction abort, we don't store the error anywhere, we just set a bit
(BTRFS_FS_STATE_ERROR) at struct btrfs_fs_info::fs_state to signal the
error state.

There are cases where it would be useful to have access to the specific
error in order to provide a more meaningful error to users/applications.
This change adds a member to struct btrfs_fs_info to store the error and
removes the BTRFS_FS_STATE_ERROR bit. When there's no error, the new
member (fs_error) has a value of 0, otherwise its value is a negative
errno value.

Followup changes will make use of this new member.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-21 14:52:18 +02:00
Filipe Manana
08eb2ad9db btrfs: add comment to struct btrfs_fs_info::dirty_cowonly_roots
Add a comment to struct btrfs_fs_info::dirty_cowonly_roots to mention
that struct btrfs_fs_info::trans_lock is the lock that protects that
list.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19 20:14:44 +02:00
Qu Wenruo
81db6ae842 btrfs: scrub: remove btrfs_fs_info::scrub_wr_completion_workers
Since the scrub rework introduced by commit 2af2aaf982 ("btrfs:
scrub: introduce structure for new BTRFS_STRIPE_LEN based interface")
and later commits, scrub only needs one single workqueue,
fs_info::scrub_worker.

That scrub_wr_completion_workers is initially to handle the delay work
after write bios finished.  But the new scrub code goes submit-and-wait
for write bios, thus all the work are done inside the scrub_worker.

The last user of fs_info::scrub_wr_completion_workers is removed in
commit 16f9399349 ("btrfs: scrub: remove the old writeback
infrastructure"), so we can safely remove the workqueue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19 13:59:40 +02:00
Christoph Hellwig
dd8b7b0416 btrfs: optimize out btrfs_is_zoned for !CONFIG_BLK_DEV_ZONED
Add an IS_ENABLED check for CONFIG_BLK_DEV_ZONED in addition to the
run-time check for the zone size.  This will allow to make use of
compiler dead code elimination for code guarded by btrfs_is_zoned, and
for example provide just a dangling prototype for a function instead of
adding a stub.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19 13:59:32 +02:00
Christoph Hellwig
8bfec2e426 btrfs: remove hipri_workers workqueue
Now that btrfs_wq_submit_bio is never called for synchronous I/O,
the hipri_workers workqueue is not used anymore and can be removed.

Reviewed-by: Chris Mason <clm@fb.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19 13:59:23 +02:00
Qu Wenruo
5dc96f8d5d btrfs: scrub: remove scrub_parity structure
The structure scrub_parity is used to indicate that some extents are
scrubbed for the purpose of RAID56 P/Q scrubbing.

Since the whole RAID56 P/Q scrubbing path has been replaced with new
scrub_stripe infrastructure, and we no longer need to use scrub_parity
to modify the behavior of data stripes, we can remove it completely.

This removal involves:

- scrub_parity_workers
  Now only one worker would be utilized, scrub_workers, to do the read
  and repair.
  All writeback would happen at the main scrub thread.

- scrub_block::sparity member
- scrub_parity structure
- function scrub_parity_get()
- function scrub_parity_put()
- function scrub_free_parity()

- function __scrub_mark_bitmap()
- function scrub_parity_mark_sectors_error()
- function scrub_parity_mark_sectors_data()
  These helpers are no longer needed, scrub_stripe has its bitmaps and
  we can use bitmap helpers to get the error/data status.

- scrub_parity_bio_endio()
- scrub_parity_check_and_repair()
- function scrub_sectors_for_parity()
- function scrub_extent_for_parity()
- function scrub_raid56_data_stripe_for_parity()
- function scrub_raid56_parity()
  The new code would reuse the scrub read-repair and writeback path.
  Just skip the dev-replace phase.
  And scrub_stripe infrastructure allows us to submit and wait for those
  data stripes before scrubbing P/Q, without extra infrastructure.

The following two functions are temporarily exported for later cleanup:

- scrub_find_csum()
- scrub_add_sector_to_rd_bio()

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:24 +02:00
Filipe Manana
5630e2bcfe btrfs: use a constant for the number of metadata units needed for an unlink
Instead of hard coding the number of metadata units for an unlink operation
in a couple places, define a macro and use it instead. This eliminates the
problem of one place getting out of sync with the other, such as recently
fixed by the previous patch in the series ("btrfs: fix calculation of the
global block reserve's size").

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:20 +02:00
Filipe Manana
d1085c9c52 btrfs: constify fs_info argument of the metadata size calculation helpers
The fs_info argument of the helpers btrfs_calc_insert_metadata_size() and
btrfs_calc_metadata_size() is not modified so it can be const. This will
also allow a new helper function in one of the next patches to have its
fs_info argument as const.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:19 +02:00
Filipe Manana
a8fdc05172 btrfs: remove obsolete delayed ref throttling logic when truncating items
We have this logic encapsulated in btrfs_should_throttle_delayed_refs()
where we try to estimate if running the current amount of delayed
references we have will take more than half a second, and if so, the
caller btrfs_should_throttle_delayed_refs() should do something to
prevent more and more delayed refs from being accumulated.

This logic was added in commit 0a2b2a844a ("Btrfs: throttle delayed
refs better") and then further refined in commit a79b7d4b3e ("Btrfs:
async delayed refs"). The idea back then was that the caller of
btrfs_should_throttle_delayed_refs() would release its transaction
handle (by calling btrfs_end_transaction()) when that function returned
true, then btrfs_end_transaction() would trigger an async job to run
delayed references in a workqueue, and later start/join a transaction
again and do more work.

However we don't run delayed references asynchronously anymore, that
was removed in commit db2462a6ad ("btrfs: don't run delayed refs in
the end transaction logic"). That makes the logic that tries to estimate
how long we will take to run our current delayed references, at
btrfs_should_throttle_delayed_refs(), pointless as we don't take any
action to run delayed references anymore. We do have other type of
throttling, which consists of checking the size and reserved space of
the delayed and global block reserves, as well as if fluhsing delayed
references for the current transaction was already started, etc - this
is all done by btrfs_should_end_transaction(), and the only user of
btrfs_should_throttle_delayed_refs() does periodically call
btrfs_should_end_transaction().

So remove btrfs_should_throttle_delayed_refs() and the infrastructure
that keeps track of the average time used for running delayed references,
as well as adapting btrfs_truncate_inode_items() to call
btrfs_check_space_for_delayed_refs() instead.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:19 +02:00
Anand Jain
0f202b256a btrfs: avoid repetitive define BTRFS_FEATURE_INCOMPAT_SUPP
BTRFS_FEATURE_INCOMPAT_SUPP is defined twice, once under
CONFIG_BTRFS_DEBUG and once without it, resulting in repetitive code. The
reason for this is to add experimental features under CONFIG_BTRFS_DEBUG.

To avoid repetitive code, add a common list BTRFS_FEATURE_INCOMPAT_SUPP_STABLE,
and append experimental features only under CONFIG_BTRFS_DEBUG.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:16 +02:00
Josef Bacik
bf1f1fec27 btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING
This flag only gets set when we're doing active zone tracking, and we're
going to need to use this flag for things related to this behavior.
Rename the flag to represent what it actually means for the file system
so it can be used in other ways and still make sense.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Christoph Hellwig
243cf8d1b6 btrfs: calculate file system wide queue limit for zoned mode
To be able to split a write into properly sized zone append commands,
we need a queue_limits structure that contains the least common
denominator suitable for all devices.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:53 +01:00
Qu Wenruo
b7625f461d btrfs: sysfs: update fs features directory asynchronously
[BUG]
Since the introduction of per-fs feature sysfs interface
(/sys/fs/btrfs/<UUID>/features/), the content of that directory is never
updated.

Thus for the following case, that directory will not show the new
features like RAID56:

  # mkfs.btrfs -f $dev1 $dev2 $dev3
  # mount $dev1 $mnt
  # btrfs balance start -f -mconvert=raid5 $mnt
  # ls /sys/fs/btrfs/$uuid/features/
  extended_iref  free_space_tree  no_holes  skinny_metadata

While after unmount and mount, we got the correct features:

  # umount $mnt
  # mount $dev1 $mnt
  # ls /sys/fs/btrfs/$uuid/features/
  extended_iref  free_space_tree  no_holes  raid56 skinny_metadata

[CAUSE]
Because we never really try to update the content of per-fs features/
directory.

We had an attempt to update the features directory dynamically in commit
14e46e0495 ("btrfs: synchronize incompat feature bits with sysfs
files"), but unfortunately it get reverted in commit e410e34fad
("Revert "btrfs: synchronize incompat feature bits with sysfs files"").
The problem in the original patch is, in the context of
btrfs_create_chunk(), we can not afford to update the sysfs group.

The exported but never utilized function, btrfs_sysfs_feature_update()
is the leftover of such attempt.  As even if we go sysfs_update_group(),
new files will need extra memory allocation, and we have no way to
specify the sysfs update to go GFP_NOFS.

[FIX]
This patch will address the old problem by doing asynchronous sysfs
update in the cleaner thread.

This involves the following changes:

- Make __btrfs_(set|clear)_fs_(incompat|compat_ro) helpers to set
  BTRFS_FS_FEATURE_CHANGED flag when needed

- Update btrfs_sysfs_feature_update() to use sysfs_update_group()
  And drop unnecessary arguments.

- Call btrfs_sysfs_feature_update() in cleaner_kthread
  If we have the BTRFS_FS_FEATURE_CHANGED flag set.

- Wake up cleaner_kthread in btrfs_commit_transaction if we have
  BTRFS_FS_FEATURE_CHANGED flag

By this, all the previously dangerous call sites like
btrfs_create_chunk() need no new changes, as above helpers would
have already set the BTRFS_FS_FEATURE_CHANGED flag.

The real work happens at cleaner_kthread, thus we pay the cost of
delaying the update to sysfs directory, but the delayed time should be
small enough that end user can not distinguish though it might get
delayed if the cleaner thread is busy with removing subvolumes or
defrag.

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-13 17:50:35 +01:00
Naohiro Aota
85e79ec7b7 btrfs: zoned: enable metadata over-commit for non-ZNS setup
The commit 79417d040f ("btrfs: zoned: disable metadata overcommit for
zoned") disabled the metadata over-commit to track active zones properly.

However, it also introduced a heavy overhead by allocating new metadata
block groups and/or flushing dirty buffers to release the space
reservations. Specifically, a workload (write only without any sync
operations) worsen its performance from 343.77 MB/sec (v5.19) to 182.89
MB/sec (v6.0).

The performance is still bad on current misc-next which is 187.95 MB/sec.
And, with this patch applied, it improves back to 326.70 MB/sec (+73.82%).

This patch introduces a new fs_info->flag BTRFS_FS_NO_OVERCOMMIT to
indicate it needs to disable the metadata over-commit. The flag is enabled
when a device with max active zones limit is loaded into a file-system.

Fixes: 79417d040f ("btrfs: zoned: disable metadata overcommit for zoned")
CC: stable@vger.kernel.org # 6.0+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-11 20:04:25 +01:00
Qu Wenruo
1a1a285139 btrfs: remove the unused endio_raid56_workers and btrfs_raid_bio::end_io_work
Since we have switched all raid56 workload to submit-and-wait method,
there is no use for btrfs_fs_info::endio_raid56_workers workqueue and
btrfs_raid_bio::end_io_work.

Remove them to save some memory.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:49 +01:00
Josef Bacik
7f0add250f btrfs: move super_block specific helpers into super.h
This will make syncing fs.h to user space a little easier if we can pull
the super block specific helpers out of fs.h and put them in super.h.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:47 +01:00
Josef Bacik
6a6b4daf92 btrfs: move CONFIG_BTRFS_FS_RUN_SANITY_TESTS checks to fs.h
We already have a few of these in fs.h, move the remaining checks out of
ctree.h into fs.h.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:47 +01:00
Josef Bacik
3683fbbc23 btrfs: add dependencies to fs.h and block-rsv.h
There's several structures that are embedded inside of fs_info.h, so if
we don't have all the proper includes when we include fs.h we'll get a
variety of compile errors.  I fixed this by adding a temporary c file
that just had #include "fs.h" and then added include files until the
compiler stopped complaining.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:45 +01:00
Josef Bacik
a56159d408 btrfs: move btrfs_fs_info declarations into fs.h
Now that we have a lot of the fs_info related helpers and stuff
isolated, copy these over to fs.h out of ctree.h.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ reformat comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:44 +01:00