Commit graph

221 commits

Author SHA1 Message Date
Qu Wenruo
4618d0a66b btrfs: fix mismatching parameter names for btrfs_get_extent()
The definition for btrfs_get_extent() is using "u64 end" as the last
parameter, but in implementation we go "u64 len", and all call sites
follows the implementation.

This can be very confusing during development, as most developers
including me, would just use the snippet returned by LSP (clangd in my
case), which would only check the definition.

Unfortunately this mismatch is introduced from the very beginning of
btrfs.

Fix it to prevent further confusion.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 22:59:30 +01:00
David Sterba
637e6e0f50 btrfs: allocate btrfs_inode::file_extent_tree only without NO_HOLES
The file_extent_tree was added in 41a2ee75aa ("btrfs: introduce
per-inode file extent tree") so we have an explicit mapping of the file
extents to know where it is safe to update i_size. When the feature
NO_HOLES is enabled, and it's been a mkfs default since 5.15, the tree
is not necessary.

To save some space in the inode, allocate the tree only when necessary.
This reduces size by 16 bytes from 1096 to 1080 on a x86_64 release
config.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 22:59:01 +01:00
Josef Bacik
ed9b50a13e btrfs: cache that we don't have security.capability set
When profiling a workload I noticed we were constantly calling getxattr.
These were mostly coming from __remove_privs, which will lookup if
security.capability exists to remove it.  However instrumenting getxattr
showed we get called nearly constantly on an idle machine on a lot of
accesses.

These are wasteful and not free.  Other security LSMs have a way to
cache their results, but capability doesn't have this, so it's asking us
all the time for the xattr.

Fix this by setting a flag in our inode that it doesn't have a
security.capability xattr.  We set this on new inodes and after a failed
lookup of security.capability.  If we set this xattr at all we'll clear
the flag.

I haven't found a test in fsperf that this makes a visible difference
on, but I assume fs_mark related tests would show it clearly.  This is a
perf report output of the smallfiles100k run where it shows 20% of our
time spent in __remove_privs because we're looking up the non-existent
xattr.

--21.86%--btrfs_write_check.constprop.0
  --21.62%--__file_remove_privs
    --21.55%--security_inode_need_killpriv
      --21.54%--cap_inode_need_killpriv
        --21.53%--__vfs_getxattr
          --20.89%--btrfs_getxattr

Obviously this is just CPU time in a mostly IO bound test, so the actual
effect of removing this callchain is minimal.  However in just normal
testing of an idle system tracing showed around 100 getxattr calls per
minute, and with this patch there are 0.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:05 +01:00
David Sterba
c6e8f898f5 btrfs: open code timespec64 in struct btrfs_inode
The type of timespec64::tv_nsec is 'unsigned long', while we have only
u32 for on-disk and in-memory. This wastes a few bytes in btrfs_inode.
Add separate members for sec and nsec with the corresponding type width.
This creates a 4 byte hole in btrfs_inode which can be utilized in the
future.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:19 +02:00
Filipe Manana
68539bd0e7 btrfs: update comment for struct btrfs_inode::lock
Update the comment for the lock named "lock" in struct btrfs_inode because
it does not mention that the fields "delalloc_bytes", "defrag_bytes",
"csum_bytes", "outstanding_extents" and "disk_i_size" are also protected
by that lock.

Also add a comment on top of each field protected by this lock to mention
that the lock protects them.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:18 +02:00
Filipe Manana
f985078796 btrfs: add and use helpers for reading and writing last_log_commit
Currently, the last_log_commit of a root can be accessed concurrently
without any lock protection. Readers can be calling btrfs_inode_in_log()
early in a fsync call, which reads a root's last_log_commit, while a
writer can change the last_log_commit while a log tree if being synced,
at btrfs_sync_log(). Any races here should be harmless, and in the worst
case they may cause a fsync to log an inode when it's not really needed,
so nothing bad from a functional perspective.

To avoid data race warnings from tools like KCSAN and other issues such
as load and store tearing (amongst others, see [1]), create helpers to
access the last_log_commit field of a root using READ_ONCE() and
WRITE_ONCE(), and use these helpers everywhere.

[1] https://lwn.net/Articles/793253/

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:17 +02:00
David Sterba
398fb9131f btrfs: reorder btrfs_inode to fill gaps
Previous commit created a hole in struct btrfs_inode, we can move
outstanding_extents there. This reduces size by 8 bytes from 1120 to
1112 on a release config.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:16 +02:00
David Sterba
54c6537146 btrfs: open code btrfs_ordered_inode_tree in btrfs_inode
The structure btrfs_ordered_inode_tree is used only in one place, in
btrfs_inode. The structure itself has a 4 byte hole which is wasted
space.

Move the btrfs_ordered_inode_tree members to btrfs_inode with a common
prefix 'ordered_tree_' where the hole can be utilized and shrink inode
size.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:16 +02:00
Filipe Manana
8b9d032225 btrfs: remove redundant root argument from btrfs_update_inode()
The root argument for btrfs_update_inode() always matches the root of the
given inode, so remove the root argument and get it from the inode
argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:12 +02:00
Filipe Manana
0a5d0dc55f btrfs: remove redundant root argument from btrfs_update_inode_fallback()
The root argument for btrfs_update_inode_fallback() always matches the
root of the given inode, so remove the root argument and get it from the
inode argument.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:12 +02:00
David Sterba
e41570d379 btrfs: reduce size and reorder compression members in struct btrfs_inode
Currently the compression type values are bounded and fit to an u8, we
can pack the btrfs_inode a bit by reordering them to the space created
by the location key. This reduces size from 1112 to 1104.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:04 +02:00
Josef Bacik
5335f4376c btrfs: include trace header in where necessary
If we no longer include the tracepoints from ctree.h we fail to compile
because we have the dependency in some of the header files and source
files.  Add the include where we have these dependencies to allow us to
remove the include from ctree.h.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:03 +02:00
Josef Bacik
04cc63d12c btrfs: add fscrypt related dependencies to respective headers
These headers have struct fscrypt_str as function arguments, so add
struct fscrypt_str to the theader, and include linux/fscrypt.h in
btrfs_inode.h as it also needs the definition of struct fscrypt_name for
the new inode args.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12 16:44:02 +02:00
Christoph Hellwig
c56cbe9059 btrfs: reduce the number of arguments to btrfs_run_delalloc_range
Instead of a separate page_started argument that tells the callers that
btrfs_run_delalloc_range already started writeback by itself, overload
the return value with a positive 1 in additio to 0 and a negative error
code to indicate that is has already started writeback, and remove the
nr_written argument as that caller can calculate it directly based on
the range, and in fact already does so for the case where writeback
wasn't started yet.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-21 14:52:14 +02:00
Christoph Hellwig
6648cedd86 btrfs: remove btrfs_writepage_endio_finish_ordered
btrfs_writepage_endio_finish_ordered is a small wrapper around
btrfs_mark_ordered_io_finished that just changs the argument passing
slightly, and adds a tracepoint.

Move the tracpoint to btrfs_mark_ordered_io_finished, which means
it now also covers the error handling in btrfs_cleanup_ordered_extent
and switch all callers to just call btrfs_mark_ordered_io_finished
directly.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-21 14:52:14 +02:00
Naohiro Aota
6442550027 btrfs: tracepoints: also show actual number of the outstanding extents
The btrfs_inode_mod_outstanding_extents trace event only shows the modified
number to the number of outstanding extents. It would be helpful if we can
see the resulting extent number as well.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19 19:47:58 +02:00
Christoph Hellwig
71df088c1c btrfs: defer splitting of ordered extents until I/O completion
The btrfs zoned completion code currently needs an ordered_extent and
extent_map per bio so that it can account for the non-predictable
write location from Zone Append.  To archive that it currently splits
the ordered_extent and extent_map at I/O submission time, and then
records the actual physical address in the ->physical field of the
ordered_extent.

This patch instead switches to record the "original" physical address
that the btrfs allocator assigned in spare space in the btrfs_bio,
and then rewrites the logical address in the btrfs_ordered_sum
structure at I/O completion time.  This allows the ordered extent
completion handler to simply walk the list of ordered csums and
split the ordered extent as needed.  This removes an extra ordered
extent and extent_map lookup and manipulation during the I/O
submission path, and instead batches it in the I/O completion path
where we need to touch these anyway.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19 13:59:33 +02:00
Josef Bacik
f541833c8e btrfs: move split_flags/combine_flags helpers to inode-item.h
These are more related to the inode item flags on disk than the
in-memory btrfs_inode, move the helpers to inode-item.h.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19 13:59:25 +02:00
Christoph Hellwig
e917ff56c8 btrfs: determine synchronous writers from bio or writeback control
The writeback_control structure already passes down the information about
a writeback being synchronous from the core VM code, and thus information
is propagated into the bio REQ_SYNC flag through the wbc_to_write_flags
helper.

Use that information to decide if checksums calculation is offloaded to
a workqueue instead of btrfs_inode::sync_writers field that not only
bloats the inode but also has too wide scope, being inode wide instead
of limited to the actual writeback request.

The sync writes were set in:

- btrfs_do_write_iter - regular IO, sync status is set
- start_ordered_ops - ordered write start, writeback with WB_SYNC_ALL
  mode
- btrfs_write_marked_extents - write marked extents, writeback with
  WB_SYNC_ALL mode

Reviewed-by: Chris Mason <clm@fb.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19 13:59:23 +02:00
Filipe Manana
fa4b8cb173 btrfs: avoid iterating over all indexes when logging directory
When logging a directory, after copying all directory index items from the
subvolume tree to the log tree, we iterate over the subvolume tree to find
all dir index items that are located in leaves COWed (or created) in the
current transaction. If we keep logging a directory several times during
the same transaction, we end up iterating over the same dir index items
everytime we log the directory, wasting time and adding extra lock
contention on the subvolume tree.

So just keep track of the last logged dir index offset in order to start
the search for that index (+1) the next time the directory is logged, as
dir index values (key offsets) come from a monotonically increasing
counter.

The following test measures the difference before and after this change:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/nullb0
  MNT=/mnt/nullb0

  umount $DEV &> /dev/null
  mkfs.btrfs -f $DEV
  mount -o ssd $DEV $MNT

  # Time values in milliseconds.
  declare -a fsync_times
  # Total number of files added to the test directory.
  num_files=1000000
  # Fsync directory after every N files are added.
  fsync_period=100

  mkdir $MNT/testdir

  fsync_total_time=0
  for ((i = 1; i <= $num_files; i++)); do
        echo -n > $MNT/testdir/file_$i

        if [ $((i % fsync_period)) -eq 0 ]; then
                start=$(date +%s%N)
                xfs_io -c "fsync" $MNT/testdir
                end=$(date +%s%N)
                fsync_total_time=$((fsync_total_time + (end - start)))
                fsync_times[i]=$(( (end - start) / 1000000 ))
                echo -n -e "Progress $i / $num_files\r"
        fi
  done

  echo -e "\nHistogram of directory fsync duration in ms:\n"

  printf '%s\n' "${fsync_times[@]}" | \
     perl -MStatistics::Histogram -e '@d = <>; print get_histogram(\@d);'

  fsync_total_time=$((fsync_total_time / 1000000))
  echo -e "\nTotal time spent in fsync: $fsync_total_time ms\n"
  echo

  umount $MNT

The test was run on a non-debug kernel (Debian's default kernel config)
against a 15G null block device.

Result before this change:

   Histogram of directory fsync duration in ms:

   Count: 10000
   Range:  3.000 - 362.000; Mean: 34.556; Median: 31.000; Stddev: 25.751
   Percentiles:  90th: 71.000; 95th: 77.000; 99th: 81.000
      3.000 -    5.278:  1423 #################################
      5.278 -    8.854:  1173 ###########################
      8.854 -   14.467:   591 ##############
     14.467 -   23.277:  1025 #######################
     23.277 -   37.105:  1422 #################################
     37.105 -   58.809:  2036 ###############################################
     58.809 -   92.876:  2316 #####################################################
     92.876 -  146.346:     6 |
    146.346 -  230.271:     6 |
    230.271 -  362.000:     2 |

   Total time spent in fsync: 350527 ms

Result after this change:

   Histogram of directory fsync duration in ms:

   Count: 10000
   Range:  3.000 - 1088.000; Mean:  8.704; Median:  8.000; Stddev: 12.576
   Percentiles:  90th: 12.000; 95th: 14.000; 99th: 17.000
      3.000 -    6.007:  3222 #################################
      6.007 -   11.276:  5197 #####################################################
     11.276 -   20.506:  1551 ################
     20.506 -   36.674:    24 |
     36.674 -  201.552:     1 |
    201.552 -  353.841:     4 |
    353.841 - 1088.000:     1 |

   Total time spent in fsync: 92114 ms

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 19:52:19 +02:00
Christoph Hellwig
7edd339c8a btrfs: pass an ordered_extent to btrfs_extract_ordered_extent
To prepare for a new caller that already has the ordered_extent
available, change btrfs_extract_ordered_extent to take an argument
for it.  Add a wrapper for the bio case that still has to do the
lookup (for now).

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:21 +02:00
Linus Torvalds
885ce48739 for-6.3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmPzxWcACgkQxWXV+ddt
 WDt+fRAAg5pz7gWNMtIK30gp/uojjAkCWXymxRtK2tZU3naI+6IYSAKxuKq8Iz1Y
 drdlpSvTX/Gv3XlGB9QuoH6digTjQzeVzjAm0eP6w8t8354KGSRUYdtoFp8I8E5Z
 q0JUuZ6w/KvpZfOIsmcgpOScgcl+8+UlOxs2iuSrOvAqP8Dg1VCt5vBm7htIb0tm
 5ClbgmIacxWrOII55XGuY0mWuZSlS4hdyWdYMelvtM8aPPG+e8eEzKjscVOOueLz
 Smi1kN5QU3o+m4oKjN1OJlKfeURdbcZUwva9zOsegSbPHUzNwIao44cQ5cQhMR0r
 kI3nCpJwGKdUd6IblEdcqBN5F4V64edLSruOLuGYzxySnEWhFE2YU2xW/v5b1eQW
 GHurI52FGrPqcX9FgQNzfTjQzk341iQ0QIs5exycJH7xeohEZnlaK2yNUngKSo1C
 naqczEMMMcxNjQaooUuxRkL/zz36D/Dkyo2YOCODtWyu61XY9LqvaxMvClFI20lL
 40dzzYnnMQwkXJrQ/MVQhz1BBaPVqizt8+ErL7GQp2CWr9miD6mcA5b2pyZm5Q3r
 hHadzeTXXS7P9g9UnuDxpZqkhvadGC2Sy4l/D6jURyKFzr8mtplaRRwUS2gSuP3z
 zxavvP4UukwNWXxDz755NAhiGbA+xpSMATKCrZ/Sdogvxe8IhRg=
 =NCpw
 -----END PGP SIGNATURE-----

Merge tag 'for-6.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "The usual mix of performance improvements and new features.

  The core change is reworking how checksums are processed, with
  followup cleanups and simplifications. There are two minor changes in
  block layer and iomap code.

  Features:

   - block group allocation class heuristics:
      - pack files by size (up to 128k, up to 8M, more) to avoid
        fragmentation in block groups, assuming that file size and life
        time is correlated, in particular this may help during balance
      - with tracepoints and extensible in the future

  Performance:

   - send: cache directory utimes and only emit the command when
     necessary
      - speedup up to 10x
      - smaller final stream produced (no redundant utimes commands
        issued)
      - compatibility not affected

   - fiemap: skip backref checks for shared leaves
      - speedup 3x on sample filesystem with all leaves shared (e.g. on
        snapshots)

   - micro optimized b-tree key lookup, speedup in metadata operations
     (sample benchmark: fs_mark +10% of files/sec)

  Core changes:

   - change where checksumming is done in the io path:
      - checksum and read repair does verification at lower layer
      - cascaded cleanups and simplifications

   - raid56 refactoring and cleanups

  Fixes:

   - sysfs: make sure that a run-time change of a feature is correctly
     tracked by the feature files

   - scrub: better reporting of tree block errors

  Other:

   - locally enable -Wmaybe-uninitialized after fixing all warnings

   - misc cleanups, spelling fixes

  Other code:

   - block: export bio_split_rw

   - iomap: remove IOMAP_F_ZONE_APPEND"

* tag 'for-6.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (109 commits)
  btrfs: make kobj_type structures constant
  btrfs: remove the bdev argument to btrfs_rmap_block
  btrfs: don't rely on unchanging ->bi_bdev for zone append remaps
  btrfs: never return true for reads in btrfs_use_zone_append
  btrfs: pass a btrfs_bio to btrfs_use_append
  btrfs: set bbio->file_offset in alloc_new_bio
  btrfs: use file_offset to limit bios size in calc_bio_boundaries
  btrfs: do unsigned integer division in the extent buffer binary search loop
  btrfs: eliminate extra call when doing binary search on extent buffer
  btrfs: raid56: handle endio in scrub_rbio
  btrfs: raid56: handle endio in recover_rbio
  btrfs: raid56: handle endio in rmw_rbio
  btrfs: raid56: submit the read bios from scrub_assemble_read_bios
  btrfs: raid56: fold rmw_read_wait_recover into rmw_read_bios
  btrfs: raid56: fold recover_assemble_read_bios into recover_rbio
  btrfs: raid56: add a bio_list_put helper
  btrfs: raid56: wait for I/O completion in submit_read_bios
  btrfs: raid56: simplify code flow in rmw_rbio
  btrfs: raid56: simplify error handling and code flow in raid56_parity_write
  btrfs: replace btrfs_wait_tree_block_writeback by wait_on_extent_buffer_writeback
  ...
2023-02-20 12:54:27 -08:00
Christoph Hellwig
35a8d7da3c btrfs: remove now spurious bio submission helpers
Call btrfs_submit_bio and btrfs_submit_compressed_read directly from
submit_one_bio now that all additional functionality has moved into
btrfs_submit_bio.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:53 +01:00
Christoph Hellwig
69ccf3f424 btrfs: handle recording of zoned writes in the storage layer
Move the code that splits the ordered extents and records the physical
location for them to the storage layer so that the higher level consumers
don't have to care about physical block numbers at all.  This will also
allow to eventually remove accounting for the zone append write sizes in
the upper layer with a little bit more block layer work.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:52 +01:00
Christoph Hellwig
deb6216fa0 btrfs: open code the submit_bio_start helpers
The submit helpers are now trivial and can be called directly.  Note
that btree_csum_one_bio has to be moved up in the file a bit to avoid a
forward declaration.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:52 +01:00
Christoph Hellwig
0571b6357c btrfs: remove the io_failure_record infrastructure
struct io_failure_record and the io_failure_tree tree are unused now,
so remove them. This in turn makes struct btrfs_inode smaller by 16
bytes.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:51 +01:00
Christoph Hellwig
3d49d0d312 btrfs: remove now unused checksumming helpers
Remove the unused btrfs_verify_data_csum helper, and fold
btrfs_check_data_csum into its only caller.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:51 +01:00
Christoph Hellwig
e52190441b btrfs: add a btrfs_data_csum_ok helper
Add a new checksumming helper that wraps btrfs_check_data_csum and
does all the checks to if we're dealing with some form of nodatacsum
I/O.  This helper will be used by the new storage layer checksum
validation and repair code.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:51 +01:00
Christian Brauner
f2d40141d5
fs: port inode_init_owner() to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
David Sterba
e55cf7ca85 btrfs: pass btrfs_inode to btrfs_add_delayed_iput
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:55 +01:00
David Sterba
bd54766e40 btrfs: pass btrfs_inode to btrfs_clear_delalloc_extent
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba
62798a4915 btrfs: pass btrfs_inode to btrfs_split_delalloc_extent
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba
4c5d166f6b btrfs: pass btrfs_inode to btrfs_set_delalloc_extent
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba
2454151cde btrfs: pass btrfs_inode to btrfs_merge_delalloc_extent
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:54 +01:00
David Sterba
3c4f91e23a btrfs: pass btrfs_inode to btrfs_delete_subvolume
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:53 +01:00
David Sterba
621af94af3 btrfs: pass btrfs_inode to btrfs_check_data_csum
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:53 +01:00
David Sterba
e5d4d75bd3 btrfs: pass btrfs_inode to btrfs_inode_unlock
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:53 +01:00
David Sterba
29b6352b14 btrfs: pass btrfs_inode to btrfs_inode_lock
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:53 +01:00
David Sterba
d781c1c315 btrfs: pass btrfs_inode to btrfs_submit_dio_repair_bio
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:52 +01:00
David Sterba
b762041629 btrfs: pass btrfs_inode to btrfs_submit_data_read_bio
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:52 +01:00
David Sterba
535a7e5d6b btrfs: pass btrfs_inode to btrfs_submit_data_write_bio
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:52 +01:00
David Sterba
bfa1706682 btrfs: pass btrfs_inode to btrfs_submit_bio_start_direct_io
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:52 +01:00
David Sterba
882681ac98 btrfs: pass btrfs_inode to btrfs_submit_bio_start
The function is for internal interfaces so we should use the
btrfs_inode.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:52 +01:00
David Sterba
ad65ecf30b btrfs: simplify btree_submit_bio_start and btrfs_submit_bio_start parameters
After previous patches the unused parameters can be removed from
btree_submit_bio_start and btrfs_submit_bio_start as they don't need to
conform to the extent_submit_bio_start_t typedef.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:51 +01:00
David Sterba
ab2072b292 btrfs: change how submit bio callback is passed to btrfs_wq_submit_bio
There's a callback function parameter for btrfs_wq_submit_bio that can
be one of: metadata, buffered data, direct io data. The callback
abstraction is unnecessary as we have all functions available.

Replace the parameter with a command that leads to a direct call in
run_one_async_start. The called functions can be then simplified and we
can also remove the extent_submit_bio_start_t typedef.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:51 +01:00
David Sterba
7920b773bd btrfs: drop parameter compression_type from btrfs_submit_dio_repair_bio
Compression and direct io don't work together so the compression
parameter can be dropped after previous patch that changed the call
to direct.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:51 +01:00
David Sterba
19af6a7d34 btrfs: change how repair action is passed to btrfs_repair_one_sector
There's a function pointer passed to btrfs_repair_one_sector that will
submit the right bio for repair. However there are only two callbacks,
for buffered and for direct IO. This can be simplified to a bool-based
switch and call either function, indirect calls in this case is an
unnecessary abstraction. This allows to remove the submit_bio_hook_t
typedef.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:51 +01:00
Josef Bacik
2885fd6320 btrfs: move inode prototypes to btrfs_inode.h
I initially wanted to make a new header file for this, but these
prototypes do naturally fit into btrfs_inode.h.  If we want to extract
vfs from pure btrfs code in the future we may need to split this up, but
btrfs_inode embeds the vfs_inode, so it makes sense to put the
prototypes in this header for now.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:45 +01:00
Josef Bacik
f60acad355 btrfs: move btrfs_print_data_csum_error into inode.c
This isn't used outside of inode.c, there's no reason to define it in
btrfs_inode.h. Drop the inline and add __cold as it's for errors that
are not in any hot path.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
9b9b885465 btrfs: use a runtime flag to indicate an inode is a free space inode
We always check the root of an inode as well as it's inode number to
determine if it's a free space inode.  This is problematic as the helper
is in a header file where it doesn't have the fs_info definition.  To
avoid this and make the check a little cleaner simply add a flag to the
runtime_flags to indicate that the inode is a free space inode, set that
when we create the inode, and then change the helper to check for this
flag.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26 12:28:07 +02:00