Commit Graph

82 Commits

Author SHA1 Message Date
Kent Overstreet 65eaf4e24a bcachefs: s/bkey_invalid_flags/bch_validate_flags
We're about to start using bch_validate_flags for superblock section
validation - it's no longer bkey specific.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09 16:23:36 -04:00
Kent Overstreet 78e9b548f3 bcachefs: bch2_dev_bucket_exists() uses bch2_dev_rcu()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet abe2f470bc bcachefs: simplify bch2_trans_start_alloc_update()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet be11ae16c4 bcachefs: __mark_pointer now takes bch_alloc_v4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Kent Overstreet c02eb9e891 bcachefs: kill bch2_dev_usage_update_m()
by using bucket_m_to_alloc() more, we can get some nice code cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Kent Overstreet fa9bb741fe bcachefs: alloc_data_type_set()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Nathan Chancellor 2d288745eb bcachefs: Fix type of flags parameter for some ->trigger() implementations
When building with clang's -Wincompatible-function-pointer-types-strict
(a warning designed to catch potential kCFI failures at build time),
there are several warnings along the lines of:

  fs/bcachefs/bkey_methods.c:118:2: error: incompatible function pointer types initializing 'int (*)(struct btree_trans *, enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, enum btree_iter_update_trigger_flags)' with an expression of type 'int (struct btree_trans *, enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, unsigned int)' [-Werror,-Wincompatible-function-pointer-types-strict]
    118 |         BCH_BKEY_TYPES()
        |         ^~~~~~~~~~~~~~~~
  fs/bcachefs/bcachefs_format.h:394:2: note: expanded from macro 'BCH_BKEY_TYPES'
    394 |         x(inode,                8)                      \
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
  fs/bcachefs/bkey_methods.c:117:41: note: expanded from macro 'x'
    117 | #define x(name, nr) [KEY_TYPE_##name]   = bch2_bkey_ops_##name,
        |                                           ^~~~~~~~~~~~~~~~~~~~
  <scratch space>:277:1: note: expanded from here
    277 | bch2_bkey_ops_inode
        | ^~~~~~~~~~~~~~~~~~~
  fs/bcachefs/inode.h:26:13: note: expanded from macro 'bch2_bkey_ops_inode'
     26 |         .trigger        = bch2_trigger_inode,           \
      |                           ^~~~~~~~~~~~~~~~~~

There are several functions that did not have their flags parameter
converted to 'enum btree_iter_update_trigger_flags' in the recent
unification, which will cause kCFI failures at runtime because the
types, while ABI compatible (hence no warning from the non-strict
version of this warning), do not match exactly.

Fix up these functions (as well as a few other obvious functions that
should have it, even if there are no warnings currently) to resolve the
warnings and potential kCFI runtime failures.

Fixes: 31e4ef3280c8 ("bcachefs: iter/update/trigger/str_hash flag cleanup")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Kent Overstreet c4e8db2b5d bcachefs: bucket_data_type_mismatch()
We're working on potentially unifying bch2_check_bucket_ref() and
bch2_check_fix_ptrs() - or at least eliminating gratuitious differences.

Most immediately, there's a bunch of cleanups to be done regarding
BCH_DATA_stripe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet 2f724563fc bcachefs: member helper cleanups
Some renaming for better consistency

bch2_member_exists	-> bch2_member_alive
bch2_dev_exists		-> bch2_member_exists
bch2_dev_exsits2	-> bch2_dev_exists
bch_dev_locked		-> bch2_dev_locked
bch_dev_bkey_exists	-> bch2_dev_bkey_exists

new helper - bch2_dev_safe

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet d155272b6e bcachefs: bucket_valid()
cut out a branch from doing it the obvious way

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet 6b8cbfc3db bcachefs: Fix assert in bch2_alloc_v4_invalid()
Reported-by: syzbot+10827fa6b176e1acf1d0@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06 10:58:17 -04:00
Kent Overstreet a393f33123 bcachefs: Split out discard fastpath
Buckets usually can't be discarded until the transaction that made them
empty has been committed in the journal.

Tracing has indicated that we're queuing the discard worker excessively,
only for it to skip over many buckets that are still waiting on a
journal commit, discarding only one or two buckets per iteration.

We want to switch to only queuing the discard worker after a journal
flush write, but there's an important optimization we need to preserve:
if a bucket becomes empty and it was never committed in the journal
while it was in use, we want to discard it and reuse it right away -
since overwriting it before the previous writes are flushed from the
device cache eans those writes only cost bus bandwidth.

So, this patch implements a fast path for buckets that can be discarded
right away. We need new locking between the two discard workers; the new
list of buckets being discarded provides that locking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet f0431c5f47 bcachefs: Combine .trans_trigger, .atomic_trigger
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet 153d1c63c2 bcachefs: unify alloc trigger
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet 6820ac2cdc bcachefs: move bch2_mark_alloc() to alloc_background.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet 717296c34c bcachefs: trans_mark now takes bkey_s
Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet dafff7e575 bcachefs: New bucket sector count helpers
This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(),
prep work for separately accounting stripe sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet 1f7056b735 bcachefs: Ensure copygc does not spin
If copygc does no work - finds no fragmented buckets - wait for a bit of
IO to happen.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04 14:17:11 -04:00
Kent Overstreet b65db750e2 bcachefs: Enumerate fsck errors
This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.

Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet 69d1f052d1 bcachefs: Correctly initialize new buckets on device resize
bch2_dev_resize() was never updated for the allocator rewrite with
persistent freelists, and it wasn't noticed because the tests weren't
running fsck - oops.

Fix this by running bch2_dev_freespace_init() for the new buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:16 -04:00
Kent Overstreet 10a6ced2da bcachefs: Kill bch2_bucket_gens_read()
This folds bch2_bucket_gens_read() into bch2_alloc_read(), doing the
version check there.

This is prep work for enumarating all recovery passes: we need some
cleanup first to make calling all the recovery passes consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet 8726dc936f bcachefs: Change check for invalid key types
As part of the forward compatibility patch series, we need to allow for
new key types without complaining loudly when running an old version.

This patch changes the flags parameter of bkey_invalid to an enum, and
adds a new flag to indicate we're being called from the transaction
commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet e53a961c6b bcachefs: Rename enum alloc_reserve -> bch_watermark
This is prep work for consolidating with JOURNAL_WATERMARK.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet 174f930b8e bcachefs: bkey_ops.min_val_size
This adds a new field to bkey_ops for the minimum size of the value,
which standardizes that check and also enforces the new rule (previously
done somewhat ad-hoc) that we can extend value types by adding new
fields on to the end.

To make that work we do _not_ initialize min_val_size with sizeof,
instead we initialize it to the size of the first version of those
values.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:00 -04:00
Kent Overstreet e84face6f0 bcachefs: RESERVE_stripe
Rework stripe creation path - new algorithm for deciding when to create
new stripes or reuse existing stripes.

We add a new allocation watermark, RESERVE_stripe, above RESERVE_none.
Then we always try to create a new stripe by doing RESERVE_stripe
allocations; if this fails, we reuse an existing stripe and allocate
buckets for it with the reserve watermark for the given write
(RESERVE_none or RESERVE_movinggc).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet 910659763e bcachefs: Mark stripe buckets with correct data type
Currently, we don't use bucket data type for tracking whether buckets
are part of a stripe; parity buckets are BCH_DATA_parity, but data
buckets in a stripe are BCH_DATA_user. There's a separate counter,
buckets_ec, outside the BCH_DATA_TYPES system for tracking number of
buckets on a device that are part of a stripe.

The trouble with this approach is that it's too coarse grained, and we
need better information on fragmentation for debugging copygc.

With this patch, data buckets in a stripe are now tracked as
BCH_DATA_stripe buckets.

This doesn't yet differentiate between erasure coded and non-erasure
coded data in a stripe bucket, nor do we yet track empty data buckets in
stripes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet 80c3308578 bcachefs: Fragmentation LRU
Now that we have much more efficient updates to the LRU btree, this
patch adds a new LRU that indexes buckets by fragmentation.

This means copygc no longer has to scan every bucket to find buckets
that need to be evacuated.

Changes:
 - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to
   the bucket's position in the fragmentation LRU. We add a new field
   for this instead of calculating it as needed because we may make the
   fragmentation LRU optional; this field indicates whether a bucket is
   on the fragmentation LRU.

   Also, zoned devices will introduce variable bucket sizes; explicitly
   recording the LRU position will be safer for them.

 - A new copygc path for using the fragmentation LRU instead of
   scanning every bucket and building up an in-memory heap.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:53 -04:00
Kent Overstreet facafdcbc1 bcachefs: Change bkey_invalid() rw param to flags
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:52 -04:00
Kent Overstreet 350175bf9b bcachefs: Improved nocow locking
This improves the nocow lock table so that hash table entries have
multiple locks, and locks specify which bucket they're for - i.e. we can
now resolve hash collisions.

This is important because the allocator has to skip buckets that are
locked in the nocow lock table, and previously hash collisions would
cause it to spuriously skip unlocked buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:52 -04:00
Kent Overstreet 5250b74d55 bcachefs: bucket_gens btree
To improve mount times, add a btree for just bucket gens, 256 of them
per key: this means we'll have to scan drastically less metadata at
startup.

This adds
 - trigger for keeping it in sync with the all btree
 - initialization code, for filesystems from previous versions
 - new path for reading bucket gens
 - new fsck code

And a new on disk format version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:51 -04:00
Kent Overstreet a8c752bb1d bcachefs: New on disk format: Backpointers
This patch adds backpointers: we now have a reverse index from device
and offset on that device (specifically, offset within a bucket) back to
btree nodes and (non cached) data extents.

The first 40 backpointers within a bucket are stored in the alloc key;
after that backpointers spill over to the next backpointers btree. This
is to help avoid performance regressions from additional btree updates
on large streaming workloads.

This patch adds all the code for creating, checking and repairing
backpointers. The next patch in the series is going to use backpointers
for copygc - finally getting rid of the need to scan all extents to do
copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:50 -04:00
Kent Overstreet 19a614d2e4 bcachefs: Better inlining for bch2_alloc_to_v4_mut
This separates out the slowpath into a separate function, and inlines
bch2_alloc_v4_mut into bch2_trans_start_alloc_update(), the main place
it's called.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:49 -04:00
Kent Overstreet a101957649 bcachefs: More style fixes
Fixes for various checkpatch errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet b962552eab bcachefs: Fix should_invalidate_buckets()
Like bch2_copygc_wait_amount, should_invalidate_buckets() needs to try
to ensure that there are always more buckets free than the largest
reserve.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:36 -04:00
Kent Overstreet e1b8f5f5ca bcachefs: Plumb btree_id & level to trans_mark
For backpointers, we'll need the full key location - that means btree_id
and btree level. This patch plumbs it through.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet 822835ffea bcachefs: Fold bucket_state in to BCH_DATA_TYPES()
Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet 62491956f4 bcachefs: Move alloc assertion to .key_invalid()
.key_invalid is a better place for this assertion.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet 275c8426fb bcachefs: Add rw to .key_invalid()
This adds a new parameter to .key_invalid() methods for whether the key
is being read or written; the idea being that methods can do more
aggressive checks when a key is newly created and being written, when we
wouldn't want to delete the key because of those checks.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet e1effd42a1 bcachefs: More improvements for alloc info checks
- Move checks for whether the device & bucket are valid from the
   .key_invalid method to bch2_check_alloc_key(). This is because
   .key_invalid() is called on keys that may no longer exist (post
   journal replay), which is a problem when removing/resizing devices.

 - We weren't checking the need_discard btree to ensure that every set
   bucket has a corresponding alloc key. This refactors the code for
   checking the freespace btree, so that it now checks both.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet f0ac7df23d bcachefs: Convert .key_invalid methods to printbufs
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet 5735608c14 bcachefs: Kill main in-memory bucket array
All code using the in-memory bucket array, excluding GC, has now been
converted to use the alloc btree directly - so we can finally delete it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet 5add07d56a bcachefs: Fsck for need_discard & freespace btrees
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet caece7fe3f bcachefs: New bucket invalidate path
In the old allocator code, preparing an existing empty bucket was part
of the same code path that invalidated buckets containing cached data.
In the new allocator code this is no longer the case: the main allocator
path finds empty buckets (via the new freespace btree), and can't
allocate buckets that contain cached data.

We now need a separate code path to invalidate buckets containing cached
data when we're low on empty buckets, which this patch implements. When
the number of free buckets decreases that triggers the new invalidate
path to run, which uses the LRU btree to pick cached data buckets to
invalidate until we're above our watermark.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet 59cc38b8d4 bcachefs: New discard implementation
In the old allocator code, buckets would be discarded just prior to
being used - this made sense in bcache where we were discarding buckets
just after invalidating the cached data they contain, but in a
filesystem where we typically have more free space we want to be
discarding buckets when they become empty.

This patch implements the new behaviour - it checks the need_discard
btree for buckets awaiting discards, and then clears the appropriate
bit in the alloc btree, which moves the buckets to the freespace btree.

Additionally, discards are now enabled by default.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet f25d8215f4 bcachefs: Kill allocator threads & freelists
Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet c6b2826cd1 bcachefs: Freespace, need_discard btrees
This adds two new btrees for the upcoming allocator rewrite: an extents
btree of free buckets, and a btree for buckets awaiting discards.

We also add a new trigger for alloc keys to keep the new btrees up to
date, and a compatibility path to initialize them on existing
filesystems.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet 3d48a7f85f bcachefs: KEY_TYPE_alloc_v4
This introduces a new alloc key which doesn't use varints. Soon we'll be
adding backpointers and storing them in alloc keys, which means our
pack/unpack workflow for alloc keys won't really work - we'll need to be
mutating alloc keys in place.

Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that
converts older types of alloc keys to v4 if needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet 880e2275f9 bcachefs: Move trigger fns to bkey_ops
This replaces the switch statements in bch2_mark_key(),
bch2_trans_mark_key() with new bkey methods - prep work for the next
patch, which fixes BTREE_TRIGGER_WANTS_OLD_AND_NEW.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:28 -04:00
Kent Overstreet ec061b215d bcachefs: btree_gc no longer uses main in-memory bucket array
This changes the btree_gc code to only use the second bucket array, the
one dedicated to GC. On completion, it compares what's in its in memory
bucket array to the allocation information in the btree and writes it
directly, instead of updating the main in-memory bucket array and
writing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:23 -04:00
Kent Overstreet abe19d458e bcachefs: Refactor open_bucket code
Prep work for adding a hash table of open buckets - instead of embedding
a bch_extent_ptr, we need to refer to the bucket directly so that we're
not calling sector_to_bucket() in the hash table lookup code, which has
an expensive divide.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00