Commit Graph

260 Commits

Author SHA1 Message Date
Kent Overstreet 27c15ed297 bcachefs: bch_member.btree_allocated_bitmap
This adds a small (64 bit) per-device bitmap that tracks ranges that
have btree nodes, for accelerating btree node scan if it is ever needed.

- New helpers, bch2_dev_btree_bitmap_marked() and
  bch2_dev_bitmap_mark(), for checking and updating the bitmap

- Interior btree update path updates the bitmaps when required

- The check_allocations pass has a new fsck_err check,
  btree_bitmap_not_marked

- New on disk format version, mi_btree_mitmap, which indicates the new
  bitmap is present

- Upgrade table lists the required recovery pass and expected fsck error

- Btree node scan uses the bitmap to skip ranges if we're on the new
  version

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14 20:02:11 -04:00
Kent Overstreet 5ab4beb759 bcachefs: Don't scan for btree nodes when we can reconstruct
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-09 00:53:14 -04:00
Kent Overstreet 359571c327 bcachefs: Fix check_topology() when using node scan
shoot down journal keys _before_ populating journal keys with pointers
to scanned nodes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-09 00:04:57 -04:00
Kent Overstreet 43f5ea4646 bcachefs: Topology repair now uses nodes found by scanning to fill holes
With the new btree node scan code, we can now recover from corrupt btree
roots - simply create a new fake root at depth 1, and then insert all
the leaves we found.

If the root wasn't corrupt but there's corruption elsewhere in the
btree, we can fill in holes as needed with the newest version of a given
node(s) from the scan; we also check if a given btree node is older than
what we found from the scan.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03 14:45:30 -04:00
Kent Overstreet b268aa4e7f bcachefs: Don't skip fake btree roots in fsck
When a btree root is unreadable, we might still have keys fro the
journal to walk and mark.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03 14:44:18 -04:00
Kent Overstreet d2554263ad bcachefs: Split out recovery_passes.c
We've grown a fair amount of code for managing recovery passes; tracking
which ones we're running, which ones need to be run, and flagging in the
superblock which ones need to be run on the next recovery.

So it's worth splitting out into its own file, this code is pretty
different from the code in recovery.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet 47d2080e30 bcachefs: Kill bch2_bkey_ptr_data_type()
Remove some duplication, and inconsistency between check_fix_ptrs and
the main ptr marking paths

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet 6f5869ffd9 bcachefs: Fix use after free in bch2_check_fix_ptrs()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet 79032b0781 bcachefs: Improved topology repair checks
Consolidate bch2_gc_check_topology() and btree_node_interior_verify(),
and replace them with an improved version,
bch2_btree_node_check_topology().

This checks that children of an interior node correctly span the full
range of the parent node with no overlaps.

Also, ensure that topology repairs at runtime are always a fatal error;
in particular, this adds a check in btree_iter_down() - if we don't find
a key while walking down the btree that's indicative of a topology error
and should be flagged as such, not a null ptr deref.

Some checks in btree_update_interior.c remaining BUG_ONS(), because we
already checked the node for topology errors when starting the update,
and the assertions indicate that we _just_ corrupted the btree node -
i.e. the problem can't be that existing on disk corruption, they
indicate an actual algorithmic bug.

In the future, we'll be annotating the fsck errors list with which
recovery pass corrects them; the open coded "run explicit recovery pass
or fatal error" in bch2_btree_node_check_topology() will in the future
be done for every fsck_err() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet 37bb9c9572 bcachefs: Fix locking in bch2_alloc_write_key()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:11 -04:00
Kent Overstreet cdce109431 bcachefs: reconstruct_alloc cleanup
Now that we've got the errors_silent mechanism, we don't have to check
if the reconstruct_alloc option is set all over the place.

Also - users no longer have to explicitly select fsck and fix_errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:26 -04:00
Kent Overstreet b6fc661f09 bcachefs: Fix order of gc_done passes
gc_stripes_done() and gc_reflink_done() may do alloc btree updates (i.e.
when deleting an indirect extent) - we need bucket gens to be fixed by
then.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:26 -04:00
Kent Overstreet 06ebc48306 bcachefs: fix deletion of indirect extents in btree_gc
we need to run the normal extent update path on deletion -
bch2_bkey_make_mut() is incorrect when key type is changing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:26 -04:00
Kent Overstreet 52946d828a bcachefs: Kill more -EIO error codes
This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet cb6fc943b6 bcachefs: kill kvpmalloc()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:12 -04:00
Kent Overstreet 5f43b0134e bcachefs: btree node prefetching in check_topology
btree_and_journal_iter is old code that we want to get rid of, but we're
not ready to yet.

lack of btree node prefetching is, it turns out, a real performance
issue for fsck on spinning rust, so - add it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet fc634d8e46 bcachefs: btree_and_journal_iter.trans
we now always have a btree_trans when using a btree_and_journal_iter;
prep work for adding prefetching to btree_and_journal_iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet b3eba6a4a7 bcachefs: Fix degraded mode fsck
We don't know where the superblock and journal lives on offline devices;
that means if a device is offline fsck can't check those buckets.

Previously, fsck would incorrectly clear bucket data types for those
buckets on offline devices; now we just use the previous state.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:18:45 -04:00
Kent Overstreet 8e7834a883 bcachefs: bch_fs_usage_base
Split out base filesystem usage into its own type; prep work for
breaking up bch2_trans_fs_usage_apply().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 06:01:45 -05:00
Kent Overstreet e58f963cec bcachefs: helpers for printing data types
We need bounds checking since new versions may introduce new data types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 06:01:45 -05:00
Kent Overstreet f0431c5f47 bcachefs: Combine .trans_trigger, .atomic_trigger
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet 089e311347 bcachefs: Kill BTREE_TRIGGER_NOATOMIC
dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet ad00bce07d bcachefs: mark now takes bkey_s
Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet 717296c34c bcachefs: trans_mark now takes bkey_s
Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet 0beebd9245 bcachefs: bkey_for_each_ptr() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet 9fea2274f7 bcachefs: for_each_member_device() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet 80eab7a7c2 bcachefs: for_each_btree_key() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet cf904c8d96 bcachefs: bch_err_(fn|msg) check if should print
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet 5028b9078c bcachefs: Rename for_each_btree_key2() -> for_each_btree_key()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet 27b2df982f bcachefs: Kill for_each_btree_key()
for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.

The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet 3c471b6588 bcachefs: convert bch_fs_flags to x-macro
Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet 9b34f02cdc bcachefs: Kill dev_usage->buckets_ec
This counter is redundant; it's simply the sum of BCH_DATA_stripe and
BCH_DATA_parity buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet 086a52f7fa bcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1
Prep work for introducing bch_replicas_entry_v2

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet cb52d23e77 bcachefs: Rename BTREE_INSERT flags
BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet 463086d998 bcachefs: Convert gc_alloc_start() to for_each_btree_key2()
This eliminates some SRCU warnings: for_each_btree_key2() runs every
loop iteration in a distinct transaction context.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28 22:58:22 -05:00
Kent Overstreet b65db750e2 bcachefs: Enumerate fsck errors
This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.

Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet 253ba178c8 bcachefs: Fix ca->oldest_gen allocation
The ca->oldest_gen array needs to be the same size as the bucket_gens
array; ca->mi.nbuckets is updated with only state_lock held, not
gc_lock, so bch2_gc_gens() could race with device resize and allocate
too small of an oldest_gens array.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:37 -04:00
Kent Overstreet 88dfe193bd bcachefs: bch2_btree_id_str()
Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:37 -04:00
Kent Overstreet 6bd68ec266 bcachefs: Heap allocate btree_trans
We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet 96dea3d599 bcachefs: Fix W=12 build errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Yang Li b5e85d4d0c bcachefs: Remove unneeded semicolon
./fs/bcachefs/btree_gc.c:1249:2-3: Unneeded semicolon
./fs/bcachefs/btree_gc.c:1521:2-3: Unneeded semicolon
./fs/bcachefs/btree_gc.c:1575:2-3: Unneeded semicolon
./fs/bcachefs/counters.c:46:2-3: Unneeded semicolon

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet e46c181af9 bcachefs: Convert more code to bch_err_msg()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet 401585fe87 bcachefs: btree_journal_iter.c
Split out a new file from recovery.c for managing the list of keys we
read from the journal: before journal replay finishes the btree iterator
code needs to be able to iterate over and return keys from the journal
as well, so there's a fair bit of code here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet 1e81f89b02 bcachefs: Fix assorted checkpatch nits
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet 0ed4ca146e bcachefs: Ensure topology repair runs
This fixes should_restart_for_topology_repair() - previously it was
returning false if the btree io path had already seleceted topology
repair to run, even if it hadn't run yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:09 -04:00
Kent Overstreet 922bc5a037 bcachefs: Make topology repair a normal recovery pass
This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and
explicitly running a specific recovery pass - this is a more general
replacement for how we were running topology repair before.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:08 -04:00
Kent Overstreet a0f8faea5f bcachefs: fix_errors option is now a proper enum
Before, it was parsed as a bool but internally it was really an enum:
this lets us pass in all the possible values.

But we special case the option parsing: no supplied value is parsed as
FSCK_FIX_yes, to match the previous behaviour.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:07 -04:00
Kent Overstreet 067d228bb0 bcachefs: Enumerate recovery passes
Recovery and fsck have many different passes/jobs to do, which always
run in the same order - but not all of them run all the time. Some are
for fsck, some for unclean shutdown, some for version upgrades.

This adds some new structure: a defined list of recovery passes that we
can run in a loop, as well as consolidating the log messages.

The main benefit is consolidating the "should run this recovery pass"
logic, as well as cleaning up the "this recovery pass has finished"
state; instead of having a bunch of ad-hoc state bits in c->flags, we've
now got c->curr_recovery_pass.

By consolidating the "should run this recovery pass" logic, in the
future on disk format upgrades will be able to say "upgrading to this
version requires x passes to run", instead of forcing all of fsck to
run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet 73bd774d28 bcachefs: Assorted sparse fixes
- endianness fixes
 - mark some things static
 - fix a few __percpu annotations
 - fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet faa6cb6c13 bcachefs: Allow for unknown btree IDs
We need to allow filesystems with metadata from newer versions to be
mountable and usable by older versions.

This patch enables us to roll out new btrees without a new major version
number; we can now handle btree roots for unknown btree types.

The unknown btree roots will be retained, and fsck (including
backpointers) will check them, the same as other btree types.

We add a dynamic array for the extra, unknown btree roots, in addition
to the fixed size btree root array, and add new helpers for looking up
btree roots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:05 -04:00