Commit Graph

347 Commits

Author SHA1 Message Date
Kent Overstreet cd3b31f9d4 bcachefs: Ensure we're RW before journalling
Reported-by: syzbot+c60cd352aedb109528bf@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-22 20:17:33 -04:00
Kent Overstreet f108ddd467 bcachefs: Fix shift overflow in btree_lost_data()
Reported-by: syzbot+29f65db1a5fe427b5c56@syzkaller.appspotmail.com
Fixes: 55936afe11 ("bcachefs: Flag btrees with missing data")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20 05:37:26 -04:00
Kent Overstreet adf81796ee bcachefs: journal_replay_entry_early() checks for nonexistent device
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet 45150765d3 bcachefs: bch_member.last_journal_bucket
On recovery from clean shutdown we don't typically read the journal, but
we still want to avoid overwriting existing entries in the journal for
list_journal debugging.

Thus, add some fields to the member info section so we can remember
where we left off.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Kent Overstreet f04158290d bcachefs: journal seq blacklist gc no longer has to walk btree
Since btree_ptr_v2, we no longer require the journal seq blacklist table
for skipping blacklisted bsets (btree node entries); the pointer to a
given node indicates how much data is present.

Therefore there's no longer any need for journal seq blacklist gc to
walk the btree - we can prune entries older than journal last_seq.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet ca563dccb2 bcachefs: bch2_trans_unlock() must always be followed by relock() or begin()
We're about to add new asserts for btree_trans locking consistency, and
part of that requires that aren't using the btree_trans while it's
unlocked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet 2f724563fc bcachefs: member helper cleanups
Some renaming for better consistency

bch2_member_exists	-> bch2_member_alive
bch2_dev_exists		-> bch2_member_exists
bch2_dev_exsits2	-> bch2_dev_exists
bch_dev_locked		-> bch2_dev_locked
bch_dev_bkey_exists	-> bch2_dev_bkey_exists

new helper - bch2_dev_safe

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet 5dd8c60e1e bcachefs: iter/update/trigger/str_hash flag cleanup
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.

These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet d1b213a00d bcachefs: Finish converting reconstruct_alloc to errors_silent
with errors_silent, reconstruct_alloc no longer requires fsck and
fix_errors to work

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet 4dcd90b6d1 bcachefs: Don't read journal just for fsck
reading the journal can take a decent amount of time compared to the
rest of fsck, let's only read it when required.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet 62606398d5 bcachefs: Run upgrade/downgrade even in -o nochanges mode
We need to be able to test these paths in dry run mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:17 -04:00
Kent Overstreet 7ffec9ccdc bcachefs: don't free error pointers
Reported-by: syzbot+3333603f569fc2ef258c@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06 10:58:17 -04:00
Kent Overstreet 79055f50a6 bcachefs: make sure to release last journal pin in replay
This fixes a deadlock when journal replay has many keys to insert that
were from fsck, not the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-16 19:14:01 -04:00
Kent Overstreet 5ab4beb759 bcachefs: Don't scan for btree nodes when we can reconstruct
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-09 00:53:14 -04:00
Kent Overstreet a292be3b68 bcachefs: Reconstruct missing snapshot nodes
When the snapshots btree is going, we'll have to delete huge amounts of
data - unless we can reconstruct it by looking at the keys that refer to
it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03 14:46:51 -04:00
Kent Overstreet 55936afe11 bcachefs: Flag btrees with missing data
We need this to know when we should attempt to reconstruct the snapshots
btree

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03 14:46:51 -04:00
Kent Overstreet 4409b8081d bcachefs: Repair pass for scanning for btree nodes
If a btree root or interior btree node goes bad, we're going to lose a
lot of data, unless we can recover the nodes that it pointed to by
scanning.

Fortunately btree node headers are fully self describing, and
additionally the magic number is xored with the filesytem UUID, so we
can do so safely.

This implements the scanning - next patch will rework topology repair to
make use of the found nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03 14:44:18 -04:00
Kent Overstreet f2f61f4192 bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03 14:44:18 -04:00
Kent Overstreet bdbf953b3c bcachefs: bch2_shoot_down_journal_keys()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03 14:44:18 -04:00
Kent Overstreet 27fcec6c27 bcachefs: Clear recovery_passes_required as they complete without errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03 14:44:18 -04:00
Kent Overstreet 13c1e583f9 bcachefs: Improve -o norecovery; opts.recovery_pass_limit
This adds opts.recovery_pass_limit, and redoes -o norecovery to make use
of it; this fixes some issues with -o norecovery so it can be safely
used for data recovery.

Norecovery means "don't do journal replay"; it's an important data
recovery tool when we're getting stuck in journal replay.

When using it this way we need to make sure we don't free journal keys
after startup, so we continue to overlay them: thus it needs to imply
retain_recovery_info, as well as nochanges.

recovery_pass_limit is an explicit option for telling recovery to exit
after a specific recovery pass; this is a much cleaner way of
implementing -o norecovery, as well as being a useful debug feature in
its own right.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:12 -04:00
Kent Overstreet 0a34c058fc bcachefs: Ensure bch_sb_field_ext always exists
This makes bch_sb_field_ext more consistent with the rest of -o
nochanges - we don't want to be varying other codepaths based on -o
nochanges, since it's used for testing in dry run mode; also fixes some
potential null ptr derefs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:12 -04:00
Kent Overstreet 4fe0eeeae4 bcachefs: Flush journal immediately after replay if we did early repair
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:12 -04:00
Kent Overstreet d2554263ad bcachefs: Split out recovery_passes.c
We've grown a fair amount of code for managing recovery passes; tracking
which ones we're running, which ones need to be run, and flagging in the
superblock which ones need to be run on the next recovery.

So it's worth splitting out into its own file, this code is pretty
different from the code in recovery.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet a586036841 bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 21:17:38 -04:00
Kent Overstreet cdce109431 bcachefs: reconstruct_alloc cleanup
Now that we've got the errors_silent mechanism, we don't have to check
if the reconstruct_alloc option is set all over the place.

Also - users no longer have to explicitly select fsck and fix_errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:26 -04:00
Kent Overstreet 2cce3752ce bcachefs: split out ignore_blacklisted, ignore_not_dirty
prep work for replaying the journal backwards

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet 69426613cd bcachefs: improve move_gap()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet 95ffc7fb8c bcachefs: journal_keys now uses darray helpers
nice bit of code cleanup

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet 894d062254 bcachefs: Rename journal_keys.d -> journal_keys.data
This will let us use some darray helpers in the next patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet 52946d828a bcachefs: Kill more -EIO error codes
This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet ba89083e9f bcachefs: Fix journal replay with unreadable btree roots
When a btree root is unreadable, we still might be able to get some data
back by replaying what's in the journal. Previously though, we got
confused when journal replay would attempt to replay a key for a level
that didn't exist.

This adds bch2_btree_increase_depth(), so that journal replay can handle
this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:18:13 -04:00
Kent Overstreet 6fa30fe7f7 bcachefs: journal_seq_blacklist_add() now handles entries being added out of order
bch2_journal_seq_blacklist_add() was bugged when the new entry
overlapped with multiple existing entries, and it also assumed new
entries are being added in increasing order.

This is true on any sane filesystem, but when trying to recover from
very badly mangled filesystems we might end up with the journal sequence
number rewinding vs. what the blacklist list knows about - easiest to
just handle that here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:09:59 -04:00
Kent Overstreet 2eeccee86d bcachefs: Fix check_version_upgrade()
When also downgrading, check_version_upgrade() could pick a new version
greater than the latest supported version.

Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-13 21:58:37 -05:00
Kent Overstreet 8e7834a883 bcachefs: bch_fs_usage_base
Split out base filesystem usage into its own type; prep work for
breaking up bch2_trans_fs_usage_apply().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 06:01:45 -05:00
Kent Overstreet 72e2c920e4 bcachefs: Restart recovery passes more reliably
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:21 -05:00
Kent Overstreet 15eaaa4c31 bcachefs: Upgrades now specify errors to fix, like downgrades
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet d55ddf6e7a bcachefs: Online fsck can now fix errors
BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing
fsck errors. Also; set fix_errors to ask by default when fsck is
running.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet eff1f728be bcachefs: Upgrading uses bch_sb.recovery_passes_required
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet 62719cf33c bcachefs: Fix nochanges/read_only interaction
nochanges means "we cannot issue writes at all"; it's possible to go
into a pseudo read-write mode where we pin dirty metadata in memory,
which is used for fsck in dry run mode and doing journal replay on a
read only mount, but we do not want to allow an actual read-write mount
in nochanges mode.

But we do always want to allow early read-write, during recovery - this
patch clarifies that.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet cea07a7b6a bcachefs: vstruct_for_each() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet 9fea2274f7 bcachefs: for_each_member_device() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet defd9e39b5 bcachefs: darray_for_each() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet cf904c8d96 bcachefs: bch_err_(fn|msg) check if should print
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet 249bf593e8 bcachefs: Fix snapshot.c assertion for online fsck
c->curr_recovery_pass can go backwards; this adds a non rewinding
version, c->recovery_pass_done.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet 27b2df982f bcachefs: Kill for_each_btree_key()
for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.

The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet 7f391b2f8e bcachefs: bch2_run_online_recovery_passes()
Add a new helper for running online recovery passes - i.e. online fsck.
This is a subset of our normal recovery passes, and does not - for now -
use or follow c->curr_recovery_pass.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet 2b41226d7f bcachefs: Add ability to redirect log output
Upcoming patches are going to add two new ioctls for running fsck in the
kernel, but pretending that we're running our normal userspace fsck.

This patch adds some plumbing for redirecting our normal log messages
away from the dmesg log to a thread_with_file file descriptor - via a
struct log_output, which will be consumed by the fsck f_op's read method.

The new ioctls will allow for running fsck in the kernel against an
offline filesystem (without mounting it), and an online filesystem. For
an offline filesystem we need a way to pass in a pointer to the
log_output, which is done via a new hidden opts.h option.

For online fsck, we can set c->output directly, but only want to
redirect log messages from the thread running fsck - hence the new
c->output_filter method.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet 3f0e297d86 bcachefs: Explicity go RW for fsck
This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less
error prone.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:39 -05:00
Kent Overstreet 3c471b6588 bcachefs: convert bch_fs_flags to x-macro
Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00