Commit graph

88035 commits

Author SHA1 Message Date
Kent Overstreet
80eab7a7c2 bcachefs: for_each_btree_key() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
c47e8bfbb7 bcachefs: kill for_each_btree_key_norestart()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
44ddd8ad1e bcachefs: kill for_each_btree_key_old_upto()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
3a860b5ad5 bcachefs: for_each_btree_key_upto() -> for_each_btree_key_old_upto()
And for_each_btree_key2_upto -> for_each_btree_key_upto

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
c8ef2dc2fc bcachefs: bch2_dirent_lookup() -> lockrestart_do()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
79904fa2bb bcachefs: bch2_trans_srcu_lock() should be static
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
6d5c606c1c bcachefs: use track_event_change() for allocator blocked stats
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
ef23397c30 bcachefs: fix warning about uninitialized time_stats
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
e34ec13a56 bcachefs: add more verbose logging
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
53b67d8dcf bcachefs: better error message in btree_node_write_work()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
037a2d9f48 bcachefs: simplify bch_devs_list
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
defd9e39b5 bcachefs: darray_for_each() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
559e6c2336 bcachefs: trans_for_each_update() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
cee0a8ea6d bcachefs: Improve the nopromote tracepoint
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
1ad36a010c bcachefs: Use GFP_KERNEL for promote allocations
We already have btree locks dropped here - no need for GFP_NOFS.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Randy Dunlap
920388254f bcachefs: mean and variance: fix kernel-doc for function params
Add missing function parameter descriptions in mean_and_variance.c.
The also eliminates the "Excess function parameter" warnings.

Prevents these kernel-doc warnings:

mean_and_variance.c:67: warning: Function parameter or member 's' not described in 'mean_and_variance_get_mean'
mean_and_variance.c:78: warning: Function parameter or member 's1' not described in 'mean_and_variance_get_variance'
mean_and_variance.c:94: warning: Function parameter or member 's' not described in 'mean_and_variance_get_stddev'
mean_and_variance.c:108: warning: Function parameter or member 's' not described in 'mean_and_variance_weighted_update'
mean_and_variance.c:108: warning: Function parameter or member 'x' not described in 'mean_and_variance_weighted_update'
mean_and_variance.c:108: warning: Excess function parameter 's1' description in 'mean_and_variance_weighted_update'
mean_and_variance.c:108: warning: Excess function parameter 's2' description in 'mean_and_variance_weighted_update'
mean_and_variance.c:134: warning: Function parameter or member 's' not described in 'mean_and_variance_weighted_get_mean'
mean_and_variance.c:143: warning: Function parameter or member 's' not described in 'mean_and_variance_weighted_get_variance'
mean_and_variance.c:153: warning: Function parameter or member 's' not described in 'mean_and_variance_weighted_get_stddev'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Brian Foster <bfoster@redhat.com>
Cc: linux-bcachefs@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
447c1c0105 bcachefs: check for failure to downgrade
With the upcoming member seq patch, it's now critical that we don't ever
write to a superblock that hasn't been version downgraded - failure to
update member seq fields will cause split brain detection to fire
erroniously.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
44fd13a4c6 bcachefs: Fixes for rust bindgen
bindgen doesn't seem to like u128 or DECLARE_FLEX_ARRAY(), but we can
hack around them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
023f9ac9f7 bcachefs: Delete dio read alignment check
We'll typically fomat devices with the physical blocksize supported, but
the logical blocksize will be smaller.

There's no real need to be checking the blocksize at the filesystem
level, anyways - the block layer has to check this anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Brian Foster
d8d819580a bcachefs: clean up some dead fallocate code
The have_reservation local variable in bch2_extent_fallocate() is
initialized to false and set to true further down in the function.
Between this two points, one branch of code checks for negative
value and one for positive, and nothing ever checks the variable
after it is set to true. Clean up some of the unnecessary logic and
code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
a7dc10ce68 bcachefs: Make sure allocation failure errors are logged
The previous patch fixed a bug in allocation path error handling, and it
would've been noticed sooner had it been logged properly.

Generally speaking, errors that shouldn't happen in normal operation and
are being returned up the stack should be logged: the write path was
already logging IO errors, but non IO errors were missed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
548673f8d3 bcachefs: drop extra semicolon
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Gustavo A. R. Silva
4c26dea1c0 bcachefs: Replace zero-length array with flex-array member and use __counted_by
Fake flexible arrays (zero-length and one-element arrays) are
deprecated, and should be replaced by flexible-array members.
So, replace zero-length array with a flexible-array member in
`struct bch_ioctl_fsck_offline`.

Also annotate array `devs` with `__counted_by()` to prepare for the
coming implementation by GCC and Clang of the `__counted_by` attribute.
Flexible array members annotated with `__counted_by` can have their
accesses bounds-checked at run-time via `CONFIG_UBSAN_BOUNDS` (for
array indexing) and `CONFIG_FORTIFY_SOURCE` (for strcpy/memcpy-family
functions).

This fixes the following -Warray-bounds warnings:
fs/bcachefs/chardev.c: In function 'bch2_ioctl_fsck_offline':
fs/bcachefs/chardev.c:363:34: warning: array subscript 0 is outside array bounds of '__u64[0]' {aka 'long long unsigned int[]'} [-Warray-bounds=]
  363 |         if (copy_from_user(devs, &user_arg->devs[0], sizeof(user_arg->devs[0]) * arg.nr_devs)) {
      |                                  ^~~~~~~~~~~~~~~~~~
In file included from fs/bcachefs/chardev.c:5:
fs/bcachefs/bcachefs_ioctl.h:400:33: note: while referencing 'devs'
  400 |         __u64                   devs[0];

This results in no differences in binary output.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Gustavo A. R. Silva
ac19c4c3d0 bcachefs: Use array_size() in call to copy_from_user()
Use array_size() helper, instead of the open-coded version in
call to copy_from_user().

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
038fecc045 bcachefs: qstr_eq()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
cf904c8d96 bcachefs: bch_err_(fn|msg) check if should print
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
e06af20719 bcachefs: fix userspace build errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
73ffa53056 bcachefs: Drop journal entry compaction
Previously, we dropped empty journal entries and coalesced entries that
could be - but it's not worth the overhead; we very rarely leave unused
journal entries after getting a journal reservation.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
679972348d bcachefs: kill btree_trans->wb_updates
the btree write buffer path now creates a journal entry directly

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
002c76dcf6 bcachefs: check_root() can now be run online
check_root() is simple enough to run as one single transaction, so is
trivial to run online.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
38ced43bb0 bcachefs: Inline btree write buffer sort
The sort in the btree write buffer flush path is a very hot path, and
it's particularly performance sensitive since it's single threaded and
can block every other thread on a multithreaded write workload.

It's well worth doing a sort with inlined cmp and swap functions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
09caeabe1a bcachefs: btree write buffer now slurps keys from journal
Previosuly, the transaction commit path would have to add keys to the
btree write buffer as a separate operation, requiring additional global
synchronization.

This patch introduces a new journal entry type, which indicates that the
keys need to be copied into the btree write buffer prior to being
written out. We switch the journal entry type back to
JSET_ENTRY_btree_keys prior to write, so this is not an on disk format
change.

Flushing the btree write buffer may require pulling keys out of journal
entries yet to be written, and quiescing outstanding journal
reservations; we previously added journal->buf_lock for synchronization
with the journal write path.

We also can't put strict bounds on the number of keys in the journal
destined for the write buffer, which means we might overflow the size of
the preallocated buffer and have to reallocate - this introduces a
potentially fatal memory allocation failure. This is something we'll
have to watch for, if it becomes an issue in practice we can do
additional mitigation.

The transaction commit path no longer has to explicitly check if the
write buffer is full and wait on flushing; this is another performance
optimization. Instead, when the btree write buffer is close to full we
change the journal watermark, so that only reservations for journal
reclaim are allowed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
b05c0e9370 bcachefs: journal->buf_lock
Add a new lock for synchronizing between journal IO path and btree write
buffer flush.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
0ba9375a11 bcachefs: Unwritten journal buffers are always dirty
Ensure that journal bufs that haven't been written can't be reclaimed
from the journal pin fifo, and can thus have new pins taken.

Prep work for changing the btree write buffer to pull keys from the
journal directly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
f33600057f bcachefs: bch2_trans_node_add no longer uses trans_for_each_path()
In the future we'll be making trans->paths resizable and potentially
having _many_ more paths (for fsck); we need to start fixing algorithms
that walk each path in a transaction where possible.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
24de63dacb bcachefs: Improve trans->extra_journal_entries
Instead of using a darray, we now allocate journal entries for the
transaction commit path with our normal bump allocator - with an inlined
fastpath, and using btree_transaction_stats to remember how much to
initially allocate so as to avoid transaction restarts.

This is prep work for converting write buffer updates to use this
mechanism.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
e4e49375a8 bcachefs; kill bch2_btree_key_cache_flush()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
a83b6c895c bcachefs: kill btree_path->(alloc_seq|downgrade_seq)
These were for extra info in tracepoints for debugging a specialized
issue - we do not want to bloat btree_path for this, at least in release
builds.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
249bf593e8 bcachefs: Fix snapshot.c assertion for online fsck
c->curr_recovery_pass can go backwards; this adds a non rewinding
version, c->recovery_pass_done.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Randy Dunlap
b56cee70e7 bcachefs: six lock: fix typos
Fix a few typos in the six.h header file.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Brian Foster <bfoster@redhat.com>
Cc: linux-bcachefs@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
f8fd5871be bcachefs: reserve path idx 0 for sentinal
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
5028b9078c bcachefs: Rename for_each_btree_key2() -> for_each_btree_key()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
27b2df982f bcachefs: Kill for_each_btree_key()
for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.

The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
8c066edeb4 bcachefs: continue now works in for_each_btree_key2()
continue now works as in any other loop

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
be1fa63de8 bcachefs: Fix bch2_read_btree()
In the debugfs code, we had an incorrect use of drop_locks_do(); on
transaction restart we don't want to restart the current loop iteration,
since we've already emitted the current key to the buffer for userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
a0acc24fed bcachefs: Fix open coded set_btree_iter_dontneed()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
267b801fda bcachefs: BCH_IOCTL_FSCK_ONLINE
This adds a new ioctl for running fsck on a mounted, in use filesystem.

This reuses the fsck_thread code from the previous patch for running
fsck on an offline, unmounted filesystem, so that log messages for the
fsck thread are redirected to userspace.

Only one running fsck instance is allowed at a time; a new semaphore
(since the lock will be taken by one thread and released by another) is
added for this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
8408fa570e bcachefs: BCH_IOCTL_FSCK_OFFLINE
This adds a new ioctl for running fsck on a list of devices.

Normally, if we wish to use the kernel's implementation of fsck we'd run
it at mount time with -o fsck. This ioctl lets us run fsck without
mounting, so that userspace bcachefs-tools can transparently switch to
the kernel's implementation of fsck when appropriate - primarily if the
kernel version of bcachefs better matches the filesystem on disk.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
7f391b2f8e bcachefs: bch2_run_online_recovery_passes()
Add a new helper for running online recovery passes - i.e. online fsck.
This is a subset of our normal recovery passes, and does not - for now -
use or follow c->curr_recovery_pass.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
0953450af7 bcachefs: Mark recovery passses that are safe to run online
Online fsck is coming, and many of our recovery/fsck passes are already
safe to run while the filesystem is in use - mark which ones.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00