Commit graph

32 commits

Author SHA1 Message Date
Kent Overstreet
231db03c57 bcachefs: Journal pin refactoring
This deletes some duplicated code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
5731cf0156 bcachefs: Fix journal reclaim spinning in recovery
We can't run journal reclaim until we've finished replaying updates to
interior btree nodes - the check for this was in the wrong place though,
leading to journal reclaim spinning before it was allowed to proceed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
b7a9bbfc1b bcachefs: Move journal reclaim to a kthread
This is to make tracing easier.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
9d4582ffdb bcachefs: Journal reclaim requires memalloc_noreclaim_save()
Memory reclaim requires journal reclaim to make forward progress - it's
what cleans our caches - thus, while we're in journal reclaim or holding
the journal reclaim lock we can't recurse into  memory reclaim.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
8a92e54559 bcachefs: Ensure journal reclaim runs when btree key cache is too dirty
Ensuring the key cache isn't too dirty is critical for ensuring that the
shrinker can reclaim memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
ed0e24c099 bcachefs: Be more precise with journal error reporting
We were incorrectly detecting a journal deadlock - the journal filling
up - when only the journal pin fifo had filled up; if the journal pin
fifo is full that just means we need to wait on reclaim.

This plumbs through better error reporting so we can better discriminate
in the journal_res_get path what's going on.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:47 -04:00
Kent Overstreet
f526d26d71 bcachefs: Fix btree key cache shutdown
On emergency shutdown, we might still have dirty keys in the btree key
cache that need to be cleaned up properly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet
6a747c4683 bcachefs: Add accounting for dirty btree nodes/keys
This lets us improve journal reclaim, so that it now tries to make sure
no more than 3/4s of the btree node cache and btree key cache are dirty
- ensuring the shrinkers can free memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet
2f33ece9b4 bcachefs: Minor journal reclaim improvement
With the btree key cache code, journal reclaim now has a lot more work
to do. It could be the case that after journal reclaim has finished one
iteration there's already more work to do, so put it in a loop to check
for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet
89fd25be70 bcachefs: Use x-macros for data types
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
5d20ba48f0 bcachefs: Use cached iterators for alloc btree
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet
2ca88e5ad9 bcachefs: Btree key cache
This introduces a new kind of btree iterator, cached iterators, which
point to keys cached in a hash table. The cache also acts as a write
cache - in the update path, we journal the update but defer updating the
btree until the cached entry is flushed by journal reclaim.

Cache coherency is for now up to the users to handle, which isn't ideal
but should be good enough for now.

These new iterators will be used for updating inodes and alloc info (the
alloc and stripes btrees).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet
a27443bc76 bcachefs: Kill old allocator startup code
It's not needed anymore since we can now write to buckets before
updating the alloc btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:40 -04:00
Kent Overstreet
039fc4c522 bcachefs: Fixes for going RO
Now that interior btree updates are fully transactional, we don't need
to write out alloc info in a loop. However, interior btree updates do
put more things in the journal, so we still need a loop in the RO
sequence.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:40 -04:00
Kent Overstreet
00b8ccf707 bcachefs: Interior btree updates are now fully transactional
We now update the alloc info (bucket sector counts) atomically with
journalling the update to the interior btree nodes, and we also set new
btree roots atomically with the journalled part of the btree update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:40 -04:00
Kent Overstreet
94035eed52 bcachefs: Fix a locking bug in bch2_journal_pin_copy()
There was a race where the src pin would be flushed - releasing the last
pin on that sequence number - before adding the new journal pin. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:38 -04:00
Kent Overstreet
3f58a19763 bcachefs: Journal pin cleanups
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:36 -04:00
Kent Overstreet
b5d056358d bcachefs: minor journal reclaim fixes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:17 -04:00
Kent Overstreet
68ef94a63c bcachefs: Add a pre-reserve mechanism for the journal
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:17 -04:00
Kent Overstreet
9ace606e93 bcachefs: Don't block on reclaim_lock from journal_res_get
When we're doing btree updates from journal flush, this becomes a
locking inversion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:17 -04:00
Kent Overstreet
03d5eaed86 bcachefs: bch2_journal_space_available improvements
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:17 -04:00
Kent Overstreet
2384db8f32 bcachefs: Separate discards from rest of journal reclaim
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:17 -04:00
Kent Overstreet
0ce2dbbe99 bcachefs: ja->discard_idx, ja->dirty_idx
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:17 -04:00
Kent Overstreet
dc9aa17841 bcachefs: Drop a faulty assertion
the assertion was meant to check that bch2_journal_reclaim_fast() was
always being called, but since the atomic dec can happen outside of
j->lock the assertion itself can race

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet
e5a66496a0 bcachefs: Journal reclaim refactoring
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet
7ef2a73a58 bcachefs: Fix check for if extent update is allocating
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet
0519b72dd2 bcachefs: Add a workqueue for journal reclaim
journal reclaim writes btree nodes, which can end up waiting for in
flight btree writes to complete, and btree write completions run out of
workqueues - so we can't run out of the same workqueue or we risk
deadlock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet
000de45996 bcachefs: fixes for getting stuck flushing journal pins
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet
3636ed489a bcachefs: Deferred btree updates
Will be used in the future for inode updates, which will be very helpful
for multithreaded workloads that have to update the inode with every
extent update (appends, or updates that change i_sectors)

Also will be used eventually for fully persistent alloc info

However - we still need a mechanism for reserving space in the journal
prior to getting a journal reservation, so it's not technically safe to
make use of this just yet, we could deadlock with the journal full
(although not likely to be an issue in practice)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet
a9ec345401 bcachefs: Journal refactoring
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet
4077991c85 bcachefs: Fix a use after free in the journal code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:07 -04:00
Kent Overstreet
1c6fdbd8f2 bcachefs: Initial commit
Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:07 -04:00