Commit graph

119 commits

Author SHA1 Message Date
Kent Overstreet
87ced107f3 bcachefs: Convert EAGAIN errors to private error codes
More error code cleanup, for better error messages and debugability.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:49 -04:00
Kent Overstreet
858536c7ce bcachefs: Convert EROFS errors to private error codes
More error code improvements - this gets us more useful error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:49 -04:00
Kent Overstreet
5bbe3f2d0e bcachefs: Log more messages in the journal
This patch

 - Adds a mechanism for queuing up journal entries prior to the journal
   being started, which will be used for early journal log messages

 - Adds bch2_fs_log_msg() and improves bch2_trans_log_msg(), which now
   take format strings. bch2_fs_log_msg() can be used before or after
   the journal has been started, and will use the appropriate mechanism.

 - Deletes the now obsolete bch2_journal_log_msg()

 - And adds more log messages to the recovery path - messages for
   journal/filesystem started, journal entries being blacklisted, and
   journal replay starting/finishing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:48 -04:00
Kent Overstreet
3e3e02e6bc bcachefs: Assorted checkpatch fixes
checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

 ASSIGN_IN_IF
 UNSPECIFIED_INT	- bcachefs coding style prefers single token type names
 NEW_TYPEDEFS		- typedefs are occasionally good
 FUNCTION_ARGUMENTS	- we prefer to look at functions in .c files
			  (hopefully with docbook documentation), not .h
			  file prototypes
 MULTISTATEMENT_MACRO_USE_DO_WHILE
			- we have _many_ x-macros and other macros where
			  we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
098ef98d5b bcachefs: Add private error codes for ENOSPC
Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
674cfc2624 bcachefs: Add persistent counters for all tracepoints
Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:39 -04:00
Kent Overstreet
dadecd02c4 bcachefs: bch2_trans_run()
This adds a new helper, bch2_trans_run(), that runs a function with a
btree_transaction context but without handling transaction restarts.
We're adding checks for nested transaction restart handling: when an
inner transaction handles a transaction restart it will still have to
return it to the outer transaction, or else assertions will be popped in
the outer transaction.

But some places don't need restart handling at the outer scope, so this
helper does what they need.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:36 -04:00
Kent Overstreet
401ec4db63 bcachefs: Printbuf rework
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:33 -04:00
Kent Overstreet
ce6201c456 bcachefs: Use a genradix for reading journal entries
Previously, the journal read path used a linked list for storing the
journal entries we read from disk. But there's been a bug that's been
causing journal_flush_delay to incorrectly be set to 0, leading to far
more journal entries than is normal being written out, which then means
filesystems are no longer able to start due to the O(n^2) behaviour of
inserting into/searching that linked list.

Fix this by switching to a radix tree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet
f25d8215f4 bcachefs: Kill allocator threads & freelists
Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
25be2e5d4a bcachefs: bch_sb_field_journal_v2
Add a new superblock field which represents journal buckets as ranges:
also move code for the superblock journal fields to journal_sb.c.

This also reworks the code for resizing the journal to write the new
superblock before using the new journal buckets, and thus be a bit
safer.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:29 -04:00
Kent Overstreet
31f63fd124 bcachefs: Introduce a separate journal watermark for copygc
Since journal reclaim -> btree key cache flushing may require the
allocation of new btree nodes, it has an implicit dependency on copygc
in order to make forward progress - so we should avoid blocking copygc
unless the journal is really close to full.

This introduces watermarks to replace our single MAY_GET_UNRESERVED bit
in the journal, and adds a watermark for copygc and plumbs it through.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
3e1547116f bcachefs: x-macroize alloc_reserve enum
This makes an array of strings available, like our other enums.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
f6c92ebbb8 bcachefs: Allocate journal buckets sequentially
This tweaks __bch2_set_nr_journal_buckets() so that we aren't reversing
their order in the jorunal anymore - nice for rotating disks.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:27 -04:00
Kent Overstreet
d5d3be7dc5 bcachefs: bch2_journal_log_msg()
This adds bch2_journal_log_msg(), which just logs a message to the
journal, and uses it to mark startup and when journal replay finishes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:27 -04:00
Kent Overstreet
718ce1eb8a bcachefs: Skip periodic wakeup of journal reclaim when journal empty
Less system noise.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:27 -04:00
Kent Overstreet
b60c380bca bcachefs: Don't arm journal->write_work when journal entry !open
This fixes a shutdown race where we were rearming journal->write_work
after the journal has already shut down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:27 -04:00
Kent Overstreet
e0c014e7e4 bcachefs: Finish writing journal after journal error
After emergency shutdown, all journal entries will be written as noflush
entries, meaning they will never be used - but they'll still exist for
debugging tools to examine.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:26 -04:00
Kent Overstreet
24a3d53b28 bcachefs: __journal_entry_close() never fails
Previous patch just moved responsibility for incrementing the journal
sequence number and initializing the new journal entry from
__journal_entry_close() to journal_entry_open(); this patch makes the
analagous change for journal reservation state, incrementing the index
into array of journal_bufs at open time.

This means that __journal_entry_close() never fails to close an open
journal entry, which is important for the next patch that will change
our emergency shutdown behaviour.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:26 -04:00
Kent Overstreet
30ef633a0b bcachefs: Refactor journal code to not use unwritten_idx
It makes the code more readable if we work off of sequence numbers,
instead of direct indexes into the array of journal buffers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:26 -04:00
Kent Overstreet
f0a3a2ccab bcachefs: Journal seq now incremented at entry open, not close
This patch changes journal_entry_open() to initialize the new journal
entry, not __journal_entry_close().

This also means that journal_cur_seq() refers to the sequence number of
the last journal entry when we don't have an open journal entry, not the
next one.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:26 -04:00
Kent Overstreet
dfc0f7ea00 bcachefs: bch2_journal_halt() now takes journal lock
This change is prep work for moving some work from
__journal_entry_close() to journal_entry_open(): without this change,
journal_entry_open() doesn't know if it's going to be able to open a new
journal entry until the cmpxchg loop, meaning it can't create the new
journal pin entry and update other global state because those have to be
done prior to the cmpxchg opening the new journal entry.

Fortunately, we don't call bch2_journal_halt() from interrupt context.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:26 -04:00
Kent Overstreet
fbec3b8800 bcachefs: Kill JOURNAL_NEED_WRITE
This replaces the journal flag JOURNAL_NEED_WRITE with per-journal buf
state - more explicit, and solving a race in the old code that would
lead to entries being opened and written unnecessarily.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:26 -04:00
Kent Overstreet
506bac7e59 bcachefs: Delete some dead journal code
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:26 -04:00
Kent Overstreet
75ef2c59bc bcachefs: Start moving debug info from sysfs to debugfs
In sysfs, files can only output at most PAGE_SIZE. This is a problem for
debug info that needs to list an arbitrary number of times, and because
of this limit some of our debug info has been terser and harder to read
than we'd like.

This patch moves info about journal pins and cached btree nodes to
debugfs, and greatly expands and improves the output we return.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:26 -04:00
Kent Overstreet
fa8e94faee bcachefs: Heap allocate printbufs
This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:25 -04:00
Kent Overstreet
eac91bf27f bcachefs: Fix bch2_journal_pins_to_text()
When key cache pins were put onto their own list, we neglected to update
bch2_journal_pins_to_text() to print them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:25 -04:00
Kent Overstreet
e201f70b11 bcachefs: Fix for journal getting stuck
The journal can get stuck if we need to get a journal reservation for
something we have a pre-reservation for, but aren't able to reclaim
space, or if the pin fifo is full - it's impractical to resize the pin
fifo at runtime.

Previously, we reserved 8 entries in the pin fifo for pre-reservations,
but that seems small - we're seeing the journal occasionally get stuck.
Let's reserve a quarter of it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:25 -04:00
Kent Overstreet
5b2e599f50 bcachefs: bch2_journal_noflush_seq()
Add bch2_journal_noflush_seq(), for telling the journal that entries
before a given sequence number should not be flushes - to be used by an
upcoming allocator optimization.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
abe19d458e bcachefs: Refactor open_bucket code
Prep work for adding a hash table of open buckets - instead of embedding
a bch_extent_ptr, we need to refer to the bucket directly so that we're
not calling sector_to_bucket() in the hash table lookup code, which has
an expensive divide.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
8511632d44 bcachefs: Journal initialization fixes
This fixes a rare bug when mounting & unmounting RO - flushing a clean
filesystem that never went RO should be a no op.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:19 -04:00
Kent Overstreet
4141fde0be bcachefs: Fix bch2_journal_meta()
This patch ensures that the journal entry written gets written as flush
entry, which is important for the shutdown path - the last entry written
needs to be a flush entry.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:19 -04:00
Kent Overstreet
77170d0dd7 bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marks
Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to
decide what buckets are eligible to allocate, we can clean up the
filesystem initialization and device add paths. Previously, we had to
use ancient code to mark superblock/journal buckets in the in memory
bucket marks as we allocated them, and then zero that out and re-do that
marking using the newer transational bucket mark paths. Now, we can
simply delete the in-memory bucket marking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:19 -04:00
Kent Overstreet
991ba02112 bcachefs: Add more time_stats
This adds more latency/event measurements and breaks some apart into
more events. Journal writes are broken apart into flush writes and
noflush writes, btree compactions are broken out from btree splits,
btree mergers are added, as well as btree_interior_updates - foreground
and total.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
1d81313f22 bcachefs: Make __bch2_journal_debug_to_text() more readable
Switch to one line of output per pr_buf() call - longer lines but quite
a bit more readable.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
2430e72f42 bcachefs: Convert journal sysfs params to regular options
This converts journal_write_delay, journal_flush_disabled, and
journal_reclaim_delay to normal filesystems options, and also adds them
to the superblock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
9be1efe9c5 bcachefs: Fix error reporting from bch2_journal_flush_seq
- bch2_journal_halt() was unconditionally overwriting j->err_seq, the
  sequence number that we failed to write
- journal_write_done was updating seq_ondisk and flushed_seq_ondisk even
  for writes that errored, which broke the way bch2_journal_flush_seq_async()
  locklessly checked for completions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
531b69e9af bcachefs: Convert journal BUG_ON() to a warning
It's definitely indicative of a bug if we request to flush a journal
sequence number that hasn't happened yet, but it's more useful if we
warn and print out the relevant sequence numbers instead of just dying.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
0e030f5e20 bcachefs: Kill journal buf bloom filter
This was used for recording which inodes have been modified by in flight
journal writes, but was broken and has been superceded.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
731bdd2eff bcachefs: Add a workqueue for btree io completions
Also, clean up workqueue usage - we shouldn't be using system
workqueues, pretty much everything we do needs to be on our own
WQ_MEM_RECLAIM workqueues.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
6ebe32b94c bcachefs: Fix locking in __bch2_set_nr_journal_buckets()
We weren't holding mark_lock correctly - it's needed for the new_fs
path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:03 -04:00
Kent Overstreet
1784d43a88 bcachefs: Fix usage of last_seq + encryption
jset->last_seq is in the region that's encrypted - on journal write
completion, we were using it and getting garbage. This patch shadows it
to fix.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:02 -04:00
Kent Overstreet
2ce867df31 bcachefs: Make sure to initialize j->last_flushed
If the journal reclaim thread makes it to the timeout without ever
initializing j->last_flushed, we could end up sleeping for a very long
time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:02 -04:00
Kent Overstreet
633632ef1b bcachefs: Simplify bch2_set_nr_journal_buckets()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:00 -04:00
Kent Overstreet
d62ab355d7 bcachefs: Fix bch2_trans_mark_dev_sb()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:00 -04:00
Kent Overstreet
241e26369e bcachefs: Don't flush btree writes more aggressively because of btree key cache
We need to flush the btree key cache when it's too dirty, because
otherwise the shrinker won't be able to reclaim memory - this is done by
journal reclaim. But journal reclaim also kicks btree node writes: this
meant that btree node writes were getting kicked much too often just
because we needed to flush btree key cache keys.

This patch splits journal pins into two different lists, and teaches
journal reclaim to not flush btree node writes when it only needs to
flush key cache keys.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:59 -04:00
Kent Overstreet
671cc8a51b bcachefs: Eliminate memory barrier from fast path of journal_preres_put()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:59 -04:00
Kent Overstreet
2940295c97 bcachefs: Be more careful about JOURNAL_RES_GET_RESERVED
JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be
done to free up space in the journal. In particular, when we're flushing
keys from the key cache, if we're flushing them out of order we
shouldn't be using it, since we're using up our remaining space in the
journal without dropping a pin that will let us make forward progress.

With this patch, BTREE_INSERT_JOURNAL_RECLAIM without
BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on
journal reclaim if we're already in journal reclaim.

This means we need to propagate these errors up to journal reclaim,
indicating that flushing a journal pin should be retried in the future.

This is prep work for a patch to change the way journal reclaim works,
to split out flushing key cache keys because the btree key cache is too
dirty from journal reclaim because we need space in the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:59 -04:00
Kent Overstreet
24db24c749 bcachefs: Don't make foreground writes wait behind journal reclaim too long
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:58 -04:00
Kent Overstreet
7c8b166e58 bcachefs: Increase default journal size
The default was 1/256th of the device and capped at 512MB, which is
fairly tiny these days.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00