Commit Graph

22 Commits

Author SHA1 Message Date
Kent Overstreet 5d32c5bb07 bcachefs: Be more conservation about journal pre-reservations
- Try to always keep 1/8th of the journal free, on top of
   pre-reservations
 - Move the check for whether the journal is stuck to
   bch2_journal_space_available, and make it only fire when there aren't
   any journal writes in flight (that might free up space by updating
   last_seq)

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:49 -04:00
Kent Overstreet adbcada43f bcachefs: Don't require flush/fua on every journal write
This patch adds a flag to journal entries which, if set, indicates that
they weren't done as flush/fua writes.

 - non flush/fua journal writes don't update last_seq (i.e. they don't
   free up space in the journal), thus the journal free space
   calculations now check whether nonflush journal writes are currently
   allowed (i.e. are we low on free space, or would doing a flush write
   free up a lot of space in the journal)

 - write_delay_ms, the user configurable option for when open journal
   entries are automatically written, is now interpreted as the max
   delay between flush journal writes (default 1 second).

 - bch2_journal_flush_seq_async is changed to ensure a flush write >=
   the requested sequence number has happened

 - journal read/replay must now ignore, and blacklist, any journal
   entries newer than the most recent flush entry in the journal. Also,
   the way the read_entire_journal option is handled has been improved;
   struct journal_replay now has an entry, 'ignore', for entries that
   were read but should not be used.

 - assorted refactoring and improvements related to journal read in
   journal_io.c and recovery.c

Previously, we'd have to issue a flush/fua write every time we
accumulated a full journal entry - typically the bucket size. Now we
need to issue them much less frequently: when an fsync is requested, or
it's been more than write_delay_ms since the last flush, or when we need
to free up space in the journal. This is a significant performance
improvement on many write heavy workloads.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:49 -04:00
Kent Overstreet b6df4325cd bcachefs: Improve journal free space calculations
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:49 -04:00
Kent Overstreet ebb84d0941 bcachefs: Increase journal pipelining
This patch increases the maximum journal buffers in flight from 2 to 4 -
this will be particularly helpful when in the future we stop requiring
flush+fua for every journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:49 -04:00
Kent Overstreet f299d57350 bcachefs: Refactor filesystem usage accounting
Various filesystem usage counters are kept in percpu counters, with one
set per in flight journal buffer. Right now all the code that deals with
it assumes that there's only two buffers/sets of counters, but the
number of journal bufs is getting increased to 4 in the next patch - so
refactor that code to not assume a constant.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:49 -04:00
Kent Overstreet b7a9bbfc1b bcachefs: Move journal reclaim to a kthread
This is to make tracing easier.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet ed0e24c099 bcachefs: Be more precise with journal error reporting
We were incorrectly detecting a journal deadlock - the journal filling
up - when only the journal pin fifo had filled up; if the journal pin
fifo is full that just means we need to wait on reclaim.

This plumbs through better error reporting so we can better discriminate
in the journal_res_get path what's going on.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:47 -04:00
Kent Overstreet 158eecb88e bcachefs: Assorted journal refactoring
Improved the way we track various state by adding j->err_seq, which
records the first journal sequence number that encountered an error
being written, and j->last_empty_seq, which records the most recent
journal entry that was completely empty.

Also, use the low bits of the journal sequence number to index the
corresponding journal_buf.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:47 -04:00
Kent Overstreet 5d20ba48f0 bcachefs: Use cached iterators for alloc btree
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet 644d180b05 bcachefs: Journal replay refactoring
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet 1dd7f9d98d bcachefs: Rewrite journal_seq_blacklist machinery
Now, we store blacklisted journal sequence numbers in the superblock,
not the journal: this helps to greatly simplify the code, and more
importantly it's now implemented in a way that doesn't require all btree
nodes to be visited before starting the journal - instead, we
unconditionally blacklist the next 4 journal sequence numbers after an
unclean shutdown.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet 68ef94a63c bcachefs: Add a pre-reserve mechanism for the journal
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:17 -04:00
Kent Overstreet 2384db8f32 bcachefs: Separate discards from rest of journal reclaim
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:17 -04:00
Kent Overstreet 0ce2dbbe99 bcachefs: ja->discard_idx, ja->dirty_idx
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:17 -04:00
Kent Overstreet e5a66496a0 bcachefs: Journal reclaim refactoring
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet d16b4a77a5 bcachefs: Assorted journal refactoring
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet 768ac63924 bcachefs: Add a mechanism for blocking the journal
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet eac3ca0f49 bcachefs: New journal_entry_res mechanism
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:15 -04:00
Kent Overstreet a9ec345401 bcachefs: Journal refactoring
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet 9c859dc91b bcachefs: Assorted journal refactoring
Also improve error reporting - only return an error from
bch2_journal_flush_seq() if we had an error writing that entry (i.e. not
if there was an error with a newer entry).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:08 -04:00
Kent Overstreet 4077991c85 bcachefs: Fix a use after free in the journal code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:07 -04:00
Kent Overstreet 1c6fdbd8f2 bcachefs: Initial commit
Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:07 -04:00