bcachefs: Fix a journal deadlock in replay

Recently, journal pre-reservations were removed. They were for reserving
space ahead of time in the journal for operations that are required for
journal reclaim, e.g. btree key cache flushing and interior node btree
updates.

Instead we have watermarks - only operations for journal reclaim are
allowed when the journal is low on space, and in general we're quite
good about doing operations in the order that will free up space in the
journal quickest when we're low on space. If we're doing a journal
reclaim operation out of order, we usually do it in nonblocking mode if
it's not freeing up space at the end of the journal.

There's an exceptino though - interior btree node update operations have
to be BCH_WATERMARK_reclaim - once they've been started, and they can't
be nonblocking. Generally this is fine because they'll only be a very
small fraction of transaction commits - but there's an exception, which
is during journal replay.

Journal replay does many btree operations, but doesn't need to commit
them to the journal since they're already in the journal. So killing off
of pre-reservation, plus another change to make journal replay more
efficient by initially doing the replay in sorted btree order, made it
possible for the interior update operations replay generates to fill and
deadlock the journal.

Fix this by introducing a new check on journal space at the _start_ of
an interior update operation. This causes us to block if necessary in
exactly the same way as we used to when interior updates took a journal
pre-reservaiton, but without all the expensive accounting
pre-reservations required.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This commit is contained in:
Kent Overstreet 2023-12-01 01:14:50 -05:00
parent ef6fae4a13
commit 87b0d8d3d0

View file

@ -1056,6 +1056,17 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path,
flags &= ~BCH_WATERMARK_MASK;
flags |= watermark;
if (!(flags & BTREE_INSERT_JOURNAL_RECLAIM) &&
watermark < c->journal.watermark) {
struct journal_res res = { 0 };
ret = drop_locks_do(trans,
bch2_journal_res_get(&c->journal, &res, 1,
watermark|JOURNAL_RES_GET_CHECK));
if (ret)
return ERR_PTR(ret);
}
while (1) {
nr_nodes[!!update_level] += 1 + split;
update_level++;