Commit Graph

74 Commits

Author SHA1 Message Date
Kent Overstreet d8585a79be bcachefs: bch2_dev_have_ref()
bch2_dev_bkey_exists() is going away; bch2_dev_have_ref() documents that
we're looking up a device without checking if it's present because we
have a reference to it already.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:24 -04:00
Kent Overstreet 222eacabc1 bcachefs: kill bch2_dev_bkey_exists() in data_update_init()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet 302c980a81 bcachefs: extent_ptr_durability() -> bch2_dev_rcu()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet 1f2f92ec3f bcachefs: PTR_BUCKET_POS() now takes bch_dev
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet f295298b8c bcachefs: New helpers for device refcounts
This will be used in the next patch for adding some new debug mode
asserts.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet 9a768ab75b bcachefs: bch2_bkey_drop_ptrs() declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet ca563dccb2 bcachefs: bch2_trans_unlock() must always be followed by relock() or begin()
We're about to add new asserts for btree_trans locking consistency, and
part of that requires that aren't using the btree_trans while it's
unlocked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet 2f724563fc bcachefs: member helper cleanups
Some renaming for better consistency

bch2_member_exists	-> bch2_member_alive
bch2_dev_exists		-> bch2_member_exists
bch2_dev_exsits2	-> bch2_dev_exists
bch_dev_locked		-> bch2_dev_locked
bch_dev_bkey_exists	-> bch2_dev_bkey_exists

new helper - bch2_dev_safe

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet 5dd8c60e1e bcachefs: iter/update/trigger/str_hash flag cleanup
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.

These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet 5957e0a28b bcachefs: Fix rebalance from durability=0 device
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-05 03:05:30 -04:00
Kent Overstreet c42cd606e4 bcachefs: fix nocow lock deadlock
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-02 01:04:10 -04:00
Kent Overstreet ec9cc18fc2 bcachefs: Add checks for invalid snapshot IDs
Previously, we assumed that keys were consistent with the snapshots
btree - but that's not correct as fsck may not have been run or may not
be complete.

This adds checks and error handling when using the in-memory snapshots
table (that mirrors the snapshots btree).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet d7e77f53e9 bcachefs: opts->compression can now also be applied in the background
The "apply this compression method in the background" paths now use the
compression option if background_compression is not set; this means that
setting or changing the compression option will cause existing data to
be compressed accordingly in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 0beebd9245 bcachefs: bkey_for_each_ptr() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet a7dc10ce68 bcachefs: Make sure allocation failure errors are logged
The previous patch fixed a bug in allocation path error handling, and it
would've been noticed sooner had it been logged properly.

Generally speaking, errors that shouldn't happen in normal operation and
are being returned up the stack should be logged: the write path was
already logging IO errors, but non IO errors were missed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Daniel Hill 21e07cc966 bcachefs: remove redundant condition from data_update_index_update
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:39 -05:00
Kent Overstreet 7464403009 bcachefs: count_event()
Small helper for event counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:39 -05:00
Kent Overstreet 25d1e39df0 bcachefs: Add a rebalance, data_update tracepoints
Add a tracepoint for rebalance, printing out
 - the target option
 - the compression option
 - the key being rebalanced

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet cb52d23e77 bcachefs: Rename BTREE_INSERT flags
BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet 7b474c77da bcachefs: Fix promotes
The recent work to fix data moves w.r.t. durability broke promotes,
because the caused us to bail out when the extent minus pointers being
dropped still has enough pointers to satisfy the current number of
replicas.

Disable this check when we're adding cached replicas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-26 19:31:11 -05:00
Kent Overstreet bedd6fe4d3 bcachefs: Fix nocow locks deadlock
On trylock failure we were waiting for outstanding reads to complete -
but nocow locks need to be held until the whole move is finished.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-11 20:43:11 -05:00
Kent Overstreet 131898b0cb bcachefs: Fix bch2_extent_drop_ptrs() call
Also, make bch2_extent_drop_ptrs() safer, so it works with extents and
non-extents iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 16:04:55 -05:00
Kent Overstreet 7d9f8468ff bcachefs: Data update path won't accidentaly grow replicas
Previously, there was a bug where if an extent had greater durability
than required (because we needed to move a durability=1 pointer and
ended up putting it on a durability 2 device), we would submit a write
for replicas=2 - the durability of the pointer being rewritten - instead
of the number of replicas required to bring it back up to the
data_replicas option.

This, plus the allocation path sometimes allocating on a greater
durability device than requested, meant that extents could continue
having more and more replicas added as they were being rewritten.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-25 21:48:42 -05:00
Kent Overstreet 701ff57eb3 bcachefs: Check for nonce offset inconsistency in data_update path
We've rarely been seeing a nonce offset inconsistency that doesn't show
up in tests: this adds some extra verification code to the data update
path that prints out more relevant info when it occurs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13 21:45:03 -05:00
Kent Overstreet be9e782df3 bcachefs: Don't downgrade locks on transaction restart
We should only be downgrading locks on success - otherwise, our
transaction restarts won't be getting the correct locks and we'll
livelock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet fb3f57bb11 bcachefs: rebalance_work
This adds a new btree, rebalance_work, to eliminate scanning required
for finding extents that need work done on them in the background - i.e.
for the background_target and background_compression options.

rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an
extent in the extents or reflink btree at the same pos.

A new extent field is added, bch_extent_rebalance, which indicates that
this extent has work that needs to be done in the background - and which
options to use. This allows per-inode options to be propagated to
indirect extents - at least in some circumstances. In this patch,
changing IO options on a file will not propagate the new options to
indirect extents pointed to by that file.

Updating (setting/clearing) the rebalance_work btree is done by the
extent trigger, which looks at the bch_extent_rebalance field.

Scanning is still requrired after changing IO path options - either just
for a given inode, or for the whole filesystem. We indicate that
scanning is required by adding a KEY_TYPE_cookie key to the
rebalance_work btree: the cookie counter is so that we can detect that
scanning is still required when an option has been flipped mid-way
through an existing scan.

Future possible work:
 - Propagate options to indirect extents when being changed
 - Add other IO path options - nr_replicas, ec, to rebalance_work so
   they can be applied in the background when they change
 - Add a counter, for bcachefs fs usage output, showing the pending
   amount of rebalance work: we'll probably want to do this after the
   disk space accounting rewrite (moving it to a new btree)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:05 -04:00
Kent Overstreet 96a363a7e6 bcachefs: move: move_stats refactoring
data_progress_list is gone - it was redundant with moving_context_list

The upcoming rebalance rewrite is going to have it using two different
move_stats objects with the same moving_context, depending on whether
it's scanning or using the rebalance_work btree - this patch plumbs
stats around a bit differently so that will work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:38 -04:00
Kent Overstreet d5eade9345 bcachefs: move: convert to bbpos
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:37 -04:00
Kent Overstreet 633169035a bcachefs: moving_context now owns a btree_trans
btree_trans and moving_context are used together, and having the
moving_context owns the transaction object reduces some plumbing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:37 -04:00
Kent Overstreet 6bd68ec266 bcachefs: Heap allocate btree_trans
We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet 96dea3d599 bcachefs: Fix W=12 build errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet 1809b8cba7 bcachefs: Break up io.c
More reorganization, this splits up io.c into
 - io_read.c
 - io_misc.c - fallocate, fpunch, truncate
 - io_write.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet 93ee2c4b21 bcachefs: Don't open code closure_nr_remaining()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet 986e9842fb bcachefs: Compression levels
This allows including a compression level when specifying a compression
type, e.g.
  compression=zstd:15

Values from 1 through 15 indicate compression levels, 0 or unspecified
indicates the default.

For LZ4, values 3-15 specify that the HC algorithm should be used.

Note that for compatibility, extents themselves only include the
compression type, not the compression level. This means that specifying
the same compression algorithm but different compression levels for the
compression and background_compression options will have no effect.

XXX: perhaps we could add a warning for this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:07 -04:00
Kent Overstreet f33c58fc46 bcachefs: Kill BTREE_INSERT_USE_RESERVE
Now that we have journal watermarks and alloc watermarks unified,
BTREE_INSERT_USE_RESERVE is redundant and can be deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:05 -04:00
Kent Overstreet e53a961c6b bcachefs: Rename enum alloc_reserve -> bch_watermark
This is prep work for consolidating with JOURNAL_WATERMARK.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet 91ecd41b7f bcachefs: bch2_extent_ptr_desired_durability()
This adds a new helper for getting a pointer's durability irrespective
of the device state, and uses it in the the data update path.

This fixes a bug where we do a data update but request 0 replicas to be
allocated, because the replica being rewritten is on a device marked as
failed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet ad520141b1 bcachefs: Fix corruption with writeable snapshots
When partially overwriting an extent in an older snapshot, the existing
extent has to be split.

If the existing extent was overwritten in a different (sibling)
snapshot, we have to ensure that the split won't be visible in the
sibling snapshot.

data_update.c already has code for this,
bch2_insert_snapshot_writeouts() - we just need to move it into
btree_update_leaf.c and change bch2_trans_update_extent() to use it as
well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:03 -04:00
Kent Overstreet c26463ce99 bcachefs: Fix move_extent_fail counter
fail counters need to be events, not numbers of sectors - or the
calculations the tests use for determining if we've had too many
slowpath events don't work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:02 -04:00
Kent Overstreet bcb79a51cb bcachefs: bch2_bkey_get_iter() helpers
Introduce new helpers for a common pattern:

  bch2_trans_iter_init();
  bch2_btree_iter_peek_slot();

 - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of
   the correct type
 - bch2_bkey_get_val_typed() copies the val out of the btree to a
   (typically stack allocated) variable; it handles the case where the
   value in the btree is smaller than the current version of the type,
   zeroing out the remainder.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:00 -04:00
Kent Overstreet 5a21764db1 bcachefs: Improve move path tracepoints
Move path tracepoints now include the key being moved. Also, add new
tracepoints for the start of move_extent, and evacuate_bucket.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:00 -04:00
Kent Overstreet bb6c4b92fd bcachefs: Improve trace_move_extent_fail()
This greatly expands the move_extent_fail tracepoint - now it includes
all the information we have available, including exactly why the extent
wasn't updated.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:59 -04:00
Kent Overstreet 25d8f40560 bcachefs: Data update path no longer leaves cached replicas
It turns out that it's currently impossible to invalidate buckets
containing only cached data if they're part of a stripe. The normal
bucket invalidate path can't do it because we have to be able to
incerement the bucket's gen, which isn't correct becasue it's still a
member of the stripe - and the bucket invalidate path makes the bucket
availabel for reuse right away, which also isn't correct for buckets in
stripes.

What would work is invalidating cached data by following backpointers,
except that cached replicas don't currently get backpointers - because
they would be awkward for the existing bucket invalidate path to delete
and they haven't been needed elsewhere.

So for the time being, to prevent running out of space in stripes,
switch the data update path to not leave cached replicas; we may revisit
this in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:58 -04:00
Kent Overstreet b40901b0f7 bcachefs: New erasure coding shutdown path
This implements a new shutdown path for erasure coding, which is needed
for the upcoming BCH_WRITE_WAIT_FOR_EC write path.

The process is:
 - Cancel new stripes being built up
 - Close out/cancel open buckets on write points or the partial list
   that are for stripes
 - Shutdown rebalance/copygc
 - Then wait for in flight new stripes to finish

With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill
up before they complete; the new ec shutdown path is needed for shutting
down copygc/rebalance without deadlocking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet 57c723de7d bcachefs: Rework __bch2_data_update_index_update()
This makes some improvements to the logic for adding/removing replicas,
as part of the larger erasure coding improvements. We now directly
consider number of replicas desired for the given inode, and
extent/pointer durability: this ensures that the extent ends up with the
desired number of replicas when we're replacing multiple pointers with
one that has higher durability (e.g. erasure coded).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:56 -04:00
Kent Overstreet 702ffea204 bcachefs: Extent helper improvements
- __bch2_bkey_drop_ptr() -> bch2_bkey_drop_ptr_noerror(), now available
   outside extents.

 - Split bch2_bkey_has_device() and bch2_bkey_has_device_c(), const and
   non const versions

 - bch2_extent_has_ptr() now returns the pointer it found

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:56 -04:00
Kent Overstreet 2f528663c5 bcachefs: moving_context->stats is allowed to be NULL
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet 11bb67a4a3 bcachefs: bch2_data_update_init() considers ptr durability
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet 039c45feef bcachefs: bch2_data_update_index_update() -> bch2_trans_run()
Convert to use the standard helper

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:54 -04:00
Kent Overstreet 33669e0cc9 bcachefs: Add option for completely disabling nocow
This adds an option for completely disabling nocow mode, including the
locking in the data move path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:54 -04:00