This adds some horrible hacks, but the atomic ops for closures were
getting to be a pretty expensive part of the write path. We don't want
to rip out closures entirely from the write path, because they're used
for e.g. waiting on the allocator, or waiting on the journal flush, and
that stuff would get really ugly without closures.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_extent_ptr_decoded_append() is more general than we need here; we
know we're initializing a new extent so e.g. we're going to need the crc
entry.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We can't reuse bios without reinitializing them, and in the retry path
it's safer to just make sure we don't reuse them at all.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The generic IO path now handles inode updates for i_size and i_sectors -
this means we can drop a fair amount of code from fs-io.c.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The next few patches are going to be more moving the logic around
i_size/i_sectors updates to io.c, and better separating the Linux VFS
specific code from core bcachefs code, to better support the fuse port.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The btree_trans struct needs to memoize/cache btree iterators, so that
on transaction restart we don't have to completely redo btree lookups,
and so that we can do them all at once in the correct order when the
transaction had to restart to avoid a deadlock.
This switches the btree iterator lookups to work based on iterator
position, instead of trying to match them up based on the stack trace.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With reflink, various code now has to handle both KEY_TYPE_extent
or KEY_TYPE_reflink_v - so, convert it to be generic across all keys
with pointers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
for_each_btree_key() calls bch2_trans_get_iter() - we have to reset the
transaction state before getting the iterator again, in the retry path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With reflink, we'll no longer be able to calculate the offset of the
data we want into the extent we're reading from from the extent pos and
the iter pos - we'll have to pass it in separately.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bio_uncompress_inplace() used to potentially need to extend the bio to
be big enough for the uncompressed data, which has become problematic
with multipage bvecs - but, the move extent path actually already
allocated the bios to be big enough for the uncompressed data.
The promote path needed to be fixed, though.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Switch to always using bio_add_page(), which merges contiguous pages now
that we have multipage bvecs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch_write_op->written used to be a u16, but it's not so the assertion
isn't needed anymore - and 5.1 can send larger bios.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.
Also improve the on disk format versioning stuff.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This new helper for the move path avoids creating a new CRC entry when
we already have one that matches the pointer being added.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_alloc_sectors_start() was a nightmare to work with - it's got some
tricky stuff to do, since it wants to use the buckets the writepoint
already has, unless they're not in the target it wants to write to,
unless it can't allocate from any other devices in which case it will
use those buckets if it has to - et cetera.
This restructures the code to start with a new empty list of open
buckets we're going to use for the new allocation, pulling buckets from
the write point's list as we decide that we really are going to use
them - making the code somewhat more functional and drastically easier
to understand.
Also fixes a bug where we could end up waiting on c->freelist_wait
(because allocating from one device failed) but return success from
bch2_bucket_alloc(), because allocating from a different device
succeeded.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.
Website: https://bcachefs.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>