We can assume that usually buffered and O_DIRECT IO won't be mixed, and
the calls to flush the page cache won't be needed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In theory we should be able to do (non appending/extending) dio writes
without taking the inode lock at all - but this gets us most of the way
there.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds some horrible hacks, but the atomic ops for closures were
getting to be a pretty expensive part of the write path. We don't want
to rip out closures entirely from the write path, because they're used
for e.g. waiting on the allocator, or waiting on the journal flush, and
that stuff would get really ugly without closures.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
On IO error, bch2_writepages_io_done() will set the page state to
indicate nothing's already reserved (since the write didn't happen, we
don't know what's already reserved). This can race with the buffered IO
path, in between getting a disk reservation and calling
bch2_set_page_dirty().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This works around a bug where bio_full() doesn't check for
bio->bi_iter.bi_size overflowing - and, we don't really want to build
bios that are that big anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The generic IO path now handles inode updates for i_size and i_sectors -
this means we can drop a fair amount of code from fs-io.c.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The next few patches are going to be more moving the logic around
i_size/i_sectors updates to io.c, and better separating the Linux VFS
specific code from core bcachefs code, to better support the fuse port.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Moving bch2_extent_update() to io.c will be greatly simplified if we
no longer have to keep ei_inode.bi_size/bi_sectors up to date.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In bch2_extent_update(), we have to update the inode if i_size is
changing (the file is being extend) or if i_sectors is changing, but we
want to avoid touching the inode if it's not necessary.
Change sum_sector_overwrites() to also check if there's already data
above where we're writing to - this means we're definitely not extending
the file.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We can't use the page lock to protect it, because on writeback IO error
we need to access the page state before calling end_page_writeback() and
the page lock semantics are completely insane so that deadlocks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a bug in io.c bch2_write_index_default() - it was missing the
traverse call, but bch2_extent_atomic_end returns an error now and can
just call it itself.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When grab_cache_page_write_begin() fails but we did pin some pages, we
shouldn't return -ENOMEM, we should do a partial write.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The btree_trans struct needs to memoize/cache btree iterators, so that
on transaction restart we don't have to completely redo btree lookups,
and so that we can do them all at once in the correct order when the
transaction had to restart to avoid a deadlock.
This switches the btree iterator lookups to work based on iterator
position, instead of trying to match them up based on the stack trace.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We shouldn't ever be writing past i_size - but, apparently there's still
a bug to track down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If the user buffer isn't aligned to the filesystem block size, on a
large enough IO - where it won't fit into a single bio -
bio_iov_iter_get_pages() won't necessarily return a bio with the proper
alignment.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Move extents instead of copying them - this way, we can iterate over
only live extents, not the entire keyspace. Also, this means we can
mostly skip running triggers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With reflink, various code now has to handle both KEY_TYPE_extent
or KEY_TYPE_reflink_v - so, convert it to be generic across all keys
with pointers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Prep work for reflink - for reflink, we're going to be using
bch2_extent_update() with other updates in the same transaction.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With reflink, we'll no longer be able to calculate the offset of the
data we want into the extent we're reading from from the extent pos and
the iter pos - we'll have to pass it in separately.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This will mean we don't have to use cmpxchg for modifying page state,
which will simplify a fair amount of code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Switch to always using bio_add_page(), which merges contiguous pages now
that we have multipage bvecs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.
Also improve the on disk format versioning stuff.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
previously, if the code traversed to the next btree node, that could
return an error (due to lock restarts) - which was not being checked
for.
fix is to rework it so it never iterates past the current leaf node, and
pops an assertion if it ever sees an error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
ei_update_lock isn't currently needed for write inode (but it will be
needed again when deferred btree updates are used for inode updates)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.
Website: https://bcachefs.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>