Commit graph

1215734 commits

Author SHA1 Message Date
Kent Overstreet
54847d253a bcachefs: DIO write path only needs to shoot down pagecache once, not twice
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet
1b783a690d bcachefs: Add pagecache_add lock to buffered IO path, fault path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Justin Husted
6d01598ecd bcachefs: Fix uninitialized field in hash_check_init()
The chain_end field was not initialized before use in
hash_set_chain_start.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet
7edcfbfefe bcachefs: Don't hold inode lock longer than necessary in dio write path
In theory we should be able to do (non appending/extending) dio writes
without taking the inode lock at all - but this gets us most of the way
there.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet
f8f3086338 bcachefs: Avoid atomics in write fast path
This adds some horrible hacks, but the atomic ops for closures were
getting to be a pretty expensive part of the write path. We don't want
to rip out closures entirely from the write path, because they're used
for e.g. waiting on the allocator, or waiting on the journal flush, and
that stuff would get really ugly without closures.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet
f7f63211a4 bcachefs: Don't use extent_ptr_decoded_append() in write path (fixup patch)
bch2_extent_ptr_decoded_append() is more general than we need here; we
know we're initializing a new extent so e.g. we're going to need the crc
entry.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet
887c2a4ee5 bcachefs: bch2_btree_iter_fix_key_modified()
This is considerably cheaper than bch2_btree_node_iter_fix(), for cases
where the key was only modified and key ordering isn't changing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
b7ba66c845 bcachefs: Inline more of bch2_trans_commit hot path
The main optimization here is that if we let
bch2_replicas_delta_list_apply() fail, we can completely skip calling
bch2_bkey_replicas_marked_locked().

And assorted other small optimizations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
ff929515cc bcachefs: Trust btree alloc info at runtime
This lets us avoid a cache miss in the write path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
c4e065c23c bcachefs: More bset.c microoptimization
Improve a few paper cuts that've shown up during profiling.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Justin Husted
928c839cc9 bcachefs: Initialize btree_node flags field in bch2_btree_root_alloc.
Valgrind data indicated that the flags field was only partially
initialized when written to disk.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Justin Husted
43cfbad6e4 bcachefs: Further padding fixes in bch2_journal_super_entries_add_common()
The previous patch 128cb1a to fix uninitialized data was incorrect and
did not initialize the padding space correctly. Furthermore, several
other cases in this function do not initialize their padding space
correctly.

Move initialization into some helper functions in a more robust way.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Justin Husted
e3728b5003 bcachefs: Initialize padding space after alloc bkey
Packed bkeys are padded up to 64 bit alignment, but the alloc bkey type
was not clearing the pad bytes after the last data byte. This left the
key possibly containing some random garbage at the end.

This problem was found using valgrind.

This patch also changes a path with the inode bkey to clear in the same
way.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
e219965586 bcachefs: Add missing error checking in bch2_find_by_inum_trans()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
406d6d5a07 bcachefs: Fix an error path race
On IO error, bch2_writepages_io_done() will set the page state to
indicate nothing's already reserved (since the write didn't happen, we
don't know what's already reserved). This can race with the buffered IO
path, in between getting a disk reservation and calling
bch2_set_page_dirty().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
92384391c8 bcachefs: Don't reuse bio in retry path
We can't reuse bios without reinitializing them, and in the retry path
it's safer to just make sure we don't reuse them at all.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
b8098f36dd bcachefs: Don't use rep movsq for small memcopies
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
7f9473d171 bcachefs: Avoid calling iter_prev() in extent update path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
2e050d96b0 bcachefs: kill bch2_extent_merge_inline()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
cdd775e6d7 bcachefs: Don't use FUA unnecessarily
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
2a9101a989 bcachefs: Refactor bch2_trans_commit() path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
8f1965391c bcachefs: Make btree_node_type_needs_gc() cheaper
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
77d63522f0 bcachefs: Make replicas_delta_list smaller
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
fbc519ab2e bcachefs: Don't submit bio in write path under lock
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
2d78737d96 bcachefs: Drop bch_write_op->io_wq
This is dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
a94407434b bcachefs: Limit bios in writepages path to 256M
This works around a bug where bio_full() doesn't check for
bio->bi_iter.bi_size overflowing - and, we don't really want to build
bios that are that big anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
71603f1ffe bcachefs: Fix an iterator counting bug
The iterator counting assumed we're doing an obvious optimization when
only updating the refcount on indirect extents - but we're not doing it
yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
ae93a62895 bcachefs: Fix flushing held btree writes when there's a fs error
Previously, we'd go into an infinite loop.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
f38fe2dc5d bcachefs: Fix iterator counting for reflink pointers (again)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
538abcb8a1 bcachefs: Fix a debug assertion
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
821a99b7ba bcachefs: Switch to .iterate_shared for readdir
We definitely don't need an exclusive inode lock for readdir.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
05240ba6b8 bcachefs: Fix creation of lost+found
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
ea3532cbf7 bcachefs: Fix a subtle race in the btree split path
We have to free the old (in memory) btree node _before_ unlocking the
new nodes - else, some other thread with a read lock on the old node
could see stale data after another thread has already updated the new
node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
9a3df993e1 bcachefs: Kill bchfs_extent_update()
The generic IO path now handles inode updates for i_size and i_sectors -
this means we can drop a fair amount of code from fs-io.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
2e87eae1fb bcachefs: Convert bch2_fpunch to bch2_extent_update()
As before - we're moving non Linux specific code out of fs-io.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
2925fc49b3 bcachefs: Split out bchfs_extent_update()
The next few patches are going to be more moving the logic around
i_size/i_sectors updates to io.c, and better separating the Linux VFS
specific code from core bcachefs code, to better support the fuse port.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
e0541a9346 bcachefs: Kill some dependencies on ei_inode
Moving bch2_extent_update() to io.c will be greatly simplified if we
no longer have to keep ei_inode.bi_size/bi_sectors up to date.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
daf3fe502a bcachefs: Check if extending inode differently
In bch2_extent_update(), we have to update the inode if i_size is
changing (the file is being extend) or if i_sectors is changing, but we
want to avoid touching the inode if it's not necessary.

Change sum_sector_overwrites() to also check if there's already data
above where we're writing to - this means we're definitely not extending
the file.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
14989d547e bcachefs: Fix bch2_btree_iter_next() after peek_slot()
this deserves a unit test

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
495fa1a2ec bcachefs: Refactor bch2_readdir() a bit
The tweaks to ctx->pos handling are also to help the fuse port

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
3826ee0b17 bcachefs: Add a lock to bch_page_state
We can't use the page lock to protect it, because on writeback IO error
we need to access the page state before calling end_page_writeback() and
the page lock semantics are completely insane so that deadlocks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
43de7376f3 bcachefs: Fix erasure coding disk space accounting
Disk space accounting for erasure coding + compression was completely
broken - we need to calculate the parity sectors delta the same way we
calculate disk_sectors, by calculating the old and new usage and
subtracting to get the difference.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
9ec211b0ff bcachefs: Fix ec_stripes_read()
The bkey_s_c returned by btree_iter_(peek|next) points into the btree
iter type, so advancing the iterator and then using the one previously
returned is a bug...

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
37954a275f bcachefs: Limit pointers to being in only one stripe
This make the disk accounting code saner, and it's not clear why we'd
ever want the same data to be in multiple stripes simultaneously.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
9ef6068c4d bcachefs: Fix bch2_extent_ptr_durability()
We were looking up the wrong entry in the stripes radix tree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
332c6e5370 bcachefs: Fix bch2_mark_extent()
If an extent only contained cached or erasure coded pointers, there
won't be any devices in the normal dirty replicas list or an entry to
update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Justin Husted
bf974f9203 bcachefs: Initialize journal pad data in bch_replica_entry objects.
Running the filesystem under valgrind exposed some garbage data being
written to disk in bch2_journal_super_entries_add_common(), in the
portion which encodes bch_replica_entry objects.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Justin Husted
f7c0fcdd39 bcachefs: Fix uninitialized data in bch2_gc_btree()
Running the filesystem under valgrind exposed a path where the max_stale
variable in bch2_gc_btree() might not be initialized before use in a
rare case when there are no btree nodes in a transaction.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
a40d97a771 bcachefs: Fix incorrect use of bch2_extent_atomic_end()
bch2_extent_atomic_end counts the number of iterators requried for
marking overwrites - but journal replay never marks overwrites, so that
part was incorrect. And counting iterators for the key being inserted
should be unnecessary because we did that prior to the key being
inserted before it was first journalled.

This should fix an iterator overflow bug - the iterators for walking
overwrites were totally unneeded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:28 -04:00
Kent Overstreet
63fbf458cb bcachefs: Can't be holding read locks while taking write locks
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:28 -04:00