Commit Graph

1215955 Commits

Author SHA1 Message Date
Kent Overstreet b3d1e6cab2 bcachefs: Fix build warning when CONFIG_BCACHEFS_DEBUG=n
this function is only used by debug code, but we'd like to always build
it so we know that it does build.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet 811d2bcd85 bcachefs: Drop typechecking from bkey_cmp_packed()
This only did anything in two places, and those can just be replaced
wiht bkey_cmp_left_packed()).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet 73e7470b31 bcachefs: More inlinining in the btree key cache code
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet 7e7ae6ca57 bcachefs: Fix spurious transaction restarts
The checks for lock ordering violations weren't quite right.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet 1a21bf9866 bcachefs: Add a single slot percpu buf for btree iters
Allocating our array of btree iters is a big enough allocation that it
hits the buddy allocator, and we're seeing lots of lock contention.
Sticking a single element buffer in front of it should help.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Matthew Wilcox (Oracle) 00276f9f34 bcachefs: Use attach_page_private and detach_page_private
These recently added helpers simplify the code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Matthew Wilcox (Oracle) 96fee47e44 bcachefs: Remove page_state_init_for_read
This is dead code; delete the function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet b735d73a00 bcachefs: Build fixes for 32bit x86
PAGE_SIZE and size_t are not unsigned longs on 32 bit, annoying...

also switch to atomic64_cmpxchg instead of cmpxchg() for
journal_seq_copy, as atomic64_cmpxchg has a fallback that uses spinlocks
for when it's not supported.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet b5e8a6992f bcachefs: Improved inode create optimization
This shards new inodes into different btree nodes by using the processor
ID for the high bits of the new inode number. Much faster than the
previous inode create optimization - this also helps with sharding in
the other btrees that index by inode number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet df082b3a50 bcachefs: Report inode counts via statfs
Took awhile to figure out exactly what statfs wanted...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet d108efc254 bcachefs: add const annotations to bset.c
perhaps a bit silly, but some debug assertions we want to add need const
propagated a bit more.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet ae1ede5893 bcachefs: Don't embed btree iters in btree_trans
These haven't been in used since reallocing iterators has been disabled,
and saves us a lot of stack if we get rid of it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet 692d4031a4 bcachefs: Split out debug_check_btree_accounting
This check is very expensive

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 29364f3453 bcachefs: Drop sysfs interface to debug parameters
It's not used much anymore, the module paramter interface is better.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 2f33ece9b4 bcachefs: Minor journal reclaim improvement
With the btree key cache code, journal reclaim now has a lot more work
to do. It could be the case that after journal reclaim has finished one
iteration there's already more work to do, so put it in a loop to check
for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 45e4dcba79 bcachefs: Inode create optimization
On workloads that do a lot of multithreaded creates all at once, lock
contention on the inodes btree turns out to still be an issue.

This patch adds a small buffer of inode numbers that are known to be
free, so that we can avoid touching the btree on every create. Also,
this changes inode creates to update via the btree key cache for the
initial create.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet b16fa0bae5 bcachefs: Improve check for when bios are physically contiguous
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet dcf141b9e1 bcachefs: Fix spurious transaction restarts
The check for whether locking a btree node would deadlock was wrong - we
have to check that interior nodes are locked before descendents, but
this check was wrong when consider cached vs. non cached iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet a301dc38ef bcachefs: Improve tracing for transaction restarts
We have a bug where we can get stuck with a process spinning in
transaction restarts - need more information.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 527087c741 bcachefs: Fix stack corruption
A bkey_on_stack_realloc() call was in the wrong place, and broken for
indirect extents

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 8cad3e2f73 bcachefs: Use cached iterators for inode updates
This switches inode updates to use cached btree iterators - which should
be a nice performance boost, since lock contention on the inodes btree
can be a bottleneck on multithreaded workloads.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet e7b854b1f7 bcachefs: fiemap fixes
- fiemap didn't know about inline extents, fixed
 - advancing to the next extent after we'd chased a pointer to the
   reflink btree was wrong, fixed

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 645d72aa36 bcachefs: Fix btree updates when mixing cached and non cached iterators
There was a bug where bch2_trans_update() would incorrectly delete a
pending update where the new update did not actually overwrite the
existing update, because we were incorrectly using BTREE_ITER_TYPE when
sorting pending btree updates.

This affects the pending patch to use cached iterators for inode
updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet eb4609796d bcachefs: Add mode to bch2_inode_to_text
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 8be901d5d4 bcachefs: Always write a journal entry when stopping journal
This is to fix a (harmless) bug where the read clock hand in the
superblock doesn't match the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 33114c2d89 bcachefs: Drop alloc keys from journal when -o reconstruct_alloc
This fixes a bug where we'd pop an assertion due to replaying a key for
an interior btree node when that node no longer exists.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 801a3de642 bcachefs: Indirect inline data extents
When inline data extents were added, reflink was forgotten about - we
need indirect inline data extents for reflink + inline data to work
correctly.

This patch adds them, and a new feature bit that's flipped when they're
used.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 13dcd4abcd bcachefs: Fix rare use after free in read path
If the bkey_on_stack_reassemble() call in __bch2_read_indirect_extent()
reallocates the buffer, k in bch2_read - which we pointed at the
bkey_on_stack buffer - will now point to a stale buffer. Whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet e00711d2ca bcachefs: Improve some error messages
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet a10e677a15 bcachefs: Fix for passing target= opts as mount opts
Some options can't be parsed until the filesystem initialized;
previously, passing these options to mount or remount would cause mount
to fail.

This changes the mount path so that we parse the options passed in
twice, and just ignore any options that can't be parsed the first time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 5b088c1dd0 bcachefs: Fix bch2_mark_stripe()
There's no reason not to always recalculate these fields

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet b88e971e45 bcachefs: Don't drop replicas when copygcing ec data
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet af4d05c46b bcachefs: Account for stripe parity sectors separately
Instead of trying to charge EC parity to the data within the stripe
(which is subject to rounding errors), let's charge it to the stripe
itself. It should also make -ENOSPC issues easier to deal with if we
charge for parity blocks up front, and means we can also make more fine
grained accounting available to the user.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet 39283c712e bcachefs: Fix for bad stripe pointers
The allocator usually doesn't increment bucket gens right away on
buckets that it's about to hand out (for reasons that need to be
documented), instead deferring that to whatever extent update first
references that bucket.

But stripe pointers reference buckets without changing bucket sector
counts, meaning we could end up with a pointer in a stripe with a gen
newer than the bucket it points to.

Fix this by adding a transactional trigger for KEY_TYPE_stripe that just
writes out the keys in the alloc btree for the buckets it points to.

Also - consolidate the code that checks pointer validity.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 289980195f bcachefs: Start/stop io clock hands in read/write paths
This fixes a bug where the clock hands in the journal and superblock
didn't match, because we were still incrementing the read clock hand
while read-only.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 8d6b6222bf bcachefs: Improvements to writing alloc info
Now that we've got transactional alloc info updates (and have for
awhile), we don't need to write it out on shutdown, and we don't need to
write it out on startup except when GC found errors - this is a big
improvement to mount/unmount performance.

This patch also fixes a few bugs where we weren't writing out alloc
info (on new filesystems, and new devices) and should have been.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet aa8889c07a bcachefs: Fix assertion popping in transaction commit path
We can't be holding read locks on btree nodes when we go to take write
locks: this would deadlock if another thread is holding an intent lock
on the node we have a read lock on, and it tries to commit and upgrade
to a write lock.

But instead of triggering an assertion, if this happens we can just
upgrade the read lock to an intent lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet f3721e12d0 bcachefs: Perf improvements for bch_alloc_read()
On large filesystems reading in the alloc info takes a significant
amount of time. But we don't need to be calling into the fully general
bch2_mark_key() path, just open code what we need in
bch2_alloc_read_fn().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 9f20ed157d bcachefs: Fix copygc dying on startup
The copygc threads errors out and makes the filesystem go RO if it ever
tries to run and discovers it has no reserve allocated - which is a
problem if it races with the allocator thread and its reserve hasn't
been filled yet.

The allocator thread doesn't start filling the copygc reserve until
after BCH_FS_STARTED has been set, so make sure to wake up the allocator
threads after setting that and before starting copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 6ea873d172 bcachefs: Fix copygc of compressed data
The check for when we need to get a disk reservation was wrong.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 97c0e19502 bcachefs: Fix another lockdep splat
vfree() can allocate memory, so we need to call memalloc_nofs_save().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 505b7a4c28 bcachefs: Fix errors early in the fs init process
At some point bch2_fs_alloc() was changed to always call bch2_fs_free()
in the error path, which means we need c->cl to always be initialized.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 922ae9f455 bcachefs: Copy ptr->cached when migrating data
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet c47c50f856 bcachefs: Fix gc of stale ptr gens
Awhile back, gcing of stale pointers was split out from full
mark-and-sweep gc - but, the bit to actually drop those stale pointers
wasn't implemnted. Whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 9ee38f62da bcachefs: Fix off-by-one error in ptr gen check
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 5d0b7f906a bcachefs: Fix a lockdep splat
We can't allocate memory with GFP_FS while holding the btree cache lock,
and vfree() can allocate memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 9ba2eb25f0 bcachefs: Fix __bch2_truncate_page()
__bch2_truncate_page() will mark some of the blocks in a page as
unallocated. But, if the page is mmapped (and writable), every block in
the page needs to be marked dirty, else those blocks won't be written by
__bch2_writepage().

The solution is to change those userspace mappings to RO, so that we
force bch2_page_mkwrite() to be called again.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 61ce38b862 bcachefs: Fix journal_seq_copy()
We also need to update the journal's bloom filter of inode numbers that
each journal write has upudates for - in case the inode gets evicted
before it gets fsynced.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet d5e4dcc29c bcachefs: Fix unmount path
There was a long standing race in the mount/unmount code - the VFS
intends for mount/unmount synchronizatino to be handled by the list of
superblocks, but we were still holding devices open after tearing down
our superblock in the unmount path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet 625104ea21 bcachefs: Don't fail mount if device has been removed
Also - make sure to show the devices we actually have open in /proc

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00