Commit graph

1216023 commits

Author SHA1 Message Date
Kent Overstreet
af4d05c46b bcachefs: Account for stripe parity sectors separately
Instead of trying to charge EC parity to the data within the stripe
(which is subject to rounding errors), let's charge it to the stripe
itself. It should also make -ENOSPC issues easier to deal with if we
charge for parity blocks up front, and means we can also make more fine
grained accounting available to the user.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet
39283c712e bcachefs: Fix for bad stripe pointers
The allocator usually doesn't increment bucket gens right away on
buckets that it's about to hand out (for reasons that need to be
documented), instead deferring that to whatever extent update first
references that bucket.

But stripe pointers reference buckets without changing bucket sector
counts, meaning we could end up with a pointer in a stripe with a gen
newer than the bucket it points to.

Fix this by adding a transactional trigger for KEY_TYPE_stripe that just
writes out the keys in the alloc btree for the buckets it points to.

Also - consolidate the code that checks pointer validity.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
289980195f bcachefs: Start/stop io clock hands in read/write paths
This fixes a bug where the clock hands in the journal and superblock
didn't match, because we were still incrementing the read clock hand
while read-only.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
8d6b6222bf bcachefs: Improvements to writing alloc info
Now that we've got transactional alloc info updates (and have for
awhile), we don't need to write it out on shutdown, and we don't need to
write it out on startup except when GC found errors - this is a big
improvement to mount/unmount performance.

This patch also fixes a few bugs where we weren't writing out alloc
info (on new filesystems, and new devices) and should have been.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
aa8889c07a bcachefs: Fix assertion popping in transaction commit path
We can't be holding read locks on btree nodes when we go to take write
locks: this would deadlock if another thread is holding an intent lock
on the node we have a read lock on, and it tries to commit and upgrade
to a write lock.

But instead of triggering an assertion, if this happens we can just
upgrade the read lock to an intent lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
f3721e12d0 bcachefs: Perf improvements for bch_alloc_read()
On large filesystems reading in the alloc info takes a significant
amount of time. But we don't need to be calling into the fully general
bch2_mark_key() path, just open code what we need in
bch2_alloc_read_fn().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
9f20ed157d bcachefs: Fix copygc dying on startup
The copygc threads errors out and makes the filesystem go RO if it ever
tries to run and discovers it has no reserve allocated - which is a
problem if it races with the allocator thread and its reserve hasn't
been filled yet.

The allocator thread doesn't start filling the copygc reserve until
after BCH_FS_STARTED has been set, so make sure to wake up the allocator
threads after setting that and before starting copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
6ea873d172 bcachefs: Fix copygc of compressed data
The check for when we need to get a disk reservation was wrong.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
97c0e19502 bcachefs: Fix another lockdep splat
vfree() can allocate memory, so we need to call memalloc_nofs_save().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
505b7a4c28 bcachefs: Fix errors early in the fs init process
At some point bch2_fs_alloc() was changed to always call bch2_fs_free()
in the error path, which means we need c->cl to always be initialized.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
922ae9f455 bcachefs: Copy ptr->cached when migrating data
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
c47c50f856 bcachefs: Fix gc of stale ptr gens
Awhile back, gcing of stale pointers was split out from full
mark-and-sweep gc - but, the bit to actually drop those stale pointers
wasn't implemnted. Whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
9ee38f62da bcachefs: Fix off-by-one error in ptr gen check
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
5d0b7f906a bcachefs: Fix a lockdep splat
We can't allocate memory with GFP_FS while holding the btree cache lock,
and vfree() can allocate memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
9ba2eb25f0 bcachefs: Fix __bch2_truncate_page()
__bch2_truncate_page() will mark some of the blocks in a page as
unallocated. But, if the page is mmapped (and writable), every block in
the page needs to be marked dirty, else those blocks won't be written by
__bch2_writepage().

The solution is to change those userspace mappings to RO, so that we
force bch2_page_mkwrite() to be called again.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
61ce38b862 bcachefs: Fix journal_seq_copy()
We also need to update the journal's bloom filter of inode numbers that
each journal write has upudates for - in case the inode gets evicted
before it gets fsynced.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
d5e4dcc29c bcachefs: Fix unmount path
There was a long standing race in the mount/unmount code - the VFS
intends for mount/unmount synchronizatino to be handled by the list of
superblocks, but we were still holding devices open after tearing down
our superblock in the unmount path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
625104ea21 bcachefs: Don't fail mount if device has been removed
Also - make sure to show the devices we actually have open in /proc

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
ca73852a13 bcachefs: Improvements to the journal read error paths
- Print out more information in error messages
 - On checksum error, keep the journal entry but mark it bad so that we
   can prefer entries from other devices that don't have bad checksums

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
a672fb8f5d bcachefs: Make sure to go rw if lazy in fsck
The paths where we delete or truncate inodes don't pass commit flags for
BTREE_INSERT_LAZY_RW, so just go rw if necessary in the fsck code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
7af0cec307 bcachefs: Some project id fixes
Inode options that are accessible via the xattr interface are stored
with a +1 bias, so that a value of 0 means unset. We weren't handling
this consistently.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
ac7eef0318 bcachefs: Don't report inodes to statfs
We don't have a limit on the number of inodes in a filesystem, so this
is apparently the right way to report that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
f9adbb7d5d bcachefs: Add a cond_resched() to bch2_alloc_write()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
2d8c0da1a7 bcachefs: Fix a couple null ptr derefs when no disk groups exist
Normally successfully parsing a target means disk groups should exist,
but we don't want a BUG() or null ptr deref if we end up with an invalid
target.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
01566db2fb bcachefs: Fix disk groups not being updated when set via sysfs
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
142cbdff9b bcachefs: Change copygc to consider bucket fragmentation
When devices have different sized buckets this is more correct.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
1421bea38a bcachefs: Don't block on allocations when only writing to specific device
Since the copygc thread is now global and not per device, we're not
freeing up space on any one device in bounded time - and indeed we never
really were, since rebalance wasn't moving data around between devices
with that objective.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
9f115ce9e9 bcachefs: Fix a bug with the journal_seq_blacklist mechanism
Previously, we would start doing btree updates before writing the first
journal entry; if this was after an unclean shutdown, this could cause
those btree updates to not be blacklisted.

Also, move some code to headers for userspace debug tools.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
00c24f53b5 bcachefs: Fix bch2_new_stripes_to_text()
painful looking typo, fortunately difficult to hit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
768b42a7eb bcachefs: Don't disallow btree writes to RO devices
There's an inherent race with setting devices RO when they have dirty
btree nodes on them. We already check if a btree node is on an RO device
before we dirty it, so this patch just allows those writes so that we
don't have errors forcing the entire filesystem read only when trying to
remove a device.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
79e72a9093 bcachefs: Fix maximum btree node size
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
7807e14384 bcachefs: Convert various code to printbuf
printbufs know how big the buffer is that was allocated, so we can get
rid of the random PAGE_SIZEs all over the place.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
4580baec7f bcachefs: Remove some uses of PAGE_SIZE in the btree code
For portability to userspace, we should try to avoid working in kernel
pages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
760992aac8 bcachefs: Ensure we wake up threads locking node when reusing it
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
f805824220 bcachefs: Fix bch2_btree_node_insert_fits()
It should be checking for the recently added flag
btree_node_needs_rewrite.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
d3a2b5d809 bcachefs: Ensure we only allocate one EC bucket per writepoint
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
33e339619f bcachefs: Fix a race with BCH_WRITE_SKIP_CLOSURE_PUT
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
74ed7e560b bcachefs: Don't let copygc buckets be stolen by other threads
And assorted other copygc fixes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
3d080aa52f bcachefs: Delete unused arguments
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
4fe7efa177 bcachefs: Fix an error path
We were missing a 'goto retry' and continuing on with an error pointer.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
988e98cfce bcachefs: Refactor replicas code
Awhile back the mechanism for garbage collecting unused replicas entries
was significantly improved, but some cleanup was missed - this patch
does that now.

This is also prep work for a patch to account for erasure coded parity
blocks separately - we need to consolidate the logic for
checking/marking the various replicas entries from one bkey into a
single function.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
8f3b41ab4f bcachefs: Don't restrict copygc writes to the same device
This no longer makes any sense, since copygc is now one thread per
filesystem, not per device, with a single write point.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
63b214e75b bcachefs: Add bch2_blk_status_to_str()
We define our own BLK_STS_REMOVED, so we need our own to_str helper too.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
a2b5313a39 bcachefs: Fix a faulty assertion
Now that updates to interior nodes are journalled, we shouldn't be
checking topology of interior nodes until we've finished replaying
updates to that node.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
e8306e3b4c bcachefs: Wrap write path in memalloc_nofs_save()
This fixes a lockdep splat where we're allocating memory with vmalloc in
the compression bounce path, which doesn't always obey GFP_NOFS.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
f621e1521c bcachefs: Add an option for rebuilding the replicas section
There is a bug where we cnan end up clearing the data_has field in the
superblock members section, which causes us to skip reading the journal
and thus journal replay fails. This option tells the recovery path to
not trust those fields.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
e6d1161530 bcachefs: Make copygc thread global
Per device copygc threads don't move data to different devices and they
make fragmentation works - they don't make much sense anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
f793bc1549 bcachefs: Drop extra pointers when marking data as in a stripe
We ideally want the buckets used for the extra initial replicas to be
reused right away.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
1d2ff0a630 bcachefs: Fix extent_ptr_durability() calculation for erasure coded data
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
89fd25be70 bcachefs: Use x-macros for data types
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00