linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-28 13:22:57 +00:00

Author	SHA1	Message	Date
Kent Overstreet	8ee529e9c1	bcachefs: Make sure bch2_trans_mark_update uses correct iter flags Now that bch2_btree_iter_peek_with_updates() has been removed in favor of BTREE_ITER_WITH_UPDATES, we need to make sure it's not used where we don't want it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	2ed5cd508d	bcachefs: Fix a memory leak in dio write path Commit `c42bca92be` "bio: don't copy bvec for direct IO" changed bio_iov_iter_get_pages() to point bio->bi_iovec at the incoming biovec, meaning if we already allocated one, it'll be leaked. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Janpieter Sollie	120f63e321	bcachefs: fix a possible bcachefs checksum mapping error opt-checksum enum to type-checksum enum This fixes some rare cases where the metadata checksum option specified may map to the wrong actual checksum type. Signed-off-by: Janpieter Sollie <janpieter.sollie@edpnet.be> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:06 -04:00
Kent Overstreet	bb6bbf4a06	bcachefs: Clear iter->should_be_locked in bch2_trans_reset Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	290448ed2e	bcachefs: Don't underflow c->sectors_available This rarely used error path should've been checking for underflow - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	953ee28a3e	bcachefs: Kill bch2_btree_iter_peek_cached() It's now been rolled into bch2_btree_iter_peek_slot() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	45c2e33f79	bcachefs: Allow shorter JSET_ENTRY_dev_usage entries If the last entry(ies) would be all zeros, there's no need to write them out - the read path already handles that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Dan Robertson	044c8c9e05	bcachefs: mount: fix null deref with null devname - Fix null deref on mount when given a null device name. - Move the dev_name checks to return EINVAL when it is invalid. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:06 -04:00
Kent Overstreet	a49e9a0589	bcachefs: Fix null ptr deref when splitting compressed extents Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	90d22a660a	bcachefs: Fix overflow in journal_replay_entry_early If filesystem on disk was used by a version with a larger BCH_DATA_NR thas the currently running version, we don't want this to cause a buffer overrun. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	7ed158f294	bcachefs: Always zero memory from bch2_trans_kmalloc() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	b058ac2091	bcachefs: Merging for indirect extents Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	c2177e4da3	bcachefs: Improved extent merging Previously, checksummed extents could only be merged when the checksum covered only the currently live data. xfstest generic/064 creates a test file, then uses finsert calls to split the extent, then collapse calls to see if they get merged. But without any reads to trigger the narrow_crcs path, each of the split extents will still have a checksum for the entire original extent. This patch improves the extent merge path so that if either of the extents we're attempting to merge has a checksum that covers the entire merged extent, we just use that checksum. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	5db95e50e1	bcachefs: Re-implement extent merging in transaction commit path We haven't had extent merging in quite some time. It used to be done by the btree code when sorting btree nodes, but that was eliminated as part of the work to separate extent handling from core btree code. This patch re-implements extent merging in the transaction commit path. We don't currently have the ability to merge reflink pointers, we need to do some work on the triggers code to be able to do that without ending up with incorrect refcounts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	81d22e5d83	bcachefs: Refactor extent_handle_overwrites() Prep work for extent merging Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:06 -04:00
Kent Overstreet	59ba21d99f	bcachefs: Clean up key merging This patch simplifies the key merging code by getting rid of partial merges - it's simpler and saner if we just don't merge extents when they'd overflow k->size. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	cd8319fdd9	bcachefs: Kill trans->updates2 Now that extent handling has been lifted to bch2_trans_update(), we don't need to keep two different lists of updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	c1949baa51	bcachefs: Simplify reflink trigger Now that we only mark entire extents, we can ditch the "reflink_p_frag_references" code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	8e6bbc4181	bcachefs: Move extent_handle_overwrites() to bch2_trans_update() This lifts handling of overlapping extents out of __bch2_trans_commit() and moves it to where we first do the update - which means that BTREE_ITER_WITH_UPDATES can now work correctly in extents mode. Also, this patch reworks how extent triggers work: previously, on partial extent overwrite we would pass this information to the trigger, telling it what part of the extent was being overwritten. But, this approach has had too many subtle corner cases - now, we only mark whole extents, meaning on partial extent overwrite we unmark the old extent and mark the new extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	b1d87f527d	bcachefs: bch2_btree_iter_peek_slot() now saves initial position when searching Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:06 -04:00
Kent Overstreet	1d214eb18d	bcachefs: Kill __bch2_btree_iter_peek_slot_extents() This codepath won't just be for extents in the future, it'll also be for BTREE_ITER_FILTER_SNAPSHOTS mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:05 -04:00
Kent Overstreet	e750296bf5	bcachefs: bch2_btree_iter_peek_slot() now supports BTREE_ITER_WITH_UPDATES Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:05 -04:00
Kent Overstreet	5288e66a7b	bcachefs: BTREE_ITER_WITH_UPDATES This drops bch2_btree_iter_peek_with_updates() and replaces it with a new flag, BTREE_ITER_WITH_UPDATES, and also reworks bch2_btree_iter_peek_slot() to respect it too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:05 -04:00
Kent Overstreet	509d3e0a8d	bcachefs: Child btree iterators This adds the ability for btree iterators to own child iterators - to be used by an upcoming rework of bch2_btree_iter_peek_slot(), so we can scan forwards while maintaining our current position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	c205321b12	bcachefs: Drop all btree locks when submitting btree node reads As a rule we don't want to be holding btree locks while submitting IO - this will improve overall filesystem latency. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	4351d3ecb4	bcachefs: More topology repair code This improves the handling of overlapping btree nodes; now, we handle the case where one btree node completely overwrites another. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	74cc1abdbf	bcachefs: Fix a buffer overrun In make_extent_indirect(), we were allocating too small of a buffer for the new indirect extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	224ec3e677	bcachefs: Don't mark superblocks past end of usable space bcachefs-tools recently started putting a backup superblock at the end of the device. This causes a problem if the bucket size doesn't divide the device size - but we can fix it by just skipping marking that part. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	7138f22097	bcachefs: Fix a spurious debug mode assertion When we switched to using bch2_btree_bset_insert_key() for extents it turned out it started leaving invalid keys around - of type deleted but nonzero size - but this is fine (if ugly) because they're never written out. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Brett Holman	ca47fa2362	bcachefs: Fix unitialized use of a value Signed-off-by: Brett Holman <bpholman5@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:05 -04:00
Dan Robertson	59e2480ff7	bcachefs: do not compile acl mod on minimal config Do not compile the acl.o target if BCACHEFS_POSIX_ACL is not enabled. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:05 -04:00
Kent Overstreet	66a0a49750	bcachefs: btree_iter->should_be_locked Add a field to struct btree_iter for tracking whether it should be locked - this fixes spurious transaction restarts in bch2_trans_relock(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	531a0095c9	bcachefs: Improve btree iterator tracepoints This patch adds some new tracepoints to the btree iterator code, and adds new fields to the existing tracepoints - primarily for the iterator position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	f7beb4ca04	bcachefs: Preallocate transaction mem This helps avoid transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	bc3f8b25f3	bcachefs: Check for errors from bch2_trans_update() Upcoming refactoring is going to change bch2_trans_update() to start returning transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	01254036a3	bcachefs; Check for allocator thread shutdown We were missing a kthread_should_stop() check in the loop in bch2_invalidate_buckets(), very occasionally leading to us getting stuck while shutting down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	d7fc453bdb	bcachefs: Journal space calculation fix When devices have different bucket sizes, we may accumulate a journal write that doesn't fit on some of our devices - previously, we'd underflow when calculating space on that device and then everything would get weird. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	649d9a4dfc	bcachefs: Don't fragment extents when making them indirect This fixes a "disk usage increased without a reservation" bug, when reflinking compressed extents. Also, there's no good reason for reflink to be fragmenting extents anyways. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	890b74f03d	bcachefs: Fsck for reflink refcounts Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	c0ebe3e48c	bcachefs: Assorted endianness fixes Found by sparse Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	ee7570546e	bcachefs: Fix a deadlock Waiting on a btree node write with btree locks held can deadlock, if the write errors: the write error path has to do do a btree update to drop the pointer to the replica that errored. The interior update path has to wait on in flight btree writes before freeing nodes on disk. Previously, this was done in bch2_btree_interior_update_will_free_node(), and could deadlock; now, we just stash a pointer to the node and do it in btree_update_nodes_written(), just prior to the transactional part of the update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:05 -04:00
Kent Overstreet	9f2772c454	bcachefs: Split out btree_error_wq We can't use btree_update_wq becuase btree updates may be waiting on btree writes to complete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:04 -04:00
Kent Overstreet	bff796ae65	bcachefs: Fix pathalogical behaviour with inode sharding by cpu ID If the transactior restarts on a different CPU, it could end up needing to read in a different btree node, which makes another transaction restart more likely... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	d797ca3d8e	bcachefs: Fix journal write error path Journal write errors were racing with the submission path - potentially causing writes to other replicas to not get submitted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	9eba7c8d15	bcachefs: Reflink refcount fix __bch2_trans_mark_reflink_p wasn't always correctly returning the number of sectors processed - the new logic is a bit more straightforward overall too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	b282a74fae	bcachefs: Add an option to control sharding new inode numbers We're seeing a bug where inode creates end up spinning in bch2_inode_create - disabling sharding will simplify what we're testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	9f311f2166	bcachefs: Don't use bch_write_op->cl for delivering completions We already had op->end_io as an alternative mechanism to op->cl.parent for delivering write completions; this switches all code paths to using op->end_io. Two reasons: - op->end_io is more efficient, due to fewer atomic ops, this completes the conversion that was originally only done for the direct IO path. - We'll be restructing the write path to use a different mechanism for punting to process context, refactoring to not use op->cl will make that easier. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:04 -04:00
Kent Overstreet	af17118319	bcachefs: Kill bch_write_op.index_update_fn This deletes bch_write_op.index_update_fn: indirect function calls have gotten considerably more expensive post spectre/meltdown, and we only have two different index_update_fns - this patch adds a flag to specify which one to use (normal vs. data move path). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:04 -04:00
Kent Overstreet	7e94eeffe0	bcachefs: Inline fastpath of bch2_disk_reservation_add() The fastpath now doesn't even disable preemption - instead we use a (non locked) cmpxchg. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:04 -04:00
Kent Overstreet	ddc7dd62f0	bcachefs: Don't use uuid in tracepoints %pU for printing out pointers to uuids doesn't work in perf trace Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	19d2819d2d	bcachefs: Add a tracepoint for copygc waiting Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	c4d4b2f01a	bcachefs: Add a cond_resched call to the copygc main loop We seem to have a bug where the copygc thread ends up spinning and making the system unusable - this will at least prevent it from locking up the machine, and it's a good thing to have anyways. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	443d2760e5	bcachefs: Fix a null ptr deref bch2_btree_iter_peek() won't always return a key - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	9dd89a05fd	bcachefs: Fix an issue with inconsistent btree writes after unclean shutdown After unclean shutdown, btree writes may have completed on one device and not others - and this inconsistency could lead us to writing new bsets with a gap in our btree node in one of our replicas. Fortunately, this is only an issue with bsets that are newer than the most recent journal flush, and we already have a mechanism for detecting and blacklisting those. We just need to make sure to start new btree writes after the most recent _non_ blacklisted bset. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	4495cbed56	bcachefs: Improve FS_IOC_GOINGDOWN ioctl We weren't interpreting the flags argument at all. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	731bdd2eff	bcachefs: Add a workqueue for btree io completions Also, clean up workqueue usage - we shouldn't be using system workqueues, pretty much everything we do needs to be on our own WQ_MEM_RECLAIM workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Brett Holman	2eba51a69a	bcachefs: rewrote prefetch asm in gas syntax for clang compatibility Signed-off-by: Brett Holman <bpholman5@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:04 -04:00
Kent Overstreet	1ce0cf5fe9	bcachefs: Add a debug mode that always reads from every btree replica There's a new module parameter, verify_all_btree_replicas, that enables reading from every btree replica when reading in btree nodes and comparing them against each other. We've been seeing some strange btree corruption - this will hopefully aid in tracking it down and catching it more often. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	596d3bdc1e	bcachefs: Don't repair btree nodes until after interior journal replay is done We need the btree to be in a consistent state before we can rewrite btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	304b7e08c7	bcachefs: Fix an uninitialized var this fixes a valgrind complaint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	a6336910b1	bcachefs: Fix for buffered writes getting -ENOSPC Buffered writes may have to increase their disk reservation at btree update time, due to compression and erasure coding being unpredictable: O_DIRECT writes should be checking for -ENOSPC, but buffered writes have already been accepted and should not. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	16ac8c9523	bcachefs: Fix inode backpointers in RENAME_OVERWRITE When we delete the dirent an inode points to, we need to zero out the backpointer fields - this was missed in the RENAME_OVERWRITE case. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	e7084c9c81	bcachefs: Make bch2_remap_range respect O_SYNC Caught by xfstest generic/628 Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	d6462f494d	bcachefs: Split extents if necessary in bch2_trans_update() Currently, we handle multiple overlapping extents in the same transaction commit by doing fixups in bch2_trans_update() - this patch extents that to split updates when necessary. The next patch that changes the reflink code to not fragment extents when making them indirect will require this. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:03 -04:00
Kent Overstreet	ef1b20924b	bcachefs: Ratelimiting for writeback IOs Writeback throttling is a kernel config option and not always enabled. When it's not enabled we need a fallback, to avoid unbounded memory pinning and work item backlogs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Dan Robertson	ed34341189	bcachefs: statfs resports incorrect avail blocks The current implementation of bch_statfs does not scale the number of available blocks provided in f_bavail by the reserve factor. This causes an allocation of a file of this size to fail. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	c21d537779	bcachefs: Fix for bch2_bkey_pack_pos() not initializing len/version fields This bug led to push_whiteout() generating whiteouts that failed bch2_bkey_invalid() due to nonzero length fields - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:03 -04:00
Kent Overstreet	82355e2882	bcachefs: Fix a memcpy call Not supposed to pass a null ptr to memcpy (even if the size is 0). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:03 -04:00
Kent Overstreet	bbfcb4519d	bcachefs: Fix bch2_extent_can_insert() call It was being skipped when hole punching, leading to problems when splitting compressed extents. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:03 -04:00
Kent Overstreet	2e8f9d23cb	bcachefs: Make sure to pass a disk reservation to bch2_extent_update() It's needed when we split an existing compressed extent - we get a null ptr deref without it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:03 -04:00
Brett Holman	2cd0563461	bcachefs: made changes to support clang, fixed a couple bugs fs/bcachefs/bset.c edited prefetch macro to add clang support fs/bcachefs/btree_iter.c bugfix: initialize iter->real_pos in bch2_btree_iter_init for later use fs/bcachefs/io.c bugfix: eliminated undefined behavior (negative bitshift) fs/bcachefs/buckets.c bugfix: invert sign to handle 64bit abs() Signed-off-by: Brett Holman <bpholman5@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	6ebe32b94c	bcachefs: Fix locking in __bch2_set_nr_journal_buckets() We weren't holding mark_lock correctly - it's needed for the new_fs path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Dan Robertson	d125615a4e	bcachefs: properly initialize used values - Ensure the second key value in bch_hash_info is initialized to zero if the info type is of type BCH_STR_HASH_SIPHASH. - Initialize the possibly returned value in bch2_inode_create. Assuming bch2_btree_iter_peek returns bkey_s_c_null, the uninitialized value of ret could be returned to the user as an error pointer. - Fix compiler warning in initialization of bkey_s_c_stripe fs/bcachefs/buckets.c:1646:35: warning: suggest braces around initialization of subobject [-Wmissing-braces] struct bkey_s_c_stripe new_s = { NULL }; ^~~~ Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	e1036ce581	bcachefs: Repair code for multiple types of data in same bucket bch2_check_fix_ptrs() is awkward, we need to find a way to improve it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Dan Robertson	faf1a5f417	bcachefs: Fix out of bounds read in fs usage ioctl Fix a possible read out of bounds if bch2_ioctl_fs_usage is called when replica_entries_bytes is set to a value that is smaller than the size of bch_replicas_usage. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Dan Robertson	2b25de552f	bcachefs: Fix null deref in bch2_ioctl_read_super Do not attempt to cleanup the returned value of bch2_device_lookup if the returned value was an error pointer. We currently check to see if the returned value is null and run the cleanup otherwise. As a result, we attempt to run the cleanup on a error pointer. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Dan Robertson	ec4ab9d2fc	bcachefs: Fix possible null deref on mount Ensure that the block device pointer in a superblock handle is not null before dereferencing it in bch2_dev_to_fs. The block device pointer may be null when mounting a new bcachefs filesystem given another mounted bcachefs filesystem exists that has at least one device that is offline. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Dan Robertson	baf056b87d	bcachefs: Fix error in parsing of mount options When parsing the mount options duplicate the given options. This is required as the options are parsed twice and strsep is used in parsing. The options will be modified into a possibly invalid options set for the second round of parsing if the options are not duplicated before parsing. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Stijn Tintel	ffcf9ec78c	bcachefs: avoid out-of-bounds in split_devs Calling mount with an empty source string causes an out-of-bounds error in split_devs. Check the length of the source string to avoid this. Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	909004d2f9	bcachefs: Make sure to use BTREE_ITER_PREFETCH in fsck Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	360746bf6f	bcachefs: Fix bch2_btree_iter_peek_with_updates() By not re-fetching the next update we were going into an infinite loop. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	933532b8b2	bcachefs: Fix reflink trigger The trigger for reflink pointers wasn't always incrementing/decrementing the refcounts correctly - this patch fixes that logic. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	3a402c8dab	bcachefs: Fix some refcounting bugs We really need debug mode assertions that ca->ref and ca->io_ref are used correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Dan Robertson	5bc38f44fa	bcachefs: Fix oob write in __bch2_btree_node_write Fix a possible out of bounds write in __bch2_btree_node_write when the data buffer padding is cleared up to the block size. The out of bounds write is possible if the data buffers size is not a multiple of the block size. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	1784d43a88	bcachefs: Fix usage of last_seq + encryption jset->last_seq is in the region that's encrypted - on journal write completion, we were using it and getting garbage. This patch shadows it to fix. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	ac1019d32b	bcachefs: Clean up bch2_btree_and_journal_walk() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	e68031fb46	bcachefs: Mark newly allocated btree nodes as accessed This was a major oversight - this means under memory pressure we can end up reading in a btree node, then having it evicted before we get to use it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	595c1e9bab	bcachefs: Fix time handling There were some overflows in the time conversion functions - fix this by converting tv_sec and tv_nsec separately. Also, set sb->time_min and sb->time_max. Fixes xfstest generic/258. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	4f6dad46cb	bcachefs: Add a tracepoint for when we block on journal reclaim Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	2ce867df31	bcachefs: Make sure to initialize j->last_flushed If the journal reclaim thread makes it to the timeout without ever initializing j->last_flushed, we could end up sleeping for a very long time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	050197b1c1	bcachefs: Ensure that fpunch updates inode timestamps Fixes xfstests generic/059 Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	d4b4422345	bcachefs: Change copygc wait amount to be min of per device waits We're seeing a filesystem get stuck when all devices but one have no more reclaimable buckets - because the copygc wait amount is curretly filesystem wide. This patch should fix that, possibly at the expensive of running too much when only one or a few devices is full and the rebalance thread needs to move data around. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	baa6502905	bcachefs: Change bch2_btree_key_cache_count() to exclude dirty keys We're seeing livelocks that appear to be due to bch2_btree_key_cache_scan repeatedly scanning and blocking other tasks from using the key cache lock - we probably shouldn't be reporting objects that can't actually be freed yet. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	d99af4f194	bcachefs: Call bch2_inconsistent_error() on missing stripe/indirect extent Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	3dea728ce6	bcachefs: New tracepoint for bch2_trans_get_iter() Trying to debug an issue where after traverse_all() we shouldn't have to traverse any iterators... yet we are Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	d36cdb045a	bcachefs: Fix __bch2_trans_get_iter() We need to also set iter->uptodate to indicate it needs to be traversed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	ceda1b9a17	bcachefs: Evict btree nodes we're deleting There was a bug that led to duplicate btree node pointers being inserted at the wrong level. The new topology repair code can fix that, except that the btree cache code gets confused when we read in a btree node from the pointer that was at the wrong level. This patch evicts nodes that we're deleting to, which nicely solves the problem. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	fc51b041b7	bcachefs: New check_nlinks algorithm for snapshots With snapshots, using a radix tree for the table of link counts won't work anymore because we also need to distinguish between inodes with different snapshot IDs. Instead, this patch builds up a sorted array of inodes that have hardlinks that we can binary search on - taking advantage of the fact that with inode backpointers, the check_nlinks() pass _only_ needs to concern itself with inodes that have hardlinks now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	e3b4b48c17	bcachefs: Fix a null ptr deref Fix a few memory safety issues, found by asan in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	aae15aafcd	bcachefs: New and improved topology repair code This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	4932e07ea0	bcachefs: Fix key cache assertion Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	0098376f03	bcachefs: New helper __bch2_btree_insert_keys_interior() Consolidate common parts of bch2_btree_insert_keys_interior() and btree_split_insert_keys() - prep work for adding some new topology assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	bcd25dac53	bcachefs: Rewrite btree nodes with errors This patch adds self healing functionality for btree nodes - if we notice a problem when reading a btree node, we just rewrite it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	8058b532ac	bcachefs: Fix bch2_verify_keylist_sorted Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	bc2e5d5c66	bcachefs: Fix an out of bounds read bch2_varint_decode() can read up to 7 bytes past the end of the buffer, which means we need to allocate slightly larger key cache buffers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	65c0601a32	bcachefs: Use mmap() instead of vmalloc_exec() in userspace Calling mmap() directly is much better than malloc() then mprotect(), we end up with much less address space fragmentation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	537c32f521	bcachefs: Don't BUG_ON() btree topology error This replaces an assertion in the btree merge path with a bch2_inconsistent_error() - fsck will fix it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	1c8441bea5	bcachefs: Fix repair leading to replicas not marked bch2_check_fix_ptrs() was being called after checking if the replicas set was marked - but repair could change which replicas set needed to be marked. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	58686a259e	bcachefs: Lookup/create lost+found lazily This is prep work for subvolumes - each subvolume will have its own lost+found. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	eb365fbc33	bcachefs: Don't BUG() in update_replicas Apparently, we have a bug where in mark and sweep while accounting for a key, a replicas entry isn't found. Change the code to print out the key we couldn't mark and halt instead of a BUG_ON(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	f09517fc51	bcachefs: Fix a deadlock on journal reclaim Flushing the btree key cache needs to use allocation reserves - journal reclaim depends on flushing the btree key cache for making forward progress, and the allocator and copygc depend on journal reclaim making forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	6adaac0b95	bcachefs: Update bch2_btree_verify() bch2_btree_verify() verifies that the btree node on disk matches what we have in memory. This patch changes it to verify every replica, and also fixes it for interior btree nodes - there's a mem_ptr field which is used as a scratch space and needs to be zeroed out for comparing with what's on disk. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	7b7278bbaf	bcachefs: Fix two btree iterator leaks Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	51c804ed2a	bcachefs: Punt btree writes to workqueue to submit We don't want to be submitting IO with btree locks held, and btree writes usually aren't latency sensitive. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	4d47b21c4d	bcachefs: Fix a use after free Turns out, we weren't waiting on in flight btree writes when freeing existing btree nodes. This lead to stray btree writes overwriting newly allocated buckets, but only started showing itself with some of the recent allocator work and another patch to move submitting of btree writes to worqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	8ce600d447	bcachefs: Fix for btree_gc repairing interior btree ptrs Using the normal transaction commit path to insert and journal updates to interior nodes hadn't been done before this repair code was written, not surprising that there was a bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	e95d7edfb7	bcachefs: Preallocate trans mem in bch2_migrate_index_update() This will help avoid transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	89baec780f	bcachefs: Allocator refactoring This uses the kthread_wait_freezable() macro to simplify a lot of the allocator thread code, along with cleaning up bch2_invalidate_bucket2(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	fa272f33bb	bcachefs: Always check for invalid bkeys in trans commit path We check for this prior to metadata being written, but we're seeing some strange bugs lately, and this will help catch those closer to where they occur. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	27cc532ef2	bcachefs: Check that keys are in the correct btrees We've started seeing bug reports of pointers to btree nodes being detected in leaf nodes. This should catch that before it's happened, and it's something we should've been checking anyways. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	04903131db	bcachefs: Handle errors in bch2_trans_mark_update() It's not actually the case that iterators are always checked here - __bch2_trans_commit() checks for that after running triggers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	6ad060b0eb	bcachefs: Allocator thread doesn't need gc_lock anymore Even with runtime gc (which currently isn't supported), runtime gc no longer clears/recalculates the main set of bucket marks - it allocates and calculates another set, updating the primary at the end. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	dac1525d9c	bcachefs: gc shouldn't care about owned_by_allocator The owned_by_allocator field is a purely in memory thing, even if/when we bring back GC at runtime there's no need for it to be recalculating this field. This is prep work for pulling it out of struct bucket, and eventually getting rid of the bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	694015c2b1	bcachefs: Refactor bchfs_fallocate() to not nest btree_trans on stack Upcoming patch is going to disallow multiple btree_trans on the stack. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	f02810a1a4	bcachefs: Fix an unused var warning in userspace Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	f24fab9cba	bcachefs: Fix some small memory leaks Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	ae8bbb9fac	bcachefs: Simplify fsck remove_dirent() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	5e6a668b19	bcachefs: Fix transaction restarts due to upgrading of cloned iterators This fixes a regression from 52d86202fd bcachefs: Improve bch2_btree_iter_traverse_all() We want to avoid mucking with other iterators in the btree transaction in operations that are only supposed to be touching individual iterators - that patch was a cleanup to move lock ordering handling to bch2_btree_iter_traverse_all(). But it broke upgrading of cloned iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	96f399d0ee	bcachefs: Fix journal reclaim loop When dirty key cache keys were separated from other journal pins, we broke the loop conditional in __bch2_journal_reclaim() - it's supposed to keep looping as long as there's work to do. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	3e07a7300f	bcachefs: Fix an RCU splat Writepoints are never deallocated so the rcu_read_lock() isn't really needed, but we are doing lockless list traversal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	633632ef1b	bcachefs: Simplify bch2_set_nr_journal_buckets() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	d62ab355d7	bcachefs: Fix bch2_trans_mark_dev_sb() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	73a117d2d8	bcachefs: Improve trans_restart_mem_realloced tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	558509aa01	bcachefs: Don't downgrade iterators in bch2_trans_get_iter() This fixes a livelock with btree node splits. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	2527dd9158	bcachefs: Improve bch2_btree_iter_traverse_all() By changing it to upgrade iterators to intent locks to avoid lock restarts we can simplify __bch2_btree_node_lock() quite a bit - this fixes a probable bug where it could potentially drop a lock on an unrelated error but still succeed instead of causing a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	0ef107859b	bcachefs: Fix journal_reclaim_wait_done() Can't run arbitrary code inside a wait_event() conditional, due to task state being weird... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	1b9374adec	bcachefs: Fix bch2_gc_done() error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	5e427571c5	bcachefs: Don't call bch2_btree_iter_traverse() unnecessarily If we let bch2_trans_commit() do it, it'll traverse iterators in sorted order which means we'll get fewer lock restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	a0c9cc1727	bcachefs: Better iterator picking Avoid cloning iterators if we don't have to. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	d44a6e350e	bcachefs: Drop old style btree node coalescing We have foreground btree node merging now, and any future btree node merging improvements are going to be based off of that code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	4aac975b6c	bcachefs: Add a perf test for multiple updates per commit Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	e949fbbba0	bcachefs: Ensure bucket gen gc completes We don't want it to block, if it can't allocate it should just continue instead of possibly deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	ac516d0e7d	bcachefs: Add the status of bucket gen gc to sysfs Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	319c130507	bcachefs: Fix heap overrun in bch2_fs_usage_read() XXX squash oops Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	423300e8fe	bcachefs: BCH_BEATURE_atomic_nlink is obsolete Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	d3ff7fec9c	bcachefs: Improved check_directory_structure() Now that we have inode backpointers, we can simplify checking directory structure: instead of doing a DFS from the filesystem root and then checking if we found everything, we can iterate over every inode and see if we can go up until we get to the root. This patch also has a number of fixes and simplifications for the inode backpointer checks. Also, it turns out we don't actually need the BCH_INODE_BACKPTR_UNTRUSTED flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	176cf4bf59	bcachefs: Fix fsck to not use bch2_link_trans() bch2_link_trans() uses the btree key cache for inode updates, and fsck isn't supposed to - also, it's not really what we want for reattaching unreachable inodes anyways. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	b69ac13cb3	bcachefs: Fix bch2_trans_relock() The patch that changed bch2_trans_relock() to not look at iter->uptodate also tried to add an optimization by only having it relock btree_iter_key() iterators (iterators that are live or have been marked as keep). But, this wasn't thought through - this pops internal iterator assertions because on transaction restart, when we're traversing iterators we traverse all iterators marked as linked, and having bch2_trans_relock() skip some of those mean that it can skil the iterator that bch2_btree_iter_traverse_one() is currently traversing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	b906aaddf2	bcachefs: Redo check_nlink fsck pass Now that we have inode backpointers the check_nlink pass only is concerned with files that have hardlinks, and can be simplified. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	8a85b20cd7	bcachefs: Inode backpointers are now required This lets us simplify fsck quite a bit, which we need for making fsck snapshot aware. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	7ac2c55e4d	bcachefs: Simplify hash table checks Very early on there was a period where we were accidentally generating dirents with trailing garbage; we've since dropped support for filesystems that old and the fsck code can be dropped. Also, this patch switches to a simpler algorithm for checking hash tables. It's less efficient on hash collision - but with 64 bit keys, those are very rare. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	5c16add5ad	bcachefs: Check inodes at start of fsck This splits out checking inode nlinks from the rest of the inode checks and moves most of the inode checks to the start of fsck, so that other fsck passes can depend on it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	d7f35163e6	bcachefs: Fix BTREE_ITER_NOT_EXTENTS bch2_btree_iter_peek() wasn't properly checking for BTREE_ITER_IS_EXTENTS when updating iter->pos. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	0e96452eef	bcachefs: Fix bch2_gc_btree_gens() Since we're using a NOT_EXTENTS iterator, we shouldn't be setting the iter pos to the start of the extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	6ae0d16d29	bcachefs: Make sure to kick journal reclaim when we're waiting on it Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	b1bd955ba5	bcachefs: Don't wait for ALLOC_SCAN_BATCH buckets in allocator It used to be necessary for the allocator thread to batch up invalidating buckets when possible - but since we added the btree key cache that hasn't been a concern, and now it's causing the allocator thread to livelock when the filesystem is nearly full. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	3a14d58e7b	bcachefs: Drop bch2_fsck_inode_nlink() We've had BCH_FEATURE_atomic_nlink for quite some time, we can drop this now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	b6d4f474e4	bcachefs: Move some dirent checks to bch2_dirent_invalid() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	2177147b39	bcachefs: Improve bset compaction The previous patch that fixed btree nodes being written too aggressively now meant that we weren't sorting btree node bsets optimally - this patch fixes that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	241e26369e	bcachefs: Don't flush btree writes more aggressively because of btree key cache We need to flush the btree key cache when it's too dirty, because otherwise the shrinker won't be able to reclaim memory - this is done by journal reclaim. But journal reclaim also kicks btree node writes: this meant that btree node writes were getting kicked much too often just because we needed to flush btree key cache keys. This patch splits journal pins into two different lists, and teaches journal reclaim to not flush btree node writes when it only needs to flush key cache keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	9d8022db1c	bcachefs: Eliminate more PAGE_SIZE uses In userspace, we don't really have a well defined PAGE_SIZE and shouln't be relying on it. This is some more incremental work to remove references to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	a085778500	bcachefs: Increase BSET_CACHELINE to 256 bytes Linear searches have gotten cheaper relative to binary searches on modern hardware, due to better branch prediction behaviour. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	f72b1fd710	bcachefs: Fix a startup race Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	ecc1420944	bcachefs: Fix an uninitialized variable Fortunately it was just used in an error message Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	3ce8b463e3	bcachefs: kill bset_tree->max_key Since we now ensure a btree node's max key fits in its packed format, this isn't needed for the reasons it used to be - and, it was being used inconsistently. Also reorder struct btree a bit for performance, and kill some dead code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	671cc8a51b	bcachefs: Eliminate memory barrier from fast path of journal_preres_put() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	08e337618f	bcachefs: Drop some memset() calls gcc is emitting rep stos here, which is silly (and slow) for an 8 byte memset. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	35d5aff263	bcachefs: Kill bch2_fs_usage_scratch_get() This is an important cleanup, eliminating an unnecessary copy in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	9c2e624290	bcachefs: Fix livelock calling bch2_mark_bkey_replicas() The bug was that we were trying to find a replicas entry that wasn't sorted - but, we can also simplify the code by not using bch2_mark_bkey_replicas and instead ensuring the list of replicas entries exists directly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	2940295c97	bcachefs: Be more careful about JOURNAL_RES_GET_RESERVED JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be done to free up space in the journal. In particular, when we're flushing keys from the key cache, if we're flushing them out of order we shouldn't be using it, since we're using up our remaining space in the journal without dropping a pin that will let us make forward progress. With this patch, BTREE_INSERT_JOURNAL_RECLAIM without BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on journal reclaim if we're already in journal reclaim. This means we need to propagate these errors up to journal reclaim, indicating that flushing a journal pin should be retried in the future. This is prep work for a patch to change the way journal reclaim works, to split out flushing key cache keys because the btree key cache is too dirty from journal reclaim because we need space in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	6167f7c8ff	bcachefs: Fix journal deadlock After we get a journal reservation, we need to use it - if we erorr out of a transaction commit, we'll be eating into space in the journal and if our transaction needs to make forward progress in order to reclaim space in the journal, we'll deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	b753d4b338	bcachefs: Fix this_cpu_ptr() usage Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	2d587674ba	bcachefs: Increase commality between BTREE_ITER_NODES and BTREE_ITER_KEYS Eventually BTREE_ITER_NODES should be going away. This patch is to fix a transaction iterator overflow in the btree node merge path because BTREE_ITER_NODES iterators couldn't be reused. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	2fa81d0b5b	bcachefs: Fix BTREE_FOREGROUND_MERGE_HYSTERESIS We were multiplying instead of dividing - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	5c1d808ad8	bcachefs: Drop trans->nounlock Since we're no longer doing btree node merging post commit, we can now delete a bunch of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	b182ff609f	bcachefs: Move btree node merging to before transaction commit Currently, BTREE_INSERT_NOUNLOCK makes it hard to ensure btree node merging happens reliably - since btree node merging happens after transaction commit, we can't drop btree locks and block when starting the btree update. This patch moves it to before transaction commit - and failure to do a merge that we wanted to do just restarts the transaction. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	ecab6be7e5	bcachefs: bch2_foreground_maybe_merge() now correctly reports lock restarts This means that btree node splits don't have to automatically trigger a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	54ca47e114	bcachefs: Kill bch2_btree_node_get_sibling() This patch reworks the btree node merge path to use a second btree iterator to get the sibling node - which means bch2_btree_iter_get_sibling() can be deleted. Also, it uses bch2_btree_iter_traverse_all() if necessary - which means it should be more reliable. We don't currently even try to make it work when trans->nounlock is set - after a BTREE_INSERT_NOUNLOCK transaction commit, hopefully this will be a worthwhile tradeoff. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	1259cc31b2	bcachefs: Change where merging of interior btree nodes is trigger from Previously, we were doing btree node merging from bch2_btree_insert_node() - but this is called from the split path, when we're in the middle of creating new nodes and deleting new nodes and the iterators are in a weird state. Also, this means we're starting a new btree_update while in the middle of an existing one, and that's asking for deadlocks. Much simpler and saner to trigger btree node merging _after_ the whole btree node split path is finished. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	e264b2f62a	bcachefs: Improve bch2_btree_update_start() bch2_btree_update_start() is now responsible for taking gc_lock and upgrading the iterator to lock parent nodes - greatly simplifying error handling and all of the callers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	ba5f03d362	bcachefs: Add a sysfs var for average btree write size Useful number for performance tuning. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	d5a43661a1	bcachefs: Improve bch2_trans_relock() We're getting away from relying on iter->uptodate - this changes bch2_trans_relock() to more directly specify which iterators should be relocked. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	acb3b26e76	bcachefs: Move btree lock debugging to slowpath fn Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	24db24c749	bcachefs: Don't make foreground writes wait behind journal reclaim too long Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	65bcd6579d	buckets.c fixups XXX squash Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	5f65d74d79	bcachefs: Add repair code for out of order keys in a btree node. This just drops the offending key - in the bug report where this was seen, it was clearly a single bit memory error, and fsck will fix the missing key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	a84b6c50f1	bcachefs: Free iterator in bch2_btree_delete_range_trans() This is specifically to speed up bch2_inode_rm(), so that we're not traversing iterators we're done with. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	c5f51cdd5f	bcachefs: Have journal reclaim thread flush more aggressively This adds a new watermark for the journal reclaim when flushing btree key cache entries - it should try and stay ahead of where foreground threads doing transaction commits will enter direct journal reclaim. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	883d9701f1	bcachefs: Don't use bch2_inode_find_by_inum() in move.c Since move.c isn't aware of what subvolume we're in, we can't use the standard inode lookup code - fortunately, we're just using it for reading IO options. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	e6ae272724	bcachefs: Change inode allocation code for snapshots For snapshots, when we allocate a new inode we want to allocate an inode number that isn't in use in any other subvolume. We won't be able to use ITER_SLOTS for this, inode allocation needs to change to use BTREE_ITER_ALL_SNAPSHOTS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	ab2a29ccff	bcachefs: Inode backpointers This patch adds two new inode fields, bi_dir and bi_dir_offset, that point back to the inode's dirent. Since we're only adding fields for a single backpointer, files that have been hardlinked won't necessarily have valid backpointers: we also add a new inode flag, BCH_INODE_BACKPTR_UNTRUSTED, that's set if an inode has ever had multiple links to it. That's ok, because we only really need this functionality for directories, which can never have multiple hardlinks - when we add subvolumes, we'll need a way to enemurate and print subvolumes, and this will let us reconstruct a path to a subvolume root given a subvolume root inode. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	e751c01a8e	bcachefs: Start using bpos.snapshot field This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor\|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	4cf91b0270	bcachefs: Split out bpos_cmp() and bkey_cmp() With snapshots, we're going to need to differentiate between comparisons that should and shouldn't include the snapshot field. bpos_cmp is now the comparison function that does include the snapshot field, used by core btree code. Upper level filesystem code generally does _not_ want to compare against the snapshot field - that code wants keys to compare as equal even when one of them is in an ancestor snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	43d002432d	bcachefs: Add a mechanism for running callbacks at trans commit time Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	331194a230	bcachefs: btree key cache locking improvements The btree key cache mutex was becoming a significant bottleneck - it was mainly used to protect the lists of dirty, clean and freed cached keys. This patch eliminates the dirty and clean lists - instead, when we need to scan for keys to drop from the cache we iterate over the rhashtable, and thus we're able to remove most uses of that lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	2649b514b6	bcachefs: Simplify btree_node_iter_init_pack_failed() Since we now make sure to always generate packed bkey formats that can pack the min_key of a btree node, this path should actually never happen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	f793fd85dc	bcachefs: Fix for bch2_trans_commit() unlocking when it's not supposed to When we pass BTREE_INSERT_NOUNLOCK bch2_trans_commit isn't supposed to unlock after a successful commit, but it was calling bch2_trans_cond_resched() - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	3bf57160c2	bcachefs: Fix packed bkey format calculation for new btree roots Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	c7e04e22e0	bcachefs: Fix building of aux search trees We weren't packing the min/max keys, which was a major oversight and completely disabled generating bkey_floats for adjacent nodes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	2da5d000b9	bcachefs: Generate better bkey formats when splitting nodes On btree node split, we weren't ensuring the min_key of the new larger node packs in the new format for this node. This triggers some painful slowpaths in the bset.c aux search tree code - this patch fixes that by calculating a new format for the new node with the new min_key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00

... 2 3 4 5 6 ...

85120 commits