linux-stable

Commit Graph

Author	SHA1	Message	Date
Kent Overstreet	3398124444	bcachefs: Improve trace_trans_restart_would_deadlock In the CI, we're seeing tests failing due to excessive would_deadlock transaction restarts - the tracepoint now includes the lock cycle that occured. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	e153a0d70b	bcachefs: Improve trace_trans_restart_too_many_iters() We now include the list of paths in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	3c471b6588	bcachefs: convert bch_fs_flags to x-macro Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	ad9c7992eb	bcachefs: Kill btree_iter->journal_pos For BTREE_ITER_WITH_JOURNAL, we memoize lookups in the journal keys, to avoid the binary search overhead. Previously we stashed the pos of the last key returned from the journal, in order to force the lookup to be redone when rewinding. Now bch2_journal_keys_peek_upto() handles rewinding itself when necessary - so we can slim down btree_iter. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	e56978c80d	bcachefs: Kill BTREE_ITER_ALL_LEVELS As discussed in the previous patch, BTREE_ITER_ALL_LEVELS appears to be racy with concurrent interior node updates - and perhaps it is fixable, but it's tricky and unnecessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	cd5bd16282	bcachefs: Fix redundant variable initialization path->level was being read, but never used. Reported-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	fa014953f9	bcachefs: Fix extents iteration + snapshots interaction peek_upto() checks against the end position and bails out before FILTER_SNAPSHOTS checks; this is because if we end up at a different inode number than the original search key none of the keys we see might be visibile in the current snapshot - we might be looking at inode in a completely different subvolume. But this is broken, because when we're iterating over extents we're checking against the extent start position to decide when to bail out, and the extent start position isn't monotonically increasing until after we've run FILTER_SNAPSHOTS. Fix this by adding a simple inode number check where the old bailout check was, and moving the main check to the correct position. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>	2024-01-01 11:42:39 -05:00
Thomas Bertschinger	50a8a732d2	bcachefs: fix invalid memory access in bch2_fs_alloc() error path When bch2_fs_alloc() gets an error before calling bch2_fs_btree_iter_init(), bch2_fs_btree_iter_exit() makes an invalid memory access because btree_trans_list is uninitialized. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Fixes: `6bd68ec266` ("bcachefs: Heap allocate btree_trans") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-12-14 15:24:14 -05:00
Kent Overstreet	8a443d3ea1	bcachefs: Proper refcounting for journal_keys The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 02:43:12 -05:00
Kent Overstreet	006ccc3090	bcachefs: Kill journal pre-reservations This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:43 -05:00
Kent Overstreet	9fcdd23b6e	bcachefs: Add a comment for BTREE_INSERT_NOJOURNAL usage BTREE_INSERT_NOJOURNAL is primarily used for a performance optimization related to inode updates and fsync - document it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	f82755e4e8	bcachefs: Data move path now uses bch2_trans_unlock_long() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:11 -04:00
Kent Overstreet	c4accde498	bcachefs: Ensure srcu lock is not held too long The SRCU read lock that btree_trans takes exists to make it safe for bch2_trans_relock() to deref pointers to btree nodes/key cache items we don't have locked, but as a side effect it blocks reclaim from freeing those items. Thus, it's important to not hold it for too long: we need to differentiate between bch2_trans_unlock() calls that will be only for a short duration, and ones that will be for an unbounded duration. This introduces bch2_trans_unlock_long(), to be used mainly by the data move paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 14:17:11 -04:00
Kent Overstreet	be9e782df3	bcachefs: Don't downgrade locks on transaction restart We should only be downgrading locks on success - otherwise, our transaction restarts won't be getting the correct locks and we'll livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	88dfe193bd	bcachefs: bch2_btree_id_str() Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-31 12:18:37 -04:00
Kent Overstreet	6bd68ec266	bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	96dea3d599	bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Colin Ian King	6bf3766b52	bcachefs: Fix a handful of spelling mistakes in various messages There are several spelling mistakes in error messages. Fix these. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	5b7fbdcd5b	bcachefs: Fix silent enum conversion error This changes mark_btree_node_locked() to take an enum btree_node_locked_type, not a six_lock_type, since BTREE_NODE_UNLOCKED is -1 which may cause problems converting back and forth to six_lock_type if short enums are in use. With this change, we never store BTREE_NODE_UNLOCKED in a six_lock_type enum. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	8e877caaad	bcachefs: Split out snapshot.c subvolume.c has gotten a bit large, this splits out a separate file just for managing snapshot trees - BTREE_ID_snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	ff5b741c25	bcachefs: Zero btree_paths on allocation This fixes a bug in the cycle detector, bch2_check_for_deadlock() - we have to make sure the node pointers in the btree paths array are set to something not-garbage before another thread may see them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	401585fe87	bcachefs: btree_journal_iter.c Split out a new file from recovery.c for managing the list of keys we read from the journal: before journal replay finishes the btree iterator code needs to be able to iterate over and return keys from the journal as well, so there's a fair bit of code here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:10 -04:00
Kent Overstreet	1e81f89b02	bcachefs: Fix assorted checkpatch nits Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:10 -04:00
Kent Overstreet	bf5a261c7a	bcachefs: Assorted fixes for clang clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:09 -04:00
Kent Overstreet	73bd774d28	bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	faa6cb6c13	bcachefs: Allow for unknown btree IDs We need to allow filesystems with metadata from newer versions to be mountable and usable by older versions. This patch enables us to roll out new btrees without a new major version number; we can now handle btree roots for unknown btree types. The unknown btree roots will be retained, and fsck (including backpointers) will check them, the same as other btree types. We add a dynamic array for the extra, unknown btree roots, in addition to the fixed size btree root array, and add new helpers for looking up btree roots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:05 -04:00
Kent Overstreet	a5b696ee6e	bcachefs: seqmutex; fix a lockdep splat We can't be holding btree_trans_lock while copying to user space, which might incur a page fault. To fix this, convert it to a seqmutex so we can unlock/relock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	49c7cd9d8d	bcachefs: More drop_locks_do() conversions Using drop_locks_do() ensures that every unlock() is paired with a relock(), with proper error checking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	78367aaa5a	bcachefs: bch2_trans_kmalloc no longer allocates memory with btree locks held When allocating memory, gfp flags should generally be - GFP_NOWAIT\|__GFP_NOWARN if btree locks are held - GFP_NOFS if in the IO path or otherwise holding resources needed for IO submission - GFP_KERNEL otherwise Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	b5fd75669a	bcachefs: drop_locks_do() Add a new helper for the common pattern of: - trans_unlock() - do something - trans_relock() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	f154c3eb42	bcachefs: trans_for_each_path_safe() bch2_btree_trans_to_text() is used on btree_trans objects that are owned by different threads - when printing out deadlock cycles - so we need a safe version of trans_for_each_path(), else we race with seeing a btree_path that was just allocated and not fully initialized: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	32913f49f5	six locks: Seq now only incremented on unlock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	1fb4fe6317	six locks: Kill six_lock_state union As suggested by Linus, this drops the six_lock_state union in favor of raw bitmasks. On the one hand, bitfields give more type-level structure to the code. However, a significant amount of the code was working with six_lock_state as a u64/atomic64_t, and the conversions from the bitfields to the u64 were deemed a bit too out-there. More significantly, because bitfield order is poorly defined (#ifdef __LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the sequence number would overflow into the rest of the bitfield if the compiler didn't put the sequence number at the high end of the word. The new code is a bit saner when we're on an architecture without real atomic64_t support - all accesses to lock->state now go through atomic64_*() operations. On architectures with real atomic64_t support, we additionally use atomic bit ops for setting/clearing individual bits. Text size: 7467 bytes -> 4649 bytes - compilers still suck at bitfields. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	f375d6ca58	bcachefs: Don't call local_clock() twice in trans_begin() local_clock() is not as cheap as we'd like it to be, alas Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:01 -04:00
Kent Overstreet	11f117374a	bcachefs: Call bch2_path_put_nokeep() before bch2_path_put() bch2_path_put_nokeep() is sketchy, and we should consider removing it: it unconditionally frees btree_paths once their ref hits 0. The assumption is that we only use it for paths that have never been visible outside the btree core btree code; i.e. higher level code will never be making assumptions about locking based on these paths. However, there's subtle brokenness with this approach: - If we call bch2_path_put(), then bch2_path_put_nokeep(), bch2_path_put() may free the first path on the assumption that we we have another path keeping a node locked - but then bch2_path_put_nokeep() just unconditionally frees it. The same bug may arise if we're calling bch2_path_put() and bch2_path_put_nokeep() on the same (refcounted) path, or two adjacent paths that point to the same btree node. This patch hacks around one of these bugs by calling bch2_path_put_nokeep() first in bch2_trans_iter_exit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	65d48e3525	bcachefs: Private error codes: ENOMEM This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	511b629aca	bcachefs: bch2_btree_iter_peek_node_and_restart() Minor refactoring for the Rust interface. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	1306f87de3	bcachefs: Plumb btree_trans through btree cache code Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned device support is going to require a new allocation for every btree node write. This is a bit of prep work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	0f2ea6550f	bcachefs: bch2_btree_iter_peek_and_restart_outlined() Needed for interfacing with Rust - bindgen can't handle inline functions, alas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	5e2d8be8bd	bcachefs: Split trans->last_begin_ip and trans->last_restarted_ip These are two different things - this improves our debug assert messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Kent Overstreet	6623c0fcdf	bcachefs: Add an assertion for using multiple btree_trans A thread should never be using more than one btree_trans - doing so is an invitation for deadlocks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Kent Overstreet	920e69bc3d	bcachefs: Btree write buffer This adds a new method of doing btree updates - a straight write buffer, implemented as a flat fixed size array. This is only useful when we don't need to read from the btree in order to do the update, and when reading is infrequent - perfect for the LRU btree. This will make LRU btree updates fast enough that we'll be able to use it for persistently indexing buckets by fragmentation, which will be a massive boost to copygc performance. Changes: - A new btree_insert_type enum, for btree_insert_entries. Specifies btree, btree key cache, or btree write buffer. - bch2_trans_update_buffered(): updates via the btree write buffer don't need a btree path, so we need a new update path. - Transaction commit path changes: The update to the btree write buffer both mutates global, and can fail if there isn't currently room. Therefore we do all write buffer updates in the transaction all at once, and also if it fails we have to revert filesystem usage counter changes. If there isn't room we flush the write buffer in the transaction commit error path and retry. - A new persistent option, for specifying the number of entries in the write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	60b5538877	bcachefs: trans->notrace_relock_fail When we unlock in order to submit IO, the next relock event is likely to fail if submit_bio() blocked - we shouldn't those events in our _fail stats, since those are expected events and shouldn't cause test failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	434b1c75a4	bcachefs: Switch a BUG_ON() to a panic() This assert is popping - rarely - in the CI, this will help us track it down from the logs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	992fa4e620	bcachefs: Fix btree_path_alloc() We need to call bch2_trans_update_max_paths() before marking the new path as allocated, since we're not initializing it yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	c72f687a1f	bcachefs: Use for_each_btree_key_upto() more consistently It's important that in BTREE_ITER_FILTER_SNAPSHOTS mode we always use peek_upto() and provide an end for the interval we're searching for - otherwise, when we hit the end of the inode the next inode be in a different subvolume and not have any keys in the current snapshot, and we'd iterate over arbitrarily many keys before returning one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	94c69fafa7	bcachefs: Use six_lock_ip() This uses the new _ip() interface to six locks and hooks it up to btree_path->ip_allocated, when available. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	12344c7cb9	bcachefs: bch2_trans_in_restart_error() This replaces various BUG_ON() assertions with panics that tell us where the restart was done and the restart type. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	464b415539	bcachefs: Fix bch2_trans_reset_updates() This should have been resetting trans->fs_usage_deltas as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	4e3d18991a	bcachefs: Inline bch2_btree_path_traverse() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	ee2c6ea776	bcachefs: btree_iter->ip_allocated In debug mode, we now track where btree iterators and paths are initialized/allocated - helpful in tracking down btree path overflows. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	6c36318cc7	bcachefs: key cache: Don't hold btree locks while using GFP_RECLAIM This is something we need to do more widely: instead of bothering with GFP_NOIO/GFP_NOFS, if we need to allocate memory while holding locks: - first attempt the allocation with GFP_NOWAIT - if that fails, drop btree locks with bch2_trans_unlock(), then retry with GFP_KERNEL. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	c82ed3047b	bcachefs: Fix bch2_btree_path_traverse_all() We need to take a ref on a path while we're traversing it: this fixes a bug with paths getting reused while being traversed, in the key cache fill code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	ee94c413a7	bcachefs: Delete a faulty assertion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:48 -04:00
Kent Overstreet	9d7f2a4111	bcachefs: bch2_btree_trans_to_text(): print blocked time Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:48 -04:00
Kent Overstreet	e242b92af5	bcachefs: Fix for long running btree transactions & key cache While a btree transaction is running, we hold a SRCU read lock on the btree key cache that prevents btree key cache keys from being freed - this is so that relock() operations won't access freed memory. The downside of this is that long running btree transactions prevent memory from being freed from the key cache. This adds a check in bch2_trans_begin() - if the transaction has been running longer than 1 second, drop and retake the SRCU read lock and zero out pointers to unlock key cache paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:48 -04:00
Kent Overstreet	08f7803159	bcachefs: bch2_trans_revalidate_updates_in_node() When we started stashing the key being overwritten in btree_insert_entry, this introduced a typical iterator invalidation problem, triggered by btree node splits or resorts. Previously, dealt with this by unconditionally re-validating those stashed pointers in the transaction commit path. This patch gets rid of that by doing it only when needed, in bch2_trans_node_add() or bch2_trans_node_reinit_iter(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:48 -04:00
Kent Overstreet	321bdc73f3	bcachefs: bkey_min(), bkey_max() Parallel to bpos_min(), bpos_max() - trivial refactoring. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:48 -04:00
Kent Overstreet	ef0732861a	bcachefs: Add a missing bch2_btree_path_traverse() call bch2_btree_iter_peek_upto() in snapshots mode may need to keep a btree_path for the insert position, not just the position of the key we're returning. The code was incorrectly assuming this would be in the same btree node - we were missing a bch2_btree_path_traverse() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	5c792e1b64	bcachefs: Fix a btree iter assertion pop This fixes a (harmless) broken invariant in __bch2_btree_path_set_pos(): iterators to interior nodes should point to the first non whiteout. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	1617d56dc9	bcachefs: Key cache now works for snapshots btrees This switches btree_key_cache_fill() to use a btree iterator, not a btree path, so that it can search for keys in previous snapshots. We also add another iterator flag, BTREE_ITER_KEY_CACHE_FILL, to avoid recursion back into the key cache. This will allow us to re-enable the key cache for inodes in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	087e53c255	bcachefs: Bring back BTREE_ITER_CACHED_NOFILL Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	dcced06942	bcachefs: Kill __btree_trans_peek_key_cache() There was no reason for this to be a separate helper - we always want the relock call that btree_trans_peek_key_cache() did. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	52bf51b91f	bcachefs: Fix __btree_trans_peek_key_cache() We were returning a pointer to a variable on the stack - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	e88a75ebe8	bcachefs: New bpos_cmp(), bkey_cmp() replacements This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	c96f108b05	bcachefs: Optimize bch2_trans_iter_init() When flags & btree_id are constants, we can constant fold the entire calculation of the actual iterator flags - and the whole thing becomes small enough to inline. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	c9ee99ad8c	bcachefs: Move some asserts behind CONFIG_BCACHEFS_DEBUG Convert some non-critical asserts in long-stable code to debug asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	0f35e0860a	bcachefs: Fix return code from btree_path_traverse_one() trans->restarted is a positive error code, not the usual negative Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	b2d1d56b1d	bcachefs: Fixes for building in userspace - Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	984dc67e3b	bcachefs: Improve __bch2_btree_path_make_mut() btree_path_copy() doesn't need to call bch2_btree_path_check_sort_fast() - the newly allocated path will always be in the correct position, post copy; also delete some redundant branches from __bch2_btree_path_make_mut(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	0cc455b3ca	bcachefs: Inlining improvements - Don't call into bch2_encrypt_bio() when we're not encrypting - Pull slowpath out of trans_lock_write() - Make sure bc2h_trans_journal_res_get() gets inlined. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	a101957649	bcachefs: More style fixes Fixes for various checkpatch errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	c167f9e541	bcachefs: Journal keys overlay fixes - In the btree iterator code that overlays keys from the journal, we were incorrectly specifying level=0 instead of the btree_path's current level in a few places - When we didn't do journal replay, we shouldn't free the journal keys: this fixes cmd_list and cmd_dump, which run in norecovery mode Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	1f69368c5c	bcachefs: Fix an out-of-bounds shift roundup_pow_of_two() is undefined for 0 - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	c81f5836a4	bcachefs: Don't touch c->flags in bch2_trans_iter_init() This moves the JOURNAL_REPLAY_DONE flag check from bch2_trans_iter_init() to bch2_trans_init(), where we stash a copy in btree_trans - gaining us a small performance improvement. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	3e3e02e6bc	bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	307e3c1319	bcachefs: Optimize bch2_trans_init() Now we store the transaction's fn idx in a local variable, instead of redoing the lookup every time we call bch2_trans_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	29aa78f15e	bcachefs: Split out __btree_path_up_until_good_node() This breaks up btree_path_up_until_good_node() so that only the fastpath gets inlined. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	d7e4e51370	bcachefs: Switch to local_clock() for fastpath time source local_clock() isn't always completely accurate - e.g. on machines with TSC drift - but ktime_get_ns() overhead is too high, unfortunately. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	dccedaaa52	bcachefs: Fix btree node prefetchig We were forgetting to count down the number of nodes to prefetch, firing off _way_ more than intended - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	f42238b5cd	bcachefs: Fix a rare path in bch2_btree_path_peek_slot() In the drop_alloc tests, we may end up calling bch2_btree_iter_peek_slot() on an interior level that doesn't exist. Previously, this would hit the path->uptodate assertion in bch2_btree_path_peek_slot(); this path first checks a NULL btree node, which is how we know we're at the end of the btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	7dcbdbd85c	bcachefs: bch2_path_put_nokeep() The btree iterator code may allocate extra btree paths, temporarily, that do not refer to keys being returned: we don't need to wait until transaction restart to drop these, when they're not referenced they should be deleted right away. This fixes a transaction path overflow bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	969576ecae	bcachefs: bch2_btree_iter_peek() now works with interior nodes Needed by the next patch, which will be iterating over keys in nodes at level 1. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	29cea6f483	bcachefs: Fix bch2_btree_path_up_until_good_node() There was a rare bug when path->locks_want was nonzero, but not BTREE_MAX_DEPTH, where we'd return on a valid node that wasn't locked - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	c298fd7d34	bcachefs; Mark __bch2_trans_iter_init as inline This function is fairly small and only used in two places: one very hot, the other cold, so it should definitely be inlined. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	3f3bc66ef0	bcachefs: Optimize btree_path_alloc() - move slowpath code to a separate function, btree_path_overflow() - no need to use hweight64 - copy nr_max_paths from btree_transaction_stats to btree_trans, avoiding a data dependency in the fast path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	14d8f26ad0	bcachefs: Inline bch2_trans_kmalloc() fast path Small performance optimization. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	a8f3542843	bcachefs: bch2_print_string_as_lines() This adds a helper for printing a large buffer one line at a time, to avoid the 1k printk limit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	e9174370d0	bcachefs: bch2_btree_node_relock_notrace() Most of the node_relock_fail trace events are generated from bch2_btree_path_verify_level(), when debugcheck_iterators is enabled - but we're not interested in these trace events, they don't indicate that we're in a slowpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	afbc719468	bcachefs: Improve bch2_btree_trans_to_text() This is just a formatting/readability improvement. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	0d7009d7ca	bcachefs: Delete old deadlock avoidance code This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:41 -04:00
Kent Overstreet	96d994b37c	bcachefs: Print deadlock cycle in debugfs In the event that we're not finished debugging the cycle detector, this adds a new file to debugfs that shows what the cycle detector finds, if anything. By comparing this with btree_transactions, which shows held locks for every btree_transaction, we'll be able to determine if it's the cycle detector that's buggy or something else. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	33bd5d0686	bcachefs: Deadlock cycle detector We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:41 -04:00
Kent Overstreet	845cffed0d	bcachefs: Add a debug assert Chasing down a strange locking bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	57ce827442	bcachefs: Make an assertion more informative Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	e4215d0fec	bcachefs: All held locks must be in a btree path With the new deadlock cycle detector, it's critical that all held locks be marked in a btree_path, because that's what the cycle detector traverses - any locks that aren't correctly marked will cause deadlocks. This changes the btree_path to allocate some btree_paths for the new nodes, since until the final update is done we otherwise don't have a path referencing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	38474c2642	bcachefs: Avoid using btree_node_lock_nopath() With the upcoming cycle detector, we have to be careful about using btree_node_lock_nopath - in particular, using it to take write locks can cause deadlocks. All held locks need to be tracked in a btree_path, so that the cycle detector knows about them - unless we know that we cannot cause deadlocks for other reasons: e.g. we are only taking read locks, or we're in very early fsck (topology repair). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	4e6defd106	bcachefs: btree_bkey_cached_common->cached Add a type descriptor to btree_bkey_cached_common - there's no reason not to since we've got padding that was otherwise unused, and this is a nice cleanup (and helpful in later patches). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	674cfc2624	bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	c240c3a944	bcachefs: Print lock counts in debugs btree_transactions Improve our debugfs output, to help in debugging deadlocks: this shows, for every btree node we print, the current number of readers/intent locks/write locks held. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00

1 2 3 4 5 ...

471 Commits