linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-15 15:15:47 +00:00

Author	SHA1	Message	Date
Filipe Manana	19288951ff	btrfs: update comment for btrfs_join_transaction_nostart() Update the comment for btrfs_join_transaction_nostart() to be more clear about how it works and how it's different from btrfs_attach_transaction(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:17 +02:00
Filipe Manana	4490e803e1	btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART When joining a transaction with TRANS_JOIN_NOSTART, if we don't find a running transaction we end up creating one. This goes against the purpose of TRANS_JOIN_NOSTART which is to join a running transaction if its state is at or below the state TRANS_STATE_COMMIT_START, otherwise return an -ENOENT error and don't start a new transaction. So fix this to not create a new transaction if there's no running transaction at or below that state. CC: stable@vger.kernel.org # 4.14+ Fixes: `a6d155d2e3` ("Btrfs: fix deadlock between fiemap and transaction commits") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:17 +02:00
Qu Wenruo	096d230165	btrfs: refactor main loop in memmove_extent_buffer() [BACKGROUND] Currently memove_extent_buffer() does a loop where it strop at any page boundary inside [dst_offset, dst_offset + len) or [src_offset, src_offset + len). This is mostly allowing us to do copy_pages(), but if we're going to use folios we will need to handle multi-page (the old behavior) or single folio (the new optimization). The current code would be a burden for future changes. [ENHANCEMENT] Instead of sticking with copy_pages(), here we utilize the new __write_extent_buffer() helper to handle the writes. Unlike the refactoring in memcpy_extent_buffer(), we can not just rely on the write_extent_buffer() and only handle page boundaries inside src range. The function write_extent_buffer() itself is still doing forward writing, thus it cannot handle the following case: (already in the extent buffer memory operation tests, cross page overlapping run 2) Src Page boundary \|///////\| \|///\|////\| Dst In the above case, if we just follow page boundary in the src range, we have no need to do any split, just one __write_extent_buffer() with use_memmove = true. But __write_extent_buffer() would split the dst range into two, so it first copies the beginning part of the src range into the first half of the dst range. After this operation, the beginning of the dst range is already updated, causing corruption. So we have to follow the old behavior of handling both page boundaries. And since we're the last caller of copy_pages(), we can remove it completely. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:17 +02:00
Qu Wenruo	13840f3f28	btrfs: refactor main loop in memcpy_extent_buffer() [BACKGROUND] Currently memcpy_extent_buffer() does a loop where it would stop at any page boundary inside [dst_offset, dst_offset + len) or [src_offset, src_offset + len). This is mostly allowing us to do copy_pages(), but if we're going to use folios we will need to handle multi-page (the old behavior) or single folio (the new optimization). The current code would be a burden for future changes. [ENHANCEMENT] There is a hidden pitfall of the naming memcpy_extent_buffer(), unlike regular memcpy(), this function can handle overlapping ranges. So here we extract write_extent_buffer() into a new internal helper, __write_extent_buffer(), and add a new parameter @use_memmove, to indicate whether we should use memmove() or regular memcpy(). Now we can go __write_extent_buffer() to handle writing into the dst range, with proper overlapping detection. This has a tiny change to the chance of calling memmove(). As the split only happens at the source range page boundaries, the memcpy/memmove() range would be slightly larger than the old code, thus slightly increase the chance we call memmove() other than memcopy(). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:17 +02:00
Qu Wenruo	682a0bc557	btrfs: copy all pages at once at the end of btrfs_clone_extent_buffer() btrfs_clone_extent_buffer() calls copy_page() at each iteration but we can copy all pages at the end in one go if there were no errors. This would make later conversion to folios easier. Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:17 +02:00
Qu Wenruo	54948681c2	btrfs: refactor main loop in copy_extent_buffer_full() [BACKGROUND] copy_extent_buffer_full() currently does different handling for regular and subpage cases, for regular cases it does a page by page copying. For subpage cases, it just copies the content. This is fine for the page based extent buffer code, but for the incoming folio conversion, it can be a burden to add a new branch just to handle all the different combinations (subpage vs regular, one single folio vs multi pages). [ENHANCE] Instead of handling the different combinations, just go one single handling for all cases, utilizing write_extent_buffer() to do the copying. Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:17 +02:00
Qu Wenruo	730c374e5b	btrfs: use write_extent_buffer() to implement write_extent_buffer_*id() Helpers write_extent_buffer_chunk_tree_uuid() and write_extent_buffer_fsid(), they can be implemented by write_extent_buffer(). These two helpers are not that frequently used, they only get called during initialization of a new tree block. There is not much need for those slightly optimized versions. And since they can be easily converted to one write_extent_buffer() call, define them as inline helpers. This would make later page/folio switch much easier, as all change only need to happen in write_extent_buffer(). Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:17 +02:00
Qu Wenruo	cb22964f1d	btrfs: refactor extent buffer bitmaps operations [BACKGROUND] Currently we handle extent bitmaps manually in extent_buffer_bitmap_set() and extent_buffer_bitmap_clear(). Although with various helpers like eb_bitmap_offset() it's still a little messy to read. The code seems to be a copy of bitmap_set(), but with all the cross-page handling embedded into the code. [ENHANCEMENT] This patch would enhance the readability by introducing two helpers: - memset_extent_buffer() To handle the byte aligned range, thus all the cross-page handling is done there. - extent_buffer_get_byte() This for the first and the last byte operations, which only need to grab one byte, thus no need for any cross-page handling. So we can split both extent_buffer_bitmap_set() and extent_buffer_bitmap_clear() into 3 parts: - Handle the first byte If the range fits inside the first byte, we can exit early. - Handle the byte aligned part This is the part which can have cross-page operations, and it would be handled by memset_extent_buffer(). - Handle the last byte This refactoring does not only make the code a little easier to read, but also makes later folio/page switch much easier, as the switch only needs to be done inside memset_extent_buffer() and extent_buffer_get_byte(). Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:16 +02:00
Qu Wenruo	5864f1da6b	btrfs: tests: add self tests for extent buffer memory operations The new self tests would populate a memory range with random bytes, then copy it to the extent buffer, so that we can verify if the extent buffer memory operation and memmove()/memcopy() are resulting the same contents. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:16 +02:00
Qu Wenruo	257deed2a9	btrfs: tests: enhance extent buffer bitmap tests Enhance extent bitmap tests for the following aspects: - Remove unnecessary @len from __test_eb_bitmaps() We can fetch the length from extent buffer - Explicitly distinguish bit and byte length Now every start/len inside bitmap tests would have either "byte_" or "bit_" prefix to make it more explicit. - Better error reporting If we have mismatch bits, the error report would dump the following contents: * start bytenr * bit number * the full byte from bitmap * the full byte from the extent This is to save developers time so obvious problem can be found immediately - Extract bitmap set/clear and check operation into two helpers This is to save some code lines, as we will have more tests to do. - Add new tests The following tests are added, mostly for the incoming extent bitmap accessor refactoring: * Set bits inside the same byte * Clear bits inside the same byte * Cross byte boundary set * Cross byte boundary clear * Cross multi-byte boundary set * Cross multi-byte boundary clear Those new tests have already saved my backend for the incoming extent buffer bitmap refactoring. Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:16 +02:00
Josef Bacik	b9d97cff25	btrfs: move comments to btrfs_loop_type definition Some of these loop types aren't described, and they should be with the definitions to make it easier to tell what each of them do. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:16 +02:00
Anand Jain	7f9879eb60	btrfs: print name and pid when device scanning processes race There is a race between systemd and mount, as both of them try to register the device in the kernel. When systemd loses the race, it prints the following message: BTRFS error: device /dev/sdb7 belongs to fsid 1b3bacbf-14db-49c9-a3ef-547998aacc4e, and the fs is already mounted. The 'btrfs dev scan' registers one device at a time, so there is no way for the mount thread to wait in the kernel for all the devices to have registered as it won't know if all the devices are discovered. For now, improve the error log by printing the command name and process ID along with the error message. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:16 +02:00
Christoph Hellwig	256b0cf90d	btrfs: fix zoned handling in submit_uncompressed_range For zoned file systems we need to use run_delalloc_zoned to submit writeback, as we need to write out partial allocations when running into zone active limits. submit_uncompressed_range currently always calls cow_file_range to allocate blocks and thus misses the active zone limits handling. Fix this by passing the pages_dirty argument to run_delalloc_zoned and always using it from submit_uncompressed_range as it does the right thing for zoned and non-zoned file systems. To account for the fact that run_delalloc_zoned is now also used for non-zoned file systems rename it to run_delalloc_cow, and add comment describing it. Fixes: `42c0110009` ("btrfs: zoned: introduce dedicated data write path for zoned filesystems") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:16 +02:00
Christoph Hellwig	778b878543	btrfs: don't redirty locked_page in run_delalloc_zoned extent_write_locked_range currently expects that either all or no pages are dirty when it is called. Bur run_delalloc_zoned is called directly in the writepages path, and has the dirty bit cleared only for locked_page and which the extent_write_cache_pages currently operates. It currently works around this by redirtying locked_page, but that is a bit inefficient and cumbersome. Pass a locked_page argument to run_delalloc_zoned so that clearing the dirty bit can be skipped on just that page. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:16 +02:00
Christoph Hellwig	6e144bf16b	btrfs: refactor the zoned device handling in cow_file_range Handling of the done_offset to cow_file_range is a bit confusing, as it is not updated at all when the function succeeds, and the -EAGAIN status is used bother for the case where we need to wait for a zone finish and the one where the allocation was partially successful. Change the calling convention so that done_offset is always updated, and 0 is returned if some allocation was successful (partial allocation can still only happen for zoned devices), and waiting for a zone finish is done internally in cow_file_range instead of the caller. Also write a comment explaining the logic. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:16 +02:00
Christoph Hellwig	44962ca37c	btrfs: don't redirty pages in compress_file_range compress_file_range needs to clear the dirty bit before handing off work to the compression worker threads to prevent processes coming in through mmap and changing the file contents while the compression is accessing the data (See commit `4adaa61102` ("Btrfs: fix race between mmap writes and compression"). But when compress_file_range decides to not compress the data, it falls back to submit_uncompressed_range which uses extent_write_locked_range to write the uncompressed data. extent_write_locked_range currently expects all pages to be marked dirty so that it can clear the dirty bit itself, and thus compress_file_range has to redirty the page range. Redirtying the page range is rather inefficient and also pointless, so instead pass a pages_dirty parameter to extent_write_locked_range and skip the redirty game entirely. Note that compress_file_range was even redirtying the locked_page twice given that extent_range_clear_dirty_for_io already redirties all pages in the range, which must include locked_page if there is one. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:15 +02:00
Christoph Hellwig	f778b6b8e0	btrfs: share the code to free the page array in compress_file_range compress_file_range has two code blocks to free the page array for the compressed data. Share the code using a goto label. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:15 +02:00
Christoph Hellwig	184aa1ffa5	btrfs: use a separate label for the incompressible case in compress_file_range compress_file_range can fail to compress either because of resource or alignment constraints or because the data is incompressible. In the latter case the inode is marked so that compression isn't tried again. Currently that check is based on the condition that the pages array has been allocated which is rather cryptic. Use a separate label to clearly distinguish this case. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:15 +02:00
Christoph Hellwig	6a7167bf9c	btrfs: further simplify the compress or not logic in compress_file_range Currently the logic whether to compress or not in compress_file_range is a bit convoluted because it tries to share code for creating inline extents for the compressible [1] path and the bail to uncompressed path. But the latter isn't needed at all, because cow_file_range as called by submit_uncompressed_range will already create inline extents as needed, so there is no need to have special handling for it if we can live with the fact that it will be called a bit later in the ->ordered_func of the workqueue instead of right now. [1] there is undocumented logic that creates an uncompressed inline extent outside of the shall not compress logic if total_in is too small. This logic isn't explained in comments or any commit log I could find, so I've preserved it. Documentation explaining it would be appreciated if anyone understands this code. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:15 +02:00
Christoph Hellwig	e94e54e89b	btrfs: streamline compress_file_range Reorder compress_file_range so that the main compression flow happens straight line and not in branches. To do this ensure that pages is always zeroed before a page allocation happens, which allows the cleanup_and_bail_uncompressed label to clean up the page allocations as needed. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:15 +02:00
Christoph Hellwig	00d31d1766	btrfs: merge submit_compressed_extents and async_cow_submit The code in submit_compressed_extents just loops over the async_extents, and doesn't need to be conditional on an inode being present, as there won't be any async_extent in the list if we created and inline extent. Merge the two functions to simplify the logic. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:15 +02:00
Christoph Hellwig	c15d8cf295	btrfs: merge async_cow_start and compress_file_range There is no good reason to have the simple async_cow_start wrapper, merge the argument conversion into the main compress_file_range function. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:15 +02:00
Christoph Hellwig	3134508e47	btrfs: don't clear async_chunk->inode in async_cow_start Now that the ->inode check isn't needed in submit_compressed_extents any more, there is no reason to clear the field early. Always keep the inode around until the work item is finished and remove the special casing, and the counting of compressed extents in compress_file_range. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:15 +02:00
Christoph Hellwig	6758346808	btrfs: clean up the check for uncompressed ranges in submit_one_async_extent Instead of checking for a NULL !pages and explaining this with a cryptic comment, just check the compression type for BTRFS_COMPRESS_NONE to make the check self-explanatory. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:15 +02:00
Christoph Hellwig	c56cbe9059	btrfs: reduce the number of arguments to btrfs_run_delalloc_range Instead of a separate page_started argument that tells the callers that btrfs_run_delalloc_range already started writeback by itself, overload the return value with a positive 1 in additio to 0 and a negative error code to indicate that is has already started writeback, and remove the nr_written argument as that caller can calculate it directly based on the range, and in fact already does so for the case where writeback wasn't started yet. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:14 +02:00
Christoph Hellwig	2c73162d64	btrfs: improve the delalloc_to_write calculation in writepage_delalloc Currently writepage_delalloc adds to delalloc_to_write in every loop operation. That is not only more work than doing it once after the loop, but can also over-increment the counter due to rounding errors when a new loop iteration starts with an offset into a page. Add a new page_start variable instead of recaculation that value over and over, move the delalloc_to_write calculation out of the loop, use the DIV_ROUND_UP helper instead of open coding it and remove the pointless found local variable. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:14 +02:00
Christoph Hellwig	0835d1e66e	btrfs: remove the return value from extent_write_locked_range The return value from extent_write_locked_range is ignored, and that's fine because the error reporting happens through the mapping and ordered_extent. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:14 +02:00
Christoph Hellwig	ff20d6a4a9	btrfs: remove the return value from submit_uncompressed_range The return value from submit_uncompressed_range is ignored, and that's fine because the error reporting happens through the mapping and ordered_extent. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:14 +02:00
Christoph Hellwig	84f262f009	btrfs: reduce debug spam from submit_compressed_extents Move the printk that is supposed to help to debug failures in submit_one_async_extent into submit_one_async_extent and make it coniditonal on actually having an error condition instead of spamming the log unconditionally. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:14 +02:00
Christoph Hellwig	9783e4deed	btrfs: remove end_extent_writepage end_extent_writepage is a small helper that combines a call to btrfs_mark_ordered_io_finished with conditional error-only calls to btrfs_page_clear_uptodate and mapping_set_error with a somewhat unfortunate calling convention that passes and inclusive end instead of the len expected by the underlying functions. Remove end_extent_writepage and open code it in the 4 callers. Out of those two already are error-only and thus don't need the extra conditional, and one already has the mapping_set_error, so a duplicate call can be avoided. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:14 +02:00
Christoph Hellwig	6648cedd86	btrfs: remove btrfs_writepage_endio_finish_ordered btrfs_writepage_endio_finish_ordered is a small wrapper around btrfs_mark_ordered_io_finished that just changs the argument passing slightly, and adds a tracepoint. Move the tracpoint to btrfs_mark_ordered_io_finished, which means it now also covers the error handling in btrfs_cleanup_ordered_extent and switch all callers to just call btrfs_mark_ordered_io_finished directly. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:14 +02:00
Christoph Hellwig	ef4e88e6a5	btrfs: split page locking out of __process_pages_contig There is a lot of complexity in __process_pages_contig to deal with the PAGE_LOCK case that can return an error unlike all the other actions. Open code the page iteration for page locking in lock_delalloc_pages and remove all the now unused code from __process_pages_contig. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:14 +02:00
Christoph Hellwig	53ffb30a78	btrfs: don't create inline extents in fallback_to_cow For NOCOW files, run_delalloc_nocow can still fall back to COW allocations when required and calls to fallback_to_cow helper for that. For such an allocation we can have multiple ordered_extents for existing extents that NOCOW overwrites and new allocations that fallback_to_cow creates. If one of the new extents is an inline extent, the writepages could would have to avoid normal page writeback for them as indicated by the page_started return argument, which run_delalloc_nocow can't return. Fix this by never creating inline extents from fallback_to_cow. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:14 +02:00
Christoph Hellwig	ba9145add5	btrfs: pass a flags argument to cow_file_range The int used as bool unlock is not a very good way to describe the behavior, and the next patch will have to add another behavior modifier. We'll do that by two bool parameters instead of adding bit flags. Now specifies that the pages should always be kept locked. This is the inverse of the old unlock argument. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ switch flags to bool ] Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:14 +02:00
Boris Burkov	a649684967	btrfs: fix start transaction qgroup rsv double free btrfs_start_transaction reserves metadata space of the PERTRANS type before it identifies a transaction to start/join. This allows flushing when reserving that space without a deadlock. However, it results in a race which temporarily breaks qgroup rsv accounting. T1 T2 start_transaction do_stuff start_transaction qgroup_reserve_meta_pertrans commit_transaction qgroup_free_meta_all_pertrans hit an error starting txn goto reserve_fail qgroup_free_meta_pertrans (already freed!) The basic issue is that there is nothing preventing another commit from committing before start_transaction finishes (in fact sometimes we intentionally wait for it) so any error path that frees the reserve is at risk of this race. While this exact space was getting freed anyway, and it's not a huge deal to double free it (just a warning, the free code catches this), it can result in incorrectly freeing some other pertrans reservation in this same reservation, which could then lead to spuriously granting reservations we might not have the space for. Therefore, I do believe it is worth fixing. To fix it, use the existing prealloc->pertrans conversion mechanism. When we first reserve the space, we reserve prealloc space and only when we are sure we have a transaction do we convert it to pertrans. This way any racing commits do not blow away our reservation, but we still get a pertrans reservation that is freed when _this_ transaction gets committed. This issue can be reproduced by running generic/269 with either qgroups or squotas enabled via mkfs on the scratch device. Reviewed-by: Josef Bacik <josef@toxicpanda.com> CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:13 +02:00
Boris Burkov	e28b02118b	btrfs: free qgroup rsv on io failure If we do a write whose bio suffers an error, we will never reclaim the qgroup reserved space for it. We allocate the space in the write_iter codepath, then release the reservation as we allocate the ordered extent, but we only create a delayed ref if the ordered extent finishes. If it has an error, we simply leak the rsv. This is apparent in running any error injecting (dmerror) fstests like btrfs/146 or btrfs/160. Such tests fail due to dmesg on umount complaining about the leaked qgroup data space. When we clean up other aspects of space on failed ordered_extents, also free the qgroup rsv. Reviewed-by: Josef Bacik <josef@toxicpanda.com> CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:13 +02:00
Goldwyn Rodrigues	75d305c55b	btrfs: remove duplicate free_async_extent_pages() on reservation error While performing compressed writes, if the extent reservation fails, the async extent pages are first freed in the error check for return value ret, and then again at out_free label. Remove the first call to free_async_extent_pages(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:13 +02:00
Qu Wenruo	52ea5bfbfa	btrfs: move eb subpage preallocation out of the loop Initially we preallocate btrfs_subpage structure in the main loop of alloc_extent_buffer(). But later commit `fbca46eb46` ("btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine") has made sure we only go subpage routine if our nodesize is smaller than PAGE_SIZE. This means for that case, we only need to allocate the subpage structure once anyway. So this patch would make the preallocation out of the main loop. This would slightly reduce the workload when we hold the page lock, and make code a little easier to read. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:13 +02:00
Christoph Hellwig	b2cc440058	btrfs: simplify the no-bioc fast path condition in btrfs_map_block nr_alloc_stripes can't be one if we are writing to a replacement device, as it is incremented for that case right above. Remove the duplicate checks. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:13 +02:00
Qu Wenruo	17353a3447	btrfs: scrub: remove unused btrfs_path in scrub_simple_mirror() The @path in scrub_simple_mirror() is no longer utilized after commit `e02ee89baa` ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"). Before that commit, we call find_first_extent_item() directly, which needs a path and that path can be reused. But after that switch commit, the extent search is done inside queue_scrub_stripe(), which will no longer accept a path from outside. So the @path variable can be safely removed. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ remove the stale comment ] Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:13 +02:00
Minjie Du	7b365a2a3d	btrfs: use folio_next_index() helper in extent_write_cache_pages Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using the existing helper folio_next_index(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Minjie Du <duminjie@vivo.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:13 +02:00
David Sterba	98efb4eb31	btrfs: use helper sizeof_field in struct accessors There's a helper for obtaining size of a struct member, we can use it instead of open coding the pointer magic. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:13 +02:00
Qu Wenruo	16c3a47648	btrfs: deprecate integrity checker feature The integrity checker feature needs to be enabled at compile time (BTRFS_FS_CHECK_INTEGRITY) and then enabled by mount options check_int*. Although it provides some unique features which can not be provided by any other sanity checks like tree-checker, it does not only have high CPU and memory overhead, but is also a maintenance burden. For example, it's the only caller of btrfs_map_block() with @need_raid_map = 0. Considering most btrfs developers are not even testing this feature, I'm here to propose deprecation of this feature. For now only warning messages will be printed, the feature itself would still work. Removal time has been set to 6.7 release. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:13 +02:00
Filipe Manana	98b5a8fd2a	btrfs: move btrfs_free_excluded_extents() into block-group.c The function btrfs_free_excluded_extents() is only used by block-group.c, so move it into block-group.c and make it static. Also removed unnecessary variables that are used only once. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:13 +02:00
Filipe Manana	b1c8f527fe	btrfs: open code trivial btrfs_add_excluded_extent() The code for btrfs_add_excluded_extent() is trivial, it's just a set_extent_bit() call. However it's defined in extent-tree.c but it is only used (twice) in block-group.c. So open code it in block-group.c, reducing the need to export a trivial function. Also since the only caller btrfs_add_excluded_extent() is prepared to deal with errors, stop ignoring errors from the set_extent_bit() call. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:13 +02:00
Filipe Manana	e5860f8207	btrfs: make find_first_extent_bit() return a boolean Currently find_first_extent_bit() returns a 0 if it found a range in the given io tree and 1 if it didn't find any. There's no need to return any errors, so make the return value a boolean and invert the logic to make more sense: return true if it found a range and false if it didn't find any range. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:12 +02:00
Filipe Manana	46d81ebd4a	btrfs: make btrfs_destroy_pinned_extent() return void Currently btrfs_destroy_pinned_extent() is always returning 0 no matter what and its caller ignores its return value (as well everything up in the call chain). This is because this is called in the transaction abort path, where we can't even deal with any errors since we are in a critical situation already and cleanup of resources is done in a best effort fashion. So make btrfs_destroy_pinned_extent() return void. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:12 +02:00
Filipe Manana	aec5716c3e	btrfs: make btrfs_destroy_marked_extents() return void Currently btrfs_destroy_marked_extents() is returning the value of the last call to find_first_extent_bit(), which returns a value of 1 meaning no more ranges found the dirty pages io tree. This value is useless to the single caller of btrfs_destroy_marked_extents(), which ignores any return value from btrfs_destroy_marked_extents(). This is because it's only used in the transaction abort path, where we can't even deal with any errors since we are in a critical situation already and cleanup of resources is done in a best effort fashion. So make btrfs_destroy_marked_extents() return void. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:12 +02:00
Filipe Manana	3b9f0995d8	btrfs: rename add_new_free_space() to btrfs_add_new_free_space() Since add_new_free_space() is exported, used outside block-group.c, rename it to include the 'btrfs_' prefix. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:12 +02:00
Filipe Manana	28f6089490	btrfs: update documentation for add_new_free_space() The documentation for add_new_free_space() is stale and no longer correct: 1) It's no longer used only when caching a block group. It's also called when creating a block group (btrfs_make_block_group()), when reading a block group at mount time (read_one_block_group()) and when reading the free space tree for a block group (typically the first time we attempt to allocate from the block group); 2) It has nothing to do with pinned extents. It only deals with the excluded extents io tree, which is used to track the locations of super blocks in order to make sure we never add the location of a super block to the free space cache of a block group. So update the documention and also add a description of the arguments and return values. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:12 +02:00

1 2 3 4 5 ...

12338 commits