bcachefs: fix truncate overflow if folio is beyond EOF

generic/083 occasionally reproduces a panic caused by an overflow when accessing the bch_folio_sector array of the folio being processed by __bch2_truncate_folio(). The immediate cause of the overflow is that the folio offset is beyond i_size, and therefore the sector index calculation underflows on subtraction of the folio offset. One cause of this is mainly observed on nocow mounts. When nocow is enabled, fallocate performs physical block allocation (as opposed to block reservation in cow mode), which range_has_data() then interprets as valid data that requires partial zeroing on truncate. Therefore, if a post-eof zero range request lands across post-eof preallocated blocks, __bch2_truncate_folio() may actually create a post-eof folio in order to perform zeroing. To avoid this problem, update range_has_data() to filter out unwritten blocks from folio creation and partial zeroing. Even though we should never create folios beyond EOF like this, the mere existence of such folios is not necessarily a fatal error. Fix up the truncate code to warn about this condition and not overflow the sector array and possibly crash the system. The addition of this warning without the corresponding unwritten extent fix has shown that various other fstests are able to reproduce this problem fairly frequently, but often in ways that doesn't necessarily result in a kernel panic or a change in user observable behavior, and therefore the problem goes undetected. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-02 15:18:19 +00:00 · 2023-03-29 09:49:04 -04:00 · 2023-03-29 09:49:04 -04:00 · 4ad6aa46e1
commit 4ad6aa46e1
parent 550a6a496d
1 changed files with 13 additions and 4 deletions
--- a/fs/bcachefs/fs-io.c
+++ b/fs/bcachefs/fs-io.c
@ -2783,7 +2783,7 @@ static inline int range_has_data(struct bch_fs *c, u32 subvol,
 		goto err;

 	for_each_btree_key_upto_norestart(&trans, iter, BTREE_ID_extents, start, end, 0, k, ret)
-		if (bkey_extent_is_data(k.k)) {
+		if (bkey_extent_is_data(k.k) && !bkey_extent_is_unwritten(k)) {
 			ret = 1;
 			break;
 		}
@ -2809,6 +2809,7 @@ static int __bch2_truncate_folio(struct bch_inode_info *inode,
 	struct folio *folio;
 	s64 i_sectors_delta = 0;
 	int ret = 0;
+	loff_t end_pos;

 	folio = filemap_lock_folio(mapping, index);
 	if (!folio) {
@ -2875,10 +2876,18 @@ static int __bch2_truncate_folio(struct bch_inode_info *inode,
 	/*
 	 * Caller needs to know whether this folio will be written out by
 	 * writeback - doing an i_size update if necessary - or whether it will
-	 * be responsible for the i_size update:
+	 * be responsible for the i_size update.
+	 *
+	 * Note that we shouldn't ever see a folio beyond EOF, but check and
+	 * warn if so. This has been observed by failure to clean up folios
+	 * after a short write and there's still a chance reclaim will fix
+	 * things up.
 	 */
-	ret = s->s[(min(inode->v.i_size, folio_end_pos(folio)) -
-		    folio_pos(folio) - 1) >> 9].state >= SECTOR_dirty;
+	WARN_ON_ONCE(folio_pos(folio) >= inode->v.i_size);
+	end_pos = folio_end_pos(folio);
+	if (inode->v.i_size > folio_pos(folio))
+		end_pos = min(inode->v.i_size, end_pos);
+	ret = s->s[(end_pos - folio_pos(folio) - 1) >> 9].state >= SECTOR_dirty;

 	folio_zero_segment(folio, start_offset, end_offset);