iomap: don't skip reading in !uptodate folios when unsharing a range

Prior to commit a01b8f2252, we would always read in the contents of a
!uptodate folio prior to writing userspace data into the folio,
allocated a folio state object, etc.  Ritesh introduced an optimization
that skips all of that if the write would cover the entire folio.

Unfortunately, the optimization misses the unshare case, where we always
have to read in the folio contents since there isn't a data buffer
supplied by userspace.  This can result in stale kernel memory exposure
if userspace issues a FALLOC_FL_UNSHARE_RANGE call on part of a shared
file that isn't already cached.

This was caught by observing fstests regressions in the "unshare around"
mechanism that is used for unaligned writes to a reflinked realtime
volume when the realtime extent size is larger than 1FSB, though I think
it applies to any shared file.

Cc: ritesh.list@gmail.com, willy@infradead.org
Fixes: a01b8f2252 ("iomap: Allocate ifs in ->write_begin() early")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
This commit is contained in:
Darrick J. Wong 2023-09-18 15:57:39 -07:00
parent 4aa8cdd5e5
commit 35d30c9cf1
1 changed files with 4 additions and 2 deletions

View File

@ -640,11 +640,13 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
size_t poff, plen;
/*
* If the write completely overlaps the current folio, then
* If the write or zeroing completely overlaps the current folio, then
* entire folio will be dirtied so there is no need for
* per-block state tracking structures to be attached to this folio.
* For the unshare case, we must read in the ondisk contents because we
* are not changing pagecache contents.
*/
if (pos <= folio_pos(folio) &&
if (!(iter->flags & IOMAP_UNSHARE) && pos <= folio_pos(folio) &&
pos + len >= folio_pos(folio) + folio_size(folio))
return 0;