linux-stable/fs
Filipe Manana 939b656bc8 btrfs: fix corruption after buffer fault in during direct IO append write
During an append (O_APPEND write flag) direct IO write if the input buffer
was not previously faulted in, we can corrupt the file in a way that the
final size is unexpected and it includes an unexpected hole.

The problem happens like this:

1) We have an empty file, with size 0, for example;

2) We do an O_APPEND direct IO with a length of 4096 bytes and the input
   buffer is not currently faulted in;

3) We enter btrfs_direct_write(), lock the inode and call
   generic_write_checks(), which calls generic_write_checks_count(), and
   that function sets the iocb position to 0 with the following code:

	if (iocb->ki_flags & IOCB_APPEND)
		iocb->ki_pos = i_size_read(inode);

4) We call btrfs_dio_write() and enter into iomap, which will end up
   calling btrfs_dio_iomap_begin() and that calls
   btrfs_get_blocks_direct_write(), where we update the i_size of the
   inode to 4096 bytes;

5) After btrfs_dio_iomap_begin() returns, iomap will attempt to access
   the page of the write input buffer (at iomap_dio_bio_iter(), with a
   call to bio_iov_iter_get_pages()) and fail with -EFAULT, which gets
   returned to btrfs at btrfs_direct_write() via btrfs_dio_write();

6) At btrfs_direct_write() we get the -EFAULT error, unlock the inode,
   fault in the write buffer and then goto to the label 'relock';

7) We lock again the inode, do all the necessary checks again and call
   again generic_write_checks(), which calls generic_write_checks_count()
   again, and there we set the iocb's position to 4K, which is the current
   i_size of the inode, with the following code pointed above:

        if (iocb->ki_flags & IOCB_APPEND)
                iocb->ki_pos = i_size_read(inode);

8) Then we go again to btrfs_dio_write() and enter iomap and the write
   succeeds, but it wrote to the file range [4K, 8K), leaving a hole in
   the [0, 4K) range and an i_size of 8K, which goes against the
   expectations of having the data written to the range [0, 4K) and get an
   i_size of 4K.

Fix this by not unlocking the inode before faulting in the input buffer,
in case we get -EFAULT or an incomplete write, and not jumping to the
'relock' label after faulting in the buffer - instead jump to a location
immediately before calling iomap, skipping all the write checks and
relocking. This solves this problem and it's fine even in case the input
buffer is memory mapped to the same file range, since only holding the
range locked in the inode's io tree can cause a deadlock, it's safe to
keep the inode lock (VFS lock), as was fixed and described in commit
51bd9563b6 ("btrfs: fix deadlock due to page faults during direct IO
reads and writes").

A sample reproducer provided by a reporter is the following:

   $ cat test.c
   #ifndef _GNU_SOURCE
   #define _GNU_SOURCE
   #endif

   #include <fcntl.h>
   #include <stdio.h>
   #include <sys/mman.h>
   #include <sys/stat.h>
   #include <unistd.h>

   int main(int argc, char *argv[])
   {
       if (argc < 2) {
           fprintf(stderr, "Usage: %s <test file>\n", argv[0]);
           return 1;
       }

       int fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT |
                     O_APPEND, 0644);
       if (fd < 0) {
           perror("creating test file");
           return 1;
       }

       char *buf = mmap(NULL, 4096, PROT_READ,
                        MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
       ssize_t ret = write(fd, buf, 4096);
       if (ret < 0) {
           perror("pwritev2");
           return 1;
       }

       struct stat stbuf;
       ret = fstat(fd, &stbuf);
       if (ret < 0) {
           perror("stat");
           return 1;
       }

       printf("size: %llu\n", (unsigned long long)stbuf.st_size);
       return stbuf.st_size == 4096 ? 0 : 1;
   }

A test case for fstests will be sent soon.

Reported-by: Hanna Czenczek <hreitz@redhat.com>
Link: https://lore.kernel.org/linux-btrfs/0b841d46-12fe-4e64-9abb-871d8d0de271@redhat.com/
Fixes: 8184620ae2 ("btrfs: fix lost file sync on direct IO write with nowait and dsync iocb")
CC: stable@vger.kernel.org # 6.1+
Tested-by: Hanna Czenczek <hreitz@redhat.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-29 19:21:22 +02:00
..
9p Two fixes headed to stable trees: 2024-05-29 09:25:15 -07:00
adfs mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
affs affs: remove SLAB_MEM_SPREAD flag usage 2024-02-26 11:36:28 +01:00
afs afs: Convert comma to semicolon 2024-07-02 21:23:00 +02:00
autofs
bcachefs bcachefs: Fix kmalloc bug in __snapshot_t_mut 2024-06-25 20:51:14 -04:00
befs mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
bfs mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
btrfs btrfs: fix corruption after buffer fault in during direct IO append write 2024-07-29 19:21:22 +02:00
cachefiles cachefiles: remove unneeded include of <linux/fdtable.h> 2024-06-03 15:39:17 +02:00
ceph We have a series from Xiubo that adds support for additional access 2024-05-25 14:23:58 -07:00
coda mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
configfs
cramfs use ->bd_mapping instead of ->bd_inode->i_mapping 2024-05-03 02:36:51 -04:00
crypto The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
debugfs debugfs: continue to ignore unknown mount options 2024-05-28 14:32:42 +02:00
devpts
dlm dlm: return -ENOMEM if ls_recover_buf fails 2024-04-23 16:08:55 -05:00
ecryptfs hardening updates for 6.10-rc1 2024-05-13 14:14:05 -07:00
efivarfs efi: Clear up misconceptions about a maximum variable name size 2024-04-13 10:33:02 +02:00
efs efs: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:33 +01:00
erofs erofs: ensure m_llen is reset to 0 if metadata is invalid 2024-06-30 10:54:28 +08:00
exfat exfat: zero the reserved fields of file and stream extension dentries 2024-04-25 21:59:59 +09:00
exportfs
ext2 ext2: Remove LEGACY_DIRECT_IO dependency 2024-05-03 11:50:28 +02:00
ext4 bd_inode series 2024-05-21 09:51:42 -07:00
f2fs f2fs update for 6.10-rc1 2024-05-20 13:23:43 -07:00
fat fs: add kernel-doc comments to fat_parse_long() 2024-04-25 21:07:02 -07:00
freevxfs freevxfs: Convert freevxfs to the new mount API. 2024-03-26 09:04:53 +01:00
fuse virtio: features, fixes, cleanups 2024-05-23 12:04:36 -07:00
gfs2 bd_inode series 2024-05-21 09:51:42 -07:00
hfs
hfsplus hfsplus: refactor copy_name to not use strncpy 2024-04-24 16:55:28 -07:00
hostfs
hpfs mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
hugetlbfs The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
iomap iomap: Fix iomap_adjust_read_range for plen calculation 2024-06-05 17:27:03 +02:00
isofs isofs: Use *-y instead of *-objs in Makefile 2024-05-09 18:09:57 +02:00
jbd2 bd_inode series 2024-05-21 09:51:42 -07:00
jffs2 This pull request contains the following changes for JFFS2: 2024-05-25 13:23:42 -07:00
jfs jfs: xattr: fix buffer overflow for invalid xattr 2024-06-04 18:09:03 +02:00
kernfs kernfs: mount: Remove unnecessary ‘NULL’ values from knparent 2024-05-04 19:02:39 +02:00
lockd lockd: host: Remove unnecessary statements'host = NULL;' 2024-05-06 09:07:20 -04:00
minix minix: convert minix to use the new mount api 2024-03-26 09:04:55 +01:00
netfs netfs: Fix netfs_page_mkwrite() to flush conflicting data, not wait 2024-06-26 14:19:08 +02:00
nfs nfs: drop the incorrect assertion in nfs_swap_rw() 2024-06-24 20:52:11 -07:00
nfs_common
nfsd nfsd-6.10 fixes: 2024-06-28 09:32:33 -07:00
nilfs2 nilfs2: fix incorrect inode allocation from reserved inodes 2024-07-03 12:29:25 -07:00
nls
notify Revert "fanotify: remove unneeded sub-zero check for unsigned value" 2024-05-20 12:43:58 -07:00
ntfs3 driver ntfs3 for linux 6.10 2024-05-25 14:19:01 -07:00
ocfs2 ocfs2: fix DIO failure due to insufficient transaction credits 2024-06-24 20:52:10 -07:00
omfs
openpromfs openpromfs: finish conversion to the new mount API 2024-03-26 09:04:54 +01:00
orangefs orangefs: fix out-of-bounds fsid access 2024-05-14 17:44:14 -07:00
overlayfs ovl: fix encoding fid for lower only root 2024-06-14 10:30:40 +02:00
proc /proc/pid/smaps: add mseal info for vma 2024-06-24 20:52:09 -07:00
pstore pstore/zone: Don't clear memory twice 2024-03-09 12:33:22 -08:00
qnx4 mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
qnx6 qnx6: convert qnx6 to use the new mount api 2024-03-26 09:04:53 +01:00
quota quota: fix to propagate error of mark_dquot_dirty() to caller 2024-04-12 14:52:29 +02:00
ramfs mm: switch mm->get_unmapped_area() to a flag 2024-04-25 20:56:25 -07:00
reiserfs getting rid of bogus set_blocksize() uses, switching it 2024-05-21 08:34:51 -07:00
romfs fs,block: yield devices early 2024-03-27 13:17:15 +01:00
smb cifs: Fix read-performance regression by dropping readahead expansion 2024-07-02 21:23:41 -05:00
squashfs Mainly singleton patches, documented in their respective changelogs. 2024-05-19 14:02:03 -07:00
sysfs Merge 6.9-rc5 into driver-core-next 2024-04-23 13:27:43 +02:00
sysv sysv: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:31 +01:00
tracefs eventfs: Do not use attributes for events directory 2024-05-23 09:31:50 -04:00
ubifs This pull request contains updates for UBI and UBIFS: 2024-03-21 15:09:29 -07:00
udf udf: Use a folio in udf_write_end() 2024-04-23 15:37:02 +02:00
ufs mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
unicode kbuild: use $(src) instead of $(srctree)/$(src) for source directory 2024-05-10 04:34:52 +09:00
vboxsf vboxsf: explicitly deny setlease attempts 2024-04-03 16:06:39 +02:00
verity fsverity: use register_sysctl_init() to avoid kmemleak warning 2024-05-03 08:30:58 -07:00
xfs xfs: honor init_xattrs in xfs_init_new_inode for !ATTR fs 2024-06-26 14:29:25 +05:30
zonefs zonefs: Use str_plural() to fix Coccinelle warning 2024-04-10 07:23:47 +09:00
aio.c Assorted commits that had missed the last merge window... 2024-05-21 13:11:44 -07:00
anon_inodes.c fs: Create anon_inode_getfile_fmode() 2024-04-26 10:33:05 +02:00
attr.c lsm/stable-6.9 PR 20240312 2024-03-12 20:03:34 -07:00
backing-file.c ovl: implement tmpfile 2024-05-02 20:35:57 +02:00
bad_inode.c
binfmt_elf.c Mainly singleton patches, documented in their respective changelogs. 2024-05-19 14:02:03 -07:00
binfmt_elf_fdpic.c binfmt_elf_fdpic: fix /proc/<pid>/auxv 2024-04-24 15:55:28 -07:00
binfmt_elf_test.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
buffer.c bd_inode series 2024-05-21 09:51:42 -07:00
char_dev.c
compat_binfmt_elf.c
coredump.c virtio: features, fixes, cleanups 2024-05-23 12:04:36 -07:00
d_path.c
dax.c dax: use huge_zero_folio 2024-04-25 20:56:20 -07:00
dcache.c fs: better handle deep ancestor chains in is_subdir() 2024-07-02 21:18:32 +02:00
direct-io.c fs/direct-io: remove redundant assignment to variable retval 2024-04-11 10:21:24 +02:00
drop_caches.c
eventfd.c eventfd: strictly check the count parameter of eventfd_write to avoid inputting illegal strings 2024-02-08 10:12:26 +01:00
eventpoll.c epoll: be better about file lifetimes 2024-05-05 14:00:48 -07:00
exec.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
fcntl.c fcntl: add F_DUPFD_QUERY fcntl() 2024-05-10 08:26:31 +02:00
fhandle.c fs: Annotate struct file_handle with __counted_by() and use struct_size() 2024-04-05 15:53:47 +02:00
file.c fs/file: fix the check in find_next_fd() 2024-05-30 09:11:47 +02:00
file_table.c lsm/stable-6.9 PR 20240312 2024-03-12 20:03:34 -07:00
filesystems.c
fs-writeback.c fs/writeback: remove unnecessary return in writeback_inodes_sb 2024-04-05 15:53:45 +02:00
fs_context.c
fs_parser.c __fs_parse: Correct a documentation comment 2024-02-02 13:11:50 +01:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c bcachefs updates for 6.9 2024-03-15 09:00:09 -07:00
internal.h ovl: implement tmpfile 2024-05-02 20:35:57 +02:00
ioctl.c fs/ioctl: Add a comment to keep the logic in sync with LSM policies 2024-05-13 06:58:35 +02:00
Kconfig - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames 2024-03-14 17:43:30 -07:00
Kconfig.binfmt
kernel_read_file.c
libfs.c shmem: Fix shmem_rename2() 2024-04-17 13:49:44 +02:00
locks.c filelock: Remove locks reliably when fcntl/close race is detected 2024-07-02 20:48:14 +02:00
Makefile vfs-6.9.pidfd 2024-03-11 10:21:06 -07:00
mbcache.c vfs: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:31 +01:00
mnt_idmapping.c fs/mnt_idmapping.c: Return -EINVAL when no map is written 2024-02-08 10:12:37 +01:00
mount.h
mpage.c block, fs: Restore the per-bio/request data lifetime fields 2024-02-06 14:31:05 +01:00
namei.c vfs: generate FS_CREATE before FS_OPEN when ->atomic_open used. 2024-06-18 16:26:09 +02:00
namespace.c fs: relax mount_setattr() permission checks 2024-02-07 21:16:29 +01:00
nsfs.c pidfs: remove config option 2024-03-13 12:53:53 -07:00
open.c vfs-6.10-rc7.fixes 2024-07-01 09:22:08 -07:00
pidfs.c fs/pidfs: make 'lsof' happy with our inode changes 2024-05-21 08:08:00 -07:00
pipe.c fs/pipe: Convert to lockdep_cmp_fn 2024-02-02 13:11:49 +01:00
pnode.c
pnode.h
posix_acl.c lsm/stable-6.9 PR 20240312 2024-03-12 20:03:34 -07:00
proc_namespace.c
read_write.c Assorted commits that had missed the last merge window... 2024-05-21 13:11:44 -07:00
readdir.c
remap_range.c vfs: export remap and write check helpers 2024-04-15 14:54:13 -07:00
select.c fs/select: rework stack allocation hack for clang 2024-02-20 09:23:52 +01:00
seq_file.c seq_file: Simplify __seq_puts() 2024-05-02 16:28:20 +02:00
signalfd.c signalfd: drop an obsolete comment 2024-05-24 13:34:07 +02:00
splice.c remove call_{read,write}_iter() functions 2024-04-15 16:03:25 -04:00
stack.c
stat.c statx: stx_subvol 2024-03-26 09:01:18 +01:00
statfs.c
super.c fs: don't misleadingly warn during thaw operations 2024-06-18 16:20:47 +02:00
sync.c
sysctls.c
timerfd.c timerfd: convert to ->read_iter() 2024-04-10 16:23:02 -06:00
userfaultfd.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
utimes.c
xattr.c evm: Move to LSM infrastructure 2024-02-15 23:43:47 -05:00