linux-stable/fs
Filipe Manana f833deae2a Btrfs: fix deadlock between fiemap and transaction commits
[ Upstream commit a6d155d2e3 ]

The fiemap handler locks a file range that can have unflushed delalloc,
and after locking the range, it tries to attach to a running transaction.
If the running transaction started its commit, that is, it is in state
TRANS_STATE_COMMIT_START, and either the filesystem was mounted with the
flushoncommit option or the transaction is creating a snapshot for the
subvolume that contains the file that fiemap is operating on, we end up
deadlocking. This happens because fiemap is blocked on the transaction,
waiting for it to complete, and the transaction is waiting for the flushed
dealloc to complete, which requires locking the file range that the fiemap
task already locked. The following stack traces serve as an example of
when this deadlock happens:

  (...)
  [404571.515510] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
  [404571.515956] Call Trace:
  [404571.516360]  ? __schedule+0x3ae/0x7b0
  [404571.516730]  schedule+0x3a/0xb0
  [404571.517104]  lock_extent_bits+0x1ec/0x2a0 [btrfs]
  [404571.517465]  ? remove_wait_queue+0x60/0x60
  [404571.517832]  btrfs_finish_ordered_io+0x292/0x800 [btrfs]
  [404571.518202]  normal_work_helper+0xea/0x530 [btrfs]
  [404571.518566]  process_one_work+0x21e/0x5c0
  [404571.518990]  worker_thread+0x4f/0x3b0
  [404571.519413]  ? process_one_work+0x5c0/0x5c0
  [404571.519829]  kthread+0x103/0x140
  [404571.520191]  ? kthread_create_worker_on_cpu+0x70/0x70
  [404571.520565]  ret_from_fork+0x3a/0x50
  [404571.520915] kworker/u8:6    D    0 31651      2 0x80004000
  [404571.521290] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
  (...)
  [404571.537000] fsstress        D    0 13117  13115 0x00004000
  [404571.537263] Call Trace:
  [404571.537524]  ? __schedule+0x3ae/0x7b0
  [404571.537788]  schedule+0x3a/0xb0
  [404571.538066]  wait_current_trans+0xc8/0x100 [btrfs]
  [404571.538349]  ? remove_wait_queue+0x60/0x60
  [404571.538680]  start_transaction+0x33c/0x500 [btrfs]
  [404571.539076]  btrfs_check_shared+0xa3/0x1f0 [btrfs]
  [404571.539513]  ? extent_fiemap+0x2ce/0x650 [btrfs]
  [404571.539866]  extent_fiemap+0x2ce/0x650 [btrfs]
  [404571.540170]  do_vfs_ioctl+0x526/0x6f0
  [404571.540436]  ksys_ioctl+0x70/0x80
  [404571.540734]  __x64_sys_ioctl+0x16/0x20
  [404571.540997]  do_syscall_64+0x60/0x1d0
  [404571.541279]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  (...)
  [404571.543729] btrfs           D    0 14210  14208 0x00004000
  [404571.544023] Call Trace:
  [404571.544275]  ? __schedule+0x3ae/0x7b0
  [404571.544526]  ? wait_for_completion+0x112/0x1a0
  [404571.544795]  schedule+0x3a/0xb0
  [404571.545064]  schedule_timeout+0x1ff/0x390
  [404571.545351]  ? lock_acquire+0xa6/0x190
  [404571.545638]  ? wait_for_completion+0x49/0x1a0
  [404571.545890]  ? wait_for_completion+0x112/0x1a0
  [404571.546228]  wait_for_completion+0x131/0x1a0
  [404571.546503]  ? wake_up_q+0x70/0x70
  [404571.546775]  btrfs_wait_ordered_extents+0x27c/0x400 [btrfs]
  [404571.547159]  btrfs_commit_transaction+0x3b0/0xae0 [btrfs]
  [404571.547449]  ? btrfs_mksubvol+0x4a4/0x640 [btrfs]
  [404571.547703]  ? remove_wait_queue+0x60/0x60
  [404571.547969]  btrfs_mksubvol+0x605/0x640 [btrfs]
  [404571.548226]  ? __sb_start_write+0xd4/0x1c0
  [404571.548512]  ? mnt_want_write_file+0x24/0x50
  [404571.548789]  btrfs_ioctl_snap_create_transid+0x169/0x1a0 [btrfs]
  [404571.549048]  btrfs_ioctl_snap_create_v2+0x11d/0x170 [btrfs]
  [404571.549307]  btrfs_ioctl+0x133f/0x3150 [btrfs]
  [404571.549549]  ? mem_cgroup_charge_statistics+0x4c/0xd0
  [404571.549792]  ? mem_cgroup_commit_charge+0x84/0x4b0
  [404571.550064]  ? __handle_mm_fault+0xe3e/0x11f0
  [404571.550306]  ? do_raw_spin_unlock+0x49/0xc0
  [404571.550608]  ? _raw_spin_unlock+0x24/0x30
  [404571.550976]  ? __handle_mm_fault+0xedf/0x11f0
  [404571.551319]  ? do_vfs_ioctl+0xa2/0x6f0
  [404571.551659]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [404571.552087]  do_vfs_ioctl+0xa2/0x6f0
  [404571.552355]  ksys_ioctl+0x70/0x80
  [404571.552621]  __x64_sys_ioctl+0x16/0x20
  [404571.552864]  do_syscall_64+0x60/0x1d0
  [404571.553104]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  (...)

If we were joining the transaction instead of attaching to it, we would
not risk a deadlock because a join only blocks if the transaction is in a
state greater then or equals to TRANS_STATE_COMMIT_DOING, and the delalloc
flush performed by a transaction is done before it reaches that state,
when it is in the state TRANS_STATE_COMMIT_START. However a transaction
join is intended for use cases where we do modify the filesystem, and
fiemap only needs to peek at delayed references from the current
transaction in order to determine if extents are shared, and, besides
that, when there is no current transaction or when it blocks to wait for
a current committing transaction to complete, it creates a new transaction
without reserving any space. Such unnecessary transactions, besides doing
unnecessary IO, can cause transaction aborts (-ENOSPC) and unnecessary
rotation of the precious backup roots.

So fix this by adding a new transaction join variant, named join_nostart,
which behaves like the regular join, but it does not create a transaction
when none currently exists or after waiting for a committing transaction
to complete.

Fixes: 03628cdbc6 ("Btrfs: do not start a transaction during fiemap")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-25 10:47:54 +02:00
..
9p 9p: pass the correct prototype to read_cache_page 2019-07-31 07:27:08 +02:00
adfs fs/adfs: super: fix use-after-free bug 2019-08-06 19:06:49 +02:00
affs
afs afs: Fix uninitialised spinlock afs_volume::cb_break_lock 2019-07-21 09:03:06 +02:00
autofs autofs: fix error return in autofs_fill_super() 2019-03-13 14:02:32 -07:00
befs
bfs bfs: add sanity check at bfs_fill_super() 2018-12-01 09:37:27 +01:00
btrfs Btrfs: fix deadlock between fiemap and transaction commits 2019-08-25 10:47:54 +02:00
cachefiles fscache, cachefiles: remove redundant variable 'cache' 2018-12-17 09:24:40 +01:00
ceph ceph: return -ERANGE if virtual xattr value didn't fit in buffer 2019-08-06 19:06:50 +02:00
cifs smb3: send CAP_DFS capability during session setup 2019-08-16 10:12:52 +02:00
coda coda: add error handling for fget 2019-08-06 19:06:51 +02:00
configfs configfs: Fix use-after-free when accessing sd->s_dentry 2019-06-22 08:15:17 +02:00
cramfs Cramfs: fix abad comparison when wrap-arounds occur 2018-11-13 11:08:55 -08:00
crypto fscrypt: clean up some BUG_ON()s in block encryption/decryption 2019-07-26 09:14:02 +02:00
debugfs debugfs: fix use-after-free on symlink traversal 2019-05-08 07:21:48 +02:00
devpts fs/devpts: always delete dcache dentry-s in dput() 2019-03-23 20:09:59 +01:00
dlm dlm: check if workqueues are NULL before flushing/destroying 2019-07-31 07:27:07 +02:00
ecryptfs eCryptfs: fix a couple type promotion bugs 2019-07-26 09:14:29 +02:00
efivarfs efivars: Call guid_parse() against guid_t type of variable 2018-07-22 14:13:44 +02:00
efs
exofs fs/exofs: fix potential memory leak in mount option parsing 2018-11-27 16:13:00 +01:00
exportfs exportfs: do not read dentry after free 2018-12-17 09:24:35 +01:00
ext2 ext2: Fix underflow in ext2_max_size() 2019-03-23 20:10:03 +01:00
ext4 ext4: allow directory holes 2019-07-28 08:29:30 +02:00
f2fs f2fs: avoid out-of-range memory access 2019-07-31 07:27:07 +02:00
fat fs/fat/file.c: issue flush after the writeback of FAT 2019-06-15 11:53:59 +02:00
freevxfs
fscache fscache: fix race between enablement and dropping of object 2018-12-17 09:24:40 +01:00
fuse fuse: retrieve: cap requested size to negotiated max_write 2019-06-15 11:54:07 +02:00
gfs2 gfs2: gfs2_walk_metadata fix 2019-08-16 10:12:41 +02:00
hfs hfs: do not free node before using 2018-12-17 09:24:41 +01:00
hfsplus hfsplus: do not free node before using 2018-12-17 09:24:41 +01:00
hostfs vfs: discard ATTR_ATTR_FLAG 2018-08-17 16:20:28 -07:00
hpfs hpfs: remove unnecessary checks on the value of r when assigning error code 2018-08-25 12:42:33 -07:00
hugetlbfs hugetlb: use same fault hash key for shared and private mappings 2019-05-22 07:37:40 +02:00
isofs isofs: reject hardware sector size > 2048 bytes 2018-08-21 11:37:41 +02:00
jbd2 jbd2: introduce jbd2_inode dirty range scoping 2019-07-28 08:29:29 +02:00
jffs2 jffs2: fix use-after-free on symlink traversal 2019-05-08 07:21:48 +02:00
jfs Just one jfs patch for 4.19 2018-08-15 22:47:23 -07:00
kernfs kernfs: fix barrier usage in __kernfs_new_node() 2019-05-16 19:41:18 +02:00
lockd Revert "lockd: Show pid of lockd for remote locks" 2019-06-09 09:17:22 +02:00
minix
nfs NFSv4: Fix an Oops in nfs4_do_setattr 2019-08-16 10:12:53 +02:00
nfs_common
nfsd nfsd: Fix overflow causing non-working mounts on 1 TB machines 2019-07-10 09:53:47 +02:00
nilfs2 nilfs2: convert to SPDX license tags 2018-09-04 16:45:02 -07:00
nls
notify memcg, fsnotify: no oom-kill for remote memcg charging 2019-07-31 07:27:08 +02:00
ntfs ntfs: mft: remove VLA usage 2018-08-17 16:20:27 -07:00
ocfs2 ocfs2: fix error path kobject memory leak 2019-06-22 08:15:21 +02:00
omfs
openpromfs
orangefs orangefs: remove redundant pointer orangefs_inode 2018-08-14 12:07:14 -04:00
overlayfs ovl: fix bogus -Wmaybe-unitialized warning 2019-06-25 11:35:52 +08:00
proc /proc/<pid>/cmdline: add back the setproctitle() special case 2019-08-04 09:30:56 +02:00
pstore pstore/ram: Run without kernel crash dump region 2019-06-11 12:20:52 +02:00
qnx4
qnx6
quota quota: fix a problem about transfer quota 2019-07-14 08:11:15 +02:00
ramfs
reiserfs reiserfs: propagate errors from fill_with_dentries() properly 2018-11-27 16:12:59 +01:00
romfs
squashfs Squashfs: Compute expected length from inode size rather than block length 2018-08-02 09:34:02 -07:00
sysfs Driver core patches for 4.19-rc1 2018-08-18 11:44:53 -07:00
sysv sysv: return 'err' instead of 0 in __sysv_write_inode 2018-12-17 09:24:30 +01:00
tracefs tracefs: Annotate tracefs_ops with __ro_after_init 2018-07-31 11:32:44 -04:00
ubifs ubifs: Handle re-linking of inodes correctly while recovery 2018-12-29 13:37:55 +01:00
udf udf: Fix incorrect final NOT_ALLOCATED (hole) extent length 2019-07-14 08:11:16 +02:00
ufs ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour 2019-05-25 18:23:46 +02:00
xfs xfs: abort unaligned nowait directio early 2019-07-26 09:14:29 +02:00
aio.c Fix aio_poll() races 2019-05-02 09:58:59 +02:00
anon_inodes.c anon_inode_getfile(): switch to alloc_file_pseudo() 2018-07-12 10:04:27 -04:00
attr.c fs: Fix attr.c kernel-doc 2018-07-03 16:44:45 -04:00
bad_inode.c get rid of 'opened' argument of ->atomic_open() - part 3 2018-07-12 10:04:20 -04:00
binfmt_aout.c
binfmt_elf.c Here are the main MIPS changes for 4.19. 2018-08-13 19:24:32 -07:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c fs/binfmt_flat.c: make load_flat_shared_library() work 2019-07-03 13:14:44 +02:00
binfmt_misc.c turn filp_clone_open() into inline wrapper for dentry_open() 2018-07-10 23:29:03 -04:00
binfmt_script.c Revert "exec: load_script: don't blindly truncate shebang string" 2019-02-15 09:09:54 +01:00
block_dev.c block: fix the return errno for direct IO 2019-04-17 08:38:52 +02:00
buffer.c fs: fix guard_bio_eod to check for real EOD errors 2019-04-05 22:33:00 +02:00
char_dev.c chardev: add additional check for minor range overlap 2019-05-31 06:46:27 -07:00
compat.c
compat_binfmt_elf.c
compat_ioctl.c compat_ioctl: pppoe: fix PPPOEIOCSFWD handling 2019-08-09 17:52:34 +02:00
coredump.c
d_path.c
dax.c dax: dax_layout_busy_page() should not unmap cow pages 2019-08-16 10:12:52 +02:00
dcache.c dcache: sort the freeing-without-RCU-delay mess for good. 2019-05-25 18:23:26 +02:00
dcookies.c
direct-io.c direct-io: allow direct writes to empty inodes 2019-03-05 17:58:50 +01:00
drop_caches.c fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() 2019-03-13 14:02:32 -07:00
eventfd.c Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
eventpoll.c fs/epoll: drop ovflist branch prediction 2019-02-12 19:47:19 +01:00
exec.c sched/fair: Don't free p->numa_faults with concurrent readers 2019-08-04 09:30:56 +02:00
fcntl.c signal: Don't send signals to tasks that don't exist 2018-08-15 23:03:20 -05:00
fhandle.c
file.c fs/file.c: initialize init_files.resize_wait 2019-04-05 22:32:59 +02:00
file_table.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
filesystems.c
fs-writeback.c blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration 2019-07-26 09:14:08 +02:00
fs_pin.c
fs_struct.c
inode.c Abort file_remove_privs() for non-reg. files 2019-06-22 08:15:21 +02:00
internal.h acct_on(): don't mess with freeze protection 2019-05-31 06:46:05 -07:00
ioctl.c vfs: fix FIGETBSZ ioctl on an overlayfs file 2018-11-21 09:19:14 +01:00
iomap.c iomap: fix a use after free in iomap_dio_rw 2019-03-13 14:02:29 -07:00
Kconfig
Kconfig.binfmt kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
libfs.c
locks.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
Makefile
mbcache.c
mount.h
mpage.c mpage: mpage_readpages() should submit IO as read-ahead 2018-08-17 16:20:29 -07:00
namei.c Revert "vfs: Allow userns root to call mknod on owned filesystems." 2018-12-29 13:37:54 +01:00
namespace.c mnt: fix __detach_mounts infinite loop 2018-11-21 09:19:22 +01:00
no-block.c
nsfs.c dcache: sort the freeing-without-RCU-delay mess for good. 2019-05-25 18:23:26 +02:00
open.c access: avoid the RCU grace period for the temporary subjective credentials 2019-07-31 07:27:11 +02:00
pipe.c fs: prevent page refcount overflow in pipe_buf_get 2019-05-04 09:20:11 +02:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-05-08 07:21:51 +02:00
readdir.c
select.c Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
seq_file.c seq_file: fix problem when seeking mid-record 2019-08-25 10:47:43 +02:00
signalfd.c
splice.c fs: prevent page refcount overflow in pipe_buf_get 2019-05-04 09:20:11 +02:00
stack.c
stat.c
statfs.c kernel: add kcompat_sys_{f,}statfs64() 2018-07-12 14:49:48 +01:00
super.c Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax 2018-08-26 11:48:42 -07:00
sync.c
timerfd.c Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 20:56:23 -07:00
userfaultfd.c fs/userfaultfd.c: disable irqs for fault_pending and event locks 2019-07-10 09:53:42 +02:00
utimes.c
xattr.c sysfs: Do not return POSIX ACL xattrs via listxattr 2018-09-18 07:30:48 -04:00