linux-stable/fs
Filipe Manana 7227ff4de5 Btrfs: fix race between adding and putting tree mod seq elements and nodes
There is a race between adding and removing elements to the tree mod log
list and rbtree that can lead to use-after-free problems.

Consider the following example that explains how/why the problems happens:

1) Task A has mod log element with sequence number 200. It currently is
   the only element in the mod log list;

2) Task A calls btrfs_put_tree_mod_seq() because it no longer needs to
   access the tree mod log. When it enters the function, it initializes
   'min_seq' to (u64)-1. Then it acquires the lock 'tree_mod_seq_lock'
   before checking if there are other elements in the mod seq list.
   Since the list it empty, 'min_seq' remains set to (u64)-1. Then it
   unlocks the lock 'tree_mod_seq_lock';

3) Before task A acquires the lock 'tree_mod_log_lock', task B adds
   itself to the mod seq list through btrfs_get_tree_mod_seq() and gets a
   sequence number of 201;

4) Some other task, name it task C, modifies a btree and because there
   elements in the mod seq list, it adds a tree mod elem to the tree
   mod log rbtree. That node added to the mod log rbtree is assigned
   a sequence number of 202;

5) Task B, which is doing fiemap and resolving indirect back references,
   calls btrfs get_old_root(), with 'time_seq' == 201, which in turn
   calls tree_mod_log_search() - the search returns the mod log node
   from the rbtree with sequence number 202, created by task C;

6) Task A now acquires the lock 'tree_mod_log_lock', starts iterating
   the mod log rbtree and finds the node with sequence number 202. Since
   202 is less than the previously computed 'min_seq', (u64)-1, it
   removes the node and frees it;

7) Task B still has a pointer to the node with sequence number 202, and
   it dereferences the pointer itself and through the call to
   __tree_mod_log_rewind(), resulting in a use-after-free problem.

This issue can be triggered sporadically with the test case generic/561
from fstests, and it happens more frequently with a higher number of
duperemove processes. When it happens to me, it either freezes the VM or
it produces a trace like the following before crashing:

  [ 1245.321140] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  [ 1245.321200] CPU: 1 PID: 26997 Comm: pool Not tainted 5.5.0-rc6-btrfs-next-52 #1
  [ 1245.321235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  [ 1245.321287] RIP: 0010:rb_next+0x16/0x50
  [ 1245.321307] Code: ....
  [ 1245.321372] RSP: 0018:ffffa151c4d039b0 EFLAGS: 00010202
  [ 1245.321388] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8ae221363c80 RCX: 6b6b6b6b6b6b6b6b
  [ 1245.321409] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ae221363c80
  [ 1245.321439] RBP: ffff8ae20fcc4688 R08: 0000000000000002 R09: 0000000000000000
  [ 1245.321475] R10: ffff8ae20b120910 R11: 00000000243f8bb1 R12: 0000000000000038
  [ 1245.321506] R13: ffff8ae221363c80 R14: 000000000000075f R15: ffff8ae223f762b8
  [ 1245.321539] FS:  00007fdee1ec7700(0000) GS:ffff8ae236c80000(0000) knlGS:0000000000000000
  [ 1245.321591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1245.321614] CR2: 00007fded4030c48 CR3: 000000021da16003 CR4: 00000000003606e0
  [ 1245.321642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 1245.321668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 1245.321706] Call Trace:
  [ 1245.321798]  __tree_mod_log_rewind+0xbf/0x280 [btrfs]
  [ 1245.321841]  btrfs_search_old_slot+0x105/0xd00 [btrfs]
  [ 1245.321877]  resolve_indirect_refs+0x1eb/0xc60 [btrfs]
  [ 1245.321912]  find_parent_nodes+0x3dc/0x11b0 [btrfs]
  [ 1245.321947]  btrfs_check_shared+0x115/0x1c0 [btrfs]
  [ 1245.321980]  ? extent_fiemap+0x59d/0x6d0 [btrfs]
  [ 1245.322029]  extent_fiemap+0x59d/0x6d0 [btrfs]
  [ 1245.322066]  do_vfs_ioctl+0x45a/0x750
  [ 1245.322081]  ksys_ioctl+0x70/0x80
  [ 1245.322092]  ? trace_hardirqs_off_thunk+0x1a/0x1c
  [ 1245.322113]  __x64_sys_ioctl+0x16/0x20
  [ 1245.322126]  do_syscall_64+0x5c/0x280
  [ 1245.322139]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [ 1245.322155] RIP: 0033:0x7fdee3942dd7
  [ 1245.322177] Code: ....
  [ 1245.322258] RSP: 002b:00007fdee1ec6c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [ 1245.322294] RAX: ffffffffffffffda RBX: 00007fded40210d8 RCX: 00007fdee3942dd7
  [ 1245.322314] RDX: 00007fded40210d8 RSI: 00000000c020660b RDI: 0000000000000004
  [ 1245.322337] RBP: 0000562aa89e7510 R08: 0000000000000000 R09: 00007fdee1ec6d44
  [ 1245.322369] R10: 0000000000000073 R11: 0000000000000246 R12: 00007fdee1ec6d48
  [ 1245.322390] R13: 00007fdee1ec6d40 R14: 00007fded40210d0 R15: 00007fdee1ec6d50
  [ 1245.322423] Modules linked in: ....
  [ 1245.323443] ---[ end trace 01de1e9ec5dff3cd ]---

Fix this by ensuring that btrfs_put_tree_mod_seq() computes the minimum
sequence number and iterates the rbtree while holding the lock
'tree_mod_log_lock' in write mode. Also get rid of the 'tree_mod_seq_lock'
lock, since it is now redundant.

Fixes: bd989ba359 ("Btrfs: add tree modification log functions")
Fixes: 097b8a7c9e ("Btrfs: join tree mod log code with the code holding back delayed refs")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31 14:01:20 +01:00
..
9p 9p pull request for inclusion in 5.4 2019-09-27 15:10:34 -07:00
adfs Merge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 11:33:22 -07:00
affs affs: fix a memory leak in affs_remount 2019-11-18 14:26:43 +01:00
afs Merge branch 'dhowells' (patches from DavidH) 2020-01-14 09:56:31 -08:00
autofs Merge branch 'next.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-05 17:11:48 -08:00
befs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
bfs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
btrfs Btrfs: fix race between adding and putting tree mod seq elements and nodes 2020-01-31 14:01:20 +01:00
cachefiles treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36 2019-05-24 17:27:11 +02:00
ceph ceph: add more debug info when decoding mdsmap 2019-12-09 20:55:10 +01:00
cifs cifs: Optimize readdir on reparse points 2019-12-23 09:04:44 -06:00
coda y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
configfs configfs: calculate the depth of parent item 2019-11-06 18:36:01 +01:00
cramfs cramfs: fix usage on non-MTD device 2019-11-23 21:44:49 -05:00
crypto treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
debugfs Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-06 09:06:58 -08:00
devpts devpts_pty_kill(): don't bother with d_delete() 2019-09-03 09:30:56 -04:00
dlm dlm for 5.3 2019-07-12 17:37:53 -07:00
ecryptfs compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
efivarfs Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
efs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
erofs Changes since last update: 2019-12-11 12:25:32 -08:00
exportfs race in exportfs_decode_fh() 2019-11-11 09:21:59 -05:00
ext2 \n 2019-11-30 11:16:07 -08:00
ext4 Ext4 bug fixes (including a regression fix) for 5.5 2019-12-22 10:41:48 -08:00
f2fs compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
fat compat_ioctl: move drivers to compat_ptr_ioctl 2019-10-23 17:23:43 +02:00
freevxfs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
fscache Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs" 2019-07-10 18:43:43 -07:00
fuse fuse: fix fuse_send_readpages() in the syncronous read case 2020-01-16 11:09:36 +01:00
gfs2 GFS2 changes for this merge window: 2019-12-05 13:20:11 -08:00
hfs treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
hfsplus fs/hfsplus/xattr.c: replace strncpy with memcpy 2019-07-16 19:23:23 -07:00
hostfs This pull request contains the following changes for UML: 2019-05-12 17:52:13 -04:00
hpfs fs: compat_ioctl: move FITRIM emulation into file systems 2019-10-23 17:23:46 +02:00
hugetlbfs mm/hugetlbfs: fix for_each_hstate() loop in init_hugetlbfs_fs() 2020-01-03 10:39:08 -08:00
iomap iomap: stop using ioend after it's been freed in iomap_finish_ioend() 2019-12-05 07:41:16 -08:00
isofs y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
jbd2 This merge window saw the the following new featuers added to ext4: 2019-11-30 10:53:02 -08:00
jffs2 Revert "jffs2: Fix possible null-pointer dereferences in jffs2_add_frag_to_fragtree()" 2019-11-29 11:29:58 +01:00
jfs y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
kernfs Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-06 09:06:58 -08:00
lockd NFSv4.1: Don't rebind to the same source port when reconnecting to the server 2019-11-03 21:28:45 -05:00
minix fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
nfs reimplement path_mountpoint() with less magic 2020-01-15 01:36:06 -05:00
nfs_common treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
nfsd This is a relatively quiet cycle for nfsd, mainly various bugfixes. 2019-12-07 16:56:00 -08:00
nilfs2 fs: compat_ioctl: move FITRIM emulation into file systems 2019-10-23 17:23:46 +02:00
nls treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
notify fs: call fsnotify_sb_delete after evict_inodes 2019-12-18 00:03:01 -05:00
ntfs ntfs: remove (un)?likely() from IS_ERR() conditions 2019-09-26 10:10:44 -07:00
ocfs2 ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less 2020-01-04 13:55:09 -08:00
omfs fs: omfs: Initialize filesystem timestamp ranges 2019-08-30 08:11:25 -07:00
openpromfs Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
orangefs orangefs: posix open permission checking... 2019-12-04 08:52:55 -05:00
overlayfs overlayfs fixes for 5.5-rc2 2019-12-14 11:13:54 -08:00
proc sched/cputime, proc/stat: Fix incorrect guest nice cpustat value 2019-12-11 07:09:58 +01:00
pstore pstore/ram: Regularize prz label allocation lifetime 2020-01-08 17:05:45 -08:00
qnx4 fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
qnx6 fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
quota fs: avoid softlockups in s_inodes iterators 2019-12-18 00:03:01 -05:00
ramfs vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API 2019-09-12 21:05:34 -04:00
reiserfs reiserfs: replace open-coded atomic_dec_and_mutex_lock() 2019-11-05 12:25:22 +01:00
romfs Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-09-19 10:06:57 -07:00
squashfs Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-09-19 10:06:57 -07:00
sysfs Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
sysv fs: sysv: Initialize filesystem timestamp ranges 2019-08-30 07:27:18 -07:00
tracefs tracing: Do not create tracefs files if tracefs lockdown is in effect 2019-10-12 20:49:07 -04:00
ubifs ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gaps 2019-11-17 22:22:54 +01:00
udf fs-udf: Delete an unnecessary check before brelse() 2019-09-04 18:19:43 +02:00
ufs y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
unicode unicode: make array 'token' static const, makes object smaller 2019-09-17 11:48:24 -04:00
verity treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
xfs xfs: Make the symbol 'xfs_rtalloc_log_count' static 2019-12-20 08:07:31 -08:00
aio.c y2038: syscall implementation cleanups 2019-12-01 14:00:59 -08:00
anon_inodes.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
attr.c timestamp_truncate: Replace users of timespec64_trunc 2019-08-30 07:27:17 -07:00
bad_inode.c
binfmt_aout.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
binfmt_elf.c fs/binfmt_elf.c: extract elf_read() function 2019-12-04 19:44:13 -08:00
binfmt_elf_fdpic.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
binfmt_em86.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
binfmt_flat.c fs/binfmt_flat.c: remove set but not used variable 'inode' 2019-07-16 19:23:22 -07:00
binfmt_misc.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
binfmt_script.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
block_dev.c block: don't send uevent for empty disk when not invalidating 2019-12-02 18:49:30 -07:00
buffer.c fs: move guard_bio_eod() after bio_set_op_attrs 2020-01-09 08:16:12 -07:00
char_dev.c chardev: Avoid potential use-after-free in 'chrdev_open()' 2020-01-06 20:10:26 +01:00
compat.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
compat_binfmt_elf.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
compat_ioctl.c New code for 5.5: 2019-12-02 14:46:22 -08:00
coredump.c coredump: split pipe command whitespace before expanding template 2019-08-03 07:02:01 -07:00
d_path.c [PATCH] fix d_absolute_path() interplay with fsmount() 2019-08-30 19:31:09 -04:00
dax.c New code for 5.5: 2019-11-30 10:44:49 -08:00
dcache.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-08 11:08:28 -08:00
dcookies.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
direct-io.c fs/direct-io.c: include fs/internal.h for missing prototype 2020-01-04 13:55:09 -08:00
drop_caches.c fs: avoid softlockups in s_inodes iterators 2019-12-18 00:03:01 -05:00
eventfd.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
eventpoll.c fs/epoll: remove unnecessary wakeups of nested epoll 2019-12-04 19:44:13 -08:00
exec.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-12-03 12:20:25 -08:00
fcntl.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-08 11:08:28 -08:00
fhandle.c fs/handle.c - fix up kerneldoc 2019-08-07 21:51:47 -04:00
file.c Revert "fs: remove ksys_dup()" 2020-01-02 16:15:33 -08:00
file_table.c vfs: Export flush_delayed_fput for use by knfsd. 2019-08-19 11:00:39 -04:00
filesystems.c vfs: Implement a filesystem superblock creation/configuration context 2019-02-28 03:29:26 -05:00
fs-writeback.c cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead 2019-11-08 13:37:24 -07:00
fs_context.c vfs: subtype handling moved to fuse 2019-09-06 21:28:49 +02:00
fs_parser.c vfs: Make fs_parse() handle fs_param_is_fd-type params better 2019-09-12 21:06:14 -04:00
fs_pin.c switch the remnants of releasing the mountpoint away from fs_pin 2019-07-16 22:52:37 -04:00
fs_struct.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
fs_types.c fs: common implementation of file type 2019-01-21 17:48:13 +01:00
fsopen.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
inode.c fs: avoid softlockups in s_inodes iterators 2019-12-18 00:03:01 -05:00
internal.h fs: move guard_bio_eod() after bio_set_op_attrs 2020-01-09 08:16:12 -07:00
io-wq.c io-wq: cancel work if we fail getting a mm reference 2020-01-14 22:06:11 -07:00
io-wq.h io-wq: re-add io_wq_current_is_worker() 2019-12-17 19:57:20 -07:00
io_uring.c io_uring: only allow submit from owning task 2020-01-16 21:43:24 -07:00
ioctl.c New code for 5.5: 2019-12-02 14:46:22 -08:00
Kconfig io-wq: small threadpool implementation for io_uring 2019-10-29 12:43:00 -06:00
Kconfig.binfmt binfmt_flat: make support for old format binaries optional 2019-06-24 09:16:47 +10:00
libfs.c fs/libfs.c: fix kernel-doc warning 2019-10-14 15:04:01 -07:00
locks.c locks: print unsigned ino in /proc/locks 2019-12-29 09:00:58 -05:00
Makefile io-wq: small threadpool implementation for io_uring 2019-10-29 12:43:00 -06:00
mbcache.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
mount.h switch the remnants of releasing the mountpoint away from fs_pin 2019-07-16 22:52:37 -04:00
mpage.c fs: move guard_bio_eod() after bio_set_op_attrs 2020-01-09 08:16:12 -07:00
namei.c fix autofs regression caused by follow_managed() changes 2020-01-15 01:36:46 -05:00
namespace.c fs/namespace.c: make to_mnt_ns() static 2020-01-04 13:55:09 -08:00
no-block.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
nsfs.c fs/nsfs.c: include headers for missing declarations 2020-01-04 13:55:09 -08:00
open.c Revert "vfs: properly and reliably lock f_pos in fdget_pos()" 2019-11-26 11:34:06 -08:00
pipe.c pipe: fix empty pipe check in pipe_write() 2019-12-22 09:47:47 -08:00
pnode.c fs/namespace: fix unprivileged mount propagation 2019-06-17 17:36:09 -04:00
pnode.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209 2019-05-30 11:29:53 -07:00
posix_acl.c fs/posix_acl.c: fix kernel-doc warnings 2020-01-04 13:55:09 -08:00
proc_namespace.c vfs: subtype handling moved to fuse 2019-09-06 21:28:49 +02:00
read_write.c vfs: fix page locking deadlocks when deduping files 2019-08-16 18:43:24 -07:00
readdir.c filldir[64]: remove WARN_ON_ONCE() for bad directory entries 2019-10-18 18:41:16 -04:00
select.c y2038: syscalls: change remaining timeval to __kernel_old_timeval 2019-11-15 14:38:29 +01:00
seq_file.c seq_file: fix problem when seeking mid-record 2019-08-13 16:06:52 -07:00
signalfd.c fs: mark expected switch fall-throughs 2019-04-08 18:21:02 -05:00
splice.c pipe: remove 'waiting_writers' merging logic 2019-12-07 13:21:01 -08:00
stack.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
stat.c fs: move generic stat response attr handling to vfs_getattr_nosec 2019-02-01 01:55:45 -05:00
statfs.c vfs: Fix EOVERFLOW testing in put_compat_statfs64 2019-10-03 14:21:35 -07:00
super.c fs: call fsnotify_sb_delete after evict_inodes 2019-12-18 00:03:01 -05:00
sync.c fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback 2019-05-14 09:47:50 -07:00
timerfd.c y2038: timerfd: Use timespec64 internally 2019-11-15 14:38:30 +01:00
userfaultfd.c Merge branch 'akpm' (patches from Andrew) 2019-12-01 20:36:41 -08:00
utimes.c y2038: syscalls: change remaining timeval to __kernel_old_timeval 2019-11-15 14:38:29 +01:00
xattr.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00