linux-stable/fs
Filipe Manana 341d33128a btrfs: fix hang during unmount when block group reclaim task is running
commit 31e70e5278 upstream.

When we start an unmount, at close_ctree(), if we have the reclaim task
running and in the middle of a data block group relocation, we can trigger
a deadlock when stopping an async reclaim task, producing a trace like the
following:

[629724.498185] task:kworker/u16:7   state:D stack:    0 pid:681170 ppid:     2 flags:0x00004000
[629724.499760] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[629724.501267] Call Trace:
[629724.501759]  <TASK>
[629724.502174]  __schedule+0x3cb/0xed0
[629724.502842]  schedule+0x4e/0xb0
[629724.503447]  btrfs_wait_on_delayed_iputs+0x7c/0xc0 [btrfs]
[629724.504534]  ? prepare_to_wait_exclusive+0xc0/0xc0
[629724.505442]  flush_space+0x423/0x630 [btrfs]
[629724.506296]  ? rcu_read_unlock_trace_special+0x20/0x50
[629724.507259]  ? lock_release+0x220/0x4a0
[629724.507932]  ? btrfs_get_alloc_profile+0xb3/0x290 [btrfs]
[629724.508940]  ? do_raw_spin_unlock+0x4b/0xa0
[629724.509688]  btrfs_async_reclaim_metadata_space+0x139/0x320 [btrfs]
[629724.510922]  process_one_work+0x252/0x5a0
[629724.511694]  ? process_one_work+0x5a0/0x5a0
[629724.512508]  worker_thread+0x52/0x3b0
[629724.513220]  ? process_one_work+0x5a0/0x5a0
[629724.514021]  kthread+0xf2/0x120
[629724.514627]  ? kthread_complete_and_exit+0x20/0x20
[629724.515526]  ret_from_fork+0x22/0x30
[629724.516236]  </TASK>
[629724.516694] task:umount          state:D stack:    0 pid:719055 ppid:695412 flags:0x00004000
[629724.518269] Call Trace:
[629724.518746]  <TASK>
[629724.519160]  __schedule+0x3cb/0xed0
[629724.519835]  schedule+0x4e/0xb0
[629724.520467]  schedule_timeout+0xed/0x130
[629724.521221]  ? lock_release+0x220/0x4a0
[629724.521946]  ? lock_acquired+0x19c/0x420
[629724.522662]  ? trace_hardirqs_on+0x1b/0xe0
[629724.523411]  __wait_for_common+0xaf/0x1f0
[629724.524189]  ? usleep_range_state+0xb0/0xb0
[629724.524997]  __flush_work+0x26d/0x530
[629724.525698]  ? flush_workqueue_prep_pwqs+0x140/0x140
[629724.526580]  ? lock_acquire+0x1a0/0x310
[629724.527324]  __cancel_work_timer+0x137/0x1c0
[629724.528190]  close_ctree+0xfd/0x531 [btrfs]
[629724.529000]  ? evict_inodes+0x166/0x1c0
[629724.529510]  generic_shutdown_super+0x74/0x120
[629724.530103]  kill_anon_super+0x14/0x30
[629724.530611]  btrfs_kill_super+0x12/0x20 [btrfs]
[629724.531246]  deactivate_locked_super+0x31/0xa0
[629724.531817]  cleanup_mnt+0x147/0x1c0
[629724.532319]  task_work_run+0x5c/0xa0
[629724.532984]  exit_to_user_mode_prepare+0x1a6/0x1b0
[629724.533598]  syscall_exit_to_user_mode+0x16/0x40
[629724.534200]  do_syscall_64+0x48/0x90
[629724.534667]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[629724.535318] RIP: 0033:0x7fa2b90437a7
[629724.535804] RSP: 002b:00007ffe0b7e4458 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[629724.536912] RAX: 0000000000000000 RBX: 00007fa2b9182264 RCX: 00007fa2b90437a7
[629724.538156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000555d6cf20dd0
[629724.539053] RBP: 0000555d6cf20ba0 R08: 0000000000000000 R09: 00007ffe0b7e3200
[629724.539956] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[629724.540883] R13: 0000555d6cf20dd0 R14: 0000555d6cf20cb0 R15: 0000000000000000
[629724.541796]  </TASK>

This happens because:

1) Before entering close_ctree() we have the async block group reclaim
   task running and relocating a data block group;

2) There's an async metadata (or data) space reclaim task running;

3) We enter close_ctree() and park the cleaner kthread;

4) The async space reclaim task is at flush_space() and runs all the
   existing delayed iputs;

5) Before the async space reclaim task calls
   btrfs_wait_on_delayed_iputs(), the block group reclaim task which is
   doing the data block group relocation, creates a delayed iput at
   replace_file_extents() (called when COWing leaves that have file extent
   items pointing to relocated data extents, during the merging phase
   of relocation roots);

6) The async reclaim space reclaim task blocks at
   btrfs_wait_on_delayed_iputs(), since we have a new delayed iput;

7) The task at close_ctree() then calls cancel_work_sync() to stop the
   async space reclaim task, but it blocks since that task is waiting for
   the delayed iput to be run;

8) The delayed iput is never run because the cleaner kthread is parked,
   and no one else runs delayed iputs, resulting in a hang.

So fix this by stopping the async block group reclaim task before we
park the cleaner kthread.

Fixes: 18bb8bbf13 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-29 09:03:19 +02:00
..
9p 9p: fix fid refcount leak in v9fs_vfs_get_link 2022-06-29 09:03:19 +02:00
adfs
affs
afs afs: Fix infinite loop found by xfstest generic/676 2022-06-14 18:36:13 +02:00
autofs autofs: fix wait name hash calculation in autofs_wait() 2021-10-20 21:09:02 -04:00
befs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
bfs
btrfs btrfs: fix hang during unmount when block group reclaim task is running 2022-06-29 09:03:19 +02:00
cachefiles cachefiles: Change %p in format strings to something else 2021-08-27 13:34:02 +01:00
ceph ceph: flush the mdlog for filesystem sync 2022-06-14 18:36:23 +02:00
cifs cifs: fix reconnect on smb3 mount types 2022-06-14 18:36:25 +02:00
coda
configfs configfs: fix a race in configfs_{,un}register_subsystem() 2022-03-02 11:48:02 +01:00
cramfs
crypto fscrypt: allow 256-bit master keys with AES-256-XTS 2021-11-18 19:16:11 +01:00
debugfs debugfs: lockdown: Allow reading debugfs files that are not world readable 2022-01-27 11:03:55 +01:00
devpts fsnotify: fix fsnotify hooks in pseudo filesystems 2022-02-01 17:27:01 +01:00
dlm dlm: fix missing lkb refcount handling 2022-06-09 10:23:22 +02:00
ecryptfs
efivarfs
efs
erofs iomap: Add done_before argument to iomap_dio_rw 2022-05-01 17:22:32 +02:00
exfat exfat: check if cluster num is valid 2022-06-06 08:43:37 +02:00
exportfs exportfs: support idmapped mounts 2022-06-09 10:23:32 +02:00
ext2 ext2: correct max file size computing 2022-04-08 14:23:35 +02:00
ext4 ext4: add reserved GDT blocks check 2022-06-22 14:22:05 +02:00
f2fs f2fs: fix to tag gcing flag on page during file defragment 2022-06-14 18:36:15 +02:00
fat fat: add ratelimit to fat*_ent_bread() 2022-06-09 10:22:42 +02:00
freevxfs
fscache fscache: Remove an unused static variable 2021-10-04 22:13:12 +01:00
fuse iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable 2022-05-01 17:22:28 +02:00
gfs2 gfs2: use i_lock spin_lock for inode qadata 2022-06-09 10:22:40 +02:00
hfs
hfsplus
hostfs hostfs: support splice_write 2021-08-26 22:28:02 +02:00
hpfs hpfs: use iomap_fiemap to implement ->fiemap 2021-07-27 11:00:36 +02:00
hugetlbfs hugetlbfs: fix hugetlbfs_statfs() locking 2022-06-09 10:23:11 +02:00
iomap iomap: iomap_write_failed fix 2022-06-09 10:22:55 +02:00
isofs isofs: Fix out of bound access for corrupted isofs image 2021-11-12 15:05:50 +01:00
jbd2 jbd2: fix a potential race while discarding reserved buffers after an abort 2022-04-27 14:39:02 +02:00
jffs2 jffs2: fix memory leak in jffs2_do_fill_super 2022-06-14 18:36:10 +02:00
jfs fs: jfs: fix possible NULL pointer dereference in dbFree() 2022-06-09 10:22:41 +02:00
kernfs kernfs: Separate kernfs_pr_cont_buf and rename_lock. 2022-06-14 18:36:22 +02:00
ksmbd ksmbd: fix reference count leak in smb_check_perm_dacl() 2022-06-14 18:36:06 +02:00
lockd lockd: fix failure to cleanup client locks 2022-02-05 12:38:57 +01:00
minix minix: fix bug when opening a file with O_DIRECT 2022-04-13 20:59:10 +02:00
netfs netfs: fix parameter of cleanup() 2021-12-29 12:28:59 +01:00
nfs pNFS: Avoid a live lock condition in pnfs_update_layout() 2022-06-22 14:21:59 +02:00
nfs_common nfs: Fix kerneldoc warning shown up by W=1 2021-10-04 22:02:17 +01:00
nfsd nfsd: Replace use of rwsem with errseq_t 2022-06-22 14:21:54 +02:00
nilfs2 nilfs2: fix lockdep warnings during disk space reclamation 2022-05-25 09:57:26 +02:00
nls
notify fsnotify: fix wrong lockdep annotations 2022-06-09 10:22:50 +02:00
ntfs iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable 2022-05-01 17:22:28 +02:00
ntfs3 fs/ntfs3: Fix invalid free in log_replay 2022-06-09 10:23:32 +02:00
ocfs2 ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock 2022-06-09 10:23:22 +02:00
omfs
openpromfs
orangefs orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc() 2022-01-20 09:13:13 +01:00
overlayfs ovl: fix NULL pointer dereference in copy up warning 2022-02-05 12:38:59 +01:00
proc proc: fix dentry/inode overinstantiating under /proc/${pid}/net 2022-06-09 10:23:10 +02:00
pstore pstore: Don't use semaphores in always-atomic-context code 2022-04-08 14:23:01 +02:00
qnx4 qnx4: work around gcc false positive warning bug 2021-09-21 08:36:48 -07:00
qnx6
quota quota: Prevent memory allocation recursion while holding dq_lock 2022-06-22 14:21:56 +02:00
ramfs
reiserfs Kbuild updates for v5.15 2021-09-03 15:33:47 -07:00
romfs
smbfs_common cifs: Fix crash on unload of cifs_arc4.ko 2021-12-14 10:57:12 +01:00
squashfs squashfs: use bvec_virt 2021-08-16 10:50:32 -06:00
sysfs sysfs: Allow deferred execution of iomem_get_mapping() 2021-08-06 13:05:28 +02:00
sysv
tracefs tracefs: Set the group ownership in apply_options() not parse_options() 2022-03-02 11:48:05 +01:00
ubifs ubifs: rename_whiteout: correct old_dir size computing 2022-04-08 14:24:08 +02:00
udf udf: Avoid using stale lengthOfImpUse 2022-05-15 20:18:52 +02:00
ufs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
unicode
vboxsf vboxfs: fix broken legacy mount signature checking 2021-09-27 11:26:21 -07:00
verity fs-verity: fix signed integer overflow with i_size near S64_MAX 2021-09-22 10:56:34 -07:00
xfs iomap: Add done_before argument to iomap_dio_rw 2022-05-01 17:22:32 +02:00
zonefs zonefs: fix zonefs_iomap_begin() for reads 2022-06-25 15:18:40 +02:00
aio.c aio: Fix incorrect usage of eventfd_signal_allowed() 2021-12-14 10:57:22 +01:00
anon_inodes.c
attr.c fs: handle circular mappings correctly 2021-11-25 09:48:46 +01:00
bad_inode.c vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
binfmt_aout.c binfmt: a.out: Fix bogus semicolon 2021-09-05 10:15:05 -07:00
binfmt_elf.c coredump: Use the vma snapshot in fill_files_note 2022-04-08 14:24:18 +02:00
binfmt_elf_fdpic.c coredump: Snapshot the vmas in do_coredump 2022-04-08 14:24:17 +02:00
binfmt_flat.c binfmt_flat: do not stop relocating GOT entries prematurely on riscv 2022-06-09 10:22:26 +02:00
binfmt_misc.c
binfmt_script.c
buffer.c mm: fs: fix lru_cache_disabled race in bh_lru 2022-04-08 14:22:54 +02:00
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: Use the vma snapshot in fill_files_note 2022-04-08 14:24:18 +02:00
d_path.c d_path: make 'prepend()' fill up the buffer exactly on overflow 2021-09-02 10:07:29 -07:00
dax.c dax: fix cache flush on PMD-mapped pages 2022-06-09 10:23:09 +02:00
dcache.c
direct-io.c
drop_caches.c fs: drop_caches: fix skipping over shadow cache inodes 2021-09-03 09:58:10 -07:00
eventfd.c eventfd: Export eventfd_wake_count to modules 2021-09-06 07:20:56 -04:00
eventpoll.c ARM development updates for 5.15: 2021-09-09 13:25:49 -07:00
exec.c exec: Force single empty string when argv is empty 2022-04-08 14:23:01 +02:00
fcntl.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
fhandle.c
file.c fs: fix fd table size alignment properly 2022-04-08 14:23:54 +02:00
file_table.c SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() 2022-05-18 10:26:57 +02:00
filesystems.c fs: simplify get_filesystem_list / get_all_fs_names 2021-08-23 01:25:40 -04:00
fs-writeback.c writeback: Fix inode->i_io_list not be protected by inode->i_lock error 2022-06-14 18:36:26 +02:00
fs_context.c vfs: fs_context: fix up param length parsing in legacy_parse_param 2022-01-20 09:13:14 +01:00
fs_parser.c namei: Standardize callers of filename_lookup() 2021-09-07 16:07:47 -04:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c writeback: Fix inode->i_io_list not be protected by inode->i_lock error 2022-06-14 18:36:26 +02:00
internal.h block: simplify the block device syncing code 2022-04-27 14:38:50 +02:00
io-wq.c io-wq: drop wqe lock before creating new worker 2021-12-22 09:32:51 +01:00
io-wq.h io-wq: provide a way to limit max number of workers 2021-08-29 07:55:55 -06:00
io_uring.c io_uring: fix races with buffer table unregister 2022-06-22 14:22:00 +02:00
ioctl.c fs: fix an infinite loop in iomap_fiemap 2022-05-25 09:57:26 +02:00
Kconfig 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
Kconfig.binfmt binfmt: remove support for em86 (alpha only) 2021-07-25 22:33:03 -07:00
kernel_read_file.c vfs: check fd has read access in kernel_read_file_from_fd() 2021-10-18 20:22:03 -10:00
libfs.c
locks.c Revert "memcg: enable accounting for file lock caches" 2021-09-07 11:21:48 -07:00
Makefile 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
mbcache.c
mount.h
mpage.c
namei.c fs: add two trivial lookup helpers 2022-06-09 10:23:32 +02:00
namespace.c fs/mount_setattr: always cleanup mount_kattr 2022-01-05 12:42:39 +01:00
no-block.c
nsfs.c
open.c mm, thp: fix incorrect unmap behavior for private pages 2021-11-18 19:17:17 +01:00
pipe.c pipe: Fix missing lock in pipe_resize_ring() 2022-06-06 08:43:37 +02:00
pnode.c
pnode.h
posix_acl.c ovl: enable RCU'd ->get_acl() 2021-08-18 22:08:24 +02:00
proc_namespace.c
read_write.c fs: clean up after mandatory file locking support removal 2021-08-24 07:52:45 -04:00
readdir.c
remap_range.c fs: remove mandatory file locking support 2021-08-23 06:15:36 -04:00
select.c select: Fix indefinitely sleeping task in poll_schedule_timeout() 2022-01-29 10:58:25 +01:00
seq_file.c
signalfd.c signalfd: use wake_up_pollfree() 2021-12-14 10:57:15 +01:00
splice.c
stack.c
stat.c stat: fix inconsistency between struct stat and struct compat_stat 2022-04-27 14:38:57 +02:00
statfs.c
super.c vfs: make freeze_super abort when sync_filesystem returns error 2022-02-23 12:03:05 +01:00
sync.c vfs: make sync_filesystem return errors from ->sync_fs 2022-04-27 14:38:50 +02:00
timerfd.c timerfd: Provide timerfd_resume() 2021-08-10 17:57:22 +02:00
userfaultfd.c userfaultfd: fix a race between writeprotect and exit_mmap() 2021-10-18 20:22:02 -10:00
utimes.c
xattr.c