linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-30 08:02:30 +00:00

History

Josef Bacik a15fc58933 btrfs: do not take the uuid_mutex in btrfs_rm_device [ Upstream commit `8ef9dc0f14` ] We got the following lockdep splat while running fstests (specifically btrfs/003 and btrfs/020 in a row) with the new rc. This was uncovered by `87579e9b7d` ("loop: use worker per cgroup instead of kworker") which converted loop to using workqueues, which comes with lockdep annotations that don't exist with kworkers. The lockdep splat is as follows: WARNING: possible circular locking dependency detected 5.14.0-rc2-custom+ #34 Not tainted ------------------------------------------------------ losetup/156417 is trying to acquire lock: ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600 but task is already holding lock: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (&lo->lo_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 lo_open+0x28/0x60 [loop] blkdev_get_whole+0x28/0xf0 blkdev_get_by_dev.part.0+0x168/0x3c0 blkdev_open+0xd2/0xe0 do_dentry_open+0x163/0x3a0 path_openat+0x74d/0xa40 do_filp_open+0x9c/0x140 do_sys_openat2+0xb1/0x170 __x64_sys_openat+0x54/0x90 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #4 (&disk->open_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 blkdev_get_by_dev.part.0+0xd1/0x3c0 blkdev_get_by_path+0xc0/0xd0 btrfs_scan_one_device+0x52/0x1f0 [btrfs] btrfs_control_ioctl+0xac/0x170 [btrfs] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #3 (uuid_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 btrfs_rm_device+0x48/0x6a0 [btrfs] btrfs_ioctl+0x2d1c/0x3110 [btrfs] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #2 (sb_writers#11){.+.+}-{0:0}: lo_write_bvec+0x112/0x290 [loop] loop_process_work+0x25f/0xcb0 [loop] process_one_work+0x28f/0x5d0 worker_thread+0x55/0x3c0 kthread+0x140/0x170 ret_from_fork+0x22/0x30 -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}: process_one_work+0x266/0x5d0 worker_thread+0x55/0x3c0 kthread+0x140/0x170 ret_from_fork+0x22/0x30 -> #0 ((wq_completion)loop0){+.+.}-{0:0}: __lock_acquire+0x1130/0x1dc0 lock_acquire+0xf5/0x320 flush_workqueue+0xae/0x600 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x650 [loop] lo_ioctl+0x29d/0x780 [loop] block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&lo->lo_mutex); lock(&disk->open_mutex); lock(&lo->lo_mutex); lock((wq_completion)loop0); * DEADLOCK * 1 lock held by losetup/156417: #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop] stack backtrace: CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x57/0x72 check_noncircular+0x10a/0x120 __lock_acquire+0x1130/0x1dc0 lock_acquire+0xf5/0x320 ? flush_workqueue+0x84/0x600 flush_workqueue+0xae/0x600 ? flush_workqueue+0x84/0x600 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x650 [loop] lo_ioctl+0x29d/0x780 [loop] ? __lock_acquire+0x3a0/0x1dc0 ? update_dl_rq_load_avg+0x152/0x360 ? lock_is_held_type+0xa5/0x120 ? find_held_lock.constprop.0+0x2b/0x80 block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f645884de6b Usually the uuid_mutex exists to protect the fs_devices that map together all of the devices that match a specific uuid. In rm_device we're messing with the uuid of a device, so it makes sense to protect that here. However in doing that it pulls in a whole host of lockdep dependencies, as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus we end up with the dependency chain under the uuid_mutex being added under the normal sb write dependency chain, which causes problems with loop devices. We don't need the uuid mutex here however. If we call btrfs_scan_one_device() before we scratch the super block we will find the fs_devices and not find the device itself and return EBUSY because the fs_devices is open. If we call it after the scratch happens it will not appear to be a valid btrfs file system. We do not need to worry about other fs_devices modifying operations here because we're protected by the exclusive operations locking. So drop the uuid_mutex here in order to fix the lockdep splat. A more detailed explanation from the discussion: We are worried about rm and scan racing with each other, before this change we'll zero the device out under the UUID mutex so when scan does run it'll make sure that it can go through the whole device scan thing without rm messing with us. We aren't worried if the scratch happens first, because the result is we don't think this is a btrfs device and we bail out. The only case we are concerned with is we scratch _after_ scan is able to read the superblock and gets a seemingly valid super block, so lets consider this case. Scan will call device_list_add() with the device we're removing. We'll call find_fsid_with_metadata_uuid() and get our fs_devices for this UUID. At this point we lock the fs_devices->device_list_mutex. This is what protects us in this case, but we have two cases here. 1. We aren't to the device removal part of the RM. We found our device, and device name matches our path, we go down and we set total_devices to our super number of devices, which doesn't affect anything because we haven't done the remove yet. 2. We are past the device removal part, which is protected by the device_list_mutex. Scan doesn't find the device, it goes down and does the if (fs_devices->opened) return -EBUSY; check and we bail out. Nothing about this situation is ideal, but the lockdep splat is real, and the fix is safe, tho admittedly a bit scary looking. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ copy more from the discussion ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>		2021-11-17 09:48:33 +01:00
..
9p	9P: Cast to loff_t before multiplying	2020-11-05 11:43:34 +01:00
adfs
affs	fs/affs: release old buffer head on error path	2021-03-04 10:26:48 +01:00
afs	afs: Fix incorrect triggering of sillyrename on 3rd-party invalidation	2021-09-30 10:09:22 +02:00
autofs
befs
bfs	bfs: don't use WARNING: string when it's just info.	2021-01-06 14:48:39 +01:00
btrfs	btrfs: do not take the uuid_mutex in btrfs_rm_device	2021-11-17 09:48:33 +01:00
cachefiles	cachefiles: Handle readpage error correctly	2020-11-05 11:43:36 +01:00
ceph	ceph: fix handling of "meta" errors	2021-10-27 09:54:27 +02:00
cifs	cifs: fix a sign extension bug	2021-09-30 10:09:24 +02:00
coda
configfs	configfs: fix memleak in configfs_release_bin_file	2021-07-14 16:53:46 +02:00
cramfs
crypto	fscrypt: add fscrypt_symlink_getattr() for computing st_size	2021-09-12 08:56:38 +02:00
debugfs	debugfs: debugfs_create_file_size(): use IS_ERR to check for error	2021-10-06 15:42:35 +02:00
devpts
dlm	fs: dlm: fix memory leak when fenced	2021-07-14 16:53:17 +02:00
ecryptfs	Revert "ecryptfs: replace BUG_ON with error handling code"	2021-05-26 12:05:19 +02:00
efivarfs	efivarfs: revert "fix memory leak in efivarfs_create()"	2020-12-02 08:49:53 +01:00
efs
erofs	erofs: add unsupported inode i_format check	2021-05-11 14:04:02 +02:00
exportfs
ext2	ext2: fix sleeping in atomic bugs on error	2021-10-09 14:39:49 +02:00
ext4	ext4: correct the error path of ext4_write_inline_data_end()	2021-10-17 10:42:33 +02:00
f2fs	f2fs: fix to unmap pages from userspace process in punch_hole()	2021-09-22 12:26:26 +02:00
fat	fat: don't allow to mount if the FAT length == 0	2020-06-17 16:40:36 +02:00
freevxfs
fscache	fscache: Fix cookie key hashing	2021-09-22 12:26:25 +02:00
fuse	fuse: fix page stealing	2021-11-17 09:48:19 +01:00
gfs2	gfs2: Don't call dlm after protocol is unmounted	2021-09-22 12:26:33 +02:00
hfs	hfs: add lock nesting notation to hfs_find_init	2021-07-31 08:19:38 +02:00
hfsplus	hfsplus: prevent corruption in shrinking truncate	2021-05-19 10:08:29 +02:00
hostfs	hostfs: fix memory handling in follow_link()	2021-04-14 08:24:14 +02:00
hpfs
hugetlbfs	hugetlbfs: fix mount mode command line processing	2021-07-28 13:31:01 +02:00
iomap	mm/swap: consider max pages in iomap_swapfile_add_extent	2021-09-15 09:47:35 +02:00
isofs	isofs: Fix out of bound access for corrupted isofs image	2021-11-12 14:43:03 +01:00
jbd2	jbd2: fix up sparse warnings in checkpoint code	2020-11-18 19:20:30 +01:00
jffs2	jffs2: check the validity of dstlen in jffs2_zlib_compress()	2021-05-11 14:04:16 +02:00
jfs	fs/jfs: Fix missing error code in lmLogInit()	2021-07-20 16:10:42 +02:00
kernfs	kernfs: do not call fsnotify() with name without a parent	2020-08-19 08:16:12 +02:00
lockd	lockd: lockd server-side shouldn't set fl_ops	2021-09-22 12:26:34 +02:00
minix	fs/minix: remove expected error message in block_to_path()	2020-08-21 13:05:37 +02:00
nfs	NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times	2021-07-20 16:10:50 +02:00
nfs_common	nfs_common: need lock during iterate through the list	2020-12-30 11:51:22 +01:00
nfsd	NFSD: Keep existing listeners on portlist error	2021-10-27 09:54:25 +02:00
nilfs2	nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group	2021-09-26 14:07:13 +02:00
nls
notify	fanotify: fix ignore mask logic for events on child and on dir	2020-06-17 16:40:24 +02:00
ntfs	ntfs: fix validity check for file name attribute	2021-07-14 16:53:01 +02:00
ocfs2	ocfs2: fix data corruption on truncate	2021-11-17 09:48:17 +01:00
omfs
openpromfs
orangefs	orangefs: fix orangefs df output.	2021-07-20 16:10:48 +02:00
overlayfs	ovl: simplify file splice	2021-10-20 11:40:12 +02:00
proc	mm, oom: make the calculation of oom badness more accurate	2021-09-03 10:08:12 +02:00
pstore	pstore: Fix typo in compression option name	2021-03-04 10:26:45 +01:00
qnx4	qnx4: work around gcc false positive warning bug	2021-09-30 10:09:26 +02:00
qnx6
quota	quota: correct error number in free_dqentry()	2021-11-17 09:48:26 +01:00
ramfs	ramfs: fix nommu mmap with gaps in the page cache	2020-10-29 09:57:53 +01:00
reiserfs	reiserfs: check directory items on read from disk	2021-08-12 13:21:05 +02:00
romfs	romfs: fix uninitialized memory leak in romfs_dev_read()	2020-08-26 10:40:51 +02:00
squashfs	squashfs: fix divide error in calculate_skip()	2021-05-19 10:08:29 +02:00
sysfs	sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output	2021-03-07 12:20:48 +01:00
sysv
tracefs	tracefs: Have tracefs directories not set OTH permission bits by default	2021-11-17 09:48:30 +01:00
ubifs	ubifs: report correct st_size for encrypted symlinks	2021-09-12 08:56:39 +02:00
udf	udf_get_extendedattr() had no boundary checks.	2021-09-15 09:47:28 +02:00
ufs	fs/ufs: avoid potential u32 multiplication overflow	2020-08-21 13:05:37 +02:00
unicode
verity	fs-verity: fix signed integer overflow with i_size near S64_MAX	2021-10-06 15:42:30 +02:00
xfs	xfs: Fix assert failure in xfs_setattr_size()	2021-03-07 12:20:42 +01:00
aio.c	aio: fix async fsync creds	2020-06-17 16:40:24 +02:00
anon_inodes.c
attr.c	utimes: Clamp the timestamps in notify_change()	2020-02-11 04:35:12 -08:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c	elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings	2021-10-06 15:42:35 +02:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c	binfmt_flat: revert "binfmt_flat: don't offset the data start"	2020-09-03 11:26:39 +02:00
binfmt_misc.c	binfmt_misc: fix possible deadlock in bm_register_write	2021-03-17 17:03:57 +01:00
binfmt_script.c
block_dev.c	block: reexpand iov_iter after read/write	2021-05-22 11:38:29 +02:00
buffer.c	fs: Don't invalidate page buffers in block_write_full_page()	2020-11-05 11:43:24 +01:00
char_dev.c	chardev: Avoid potential use-after-free in 'chrdev_open()'	2020-01-14 20:08:18 +01:00
compat.c
compat_binfmt_elf.c
compat_ioctl.c	fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAP	2020-01-09 10:20:05 +01:00
coredump.c	coredump: fix core_pattern parse error	2020-12-11 13:23:30 +01:00
d_path.c	fs: fix NULL dereference due to data race in prepend_path()	2020-10-29 09:57:45 +01:00
dax.c	dax: fix ENOMEM handling in grab_mapping_entry()	2021-07-14 16:53:25 +02:00
dcache.c	fix dget_parent() fastpath race	2020-10-01 13:17:19 +02:00
dcookies.c
direct-io.c	fs: direct-io: fix missing sdio->boundary	2021-04-14 08:24:11 +02:00
drop_caches.c	fs: avoid softlockups in s_inodes iterators	2020-01-12 12:21:37 +01:00
eventfd.c	eventfd: track eventfd_signal() recursion depth	2020-02-11 04:35:37 -08:00
eventpoll.c	ep_create_wakeup_source(): dentry name can change under you...	2020-10-07 08:01:31 +02:00
exec.c	vfs: check fd has read access in kernel_read_file_from_fd()	2021-10-27 09:54:27 +02:00
fcntl.c	fcntl: fix potential deadlock for &fasync_struct.fa_lock	2021-09-15 09:47:28 +02:00
fhandle.c
file.c	fix multiplication overflow in copy_fdtable()	2020-05-27 17:46:12 +02:00
file_table.c
filesystems.c	fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()	2020-04-17 10:50:21 +02:00
fs-writeback.c	writeback: fix obtain a reference to a freeing memcg css	2021-07-14 16:53:35 +02:00
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
inode.c	futex: Fix inode life-time issue	2020-03-25 08:25:58 +01:00
internal.h	cgroup1: fix leaked context root causing sporadic NULL deref in LTP	2021-07-31 08:19:37 +02:00
io_uring.c	io_uring: Fix current->fs handling in io_sq_wq_submit_work()	2021-01-30 13:54:10 +01:00
ioctl.c
Kconfig
Kconfig.binfmt
libfs.c	libfs: fix error cast of negative value in simple_attr_write()	2020-11-24 13:29:19 +01:00
locks.c	locks: reinstate locks_delete_block optimization	2020-03-25 08:25:41 +01:00
Makefile
mbcache.c
mount.h
mpage.c	fs: move guard_bio_eod() after bio_set_op_attrs	2020-01-17 19:48:21 +01:00
namei.c	namei: only return -ECHILD from follow_dotdot_rcu()	2020-03-05 16:43:48 +01:00
namespace.c	fs: warn about impending deprecation of mandatory locks	2021-08-26 08:36:22 -04:00
no-block.c
nsfs.c
open.c	cifs_atomic_open(): fix double-put on late allocation failure	2020-03-18 07:17:51 +01:00
pipe.c	pipe: increase minimum default pipe size to 2 pages	2021-08-12 13:21:02 +02:00
pnode.c	propagate_one(): mnt_set_mountpoint() needs mount_lock	2020-05-02 08:48:44 +02:00
pnode.h	mount: fix mounting of detached mounts onto targets that reside on shared mounts	2021-03-17 17:03:33 +01:00
posix_acl.c
proc_namespace.c
read_write.c	fs: allow deduplication of eof block into the end of the destination file	2020-02-11 04:35:23 -08:00
readdir.c	readdir: make sure to verify directory entry for legacy interfaces too	2021-04-21 12:56:16 +02:00
select.c	kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()	2021-03-24 11:26:44 +01:00
seq_file.c	seq_file: disallow extremely large seq buffer allocations	2021-07-20 16:10:54 +02:00
signalfd.c	fs/signalfd.c: fix inconsistent return codes for signalfd4	2020-08-26 10:40:58 +02:00
splice.c
stack.c
stat.c
statfs.c
super.c	vfs: remove lockdep bogosity in __sb_start_write	2020-11-24 13:29:01 +01:00
sync.c
timerfd.c
userfaultfd.c	userfaultfd: prevent concurrent API initialization	2021-09-22 12:26:26 +02:00
utimes.c	utimes: Clamp the timestamps in notify_change()	2020-02-11 04:35:12 -08:00
xattr.c	xattr: break delegations in {set,remove}xattr	2020-08-11 15:33:39 +02:00