linux-stable/fs/btrfs
Desmond Cheong Zhi Xi cd7b39e7c4 btrfs: reset replace target device to allocation state on close
commit 0d977e0eba upstream.

This crash was observed with a failed assertion on device close:

  BTRFS: Transaction aborted (error -28)
  WARNING: CPU: 1 PID: 3902 at fs/btrfs/extent-tree.c:2150 btrfs_run_delayed_refs+0x1d2/0x1e0 [btrfs]
  Modules linked in: btrfs blake2b_generic libcrc32c crc32c_intel xor zstd_decompress zstd_compress xxhash lzo_compress lzo_decompress raid6_pq loop
  CPU: 1 PID: 3902 Comm: kworker/u8:4 Not tainted 5.14.0-rc5-default+ #1532
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
  Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
  RIP: 0010:btrfs_run_delayed_refs+0x1d2/0x1e0 [btrfs]
  RSP: 0018:ffffb7a5452d7d80 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
  RBP: ffff97834176a378 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff97835195d388
  R13: 0000000005b08000 R14: ffff978385484000 R15: 000000000000016c
  FS:  0000000000000000(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000056190d003fe8 CR3: 000000002a81e005 CR4: 0000000000170ea0
  Call Trace:
   flush_space+0x197/0x2f0 [btrfs]
   btrfs_async_reclaim_metadata_space+0x139/0x300 [btrfs]
   process_one_work+0x262/0x5e0
   worker_thread+0x4c/0x320
   ? process_one_work+0x5e0/0x5e0
   kthread+0x144/0x170
   ? set_kthread_struct+0x40/0x40
   ret_from_fork+0x1f/0x30
  irq event stamp: 19334989
  hardirqs last  enabled at (19334997): [<ffffffffab0e0c87>] console_unlock+0x2b7/0x400
  hardirqs last disabled at (19335006): [<ffffffffab0e0d0d>] console_unlock+0x33d/0x400
  softirqs last  enabled at (19334900): [<ffffffffaba0030d>] __do_softirq+0x30d/0x574
  softirqs last disabled at (19334893): [<ffffffffab0721ec>] irq_exit_rcu+0x12c/0x140
  ---[ end trace 45939e308e0dd3c7 ]---
  BTRFS: error (device vdd) in btrfs_run_delayed_refs:2150: errno=-28 No space left
  BTRFS info (device vdd): forced readonly
  BTRFS warning (device vdd): failed setting block group ro: -30
  BTRFS info (device vdd): suspending dev_replace for unmount
  assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3431!
  invalid opcode: 0000 [#1] PREEMPT SMP
  CPU: 1 PID: 3982 Comm: umount Tainted: G        W         5.14.0-rc5-default+ #1532
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
  RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
  RSP: 0018:ffffb7a5454c7db8 EFLAGS: 00010246
  RAX: 0000000000000068 RBX: ffff978364b91c00 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
  RBP: ffff9783523a4c00 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff9783523a4d18
  R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000003
  FS:  00007f61c8f42800(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000056190cffa810 CR3: 0000000030b96002 CR4: 0000000000170ea0
  Call Trace:
   btrfs_close_one_device.cold+0x11/0x55 [btrfs]
   close_fs_devices+0x44/0xb0 [btrfs]
   btrfs_close_devices+0x48/0x160 [btrfs]
   generic_shutdown_super+0x69/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x2c/0xa0
   cleanup_mnt+0x144/0x1b0
   task_work_run+0x59/0xa0
   exit_to_user_mode_loop+0xe7/0xf0
   exit_to_user_mode_prepare+0xaf/0xf0
   syscall_exit_to_user_mode+0x19/0x50
   do_syscall_64+0x4a/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

This happens when close_ctree is called while a dev_replace hasn't
completed. In close_ctree, we suspend the dev_replace, but keep the
replace target around so that we can resume the dev_replace procedure
when we mount the root again. This is the call trace:

  close_ctree():
    btrfs_dev_replace_suspend_for_unmount();
    btrfs_close_devices():
      btrfs_close_fs_devices():
        btrfs_close_one_device():
          ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
                 &device->dev_state));

However, since the replace target sticks around, there is a device
with BTRFS_DEV_STATE_REPLACE_TGT set on close, and we fail the
assertion in btrfs_close_one_device.

To fix this, if we come across the replace target device when
closing, we should properly reset it back to allocation state. This
fix also ensures that if a non-target device has a corrupted state and
has the BTRFS_DEV_STATE_REPLACE_TGT bit set, the assertion will still
catch the error.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: b2a6166768 ("btrfs: fix rw device counting in __btrfs_free_extra_devids")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22 12:26:19 +02:00
..
tests btrfs: Correctly handle empty trees in find_first_clear_extent_bit 2020-02-11 04:35:34 -08:00
acl.c
async-thread.c Btrfs: fix crash during unmount due to race with delayed inode workers 2020-04-17 10:50:15 +02:00
async-thread.h Btrfs: fix crash during unmount due to race with delayed inode workers 2020-04-17 10:50:15 +02:00
backref.c btrfs: backref, use correct count to resolve normal data refs 2021-02-07 15:35:47 +01:00
backref.h
block-group.c btrfs: delete duplicated words + other fixes in comments 2021-08-08 09:04:07 +02:00
block-group.h btrfs: scrub: Don't check free space before marking a block group RO 2021-03-20 10:39:46 +01:00
block-rsv.c btrfs: force chunk allocation if our global rsv is larger than metadata 2020-06-22 09:31:13 +02:00
block-rsv.h
btrfs_inode.h btrfs: fix race between marking inode needs to be logged and log syncing 2021-09-03 10:08:15 +02:00
check-integrity.c btrfs: fix possible NULL-pointer dereference in integrity checks 2020-02-24 08:36:53 +01:00
check-integrity.h
compression.c btrfs: mark compressed range uptodate only if all bio succeed 2021-08-04 12:27:37 +02:00
compression.h
ctree.c btrfs: delete duplicated words + other fixes in comments 2021-08-08 09:04:07 +02:00
ctree.h btrfs: fix lockdep splat when enabling and disabling qgroups 2021-08-15 13:08:05 +02:00
delalloc-space.c btrfs: make btrfs_qgroup_reserve_data take btrfs_inode 2021-08-15 13:08:04 +02:00
delalloc-space.h
delayed-inode.c btrfs: don't flush from btrfs_delayed_inode_reserve_metadata 2021-08-15 13:08:06 +02:00
delayed-inode.h
delayed-ref.c Btrfs: fix race between adding and putting tree mod seq elements and nodes 2020-02-11 04:35:34 -08:00
delayed-ref.h
dev-replace.c btrfs: dev-replace: fail mount if we don't have replace item with target device 2020-11-18 19:20:29 +01:00
dev-replace.h
dir-item.c
disk-io.c btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT 2021-08-15 13:08:05 +02:00
disk-io.h
export.c btrfs: export helpers for subvolume name/id resolution 2020-08-26 10:40:49 +02:00
export.h btrfs: export helpers for subvolume name/id resolution 2020-08-26 10:40:49 +02:00
extent-tree.c btrfs: check for missing device in btrfs_trim_fs 2021-07-28 13:31:00 +02:00
extent_io.c btrfs: delete duplicated words + other fixes in comments 2021-08-08 09:04:07 +02:00
extent_io.h btrfs: trim: fix underflow in trim length to prevent access beyond device boundary 2020-12-30 11:51:37 +01:00
extent_map.c Btrfs: fix race between using extent maps and merging them 2020-02-19 19:53:00 +01:00
extent_map.h
file-item.c btrfs: fix error handling in btrfs_del_csums 2021-06-10 13:37:13 +02:00
file.c btrfs: fix race between marking inode needs to be logged and log syncing 2021-09-03 10:08:15 +02:00
free-space-cache.c btrfs: delete duplicated words + other fixes in comments 2021-08-08 09:04:07 +02:00
free-space-cache.h
free-space-tree.c btrfs: fix possible free space tree corruption with online conversion 2021-02-03 23:25:57 +01:00
free-space-tree.h
inode-item.c
inode-map.c btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents() 2019-10-15 18:50:07 +02:00
inode-map.h
inode.c btrfs: wake up async_delalloc_pages waiters after submit 2021-09-22 12:26:19 +02:00
ioctl.c btrfs: fix metadata extent leak after failure to create subvolume 2021-05-11 14:04:04 +02:00
Kconfig btrfs: disable build on platforms having page size 256K 2021-07-14 16:53:14 +02:00
locking.c
locking.h
lzo.c
Makefile
misc.h
ordered-data.c Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents 2020-02-28 17:22:24 +01:00
ordered-data.h
orphan.c
print-tree.c btrfs: require only sector size alignment for parent eb bytenr 2020-09-17 13:47:51 +02:00
print-tree.h
props.c
props.h
qgroup.c btrfs: export and rename qgroup_reserve_meta 2021-08-15 13:08:06 +02:00
qgroup.h btrfs: export and rename qgroup_reserve_meta 2021-08-15 13:08:06 +02:00
raid56.c btrfs: fix raid6 qstripe kmap 2021-03-09 11:09:37 +01:00
raid56.h
rcu-string.h
reada.c btrfs: fix readahead hang and use-after-free after removing a device 2020-11-05 11:43:27 +01:00
ref-verify.c btrfs: ref-verify: fix memory leak in btrfs_ref_tree_mod 2020-11-18 19:20:28 +01:00
ref-verify.h
relocation.c btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s 2021-05-11 14:04:07 +02:00
root-tree.c btrfs: do not delete mismatched root refs 2020-01-23 08:22:40 +01:00
scrub.c btrfs: scrub: Don't check free space before marking a block group RO 2021-03-20 10:39:46 +01:00
send.c btrfs: send: fix invalid path for unlink operations after parent orphanization 2021-07-14 16:53:02 +02:00
send.h
space-info.c btrfs: take overcommit into account in inc_block_group_ro 2020-10-17 10:11:21 +02:00
space-info.h btrfs: take overcommit into account in inc_block_group_ro 2020-10-17 10:11:21 +02:00
struct-funcs.c
super.c btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan 2021-01-19 18:26:14 +01:00
sysfs.c btrfs: sysfs: use NOFS for device creation 2020-08-21 13:05:22 +02:00
sysfs.h
transaction.c btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT 2021-08-15 13:08:05 +02:00
transaction.h btrfs: fix race between marking inode needs to be logged and log syncing 2021-09-03 10:08:15 +02:00
tree-checker.c btrfs: tree-checker: do not error out if extent ref hash doesn't match 2021-06-10 13:37:01 +02:00
tree-checker.h
tree-defrag.c
tree-log.c btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction 2021-08-08 09:04:07 +02:00
tree-log.h btrfs: do not commit logs and transactions during link and rename operations 2021-08-08 09:04:07 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: handle ENOENT in btrfs_uuid_tree_iterate 2019-12-31 16:42:05 +01:00
volumes.c btrfs: reset replace target device to allocation state on close 2021-09-22 12:26:19 +02:00
volumes.h btrfs: fix readahead hang and use-after-free after removing a device 2020-11-05 11:43:27 +01:00
xattr.c btrfs: fix warning when creating a directory with smack enabled 2021-03-09 11:09:37 +01:00
xattr.h
zlib.c
zstd.c