linux-stable/drivers/md
Christoph Hellwig 65aa97c4d2 md: split mddev_find
Split mddev_find into a simple mddev_find that just finds an existing
mddev by the unit number, and a more complicated mddev_find that deals
with find or allocating a mddev.

This turns out to fix this bug reported by Zhao Heming.

----------------------------- snip ------------------------------
commit d3374825ce ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.

*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt

*** script ***

about trigger 1 time with 10 tests

`1  node1="15sp3-mdcluster1"
2  node2="15sp3-mdcluster2"
3
4  mdadm -Ss
5  ssh ${node2} "mdadm -Ss"
6  wipefs -a /dev/sda /dev/sdb
7  mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
   /dev/sdb --assume-clean
8
9  for i in {1..100}; do
10    echo ==== $i ====;
11
12    echo "test  ...."
13    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14    sleep 1
15
16    echo "clean  ....."
17    ssh ${node2} "mdadm -Ss"
18 done
`
I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.

*** stack ***

`ID: 2831   TASK: ffff8dd7223b5040  CPU: 0   COMMAND: "mdadm"
 #0 [ffffa15d00a13b90] __schedule at ffffffffb8f1935f
 #1 [ffffa15d00a13ba8] exact_lock at ffffffffb8a4a66d
 #2 [ffffa15d00a13bb0] kobj_lookup at ffffffffb8c62fe3
 #3 [ffffa15d00a13c28] __blkdev_get at ffffffffb89273b9
 #4 [ffffa15d00a13c98] blkdev_get at ffffffffb8927964
 #5 [ffffa15d00a13cb0] do_dentry_open at ffffffffb88dc4b4
 #6 [ffffa15d00a13ce0] path_openat at ffffffffb88f0ccc
 #7 [ffffa15d00a13db8] do_filp_open at ffffffffb88f32bb
 #8 [ffffa15d00a13ee0] do_sys_open at ffffffffb88ddc7d
 #9 [ffffa15d00a13f38] do_syscall_64 at ffffffffb86053cb ffffffffb900008c

or:
[  884.226509]  mddev_put+0x1c/0xe0 [md_mod]
[  884.226515]  md_open+0x3c/0xe0 [md_mod]
[  884.226518]  __blkdev_get+0x30d/0x710
[  884.226520]  ? bd_acquire+0xd0/0xd0
[  884.226522]  blkdev_get+0x14/0x30
[  884.226524]  do_dentry_open+0x204/0x3a0
[  884.226531]  path_openat+0x2fc/0x1520
[  884.226534]  ? seq_printf+0x4e/0x70
[  884.226536]  do_filp_open+0x9b/0x110
[  884.226542]  ? md_release+0x20/0x20 [md_mod]
[  884.226543]  ? seq_read+0x1d8/0x3e0
[  884.226545]  ? kmem_cache_alloc+0x18a/0x270
[  884.226547]  ? do_sys_open+0x1bd/0x260
[  884.226548]  do_sys_open+0x1bd/0x260
[  884.226551]  do_syscall_64+0x5b/0x1e0
[  884.226554]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
`
*** rootcause ***

"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.

The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.

`<thread 1>: "mdadm -Ss"
md_release
  mddev_put deletes mddev from all_mddevs
  queue_work for mddev_delayed_delete
  at this time, "/dev/md0" is still available for opening

<thread 2>: "mdadm --monitor ..."
md_open
 + mddev_find can't find mddev of /dev/md0, and create a new mddev and
 |    return.
 + trigger "if (mddev->gendisk != bdev->bd_disk)" and return
      -ERESTARTSYS.
`
In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.

In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.
------------------------------ snip ------------------------------

Cc: stable@vger.kernel.org
Fixes: d3374825ce ("md: make devices disappear when they are no longer needed.")
Reported-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2021-04-07 22:41:26 -07:00
..
bcache block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
persistent-data dm persistent data: fix return type of shadow_root() 2021-02-03 10:10:05 -05:00
dm-bio-prison-v1.c dm bio prison: replace spin_lock_irqsave with spin_lock_irq 2019-11-05 14:53:03 -05:00
dm-bio-prison-v1.h
dm-bio-prison-v2.c dm bio prison v2: use true/false for bool variable 2020-01-07 12:07:08 -05:00
dm-bio-prison-v2.h
dm-bio-record.h block: store a block_device pointer in struct bio 2021-01-24 18:17:20 -07:00
dm-bufio.c dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size 2021-03-04 14:53:54 -05:00
dm-builtin.c
dm-cache-background-tracker.c
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c dm: use bdev_read_only to check if a device is read-only 2021-01-24 18:15:57 -07:00
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c dm cache: simplify the return expression of load_mapping() 2020-12-22 09:54:48 -05:00
dm-clone-metadata.c dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions() 2020-03-27 14:42:51 -04:00
dm-clone-metadata.h dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions() 2020-03-27 14:42:51 -04:00
dm-clone-target.c dm-clone: use blkdev_issue_flush in commit_metadata 2021-01-27 09:51:48 -07:00
dm-core.h dm: fix deadlock when swapping to encrypted device 2021-02-11 09:45:28 -05:00
dm-crypt.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
dm-delay.c block: rename generic_make_request to submit_bio_noacct 2020-07-01 07:27:24 -06:00
dm-dust.c dm dust: remove h from printk format specifier 2021-02-03 10:10:04 -05:00
dm-ebs-target.c dm ebs: avoid double unlikely() notation when using IS_ERR() 2020-12-21 13:59:37 -05:00
dm-era-target.c dm era: only resize metadata in preresume 2021-02-11 09:45:22 -05:00
dm-exception-store.c
dm-exception-store.h
dm-flakey.c dm: simplify target code conditional on CONFIG_BLK_DEV_ZONED 2021-02-11 09:45:27 -05:00
dm-init.c dm init: Set file local variable static 2020-08-04 15:51:28 -04:00
dm-integrity.c dm integrity: introduce the "fix_hmac" argument 2021-02-03 10:10:05 -05:00
dm-io.c block: Add bio_max_segs 2021-02-26 15:49:51 -07:00
dm-ioctl.c dm ioctl: fix error return code in target_message 2020-12-04 18:04:36 -05:00
dm-kcopyd.c dm kcopyd: always complete failed jobs 2019-08-15 15:57:39 -04:00
dm-linear.c dm: simplify target code conditional on CONFIG_BLK_DEV_ZONED 2021-02-11 09:45:27 -05:00
dm-log-userspace-base.c
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c block: Add bio_max_segs 2021-02-26 15:49:51 -07:00
dm-log.c
dm-mpath.c dm: use dm_table_get_device_name() where appropriate in targets 2020-09-29 16:33:08 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h dm mpath: pass IO start time to path selector 2020-05-15 10:29:36 -04:00
dm-ps-historical-service-time.c dm: rename multipath path selector source files to have "dm-ps" prefix 2020-12-04 18:04:35 -05:00
dm-ps-io-affinity.c dm: rename multipath path selector source files to have "dm-ps" prefix 2020-12-04 18:04:35 -05:00
dm-ps-queue-length.c dm: rename multipath path selector source files to have "dm-ps" prefix 2020-12-04 18:04:35 -05:00
dm-ps-round-robin.c dm: rename multipath path selector source files to have "dm-ps" prefix 2020-12-04 18:04:35 -05:00
dm-ps-service-time.c dm: rename multipath path selector source files to have "dm-ps" prefix 2020-12-04 18:04:35 -05:00
dm-raid.c dm raid: fix discard limits for raid1 2021-01-04 15:02:30 -05:00
dm-raid1.c block: store a block_device pointer in struct bio 2021-01-24 18:17:20 -07:00
dm-region-hash.c
dm-rq.c block: remove the request_queue to argument request based tracepoints 2020-12-04 09:42:00 -07:00
dm-rq.h
dm-snap-persistent.c dm snap persistent: simplify area_io() 2020-09-29 16:33:12 -04:00
dm-snap-transient.c
dm-snap.c dm snapshot: flush merged data before committing metadata 2021-01-06 18:28:34 -05:00
dm-stats.c dm: replace zero-length array with flexible-array 2020-05-20 17:09:44 -04:00
dm-stats.h
dm-stripe.c dm: add support for REQ_NOWAIT to various targets 2020-12-04 18:04:35 -05:00
dm-switch.c dm: add support for REQ_NOWAIT to various targets 2020-12-04 18:04:35 -05:00
dm-sysfs.c
dm-table.c dm: support key eviction from keyslot managers of underlying devices 2021-02-11 09:45:25 -05:00
dm-target.c
dm-thin-metadata.c dm: use bdev_read_only to check if a device is read-only 2021-01-24 18:15:57 -07:00
dm-thin-metadata.h dm thin metadata: Add support for a pre-commit callback 2019-12-05 17:05:24 -05:00
dm-thin.c writeback: remove bdi->congested_fn 2020-07-08 17:20:46 -06:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c dm: add support for REQ_NOWAIT to various targets 2020-12-04 18:04:35 -05:00
dm-verity-fec.c dm verity: fix FEC for RS roots unaligned to block size 2021-03-04 15:08:18 -05:00
dm-verity-fec.h
dm-verity-target.c dm verity: skip verity work if I/O error when system is shutting down 2020-12-21 13:57:25 -05:00
dm-verity-verify-sig.c dm verity: Add support for signature verification with 2nd keyring 2020-12-04 18:04:35 -05:00
dm-verity-verify-sig.h dm verity: Fix compilation warning 2020-08-04 15:48:13 -04:00
dm-verity.h dm verity: add "panic_on_corruption" error handling mode 2020-07-13 11:47:33 -04:00
dm-writecache.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
dm-zero.c dm: add support for REQ_NOWAIT to various targets 2020-12-04 18:04:35 -05:00
dm-zoned-metadata.c block: use an on-stack bio in blkdev_issue_flush 2021-01-27 09:51:48 -07:00
dm-zoned-reclaim.c dm zoned: Fix zone reclaim trigger 2020-07-08 12:21:53 -04:00
dm-zoned-target.c for-5.9/block-20200802 2020-08-03 11:57:03 -07:00
dm-zoned.h dm zoned: select reclaim zone based on device index 2020-06-05 14:59:53 -04:00
dm.c dm: fix deadlock when swapping to encrypted device 2021-02-11 09:45:28 -05:00
dm.h dm table: fix DAX iterate_devices based device capability checks 2021-02-09 08:45:30 -05:00
Kconfig dm crypt: support using trusted keys 2021-02-03 10:13:00 -05:00
Makefile dm: rename multipath path selector source files to have "dm-ps" prefix 2020-12-04 18:04:35 -05:00
md-autodetect.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
md-bitmap.c md/bitmap: fix memory leak of temporary bitmap 2020-10-08 22:37:39 -07:00
md-bitmap.h
md-cluster.c for-5.11/drivers-2020-12-14 2020-12-16 13:09:32 -08:00
md-cluster.h
md-faulty.c block: rename generic_make_request to submit_bio_noacct 2020-07-01 07:27:24 -06:00
md-linear.c block: store a block_device pointer in struct bio 2021-01-24 18:17:20 -07:00
md-linear.h md/raid1: Replace zero-length array with flexible-array 2020-05-13 12:02:23 -07:00
md-multipath.c writeback: remove bdi->congested_fn 2020-07-08 17:20:46 -06:00
md-multipath.h
md.c md: split mddev_find 2021-04-07 22:41:26 -07:00
md.h md: add md_submit_discard_bio() for submitting discard bio 2021-03-24 16:29:06 -07:00
raid0.c md: add md_submit_discard_bio() for submitting discard bio 2021-03-24 16:29:06 -07:00
raid0.h md/raid0: avoid RAID0 data corruption due to layout confusion. 2019-09-13 13:10:05 -07:00
raid1-10.c
raid1.c md: remove bio_alloc_mddev 2021-01-27 09:51:48 -07:00
raid1.h md/raid1: Replace zero-length array with flexible-array 2020-05-13 12:02:23 -07:00
raid5-cache.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
raid5-log.h
raid5-ppl.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
raid5.c for-5.12/drivers-2021-02-17 2021-02-21 11:06:54 -08:00
raid5.h md/raid5: let multiple devices of stripe_head share page 2020-09-24 16:44:44 -07:00
raid10.c md/raid10: improve discard request for far layout 2021-03-24 16:29:07 -07:00
raid10.h md/raid10: improve discard request for far layout 2021-03-24 16:29:07 -07:00