linux-stable/block
Wanpeng Li 8d7f1fde9d block/mq: fix potential deadlock during cpu hotplug
commit 51d638b1f5 upstream.

This can be triggered by hot-unplug one cpu.

======================================================
 [ INFO: possible circular locking dependency detected ]
 4.11.0+ #17 Not tainted
 -------------------------------------------------------
 step_after_susp/2640 is trying to acquire lock:
  (all_q_mutex){+.+...}, at: [<ffffffffb33f95b8>] blk_mq_queue_reinit_work+0x18/0x110

 but task is already holding lock:
  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (cpu_hotplug.lock){+.+.+.}:
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        get_online_cpus+0x64/0x80
        blk_mq_init_allocated_queue+0x3a0/0x4e0
        blk_mq_init_queue+0x3a/0x60
        loop_add+0xe5/0x280
        loop_init+0x124/0x177
        do_one_initcall+0x53/0x1c0
        kernel_init_freeable+0x1e3/0x27f
        kernel_init+0xe/0x100
        ret_from_fork+0x31/0x40

 -> #0 (all_q_mutex){+.+...}:
        __lock_acquire+0x189a/0x18a0
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        blk_mq_queue_reinit_work+0x18/0x110
        blk_mq_queue_reinit_dead+0x1c/0x20
        cpuhp_invoke_callback+0x1f2/0x810
        cpuhp_down_callbacks+0x42/0x80
        _cpu_down+0xb2/0xe0
        freeze_secondary_cpus+0xb6/0x390
        suspend_devices_and_enter+0x3b3/0xa40
        pm_suspend+0x129/0x490
        state_store+0x82/0xf0
        kobj_attr_store+0xf/0x20
        sysfs_kf_write+0x45/0x60
        kernfs_fop_write+0x135/0x1c0
        __vfs_write+0x37/0x160
        vfs_write+0xcd/0x1d0
        SyS_write+0x58/0xc0
        do_syscall_64+0x8f/0x710
        return_from_SYSCALL_64+0x0/0x7a

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(cpu_hotplug.lock);
                                lock(all_q_mutex);
                                lock(cpu_hotplug.lock);
   lock(all_q_mutex);

  *** DEADLOCK ***

 8 locks held by step_after_susp/2640:
  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffffb3244aed>] vfs_write+0x1ad/0x1d0
  #1:  (&of->mutex){+.+.+.}, at: [<ffffffffb32d3a51>] kernfs_fop_write+0x101/0x1c0
  #2:  (s_active#166){.+.+.+}, at: [<ffffffffb32d3a59>] kernfs_fop_write+0x109/0x1c0
  #3:  (pm_mutex){+.+...}, at: [<ffffffffb30d2ecd>] pm_suspend+0x21d/0x490
  #4:  (acpi_scan_lock){+.+.+.}, at: [<ffffffffb34dc3d7>] acpi_scan_lock_acquire+0x17/0x20
  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffffb306d6d7>] freeze_secondary_cpus+0x27/0x390
  #6:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffffb306cfd5>] cpu_hotplug_begin+0x5/0xe0
  #7:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 stack backtrace:
 CPU: 3 PID: 2640 Comm: step_after_susp Not tainted 4.11.0+ #17
 Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS 1.4.9 09/12/2016
 Call Trace:
  dump_stack+0x99/0xce
  print_circular_bug+0x1fa/0x270
  __lock_acquire+0x189a/0x18a0
  lock_acquire+0x11c/0x230
  ? lock_acquire+0x11c/0x230
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? blk_mq_queue_reinit_work+0x18/0x110
  __mutex_lock+0x92/0x990
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? kmem_cache_free+0x2cb/0x330
  ? anon_transport_class_unregister+0x20/0x20
  ? blk_mq_queue_reinit_work+0x110/0x110
  mutex_lock_nested+0x1b/0x20
  ? mutex_lock_nested+0x1b/0x20
  blk_mq_queue_reinit_work+0x18/0x110
  blk_mq_queue_reinit_dead+0x1c/0x20
  cpuhp_invoke_callback+0x1f2/0x810
  ? __flow_cache_shrink+0x160/0x160
  cpuhp_down_callbacks+0x42/0x80
  _cpu_down+0xb2/0xe0
  freeze_secondary_cpus+0xb6/0x390
  suspend_devices_and_enter+0x3b3/0xa40
  ? rcu_read_lock_sched_held+0x79/0x80
  pm_suspend+0x129/0x490
  state_store+0x82/0xf0
  kobj_attr_store+0xf/0x20
  sysfs_kf_write+0x45/0x60
  kernfs_fop_write+0x135/0x1c0
  __vfs_write+0x37/0x160
  ? rcu_read_lock_sched_held+0x79/0x80
  ? rcu_sync_lockdep_assert+0x2f/0x60
  ? __sb_start_write+0xd9/0x1c0
  ? vfs_write+0x1ad/0x1d0
  vfs_write+0xcd/0x1d0
  SyS_write+0x58/0xc0
  ? rcu_read_lock_sched_held+0x79/0x80
  do_syscall_64+0x8f/0x710
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL64_slow_path+0x25/0x25

The cpu hotplug path will hold cpu_hotplug.lock and then reinit all exiting
queues for blk mq w/ all_q_mutex, however, blk_mq_init_allocated_queue() will
contend these two locks in the inversion order. This is due to commit eabe06595d
(blk/mq: Cure cpu hotplug lock inversion), it fixes a cpu hotplug lock inversion
issue because of hotplug rework, however the hotplug rework is still work-in-progress
and lives in a -tip branch and mainline cannot yet trigger that splat. The commit
breaks the linus's tree in the merge window, so this patch reverts the lock order
and avoids to splat linus's tree.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Cc: Thierry Escande <thierry.escande@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24 09:34:18 +02:00
..
partitions partitions/msdos: Unable to mount UFS 44bsd partitions 2018-04-08 12:12:42 +02:00
badblocks.c badblocks: fix wrong return value in badblocks_set if badblocks are disabled 2017-12-20 10:07:29 +01:00
bio-integrity.c bio-integrity: Do not allocate integrity context for bio w/o data 2018-04-13 19:48:18 +02:00
bio.c Fix slab name "biovec-(1<<(21-12))" 2018-04-08 12:13:00 +02:00
blk-cgroup.c blkcg: fix double free of new_blkg in blkcg_init_queue 2018-03-22 09:17:35 +01:00
blk-core.c block: wake up all tasks blocked in get_request() 2017-12-14 09:28:23 +01:00
blk-exec.c block: Fix spelling in a source code comment 2016-07-20 21:28:22 -06:00
blk-flush.c block: flush: fix IO hang in case of flood fua req 2016-10-26 07:49:27 -06:00
blk-integrity.c block: fix blk_integrity_register to use template's interval_exp if not 0 2017-05-20 14:28:36 +02:00
blk-ioc.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
blk-lib.c block: require write_same and discard requests align to logical block size 2016-10-11 15:06:30 -07:00
blk-map.c blk_rq_map_user_iov: fix error override 2018-02-25 11:05:42 +01:00
blk-merge.c block: make sure a big bio is split into at most 256 bvecs 2016-08-24 08:17:24 -06:00
blk-mq-cpumap.c blk-mq: allow the driver to pass in a queue mapping 2016-09-15 08:42:03 -06:00
blk-mq-pci.c blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL 2017-08-24 17:12:20 -07:00
blk-mq-sysfs.c blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue() 2017-12-14 09:28:21 +01:00
blk-mq-tag.c blk-mq: Fix tagset reinit in the presence of cpu hot-unplug 2017-12-20 10:07:20 +01:00
blk-mq-tag.h Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block 2016-10-09 17:29:33 -07:00
blk-mq.c block/mq: fix potential deadlock during cpu hotplug 2018-04-24 09:34:18 +02:00
blk-mq.h blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue() 2017-12-14 09:28:21 +01:00
blk-settings.c block: kill off q->flush_flags 2016-04-13 13:33:19 -06:00
blk-softirq.c This adds a new gcc plugin named "latent_entropy". It is designed to 2016-10-15 10:03:15 -07:00
blk-sysfs.c blk-mq: register device instead of disk 2016-09-21 07:56:16 -06:00
blk-tag.c
blk-throttle.c blk-throttle: make sure expire time isn't too big 2018-03-22 09:17:44 +01:00
blk-timeout.c block: Fix a race between blk_cleanup_queue() and timeout handling 2017-11-30 08:39:06 +00:00
blk.h blk-mq: remove ->map_queue 2016-09-15 08:42:03 -06:00
bounce.c Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2015-09-19 18:57:09 -07:00
bsg-lib.c Revert "bsg-lib: don't free job in bsg_prepare_job" 2017-10-21 17:21:33 +02:00
bsg.c sg_write()/bsg_write() is not fit to be called under KERNEL_DS 2017-01-09 08:32:25 +01:00
cfq-iosched.c cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode 2017-06-14 15:05:58 +02:00
cmdline-parser.c
compat_ioctl.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
deadline-iosched.c block: do not merge requests without consulting with io scheduler 2016-07-20 21:35:12 -06:00
elevator.c block: Fix secure erase 2016-08-16 09:16:51 -06:00
genhd.c block: fix bdi vs gendisk lifetime mismatch 2016-08-04 14:19:16 -06:00
ioctl.c block: invalidate the page cache when issuing BLKZEROOUT 2016-10-11 15:06:30 -07:00
ioprio.c block: fix use-after-free in sys_ioprio_get() 2016-07-01 08:39:24 -06:00
Kconfig Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block 2016-10-09 17:29:33 -07:00
Kconfig.iosched
Makefile Merge branch 'for-4.9/block-smp' of git://git.kernel.dk/linux-block 2016-10-09 17:32:20 -07:00
noop-iosched.c elevator: use list_{first,prev,next}_entry 2015-11-16 15:21:48 -07:00
partition-generic.c block: fix an error code in add_partition() 2018-04-13 19:48:04 +02:00
scsi_ioctl.c block: allow WRITE_SAME commands with the SG_IO ioctl 2017-03-22 12:43:38 +01:00
t10-pi.c block: Consolidate static integrity profile properties 2015-10-21 14:42:38 -06:00