linux-stable/block
Douglas Anderson 340a4da3c1 block, bfq: NULL out the bic when it's no longer valid
commit dbc3117d4c upstream.

In reboot tests on several devices we were seeing a "use after free"
when slub_debug or KASAN was enabled.  The kernel complained about:

  Unable to handle kernel paging request at virtual address 6b6b6c2b

...which is a classic sign of use after free under slub_debug.  The
stack crawl in kgdb looked like:

 0  test_bit (addr=<optimized out>, nr=<optimized out>)
 1  bfq_bfqq_busy (bfqq=<optimized out>)
 2  bfq_select_queue (bfqd=<optimized out>)
 3  __bfq_dispatch_request (hctx=<optimized out>)
 4  bfq_dispatch_request (hctx=<optimized out>)
 5  0xc056ef00 in blk_mq_do_dispatch_sched (hctx=0xed249440)
 6  0xc056f728 in blk_mq_sched_dispatch_requests (hctx=0xed249440)
 7  0xc0568d24 in __blk_mq_run_hw_queue (hctx=0xed249440)
 8  0xc0568d94 in blk_mq_run_work_fn (work=<optimized out>)
 9  0xc024c5c4 in process_one_work (worker=0xec6d4640, work=0xed249480)
 10 0xc024cff4 in worker_thread (__worker=0xec6d4640)

Digging in kgdb, it could be found that, though bfqq looked fine,
bfqq->bic had been freed.

Through further digging, I postulated that perhaps it is illegal to
access a "bic" (AKA an "icq") after bfq_exit_icq() had been called
because the "bic" can be freed at some point in time after this call
is made.  I confirmed that there certainly were cases where the exact
crashing code path would access the "bic" after bfq_exit_icq() had
been called.  Sspecifically I set the "bfqq->bic" to (void *)0x7 and
saw that the bic was 0x7 at the time of the crash.

To understand a bit more about why this crash was fairly uncommon (I
saw it only once in a few hundred reboots), you can see that much of
the time bfq_exit_icq_fbqq() fully frees the bfqq and thus it can't
access the ->bic anymore.  The only case it doesn't is if
bfq_put_queue() sees a reference still held.

However, even in the case when bfqq isn't freed, the crash is still
rare.  Why?  I tracked what happened to the "bic" after the exit
routine.  It doesn't get freed right away.  Rather,
put_io_context_active() eventually called put_io_context() which
queued up freeing on a workqueue.  The freeing then actually happened
later than that through call_rcu().  Despite all these delays, some
extra debugging showed that all the hoops could be jumped through in
time and the memory could be freed causing the original crash.  Phew!

To make a long story short, assuming it truly is illegal to access an
icq after the "exit_icq" callback is finished, this patch is needed.

Cc: stable@vger.kernel.org
Reviewed-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21 09:04:30 +02:00
..
partitions partitions/aix: fix usage of uninitialized lv_info and lvname structures 2018-09-19 22:43:44 +02:00
badblocks.c badblocks: fix wrong return value in badblocks_set if badblocks are disabled 2017-12-20 10:10:26 +01:00
bfq-cgroup.c block: bfq: swap puts in bfqg_and_blkg_put 2018-09-19 22:43:35 +02:00
bfq-iosched.c block, bfq: NULL out the bic when it's no longer valid 2019-07-21 09:04:30 +02:00
bfq-iosched.h Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
bfq-wf2q.c block, bfq: correctly charge and reset entity service in all cases 2018-11-13 11:14:55 -08:00
bio-integrity.c Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
bio.c block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs 2019-07-03 13:15:58 +02:00
blk-cgroup.c blkcg: init root blkcg_gq under lock 2018-06-21 04:02:44 +09:00
blk-core.c blk-mq: move cancel of requeue_work into blk_mq_release 2019-06-15 11:54:54 +02:00
blk-exec.c block: introduce new block status code type 2017-06-09 09:27:32 -06:00
blk-flush.c blk-mq: fix a hung issue when fsync 2019-02-20 10:20:44 +01:00
blk-integrity.c block: switch bios to blk_status_t 2017-06-09 09:27:32 -06:00
blk-ioc.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-lib.c block: fix infinite loop if the device loses discard capability 2018-12-29 13:39:07 +01:00
blk-map.c blk_rq_map_user_iov: fix error override 2018-02-25 11:07:49 +01:00
blk-merge.c blk-mq: fix discard merge with scheduler attached 2018-04-26 11:02:15 +02:00
blk-mq-cpumap.c blk-mq: don't keep offline CPUs mapped to hctx 0 2018-04-19 08:56:20 +02:00
blk-mq-debugfs.c blk-mq-debugfs: don't allow write on attributes with seq_operations set 2018-04-26 11:02:11 +02:00
blk-mq-debugfs.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-mq-pci.c blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL 2017-08-18 08:08:14 -06:00
blk-mq-rdma.c block: Add rdma affinity based queue mapping helper 2017-08-08 14:58:03 -04:00
blk-mq-sched.c blk-mq: only attempt to merge bio if there is rq in sw queue 2018-09-26 08:38:13 +02:00
blk-mq-sched.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-mq-sysfs.c blk-mq: untangle debugfs and sysfs 2017-05-04 08:24:13 -06:00
blk-mq-tag.c blk-mq: fix updating tags depth 2018-09-19 22:43:39 +02:00
blk-mq-tag.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-mq-virtio.c blk-mq: provide a default queue mapping for virtio device 2017-02-27 20:54:05 +02:00
blk-mq.c blk-mq: move cancel of requeue_work into blk_mq_release 2019-06-15 11:54:54 +02:00
blk-mq.h blk-mq: fix sysfs inflight counter 2018-06-21 04:02:49 +09:00
blk-settings.c block: allow max_discard_segments to be stacked 2018-09-26 08:38:00 +02:00
blk-softirq.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-stat.c blk-stat: don't use this_cpu_ptr() in a preemptable section 2017-05-10 07:40:18 -06:00
blk-stat.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-sysfs.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-tag.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-throttle.c block-throttle: avoid double charge 2017-12-29 17:53:47 +01:00
blk-timeout.c block: Fix a race between blk_cleanup_queue() and timeout handling 2017-11-30 08:40:52 +00:00
blk-wbt.c blk-wbt: account flush requests correctly 2018-02-22 15:42:29 +01:00
blk-wbt.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-zoned.c blkdev_report_zones_ioctl(): Use vmalloc() to allocate large buffers 2018-06-16 09:45:14 +02:00
blk.h block: drain queue before waiting for q_usage_counter becoming zero 2018-03-03 10:24:35 +01:00
bounce.c block: don't let passthrough IO go into .make_request_fn() 2018-01-02 20:31:05 +01:00
bsg-lib.c bsg-lib: fix use-after-free under memory-pressure 2017-10-04 08:35:04 -06:00
bsg.c bsg: remove #if 0'ed code 2017-08-29 10:50:30 -06:00
cfq-iosched.c cfq: Suppress compiler warnings about comparisons 2018-09-15 09:45:31 +02:00
cmdline-parser.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat_ioctl.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
deadline-iosched.c block, scheduler: convert xxx_var_store to void 2017-08-28 10:01:08 -06:00
elevator.c elevator: lookup mq vs non-mq elevators 2018-12-21 14:13:10 +01:00
genhd.c blk-mq: fix sysfs inflight counter 2018-06-21 04:02:49 +09:00
ioctl.c block: remove the discard_zeroes_data flag 2017-04-08 11:25:38 -06:00
ioprio.c block: Add fallthrough markers to switch statements 2017-06-21 11:46:07 -06:00
Kconfig License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig.iosched License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kyber-iosched.c block: kyber: fix domain token leak during requeue 2018-03-08 22:41:05 -08:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mq-deadline.c mq-deadline: Enable auto-loading when built as module 2017-08-29 10:47:23 -06:00
noop-iosched.c block: move existing elevator ops to union 2017-01-17 10:03:33 -07:00
opal_proto.h block: sed-opal: Set MBRDone on S3 resume path if TPER is MBREnabled 2017-09-11 09:45:52 -06:00
partition-generic.c blk-mq: fix sysfs inflight counter 2018-06-21 04:02:49 +09:00
scsi_ioctl.c block: Change argument type of scsi_req_init() 2017-06-20 19:27:14 -06:00
sed-opal.c block: sed-opal: fix IOC_OPAL_ENABLE_DISABLE_MBR 2019-05-31 06:47:29 -07:00
t10-pi.c t10-pi: Move opencoded contants to common header 2017-07-03 16:56:25 -06:00