linux-stable/drivers
Mikulas Patocka b21555786f dm snapshot: rework COW throttling to fix deadlock
Commit 721b1d98fb ("dm snapshot: Fix excessive memory usage and
workqueue stalls") introduced a semaphore to limit the maximum number of
in-flight kcopyd (COW) jobs.

The implementation of this throttling mechanism is prone to a deadlock:

1. One or more threads write to the origin device causing COW, which is
   performed by kcopyd.

2. At some point some of these threads might reach the s->cow_count
   semaphore limit and block in down(&s->cow_count), holding a read lock
   on _origins_lock.

3. Someone tries to acquire a write lock on _origins_lock, e.g.,
   snapshot_ctr(), which blocks because the threads at step (2) already
   hold a read lock on it.

4. A COW operation completes and kcopyd runs dm-snapshot's completion
   callback, which ends up calling pending_complete().
   pending_complete() tries to resubmit any deferred origin bios. This
   requires acquiring a read lock on _origins_lock, which blocks.

   This happens because the read-write semaphore implementation gives
   priority to writers, meaning that as soon as a writer tries to enter
   the critical section, no readers will be allowed in, until all
   writers have completed their work.

   So, pending_complete() waits for the writer at step (3) to acquire
   and release the lock. This writer waits for the readers at step (2)
   to release the read lock and those readers wait for
   pending_complete() (the kcopyd thread) to signal the s->cow_count
   semaphore: DEADLOCK.

The above was thoroughly analyzed and documented by Nikos Tsironis as
part of his initial proposal for fixing this deadlock, see:
https://www.redhat.com/archives/dm-devel/2019-October/msg00001.html

Fix this deadlock by reworking COW throttling so that it waits without
holding any locks. Add a variable 'in_progress' that counts how many
kcopyd jobs are running. A function wait_for_in_progress() will sleep if
'in_progress' is over the limit. It drops _origins_lock in order to
avoid the deadlock.

Reported-by: Guruswamy Basavaiah <guru2018@gmail.com>
Reported-by: Nikos Tsironis <ntsironis@arrikto.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Tested-by: Nikos Tsironis <ntsironis@arrikto.com>
Fixes: 721b1d98fb ("dm snapshot: Fix excessive memory usage and workqueue stalls")
Cc: stable@vger.kernel.org # v5.0+
Depends-on: 4a3f111a73a8c ("dm snapshot: introduce account_start_copy() and account_end_copy()")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-10-10 09:46:05 -04:00
..
accessibility
acpi Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
amba ARM updates for 5.4-rc1: 2019-09-22 09:39:09 -07:00
android
ata ata: libahci_platform: Add of_node_put() before loop exit 2019-09-19 12:21:44 -06:00
atm atm: he: clean up an indentation issue 2019-09-25 13:54:45 +02:00
auxdisplay It's a somewhat calmer cycle for docs this time, as the churn of the mass 2019-09-17 16:22:26 -07:00
base mm,thp: stats for file backed THP 2019-09-24 15:54:11 -07:00
bcma bcma: make arrays pwr_info_offset and sprom_sizes static const, shrinks object size 2019-09-13 16:44:49 +03:00
block for-linus-2019-10-03 2019-10-04 09:56:51 -07:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
bus ARM: SoC fixes 2019-09-30 10:04:28 -07:00
cdrom
char char/random: Add a newline at the end of the file 2019-10-02 13:49:43 -07:00
clk Fixes for omaps for v5.4-rc cycle 2019-10-03 09:15:19 -07:00
clocksource timer-of: don't use conditional expression with mixed 'void' types 2019-10-02 16:16:07 -07:00
connector
counter
cpufreq Power management updates for 5.4-rc1 2019-09-17 19:15:14 -07:00
cpuidle Power management updates for 5.4-rc1 2019-09-17 19:15:14 -07:00
crypto Merge branch 'akpm' (patches from Andrew) 2019-09-24 16:10:23 -07:00
dax
dca
devfreq
dio
dma Main MIPS changes for v5.4: 2019-09-22 09:30:30 -07:00
dma-buf
edac ARM updates for 5.4-rc1: 2019-09-22 09:39:09 -07:00
eisa
extcon chrome platform changes for v5.4 2019-09-19 14:14:28 -07:00
firewire
firmware ARM: SoC fixes 2019-09-30 10:04:28 -07:00
fpga Char/Misc driver patches for 5.4-rc1 2019-09-18 11:14:31 -07:00
fsi
gnss
gpio pwm: Changes for v5.4-rc1 2019-09-27 12:19:47 -07:00
gpu - Fix DP-MST crtc_mask 2019-10-04 16:31:06 +10:00
greybus
hid - First round of vmbus hibernation support from Dexuan Cui. 2019-09-24 12:36:31 -07:00
hsi HSI changes for the 5.4 series 2019-09-22 12:02:21 -07:00
hv - First round of vmbus hibernation support from Dexuan Cui. 2019-09-24 12:36:31 -07:00
hwmon ARM SCMI fixes for v5.4 2019-09-29 11:20:41 -07:00
hwspinlock
hwtracing Char/Misc driver patches for 5.4-rc1 2019-09-18 11:14:31 -07:00
i2c i2c: slave-eeprom: Add read only mode 2019-09-28 20:44:12 +02:00
i3c
ide
idle
iio chrome platform changes for v5.4 2019-09-19 14:14:28 -07:00
infiniband Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-28 17:47:33 -07:00
input chrome platform changes for v5.4 2019-09-19 14:14:28 -07:00
interconnect
iommu IOMMU Fixes for Linux v5.4-rc1 2019-09-29 10:00:14 -07:00
ipack
irqchip Main MIPS changes for v5.4: 2019-09-22 09:30:30 -07:00
isdn mISDN: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
leds leds: lm3532: Fix optional led-max-microamp prop error handling 2019-09-12 20:45:52 +02:00
lightnvm lightnvm: print error when target is not found 2019-09-05 13:17:01 -06:00
macintosh
mailbox mailbox: qcom-apcs: fix max_register value 2019-09-17 00:54:29 -05:00
mcb
md dm snapshot: rework COW throttling to fix deadlock 2019-10-10 09:46:05 -04:00
media media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get 2019-09-25 17:51:41 -07:00
memory
memstick ms_block: fix spelling mistake "randomally" -> "randomly" 2019-09-11 16:11:01 +02:00
message
mfd Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal 2019-09-29 10:24:23 -07:00
misc Merge branch 'i2c/for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2019-09-24 16:48:02 -07:00
mmc mmc: host: sdhci-pci: Add Genesys Logic GL975x support 2019-09-27 20:48:20 +02:00
mtd Main MIPS changes for v5.4: 2019-09-22 09:30:30 -07:00
mux
net net: qlogic: Fix memory leak in ql_alloc_large_buffers 2019-10-04 18:33:13 -07:00
nfc NFC: st95hf: clean up indentation issue 2019-09-27 20:31:18 +02:00
ntb NTB: fix IDT Kconfig typos/spellos 2019-09-23 17:20:40 -04:00
nubus
nvdimm libnvdimm fixes v5.4-rc1 2019-09-29 10:33:41 -07:00
nvme Merge branch 'nvme-5.4' of git://git.infradead.org/nvme into for-linus 2019-09-27 13:17:37 -06:00
nvmem Char/Misc driver patches for 5.4-rc1 2019-09-18 11:14:31 -07:00
of Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-28 17:47:33 -07:00
opp
oprofile
parisc dma-mapping updates for 5.4: 2019-09-19 13:27:23 -07:00
parport Char/Misc driver patches for 5.4-rc1 2019-09-18 11:14:31 -07:00
pci Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
pcmcia Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
perf
phy pci-v5.4-changes 2019-09-23 19:16:01 -07:00
pinctrl This is the bulk of pin control changes for the v5.4 kernel 2019-09-19 14:19:33 -07:00
platform platform-drivers-x86 for v5.4-2 2019-09-24 12:39:40 -07:00
pnp
power power supply and reset changes for the v5.4 series 2019-09-22 12:04:59 -07:00
powercap Power management updates for 5.4-rc1 2019-09-17 19:15:14 -07:00
pps
ps3
ptp ptp_qoriq: Initialize the registers' spinlock before calling ptp_qoriq_settime 2019-10-02 12:20:38 -04:00
pwm pwm: Changes for v5.4-rc1 2019-09-27 12:19:47 -07:00
rapidio
ras
regulator LED updates for 5.4-rc1 2019-09-17 18:40:42 -07:00
remoteproc remoteproc updates for v5.4 2019-09-22 10:55:08 -07:00
reset ARM: SoC fixes 2019-09-30 10:04:28 -07:00
rpmsg rpmsg: glink-smem: Name the edge based on parent remoteproc 2019-09-17 15:33:31 -07:00
rtc RTC for 5.4 2019-09-22 11:05:43 -07:00
s390 s390 updates for 5.4-rc2 2019-10-05 08:44:02 -07:00
sbus
scsi SCSI fixes on 20191004 2019-10-05 12:53:27 -07:00
sfi
sh
siox
slimbus
soc ARM: SoC driver updates for v5.4 2019-09-16 15:52:38 -07:00
soundwire soundwire updates for v5.4-rc1 2019-09-22 10:52:23 -07:00
spi LED updates for 5.4-rc1 2019-09-17 18:40:42 -07:00
spmi
ssb ssb: make array pwr_info_offset static const, makes object smaller 2019-09-13 17:23:18 +03:00
staging netfilter: drop bridge nf reset from nf_reset 2019-10-01 18:42:15 +02:00
target mm: introduce page_size() 2019-09-24 15:54:08 -07:00
tc
tee tee/shm: untag user pointers in tee_shm_register 2019-09-25 17:51:41 -07:00
thermal Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal 2019-09-29 10:24:23 -07:00
thunderbolt
tty Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
uio Char/Misc driver patches for 5.4-rc1 2019-09-18 11:14:31 -07:00
usb Merge branch 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-09-24 12:33:34 -07:00
vfio vfio/type1: untag user pointers in vaddr_get_pfn 2019-09-25 17:51:41 -07:00
vhost Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
video video/logo: do not generate unneeded logo C files 2019-10-05 15:29:49 +09:00
virt
virtio virtio_ring: fix unmap of indirect descriptors 2019-09-09 10:43:15 -04:00
visorbus
vlynq
vme
w1
watchdog linux-watchdog 5.4-rc1 tag 2019-09-27 11:17:38 -07:00
xen xen: fixes and cleanups for 5.4-rc2 2019-10-04 11:13:09 -07:00
zorro
Kconfig Staging/IIO driver patches for 5.4-rc1 2019-09-18 11:05:34 -07:00
Makefile Staging/IIO driver patches for 5.4-rc1 2019-09-18 11:05:34 -07:00