linux-stable/Documentation
Mike Snitzer 0ce65797a7 dm: impose configurable deadline for dm_request_fn's merge heuristic
Otherwise, for sequential workloads, the dm_request_fn can allow
excessive request merging at the expense of increased service time.

Add a per-device sysfs attribute to allow the user to control how long a
request, that is a reasonable merge candidate, can be queued on the
request queue.  The resolution of this request dispatch deadline is in
microseconds (ranging from 1 to 100000 usecs), to set a 20us deadline:
  echo 20 > /sys/block/dm-7/dm/rq_based_seq_io_merge_deadline

The dm_request_fn's merge heuristic and associated extra accounting is
disabled by default (rq_based_seq_io_merge_deadline is 0).

This sysfs attribute is not applicable to bio-based DM devices so it
will only ever report 0 for them.

By allowing a request to remain on the queue it will block others
requests on the queue.  But introducing a short dequeue delay has proven
very effective at enabling certain sequential IO workloads on really
fast, yet IOPS constrained, devices to build up slightly larger IOs --
yielding 90+% throughput improvements.  Having precise control over the
time taken to wait for larger requests to build affords control beyond
that of waiting for certain IO sizes to accumulate (which would require
a deadline anyway).  This knob will only ever make sense with sequential
IO workloads and the particular value used is storage configuration
specific.

Given the expected niche use-case for when this knob is useful it has
been deemed acceptable to expose this relatively crude method for
crafting optimal IO on specific storage -- especially given the solution
is simple yet effective.  In the context of DM multipath, it is
advisable to tune this sysfs attribute to a value that offers the best
performance for the common case (e.g. if 4 paths are expected active,
tune for that; if paths fail then performance may be slightly reduced).

Alternatives were explored to have request-based DM autotune this value
(e.g. if/when paths fail) but they were quickly deemed too fragile and
complex to warrant further design and development time.  If this problem
proves more common as faster storage emerges we'll have to look at
elevating a generic solution into the block core.

Tested-by: Shiva Krishna Merla <shivakrishna.merla@netapp.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-04-15 12:10:15 -04:00
..
ABI dm: impose configurable deadline for dm_request_fn's merge heuristic 2015-04-15 12:10:15 -04:00
accounting
acpi
aoe
arm
arm64
auxdisplay
backlight
blackfin
block
blockdev
bus-devices
cdrom
cgroups mm: memcontrol: use "max" instead of "infinity" in control knobs 2015-02-28 09:57:51 -08:00
connector
console
cpu-freq
cpuidle
cris
crypto
development-process
device-mapper dm switch: fix Documentation to use plain text 2015-03-31 12:03:49 -04:00
devicetree ARM: SoC fixes for 4.0-rc3 2015-03-15 10:49:38 -07:00
dmaengine
DocBook
driver-model
dvb
early-userspace
EDID
extcon
fault-injection
fb
filesystems ocfs2: update web page + git tree in documentation 2015-02-28 09:57:50 -08:00
firmware_class
fmc
frv
gpio
hid
hwmon
i2c
i2o
ia64
ide
infiniband
input
ioctl
isdn
ja_JP
kbuild
kdump
ko_KR
laptops
leds
locking
m68k
memory-devices
metag
mic
mips
misc-devices
mmc
mn10300
mtd
namespaces
netlabel
networking
nfc
nios2
parisc
PCI
pcmcia
phy
platform
power genirq / PM: describe IRQF_COND_SUSPEND 2015-03-06 01:28:14 +01:00
powerpc
pps
prctl
pti
ptp
rapidio
RCU
s390
scheduler
scsi
security
serial
sh
sound
spi
sysctl
target
thermal
timers
tpm
trace
usb
vDSO
video4linux
virtual
vm
w1
watchdog
wimax
x86
xtensa
zh_CN
00-INDEX
applying-patches.txt
assoc_array.txt
atomic_ops.txt
bad_memory.txt
basic_profiling.txt
bcache.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
BUG-HUNTING
bus-virt-phys-mapping.txt
cachetlb.txt
Changes
circular-buffers.txt
clk.txt
coccinelle.txt
CodeOfConflict Code of Conflict 2015-02-27 11:44:24 -08:00
CodingStyle
cpu-hotplug.txt
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
devices.txt
digsig.txt
DMA-API-HOWTO.txt
DMA-API.txt
DMA-attributes.txt
dma-buf-sharing.txt
DMA-ISA-LPC.txt
dontdiff
dynamic-debug-howto.txt
edac.txt
efi-stub.txt
eisa.txt
email-clients.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcov.txt
gdb-kernel-debugging.txt
highuid.txt
HOWTO
hsi.txt
hw_random.txt
hwspinlock.txt
init.txt
initrd.txt
Intel-IOMMU.txt
intel_txt.txt
io-mapping.txt
io_ordering.txt
iostats.txt
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kasan.txt
kernel-doc-nano-HOWTO.txt
kernel-docs.txt
kernel-parameters.txt
kernel-per-CPU-kthreads.txt
kmemcheck.txt
kmemleak.txt
kobject.txt
kprobes.txt
kref.txt
kselftest.txt
ldm.txt
local_ops.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lzo.txt
magic-number.txt
mailbox.txt
Makefile
ManagementStyle
md.txt
media-framework.txt
memory-barriers.txt
memory-hotplug.txt
module-signing.txt
mono.txt
nommu-mmap.txt
numastat.txt
oops-tracing.txt
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt
pinctrl.txt
pnp.txt
preempt-locking.txt
printk-formats.txt
pwm.txt
ramoops.txt
rbtree.txt
remoteproc.txt
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt
SAK.txt
SecurityBugs
serial-console.txt
sgi-ioc4.txt
SM501.txt
smsc_ece1099.txt
sparse.txt
stable_api_nonsense.txt
stable_kernel_rules.txt
static-keys.txt
SubmitChecklist
SubmittingDrivers
SubmittingPatches
svga.txt
sysfs-rules.txt
sysrq.txt
this_cpu_ops.txt
unaligned-memory-access.txt
unicode.txt
unshare.txt
vfio.txt
VGA-softcursor.txt
vgaarbiter.txt
video-output.txt
vme_api.txt
volatile-considered-harmful.txt
workqueue.txt
xillybus.txt
xz.txt
zorro.txt