linux-stable/mm
Christian Ehrhardt 3fb5c298b0 swap: allow swap readahead to be merged
Swap readahead works fine, but the I/O to disk is almost always done in
page size requests, despite the fact that readahead submits
1<<page-cluster pages at a time.

On older kernels the old per device plugging behavior might have captured
this and merged the requests, but currently all comes down to much more
I/Os than required.

On a single device this might not be an issue, but as soon as a server
runs on shared san resources savin I/Os not only improves swapin
throughput but also provides a lower resource utilization.

With a load running KVM in a lot of memory overcommitment (the hot memory
is 1.5 times the host memory) swapping throughput improves significantly
and the lead feels more responsive as well as achieves more throughput.

In a test setup with 16 swap disks running blocktrace on one of those disks
shows the improved merging:
Prior:
Reads Queued:     560,888,    2,243MiB  Writes Queued:     226,242,  904,968KiB
Read Dispatches:  544,701,    2,243MiB  Write Dispatches:  159,318,  904,968KiB
Reads Requeued:         0               Writes Requeued:         0
Reads Completed:  544,716,    2,243MiB  Writes Completed:  159,321,  904,980KiB
Read Merges:       16,187,   64,748KiB  Write Merges:       61,744,  246,976KiB
IO unplugs:       149,614               Timer unplugs:       2,940

With the patch:
Reads Queued:     734,315,    2,937MiB  Writes Queued:     300,188,    1,200MiB
Read Dispatches:  214,972,    2,937MiB  Write Dispatches:  215,176,    1,200MiB
Reads Requeued:         0               Writes Requeued:         0
Reads Completed:  214,971,    2,937MiB  Writes Completed:  215,177,    1,200MiB
Read Merges:      519,343,    2,077MiB  Write Merges:       73,325,  293,300KiB
IO unplugs:       337,130               Timer unplugs:      11,184

I got ~10% to ~40% more throughput in my cases and at the same time much
lower cpu consumption when broken down per transferred kilobyte (the
majority of that due to saved interrupts and better cache handling).  In a
shared SAN others might get an additional benefit as well, because this
now causes less protocol overhead.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:39 -07:00
..
backing-dev.c block: Convert BDI proportion calculations to flexible proportions 2012-06-09 08:37:56 +09:00
bootmem.c bootmem: make ___alloc_bootmem_node_nopanic() really nopanic 2012-07-17 16:21:29 -07:00
bounce.c bounce: allow use of bounce pool via config option 2012-07-18 16:40:35 -04:00
cleancache.c ->encode_fh() API change 2012-05-29 23:28:33 -04:00
compaction.c mm, thp: abort compaction if migration page cannot be charged to memcg 2012-07-11 16:04:43 -07:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c mm: fix implicit stat.h usage in dmapool.c 2011-10-31 09:20:12 -04:00
fadvise.c fadvise: only initiate writeback for specified range with FADV_DONTNEED 2012-01-10 16:30:43 -08:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-06-01 10:34:35 -07:00
filemap_xip.c fs: introduce inode operation ->update_time 2012-06-01 12:07:25 -04:00
fremap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
frontswap.c mm/frontswap: cleanup doc and comment error 2012-07-23 11:16:20 -04:00
highmem.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
huge_memory.c mm/memcg: apply add/del_page to lruvec 2012-05-29 16:22:28 -07:00
hugetlb.c mm: fix vma_resv_map() NULL pointer 2012-05-30 08:48:13 -07:00
hwpoison-inject.c HWPOISON: Clean up memory_failure() vs. __memory_failure() 2012-01-03 12:06:32 -08:00
init-mm.c
internal.h Revert "mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks" 2012-06-03 20:05:57 -07:00
Kconfig Frontswap provides a "transcendent memory" interface for swap pages. 2012-06-04 12:28:45 -07:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c ksm: cleanup: introduce find_mergeable_vma() 2012-03-21 17:54:59 -07:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm: Hold a file reference in madvise_remove 2012-07-06 10:34:38 -07:00
Makefile Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-07-30 11:32:24 -07:00
memblock.c memblock: free allocated memblock_reserved_regions later 2012-07-11 16:04:50 -07:00
memcontrol.c memcg: remove MEM_CGROUP_CHARGE_TYPE_FORCE 2012-07-31 18:42:39 -07:00
memory-failure.c mm: fix wrong argument of migrate_huge_pages() in soft_offline_huge_page() 2012-07-30 17:25:11 -07:00
memory.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-07-26 13:17:17 -07:00
memory_hotplug.c mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails 2012-07-11 16:04:46 -07:00
mempolicy.c Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-07-30 11:32:24 -07:00
mempool.c mempool: fix first round failure behavior 2012-01-10 16:30:45 -08:00
migrate.c mm: fix warning in __set_page_dirty_nobuffers 2012-06-03 20:05:47 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-06-01 10:34:35 -07:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmzone.c mm: add link from struct lruvec to struct zone 2012-05-29 16:22:26 -07:00
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c move security_mmap_addr() to saner place 2012-06-01 10:37:16 -04:00
msync.c
nobootmem.c memblock: free allocated memblock_reserved_regions later 2012-07-11 16:04:50 -07:00
nommu.c nommu: fix compilation of nommu.c 2012-06-04 17:17:31 -04:00
oom_kill.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
page-writeback.c writeback: Fix some comment errors 2012-06-09 19:54:47 +08:00
page_alloc.c mm: cma: don't replace lowmem pages with highmem 2012-07-06 11:07:04 +02:00
page_cgroup.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
page_io.c frontswap: s/put_page/store/g s/get_page/load 2012-05-15 11:34:08 -04:00
page_isolation.c mm: page_isolation: MIGRATE_CMA isolation functions added 2012-05-21 15:09:33 +02:00
pagewalk.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu-km.c
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c kmemleak: Fix the kmemleak tracking of the percpu areas with !SMP 2012-05-09 10:13:29 -07:00
pgtable-generic.c arch/tile: allow building Linux with transparent huge pages enabled 2012-05-25 12:48:21 -04:00
prio_tree.c
process_vm_access.c aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() 2012-05-31 17:49:32 -07:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c mm: move readahead syscall to mm/readahead.c 2012-05-29 16:22:23 -07:00
rmap.c mm: remove swap token code 2012-05-29 16:22:19 -07:00
shmem.c don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
slab.c mm, sl[aou]b: Move kmem_cache_create mutex handling to common code 2012-07-09 12:13:42 +03:00
slab.h mm, sl[aou]b: Use a common mutex definition 2012-07-09 12:13:41 +03:00
slab_common.c mm: Fix build warning in kmem_cache_create() 2012-07-30 13:15:40 +03:00
slob.c slob: Fix early boot kernel crash 2012-07-12 10:13:22 +03:00
slub.c mm, slub: ensure irqs are enabled for kmemcheck 2012-07-10 22:43:52 +03:00
sparse-vmemmap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
sparse.c mm: sparse: fix usemap allocation above node descriptor section 2012-07-11 16:04:49 -07:00
swap.c mm/memcg: apply add/del_page to lruvec 2012-05-29 16:22:28 -07:00
swap_state.c swap: allow swap readahead to be merged 2012-07-31 18:42:39 -07:00
swapfile.c swap: fix shmem swapping when more than 8 areas 2012-06-15 21:48:14 -07:00
truncate.c mm/fs: remove truncate_range 2012-05-29 16:22:23 -07:00
util.c new helper: vm_mmap_pgoff() 2012-06-01 10:37:18 -04:00
vmalloc.c mm: make vb_alloc() more foolproof 2012-07-31 18:42:39 -07:00
vmscan.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-07-24 13:34:56 -07:00
vmstat.c mm/vmstat.c: remove debug fs entries on failure of file creation and made extfrag_debug_root dentry local 2012-05-29 16:22:19 -07:00