linux-stable/mm
Vlastimil Babka 8b44d2791f mm, compaction: periodically drop lock and restore IRQs in scanners
Compaction scanners regularly check for lock contention and need_resched()
through the compact_checklock_irqsave() function.  However, if there is no
contention, the lock can be held and IRQ disabled for potentially long
time.

This has been addressed by commit b2eef8c0d0 ("mm: compaction: minimise
the time IRQs are disabled while isolating pages for migration") for the
migration scanner.  However, the refactoring done by commit 2a1402aa04
("mm: compaction: acquire the zone->lru_lock as late as possible") has
changed the conditions so that the lock is dropped only when there's
contention on the lock or need_resched() is true.  Also, need_resched() is
checked only when the lock is already held.  The comment "give a chance to
irqs before checking need_resched" is therefore misleading, as IRQs remain
disabled when the check is done.

This patch restores the behavior intended by commit b2eef8c0d0 and also
tries to better balance and make more deterministic the time spent by
checking for contention vs the time the scanners might run between the
checks.  It also avoids situations where checking has not been done often
enough before.  The result should be avoiding both too frequent and too
infrequent contention checking, and especially the potentially
long-running scans with IRQs disabled and no checking of need_resched() or
for fatal signal pending, which can happen when many consecutive pages or
pageblocks fail the preliminary tests and do not reach the later call site
to compact_checklock_irqsave(), as explained below.

Before the patch:

In the migration scanner, compact_checklock_irqsave() was called each
loop, if reached.  If not reached, some lower-frequency checking could
still be done if the lock was already held, but this would not result in
aborting contended async compaction until reaching
compact_checklock_irqsave() or end of pageblock.  In the free scanner, it
was similar but completely without the periodical checking, so lock can be
potentially held until reaching the end of pageblock.

After the patch, in both scanners:

The periodical check is done as the first thing in the loop on each
SWAP_CLUSTER_MAX aligned pfn, using the new compact_unlock_should_abort()
function, which always unlocks the lock (if locked) and aborts async
compaction if scheduling is needed.  It also aborts any type of compaction
when a fatal signal is pending.

The compact_checklock_irqsave() function is replaced with a slightly
different compact_trylock_irqsave().  The biggest difference is that the
function is not called at all if the lock is already held.  The periodical
need_resched() checking is left solely to compact_unlock_should_abort().
The lock contention avoidance for async compaction is achieved by the
periodical unlock by compact_unlock_should_abort() and by using trylock in
compact_trylock_irqsave() and aborting when trylock fails.  Sync
compaction does not use trylock.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:54 -04:00
..
backing-dev.c arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
balloon_compaction.c
bootmem.c
cleancache.c
cma.c mm: cma: adjust address limit to avoid hitting low/high memory boundary 2014-10-09 22:25:53 -04:00
compaction.c mm, compaction: periodically drop lock and restore IRQs in scanners 2014-10-09 22:25:54 -04:00
debug-pagealloc.c
dmapool.c Fix unbalanced mutex in dma_pool_create(). 2014-09-18 10:39:16 -07:00
early_ioremap.c
fadvise.c
failslab.c
filemap.c NFS client updates for Linux 3.18 2014-10-08 12:49:23 -04:00
filemap_xip.c
fremap.c mm: mark remap_file_pages() syscall as deprecated 2014-06-06 16:08:17 -07:00
frontswap.c swap: change swap_list_head to plist, add swap_avail_head 2014-06-04 16:54:07 -07:00
gup.c kvm: Faults which trigger IO release the mmap_sem 2014-09-24 14:07:54 +02:00
highmem.c mm/highmem: make kmap cache coloring aware 2014-08-06 18:01:22 -07:00
huge_memory.c mm, THP: don't hold mmap_sem in khugepaged when allocating THP 2014-10-09 22:25:53 -04:00
hugetlb.c mm: fix potential infinite loop in dissolve_free_huge_pages() 2014-08-06 18:01:21 -07:00
hugetlb_cgroup.c hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked 2014-08-29 16:28:16 -07:00
hwpoison-inject.c mm/hwpoison-inject.c: remove unnecessary null test before debugfs_remove_recursive 2014-08-06 18:01:19 -07:00
init-mm.c
internal.h mm, compaction: khugepaged should not give up due to need_resched() 2014-10-09 22:25:54 -04:00
interval_tree.c
iov_iter.c fuse: honour max_read and max_write in direct_io mode 2014-09-26 21:16:51 -04:00
Kconfig mm/zpool: update zswap to use zpool 2014-08-06 18:01:23 -07:00
Kconfig.debug
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c mm: introduce kmemleak_update_trace() 2014-06-06 16:08:17 -07:00
ksm.c sched: Remove proliferation of wait_on_bit() action functions 2014-07-16 15:10:39 +02:00
list_lru.c
maccess.c
madvise.c mm: update the description for madvise_remove 2014-08-06 18:01:18 -07:00
Makefile mm: Support compiling out madvise and fadvise 2014-08-17 19:44:24 -05:00
memblock.c mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() 2014-09-10 15:42:12 -07:00
memcontrol.c mm: memcontrol: do not iterate uninitialized memcgs 2014-10-02 16:28:44 -07:00
memory-failure.c hwpoison: fix race with changing page during offlining 2014-08-06 18:01:19 -07:00
memory.c mm: softdirty: keep bit when zapping file pte 2014-09-26 08:10:35 -07:00
memory_hotplug.c memory-hotplug: add sysfs valid_zones attribute 2014-10-09 22:25:52 -04:00
mempolicy.c Merge branch 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-07-10 11:38:23 -07:00
mempool.c mm/mempool.c: update the kmemleak stack trace for mempool allocations 2014-06-06 16:08:17 -07:00
migrate.c mm: migrate: Close race between migration completion and mprotect 2014-10-02 11:57:18 -07:00
mincore.c
mlock.c mm: describe mmap_sem rules for __lock_page_or_retry() and callers 2014-08-06 18:01:20 -07:00
mm_init.c
mmap.c mm/mmap.c: whitespace fixes 2014-10-09 22:25:52 -04:00
mmu_context.c
mmu_notifier.c kvm: Fix page ageing bugs 2014-09-24 14:07:58 +02:00
mmzone.c
mprotect.c
mremap.c mm, thp: close race between mremap() and split_huge_page() 2014-05-11 17:55:48 +09:00
msync.c msync: fix incorrect fstart calculation 2014-07-03 09:21:53 -07:00
nobootmem.c mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() 2014-09-10 15:42:12 -07:00
nommu.c arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area 2014-08-08 15:57:27 -07:00
oom_kill.c mm, oom: remove unnecessary exit_state check 2014-08-06 18:01:21 -07:00
page-writeback.c mm, writeback: prevent race when calculating dirty limits 2014-08-06 18:01:21 -07:00
page_alloc.c mm, compaction: khugepaged should not give up due to need_resched() 2014-10-09 22:25:54 -04:00
page_cgroup.c
page_io.c fix __swap_writepage() compile failure on old gcc versions 2014-06-14 19:30:48 -05:00
page_isolation.c
pagewalk.c
percpu-km.c
percpu-vm.c percpu: perform tlb flush after pcpu_map_pages() failure 2014-08-15 16:06:10 -04:00
percpu.c percpu: free percpu allocation info for uniprocessor system 2014-08-16 08:59:02 -04:00
pgtable-generic.c mm: actually clear pmd_numa before invalidating 2014-08-29 16:28:15 -07:00
process_vm_access.c start adding the tag to iov_iter 2014-05-06 17:32:49 -04:00
quicklist.c
readahead.c mm/readahead.c: remove unused file_ra_state from count_history_pages 2014-08-06 18:01:15 -07:00
rmap.c kvm: Fix page ageing bugs 2014-09-24 14:07:58 +02:00
shmem.c shmem: fix nlink for rename overwrite directory 2014-09-26 21:16:42 -04:00
slab.c mm/slab: use percpu allocator for cpu cache 2014-10-09 22:25:51 -04:00
slab.h mm/slab: use percpu allocator for cpu cache 2014-10-09 22:25:51 -04:00
slab_common.c mm/slab_common: commonize slab merge logic 2014-10-09 22:25:51 -04:00
slob.c mm/sl[ao]b: always track caller in kmalloc_(node_)track_caller() 2014-10-09 22:25:50 -04:00
slub.c mm/slab_common: commonize slab merge logic 2014-10-09 22:25:51 -04:00
sparse-vmemmap.c
sparse.c
swap.c mm: memcontrol: use page lists for uncharge batching 2014-08-08 15:57:18 -07:00
swap_state.c mm: allow drivers to prevent new writable mappings 2014-08-08 15:57:31 -07:00
swapfile.c mm: memcontrol: rewrite uncharge API 2014-08-08 15:57:17 -07:00
truncate.c mm: memcontrol: rewrite uncharge API 2014-08-08 15:57:17 -07:00
util.c proc/maps: make vm_is_stack() logic namespace-friendly 2014-10-09 22:25:50 -04:00
vmacache.c mm,vmacache: optimize overflow system-wide flushing 2014-06-04 16:53:57 -07:00
vmalloc.c mm/vmalloc.c: clean up map_vm_area third argument 2014-08-06 18:01:19 -07:00
vmpressure.c
vmscan.c mm, compaction: defer each zone individually instead of preferred zone 2014-10-09 22:25:53 -04:00
vmstat.c mm: vmscan: only update per-cpu thresholds for online CPU 2014-08-06 18:01:20 -07:00
workingset.c
zbud.c mm/zpool: use prefixed module loading 2014-08-29 16:28:16 -07:00
zpool.c mm/zpool: use prefixed module loading 2014-08-29 16:28:16 -07:00
zsmalloc.c mm/zpool: use prefixed module loading 2014-08-29 16:28:16 -07:00
zswap.c mm/zswap.c: add __init to zswap_entry_cache_destroy() 2014-08-08 15:57:18 -07:00