linux-stable/mm
Andrea Arcangeli 1c2fb7a4c2 ksm: fix deadlock with munlock in exit_mmap
Rawhide users have reported hang at startup when cryptsetup is run: the
same problem can be simply reproduced by running a program int main() {
mlockall(MCL_CURRENT | MCL_FUTURE); return 0; }

The problem is that exit_mmap() applies munlock_vma_pages_all() to
clean up VM_LOCKED areas, and its current implementation (stupidly)
tries to fault in absent pages, for example where PROT_NONE prevented
them being faulted in when mlocking.  Whereas the "ksm: fix oom
deadlock" patch, knowing there's a race by which KSM might try to fault
in pages after exit_mmap() had finally zapped the range, backs out of
such faults doing nothing when its ksm_test_exit() notices mm_users 0.

So revert that part of "ksm: fix oom deadlock" which moved the
ksm_exit() call from before exit_mmap() to the middle of exit_mmap();
and remove those ksm_test_exit() checks from the page fault paths, so
allowing the munlocking to proceed without interference.

ksm_exit, if there are rmap_items still chained on this mm slot, takes
mmap_sem write side: so preventing KSM from working on an mm while
exit_mmap runs.  And KSM will bail out as soon as it notices that
mm_users is already zero, thanks to its internal ksm_test_exit checks.
So that when a task is killed by OOM killer or the user, KSM will not
indefinitely prevent it from running exit_mmap to release its memory.

This does break a part of what "ksm: fix oom deadlock" was trying to
achieve.  When unmerging KSM (echo 2 >/sys/kernel/mm/ksm), and even
when ksmd itself has to cancel a KSM page, it is possible that the
first OOM-kill victim would be the KSM process being faulted: then its
memory won't be freed until a second victim has been selected (freeing
memory for the unmerging fault to complete).

But the OOM killer is already liable to kill a second victim once the
intended victim's p->mm goes to NULL: so there's not much point in
rejecting this KSM patch before fixing that OOM behaviour.  It is very
much more important to allow KSM users to boot up, than to haggle over
an unlikely and poorly supported OOM case.

We also intend to fix munlocking to not fault pages: at which point
this patch _could_ be reverted; though that would be controversial, so
we hope to find a better solution.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Justin M. Forbes <jforbes@redhat.com>
Acked-for-now-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:32 -07:00
..
allocpercpu.c percpu: use dynamic percpu allocator as the default percpu allocator 2009-06-24 15:13:35 +09:00
backing-dev.c writeback: splice dirty inode entries to default bdi on bdi_destroy() 2009-09-16 15:18:52 +02:00
bootmem.c kmemleak: Do not report alloc_bootmem blocks as leaks 2009-08-27 14:29:17 +01:00
bounce.c block: remove some includings of blktrace_api.h 2009-06-16 11:19:36 +02:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c dmapools: protect page_list walk in show_pools() 2009-06-30 18:56:00 -07:00
fadvise.c readahead: move max_sane_readahead() calls into force_page_cache_readahead() 2009-06-16 19:47:28 -07:00
failslab.c kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c 2009-04-03 12:23:01 +02:00
filemap.c mm: oom analysis: add shmem vmstat 2009-09-22 07:17:27 -07:00
filemap_xip.c mm: do_xip_mapping_read: fix length calculation 2009-04-02 19:04:49 -07:00
fremap.c Do not account for the address space used by hugetlbfs using VM_ACCOUNT 2009-02-10 10:48:42 -08:00
highmem.c block: remove some includings of blktrace_api.h 2009-06-16 11:19:36 +02:00
hugetlb.c hugetlb: restore interleaving of bootmem huge pages 2009-09-22 07:17:26 -07:00
init-mm.c mm: consolidate init_mm definition 2009-06-16 19:47:28 -07:00
internal.h vmscan: do not unconditionally treat zones that fail zone_reclaim() as full 2009-06-16 19:47:45 -07:00
Kconfig ksm: the mm interface to ksm 2009-09-22 07:17:31 -07:00
Kconfig.debug kmemcheck: enable in the x86 Kconfig 2009-06-15 15:49:15 +02:00
kmemcheck.c kmemcheck: add hooks for the page allocator 2009-06-15 15:48:33 +02:00
kmemleak-test.c percpu: clean up percpu variable definitions 2009-06-24 15:13:48 +09:00
kmemleak.c kmemleak: Improve the "Early log buffer exceeded" error message 2009-09-11 10:42:09 +01:00
ksm.c ksm: fix deadlock with munlock in exit_mmap 2009-09-22 07:17:32 -07:00
maccess.c [S390] maccess: add weak attribute to probe_kernel_write 2009-06-12 10:27:37 +02:00
madvise.c ksm: the mm interface to ksm 2009-09-22 07:17:31 -07:00
Makefile ksm: the mm interface to ksm 2009-09-22 07:17:31 -07:00
memcontrol.c cgroup avoid permanent sleep at rmdir 2009-07-29 19:10:35 -07:00
memory.c ksm: fix deadlock with munlock in exit_mmap 2009-09-22 07:17:32 -07:00
memory_hotplug.c memory hotplug: update zone pcp at memory online 2009-09-22 07:17:25 -07:00
mempolicy.c mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware 2009-08-07 10:39:55 -07:00
mempool.c mempool.c: clean up type-casting 2009-08-10 08:31:16 -07:00
migrate.c mm: vmstat: add isolate pages 2009-09-22 07:17:29 -07:00
mincore.c [CVE-2009-0029] System call wrappers part 14 2009-01-14 14:15:24 +01:00
mlock.c mm: remove CONFIG_UNEVICTABLE_LRU config option 2009-06-16 19:47:42 -07:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c ksm: fix deadlock with munlock in exit_mmap 2009-09-22 07:17:32 -07:00
mmu_notifier.c ksm: add mmu_notifier set_pte_at_notify() 2009-09-22 07:17:31 -07:00
mmzone.c [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2 2009-05-18 11:22:24 +01:00
mprotect.c perf: Do the big rename: Performance Counters -> Performance Events 2009-09-21 14:28:04 +02:00
mremap.c ksm: prevent mremap move poisoning 2009-09-22 07:17:31 -07:00
msync.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00
nommu.c nommu: fix error handling in do_mmap_pgoff() 2009-09-05 11:30:42 -07:00
oom_kill.c mm: revert "oom: move oom_adj value" 2009-08-18 16:31:13 -07:00
page-writeback.c mm: count only reclaimable lru pages 2009-09-22 07:17:30 -07:00
page_alloc.c mm: perform non-atomic test-clear of PG_mlocked on free 2009-09-22 07:17:30 -07:00
page_cgroup.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
page_io.c mm: remove file argument from swap_readpage() 2009-06-16 19:47:44 -07:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
percpu.c Merge branch 'for-next' into for-linus 2009-09-15 09:57:19 +09:00
prio_tree.c
quicklist.c percpu: cleanup percpu array definitions 2009-06-24 15:13:45 +09:00
readahead.c readahead: introduce context readahead algorithm 2009-06-16 19:47:30 -07:00
rmap.c ksm: no debug in page_dup_rmap() 2009-09-22 07:17:31 -07:00
shmem.c Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev 2009-09-15 09:50:49 -07:00
shmem_acl.c shmfs: use 'check_acl' instead of 'permission' 2009-09-08 11:08:46 -07:00
slab.c SLAB: Fix lockdep annotations 2009-06-29 09:57:10 +03:00
slob.c slab: remove duplicate kmem_cache_init_late() declarations 2009-08-06 11:36:25 +03:00
slub.c slub: Fix build error in kmem_cache_open() with !CONFIG_SLUB_DEBUG 2009-09-15 22:32:10 +03:00
sparse-vmemmap.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
sparse.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
swap.c mm: fix Committed_AS underflow on large NR_CPUS environment 2009-05-02 15:36:10 -07:00
swap_state.c writeback: add name to backing_dev_info 2009-09-11 09:20:26 +02:00
swapfile.c block: use blkdev_issue_discard in blk_ioctl_discard 2009-09-14 08:24:53 +02:00
thrash.c mm: pass mm to grab_swap_token 2009-06-23 12:50:05 -07:00
truncate.c mm: remove __invalidate_mapping_pages variant 2009-06-16 19:47:43 -07:00
util.c Merge branches 'slab/documentation', 'slab/fixes', 'slob/cleanups' and 'slub/fixes' into for-linus 2009-06-17 08:30:15 +03:00
vmalloc.c vmalloc.c: fix double error checking 2009-09-22 07:17:30 -07:00
vmscan.c vmscan: kill unnecessary prefetch 2009-09-22 07:17:30 -07:00
vmstat.c mm: vmstat: add isolate pages 2009-09-22 07:17:29 -07:00