linux-stable/mm
Tetsuo Handa 4515bbc4e2 mm: don't warn about allocations which stall for too long
[ Upstream commit 400e22499d ]

Commit 63f53dea0c ("mm: warn about allocations which stall for too
long") was a great step for reducing possibility of silent hang up
problem caused by memory allocation stalls.  But this commit reverts it,
for it is possible to trigger OOM lockup and/or soft lockups when many
threads concurrently called warn_alloc() (in order to warn about memory
allocation stalls) due to current implementation of printk(), and it is
difficult to obtain useful information due to limitation of synchronous
warning approach.

Current printk() implementation flushes all pending logs using the
context of a thread which called console_unlock().  printk() should be
able to flush all pending logs eventually unless somebody continues
appending to printk() buffer.

Since warn_alloc() started appending to printk() buffer while waiting
for oom_kill_process() to make forward progress when oom_kill_process()
is processing pending logs, it became possible for warn_alloc() to force
oom_kill_process() loop inside printk().  As a result, warn_alloc()
significantly increased possibility of preventing oom_kill_process()
from making forward progress.

---------- Pseudo code start ----------
Before warn_alloc() was introduced:

  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    }
    goto retry;

After warn_alloc() was introduced:

  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    } else if (waited_for_10seconds()) {
      atomic_inc(&printk_pending_logs);
    }
    goto retry;
---------- Pseudo code end ----------

Although waited_for_10seconds() becomes true once per 10 seconds,
unbounded number of threads can call waited_for_10seconds() at the same
time.  Also, since threads doing waited_for_10seconds() keep doing
almost busy loop, the thread doing print_one_log() can use little CPU
resource.  Therefore, this situation can be simplified like

---------- Pseudo code start ----------
  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    } else {
      atomic_inc(&printk_pending_logs);
    }
    goto retry;
---------- Pseudo code end ----------

when printk() is called faster than print_one_log() can process a log.

One of possible mitigation would be to introduce a new lock in order to
make sure that no other series of printk() (either oom_kill_process() or
warn_alloc()) can append to printk() buffer when one series of printk()
(either oom_kill_process() or warn_alloc()) is already in progress.

Such serialization will also help obtaining kernel messages in readable
form.

---------- Pseudo code start ----------
  retry:
    if (mutex_trylock(&oom_lock)) {
      mutex_lock(&oom_printk_lock);
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_printk_lock);
      mutex_unlock(&oom_lock)
    } else {
      if (mutex_trylock(&oom_printk_lock)) {
        atomic_inc(&printk_pending_logs);
        mutex_unlock(&oom_printk_lock);
      }
    }
    goto retry;
---------- Pseudo code end ----------

But this commit does not go that direction, for we don't want to
introduce a new lock dependency, and we unlikely be able to obtain
useful information even if we serialized oom_kill_process() and
warn_alloc().

Synchronous approach is prone to unexpected results (e.g.  too late [1],
too frequent [2], overlooked [3]).  As far as I know, warn_alloc() never
helped with providing information other than "something is going wrong".
I want to consider asynchronous approach which can obtain information
during stalls with possibly relevant threads (e.g.  the owner of
oom_lock and kswapd-like threads) and serve as a trigger for actions
(e.g.  turn on/off tracepoints, ask libvirt daemon to take a memory dump
of stalling KVM guest for diagnostic purpose).

This commit temporarily loses ability to report e.g.  OOM lockup due to
unable to invoke the OOM killer due to !__GFP_FS allocation request.
But asynchronous approach will be able to detect such situation and emit
warning.  Thus, let's remove warn_alloc().

[1] https://bugzilla.kernel.org/show_bug.cgi?id=192981
[2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@mail.gmail.com
[3] commit db73ee0d46 ("mm, vmscan: do not loop on too_many_isolated for ever"))

Link: http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-13 09:18:50 +01:00
..
kasan kasan: fix shadow_size calculation error in kasan_module_alloc 2018-08-24 13:09:12 +02:00
backing-dev.c bdi: Fix another oops in wb_workfn() 2018-07-22 14:28:49 +02:00
balloon_compaction.c virtio_balloon: fix deadlock on OOM 2018-10-13 09:27:30 +02:00
bootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cleancache.c fs: switch ->s_uuid to uuid_t 2017-06-05 16:59:12 +02:00
cma.c mm/cma.c: take __GFP_NOWARN into account in cma_alloc() 2017-10-13 16:18:32 -07:00
cma.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cma_debug.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compaction.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
debug.c mm: get rid of vmacache_flush_all() entirely 2018-09-19 22:43:48 +02:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dmapool.c
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2018-02-25 11:08:03 +01:00
fadvise.c mm/fadvise.c: fix signed overflow UBSAN complaint 2018-09-15 09:45:28 +02:00
failslab.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
filemap.c mm/filemap.c: fix NULL pointer in page_cache_tree_insert() 2018-04-24 09:36:39 +02:00
frame_vector.c mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()' 2018-03-03 10:24:21 +01:00
frontswap.c
gup.c mm: do not bug_on on incorrect length in __mm_populate() 2018-07-17 11:39:29 +02:00
highmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hmm.c mm/hmm: hmm_pfns_bad() was accessing wrong struct 2018-04-24 09:36:22 +02:00
huge_memory.c mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() 2018-12-05 19:41:08 +01:00
hugetlb.c userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails 2018-12-08 13:03:37 +01:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
internal.h mm, oom: do not rely on TIF_MEMDIE for memory reserves access 2017-09-06 17:27:30 -07:00
interval_tree.c lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
Kconfig mm: don't allow deferred pages with NEED_PER_CPU_KM 2018-05-22 18:53:58 +02:00
Kconfig.debug kmemcheck: rip it out 2018-02-22 15:42:24 +01:00
khugepaged.c mm/khugepaged: collapse_shmem() do not crash on Compound 2018-12-05 19:41:09 +01:00
kmemleak-test.c
kmemleak.c mm/kmemleak.c: wait for scan completion before disabling free 2018-05-30 07:52:21 +02:00
ksm.c mm/ksm.c: ignore STABLE_FLAG of rmap_item->address in rmap_walk_ksm() 2018-07-03 11:25:03 +02:00
list_lru.c mm: memcontrol: use vmalloc fallback for large kmem memcg arrays 2017-10-03 17:54:25 -07:00
maccess.c
madvise.c mm: madvise(MADV_DODUMP): allow hugetlbfs pages 2018-10-10 08:54:22 +02:00
Makefile kmemcheck: rip it out 2018-02-22 15:42:24 +01:00
memblock.c Revert "mm: page_alloc: skip over regions of invalid pfns where possible" 2018-03-28 18:24:39 +02:00
memcontrol.c memcg: remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure 2018-09-05 09:26:32 +02:00
memory-failure.c mm: hwpoison: disable memory error handling on 1GB hugepage 2018-07-11 16:29:20 +02:00
memory.c mm/memory.c: recheck page table entry with page table lock held 2018-12-01 09:42:51 +01:00
memory_hotplug.c mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long 2017-10-03 17:54:26 -07:00
mempolicy.c mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings 2018-11-21 09:24:09 +01:00
mempool.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c mm, thp: fix mlocking THP page with migration enabled 2018-10-13 09:27:22 +02:00
mincore.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mlock.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mm_init.c
mmap.c mm: do not bug_on on incorrect length in __mm_populate() 2018-07-17 11:39:29 +02:00
mmu_context.c
mmu_notifier.c mm/mmu_notifier: kill invalidate_page 2017-08-31 16:13:00 -07:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings 2018-08-15 18:12:51 +02:00
mremap.c mremap: properly flush TLB before releasing the page 2018-10-20 09:48:53 +02:00
msync.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nobootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nommu.c Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:13:32 -07:00
oom_kill.c mm, oom: fix concurrent munlock and oom reaper unmap, v3 2018-05-16 10:10:27 +02:00
page-writeback.c writeback: safer lock nesting 2018-04-24 09:36:39 +02:00
page_alloc.c mm: don't warn about allocations which stall for too long 2018-12-13 09:18:50 +01:00
page_counter.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_ext.c mm/page_ext.c: check if page_ext is not prepared 2017-11-24 08:37:05 +01:00
page_idle.c mm: thp: fix potential clearing to referenced flag in page_idle_clear_pte_refs_one() 2018-05-30 07:52:24 +02:00
page_io.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_isolation.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_owner.c mm/page_owner: fix recursion bug after changing skip entries 2018-05-30 07:52:21 +02:00
page_poison.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-11-13 11:15:08 -08:00
pagewalk.c mm/pagewalk.c: report holes in hugetlb ranges 2017-11-24 08:37:04 +01:00
percpu-internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
percpu-km.c percpu: add __GFP_NORETRY semantics to the percpu balancing path 2018-04-08 14:26:29 +02:00
percpu-stats.c percpu: fix starting offset for chunk statistics traversal 2017-09-27 14:45:57 -07:00
percpu-vm.c percpu: add __GFP_NORETRY semantics to the percpu balancing path 2018-04-08 14:26:29 +02:00
percpu.c percpu: stop leaking bitmap metadata blocks 2018-10-18 09:16:23 +02:00
pgtable-generic.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
process_vm_access.c
quicklist.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
readahead.c readahead: stricter check for bdi io_pages 2018-09-09 19:55:53 +02:00
rmap.c mm: migration: fix migration of huge PMD shared pages 2018-10-13 09:27:22 +02:00
rodata_test.c mm: fix RODATA_TEST failure "rodata_test: test data was not read only" 2017-10-03 17:54:24 -07:00
shmem.c userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set 2018-12-08 13:03:37 +01:00
slab.c mm: don't warn about large allocations for slab 2018-12-01 09:42:51 +01:00
slab.h kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK 2018-02-22 15:42:23 +01:00
slab_common.c mm: don't warn about large allocations for slab 2018-12-01 09:42:51 +01:00
slob.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
slub.c slub: make ->cpu_partial unsigned int 2018-10-03 17:00:55 -07:00
sparse-vmemmap.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sparse.c mm: sections are not offlined during memory hotremove 2018-05-16 10:10:27 +02:00
swap.c mm: avoid marking swap cached page as lazyfree 2017-10-03 17:54:24 -07:00
swap_cgroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_slots.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_state.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swapfile.c mm/swapfile.c: use kvzalloc for swap_info_struct allocation 2018-11-21 09:24:15 +01:00
truncate.c mm: cleancache: fix corruption on missed inode invalidation 2018-12-08 13:03:40 +01:00
usercopy.c
userfaultfd.c userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas 2018-12-08 13:03:38 +01:00
util.c mm: treat indirectly reclaimable memory as free in overcommit logic 2018-10-18 09:16:25 +02:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-19 22:43:48 +02:00
vmalloc.c mm: vmalloc: avoid racy handling of debugobjects in vunmap 2018-08-03 07:50:23 +02:00
vmpressure.c mm, vmpressure: pass-through notification support 2017-07-10 16:32:31 -07:00
vmscan.c mm: fix the NULL mapping case in __isolate_lru_page() 2018-06-05 11:41:54 +02:00
vmstat.c mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo 2018-12-08 13:03:40 +01:00
workingset.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
z3fold.c z3fold: fix possible reclaim races 2018-12-01 09:42:54 +01:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: calling zs_map_object() from irq is a bug 2017-12-14 09:53:10 +01:00
zswap.c zswap: re-check zswap_is_full() after do zswap_shrink() 2018-09-05 09:26:30 +02:00