linux-stable/mm
Mel Gorman 73444bc4d8 mm, page_alloc: do not wake kswapd with zone lock held
syzbot reported the following regression in the latest merge window and
it was confirmed by Qian Cai that a similar bug was visible from a
different context.

  ======================================================
  WARNING: possible circular locking dependency detected
  4.20.0+ #297 Not tainted
  ------------------------------------------------------
  syz-executor0/8529 is trying to acquire lock:
  000000005e7fb829 (&pgdat->kswapd_wait){....}, at:
  __wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120

  but task is already holding lock:
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: spin_lock
  include/linux/spinlock.h:329 [inline]
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_bulk
  mm/page_alloc.c:2548 [inline]
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: __rmqueue_pcplist
  mm/page_alloc.c:3021 [inline]
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_pcplist
  mm/page_alloc.c:3050 [inline]
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue
  mm/page_alloc.c:3072 [inline]
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at:
  get_page_from_freelist+0x1bae/0x52a0 mm/page_alloc.c:3491

It appears to be a false positive in that the only way the lock ordering
should be inverted is if kswapd is waking itself and the wakeup
allocates debugging objects which should already be allocated if it's
kswapd doing the waking.  Nevertheless, the possibility exists and so
it's best to avoid the problem.

This patch flags a zone as needing a kswapd using the, surprisingly,
unused zone flag field.  The flag is read without the lock held to do
the wakeup.  It's possible that the flag setting context is not the same
as the flag clearing context or for small races to occur.  However, each
race possibility is harmless and there is no visible degredation in
fragmentation treatment.

While zone->flag could have continued to be unused, there is potential
for moving some existing fields into the flags field instead.
Particularly read-mostly ones like zone->initialized and
zone->contiguous.

Link: http://lkml.kernel.org/r/20190103225712.GJ31517@techsingularity.net
Fixes: 1c30844d2d ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Reported-by: syzbot+93d94a001cfbce9e60e1@syzkaller.appspotmail.com
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
..
kasan kasan: fix krealloc handling for tag-based mode 2019-01-08 17:15:11 -08:00
backing-dev.c
balloon_compaction.c
cleancache.c
cma.c kasan, mm, arm64: tag non slab memory allocated via pagealloc 2018-12-28 12:11:44 -08:00
cma.h
cma_debug.c
compaction.c mm: move zone watermark accesses behind an accessor 2018-12-28 12:11:48 -08:00
debug.c mm/debug.c: make "migrate_reason_names[]" const char * 2018-12-28 12:11:48 -08:00
debug_page_ref.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm/: remove caller signal_pending branch predictions 2019-01-04 13:13:48 -08:00
frame_vector.c
frontswap.c
gup.c Merge branch 'akpm' (patches from Andrew) 2019-01-05 09:16:18 -08:00
gup_benchmark.c
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm 2018-12-28 12:11:52 -08:00
huge_memory.c mm: treewide: remove unused address argument from pte_alloc functions 2019-01-04 13:13:47 -08:00
hugetlb.c hugetlbfs: revert "use i_mmap_rwsem for more pmd sharing synchronization" 2019-01-08 17:15:11 -08:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c
internal.h mm: use alloc_flags to record if kswapd can wake 2018-12-28 12:11:48 -08:00
interval_tree.c
Kconfig ksm: replace jhash2 with xxhash 2018-12-28 12:11:46 -08:00
Kconfig.debug
khugepaged.c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 2018-12-28 12:11:50 -08:00
kmemleak-test.c
kmemleak.c kmemleak: add config to select auto scan 2018-12-28 12:11:51 -08:00
ksm.c ksm: react on changing "sleep_millisecs" parameter faster 2018-12-28 12:11:51 -08:00
list_lru.c
maccess.c
madvise.c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 2018-12-28 12:11:50 -08:00
Makefile
memblock.c mm/memblock.c: skip kmemleak for kasan_init() 2018-12-28 12:11:49 -08:00
memcontrol.c memcg, oom: notify on oom killer invocation from the charge path 2018-12-28 12:11:52 -08:00
memfd.c
memory-failure.c hugetlbfs: revert "use i_mmap_rwsem for more pmd sharing synchronization" 2019-01-08 17:15:11 -08:00
memory.c mm/memory.c: initialise mmu_notifier_range correctly 2019-01-08 17:15:11 -08:00
memory_hotplug.c mm, memory_hotplug: deobfuscate migration part of offlining 2018-12-28 12:11:50 -08:00
mempolicy.c Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask" 2018-12-08 10:26:20 -08:00
mempool.c
memtest.c
migrate.c hugetlbfs: revert "use i_mmap_rwsem for more pmd sharing synchronization" 2019-01-08 17:15:11 -08:00
mincore.c Change mincore() to count "mapped" pages rather than "cached" pages 2019-01-06 13:43:02 -08:00
mlock.c
mm_init.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
mmap.c mm/mmap.c: remove verify_mm_writelocked() 2018-12-28 12:11:47 -08:00
mmu_context.c
mmu_gather.c mm: Replace call_rcu_sched() with call_rcu() 2018-11-27 09:21:46 -08:00
mmu_notifier.c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 2018-12-28 12:11:50 -08:00
mmzone.c
mprotect.c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 2018-12-28 12:11:50 -08:00
mremap.c mm: speed up mremap by 20x on large regions 2019-01-04 13:13:48 -08:00
msync.c
nommu.c
oom_kill.c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 2018-12-28 12:11:50 -08:00
page-writeback.c mm/page-writeback.c: don't break integrity writeback on ->writepage() error 2018-12-28 12:11:49 -08:00
page_alloc.c mm, page_alloc: do not wake kswapd with zone lock held 2019-01-08 17:15:11 -08:00
page_counter.c
page_ext.c
page_idle.c
page_io.c mm/page_io.c: fix polled swap page in 2019-01-04 13:13:48 -08:00
page_isolation.c mm: only report isolation failures when offlining memory 2018-12-28 12:11:46 -08:00
page_owner.c mm/page_owner: clamp read count to PAGE_SIZE 2018-12-28 12:11:46 -08:00
page_poison.c
page_vma_mapped.c
pagewalk.c
percpu-internal.h
percpu-km.c percpu: convert spin_lock_irq to spin_lock_irqsave. 2018-12-18 09:04:08 -08:00
percpu-stats.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c mm/readahead.c: simplify get_next_ra_size() 2018-12-28 12:11:46 -08:00
rmap.c hugetlbfs: revert "use i_mmap_rwsem for more pmd sharing synchronization" 2019-01-08 17:15:11 -08:00
rodata_test.c
shmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
slab.c slab: alien caches must not be initialized if the allocation of the alien cache failed 2019-01-08 17:15:11 -08:00
slab.h kasan, mm: change hooks signatures 2018-12-28 12:11:43 -08:00
slab_common.c A fairly normal cycle for documentation stuff. We have a new 2018-12-29 11:21:49 -08:00
slob.c
slub.c kasan: make tag based mode work with CONFIG_HARDENED_USERCOPY 2019-01-08 17:15:11 -08:00
sparse-vmemmap.c
sparse.c mm, sparse: pass nid instead of pgdat to sparse_add_one_section() 2018-12-28 12:11:49 -08:00
swap.c fs: don't open code lru_to_page() 2019-01-04 13:13:48 -08:00
swap_cgroup.c
swap_slots.c
swap_state.c
swapfile.c mm, swap: fix swapoff with KSM pages 2018-12-28 12:11:52 -08:00
truncate.c mm: cleancache: fix corruption on missed inode invalidation 2018-11-30 14:56:14 -08:00
usercopy.c mm/usercopy.c: no check page span for stack objects 2019-01-08 17:15:11 -08:00
userfaultfd.c hugetlbfs: revert "use i_mmap_rwsem for more pmd sharing synchronization" 2019-01-08 17:15:11 -08:00
util.c mm: page_mapped: don't assume compound page is huge or THP 2019-01-08 17:15:11 -08:00
vmacache.c
vmalloc.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
vmpressure.c
vmscan.c mm: put_and_wait_on_page_locked() while page is migrated 2018-12-28 12:11:48 -08:00
vmstat.c mm: convert zone->managed_pages to atomic variable 2018-12-28 12:11:47 -08:00
workingset.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
z3fold.c
zbud.c
zpool.c
zsmalloc.c
zswap.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00