linux-stable/mm
Mel Gorman ebf46a9428 mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa
commit 8b272b3cbb upstream.

: A user reported a bug against a distribution kernel while running a
: proprietary workload described as "memory intensive that is not swapping"
: that is expected to apply to mainline kernels.  The workload is
: read/write/modifying ranges of memory and checking the contents.  They
: reported that within a few hours that a bad PMD would be reported followed
: by a memory corruption where expected data was all zeros.  A partial
: report of the bad PMD looked like
:
:   [ 5195.338482] ../mm/pgtable-generic.c:33: bad pmd ffff8888157ba008(000002e0396009e2)
:   [ 5195.341184] ------------[ cut here ]------------
:   [ 5195.356880] kernel BUG at ../mm/pgtable-generic.c:35!
:   ....
:   [ 5195.410033] Call Trace:
:   [ 5195.410471]  [<ffffffff811bc75d>] change_protection_range+0x7dd/0x930
:   [ 5195.410716]  [<ffffffff811d4be8>] change_prot_numa+0x18/0x30
:   [ 5195.410918]  [<ffffffff810adefe>] task_numa_work+0x1fe/0x310
:   [ 5195.411200]  [<ffffffff81098322>] task_work_run+0x72/0x90
:   [ 5195.411246]  [<ffffffff81077139>] exit_to_usermode_loop+0x91/0xc2
:   [ 5195.411494]  [<ffffffff81003a51>] prepare_exit_to_usermode+0x31/0x40
:   [ 5195.411739]  [<ffffffff815e56af>] retint_user+0x8/0x10
:
: Decoding revealed that the PMD was a valid prot_numa PMD and the bad PMD
: was a false detection.  The bug does not trigger if automatic NUMA
: balancing or transparent huge pages is disabled.
:
: The bug is due a race in change_pmd_range between a pmd_trans_huge and
: pmd_nond_or_clear_bad check without any locks held.  During the
: pmd_trans_huge check, a parallel protection update under lock can have
: cleared the PMD and filled it with a prot_numa entry between the transhuge
: check and the pmd_none_or_clear_bad check.
:
: While this could be fixed with heavy locking, it's only necessary to make
: a copy of the PMD on the stack during change_pmd_range and avoid races.  A
: new helper is created for this as the check if quite subtle and the
: existing similar helpful is not suitable.  This passed 154 hours of
: testing (usually triggers between 20 minutes and 24 hours) without
: detecting bad PMDs or corruption.  A basic test of an autonuma-intensive
: workload showed no significant change in behaviour.

Although Mel withdrew the patch on the face of LKML comment
https://lkml.org/lkml/2017/4/10/922 the race window aforementioned is
still open, and we have reports of Linpack test reporting bad residuals
after the bad PMD warning is observed.  In addition to that, bad
rss-counter and non-zero pgtables assertions are triggered on mm teardown
for the task hitting the bad PMD.

 host kernel: mm/pgtable-generic.c:40: bad pmd 00000000b3152f68(8000000d2d2008e7)
 ....
 host kernel: BUG: Bad rss-counter state mm:00000000b583043d idx:1 val:512
 host kernel: BUG: non-zero pgtables_bytes on freeing mm: 4096

The issue is observed on a v4.18-based distribution kernel, but the race
window is expected to be applicable to mainline kernels, as well.

[akpm@linux-foundation.org: fix comment typo, per Rafael]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Link: http://lkml.kernel.org/r/20200216191800.22423-1-aquini@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11 18:03:02 +01:00
..
kasan kasan: fix shadow_size calculation error in kasan_module_alloc 2018-08-24 13:09:12 +02:00
backing-dev.c writeback: synchronize sync(2) against cgroup writeback membership switches 2019-03-05 17:58:01 +01:00
balloon_compaction.c virtio_balloon: fix deadlock on OOM 2018-10-13 09:27:30 +02:00
bootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cleancache.c
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-08-06 19:05:23 +02:00
cma.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-06-15 11:54:51 +02:00
compaction.c mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone 2019-10-05 12:48:13 +02:00
debug.c mm: get rid of vmacache_flush_all() entirely 2018-09-19 22:43:48 +02:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dmapool.c
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2018-02-25 11:08:03 +01:00
fadvise.c mm/fadvise.c: fix signed overflow UBSAN complaint 2018-09-15 09:45:28 +02:00
failslab.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
filemap.c mm/filemap.c: don't initiate writeback if mapping has no dirty pages 2019-11-12 19:18:46 +01:00
frame_vector.c mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()' 2018-03-03 10:24:21 +01:00
frontswap.c
gup.c mm/gup.c: remove some BUG_ONs from get_gate_page() 2019-07-31 07:28:56 +02:00
highmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hmm.c mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL 2019-01-13 10:01:02 +01:00
huge_memory.c mm, thp: fix defrag setting if newline is not used 2020-03-11 18:02:55 +01:00
hugetlb.c hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic() 2019-10-29 09:17:39 +01:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-20 17:59:33 +01:00
hwpoison-inject.c
init-mm.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
internal.h vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n 2019-12-05 15:37:51 +01:00
interval_tree.c lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
Kconfig mm/hmm: select mmu notifier when selecting HMM 2019-06-15 11:54:51 +02:00
Kconfig.debug kmemcheck: rip it out 2018-02-22 15:42:24 +01:00
khugepaged.c coredump: fix race condition between collapse_huge_page() and core dumping 2019-06-22 08:16:19 +02:00
kmemleak-test.c
kmemleak.c mm/kmemleak.c: fix check for softirq context 2019-07-31 07:28:55 +02:00
ksm.c mm/ksm.c: don't WARN if page is still mapped in remove_stable_node() 2019-12-01 09:13:14 +01:00
list_lru.c mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node 2019-06-19 08:20:54 +02:00
maccess.c
madvise.c mm: madvise(MADV_DODUMP): allow hugetlbfs pages 2018-10-10 08:54:22 +02:00
Makefile kmemcheck: rip it out 2018-02-22 15:42:24 +01:00
memblock.c Revert "mm: page_alloc: skip over regions of invalid pfns where possible" 2018-03-28 18:24:39 +02:00
memcontrol.c mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() 2019-11-20 17:59:32 +01:00
memory-failure.c mm: hwpoison: fix thp split handing in soft_offline_in_use_page() 2019-03-23 14:35:24 +01:00
memory.c mm/memory.c: fix modifying of page protection by insert_pfn() 2019-05-16 19:42:31 +02:00
memory_hotplug.c mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span() 2019-12-01 09:14:18 +01:00
mempolicy.c mm/mempolicy.c: fix out of bounds write in mpol_parse_str() 2020-02-05 14:18:13 +00:00
mempool.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate 2019-04-03 06:25:20 +02:00
mincore.c mm/mincore.c: make mincore() more conservative 2019-05-21 18:50:16 +02:00
mlock.c mm/mlock.c: change count_mm_mlocked_page_nr return type 2019-07-10 09:54:36 +02:00
mm_init.c
mmap.c arm64: Revert support for execute-only user mappings 2020-01-09 10:17:56 +01:00
mmu_context.c
mmu_notifier.c mm/mmu_notifier: use hlist_add_head_rcu() 2019-07-31 07:28:56 +02:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa 2020-03-11 18:03:02 +01:00
mremap.c mremap: properly flush TLB before releasing the page 2018-10-20 09:48:53 +02:00
msync.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nobootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nommu.c Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:13:32 -07:00
oom_kill.c memcg, oom: don't require __GFP_FS when invoking memcg OOM killer 2019-10-05 12:48:09 +02:00
page-writeback.c mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() 2020-01-23 08:20:32 +01:00
page_alloc.c mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE 2019-06-15 11:54:51 +02:00
page_counter.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_ext.c mm/page_ext.c: fix an imbalance with kmemleak 2019-04-05 22:31:27 +02:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-07-03 13:15:59 +02:00
page_io.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_isolation.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_owner.c mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo 2019-10-29 09:17:39 +01:00
page_poison.c page_poison: play nicely with KASAN 2019-04-05 22:31:28 +02:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-11-13 11:15:08 -08:00
pagewalk.c mm/pagewalk.c: report holes in hugetlb ranges 2017-11-24 08:37:04 +01:00
percpu-internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
percpu-km.c percpu: convert spin_lock_irq to spin_lock_irqsave. 2019-02-12 19:46:05 +01:00
percpu-stats.c percpu: fix starting offset for chunk statistics traversal 2017-09-27 14:45:57 -07:00
percpu-vm.c percpu: add __GFP_NORETRY semantics to the percpu balancing path 2018-04-08 14:26:29 +02:00
percpu.c percpu: do not search past bitmap when allocating an area 2019-06-15 11:54:59 +02:00
pgtable-generic.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
process_vm_access.c
quicklist.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
readahead.c readahead: stricter check for bdi io_pages 2018-09-09 19:55:53 +02:00
rmap.c mm: migration: fix migration of huge PMD shared pages 2018-10-13 09:27:22 +02:00
rodata_test.c mm: fix RODATA_TEST failure "rodata_test: test data was not read only" 2017-10-03 17:54:24 -07:00
shmem.c mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment 2020-01-23 08:20:32 +01:00
slab.c mm/slab.c: fix an infinite loop in leaks_show() 2019-06-15 11:54:51 +02:00
slab.h kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK 2018-02-22 15:42:23 +01:00
slab_common.c mm: don't warn about large allocations for slab 2018-12-01 09:42:51 +01:00
slob.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
slub.c mm/slub: fix a deadlock in show_slab_objects() 2019-10-29 09:17:38 +01:00
sparse-vmemmap.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sparse.c mm: sections are not offlined during memory hotremove 2018-05-16 10:10:27 +02:00
swap.c mm: avoid marking swap cached page as lazyfree 2017-10-03 17:54:24 -07:00
swap_cgroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_slots.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_state.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swapfile.c mm/swap: use nr_node_ids for avail_lists in swap_info_struct 2019-01-26 09:37:06 +01:00
truncate.c mm: cleancache: fix corruption on missed inode invalidation 2018-12-08 13:03:40 +01:00
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-10-11 18:18:34 +02:00
userfaultfd.c hugetlb: use same fault hash key for shared and private mappings 2019-05-31 06:47:12 -07:00
util.c mm: page_mapped: don't assume compound page is huge or THP 2019-01-16 22:07:11 +01:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-19 22:43:48 +02:00
vmalloc.c mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() 2019-08-16 10:13:48 +02:00
vmpressure.c
vmscan.c mm/vmscan.c: don't round up scan size for online memory cgroup 2020-02-28 16:36:12 +01:00
vmstat.c mm/vmstat.c: fix NUMA statistics updates 2019-12-17 20:37:57 +01:00
workingset.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
z3fold.c z3fold: fix possible reclaim races 2018-12-01 09:42:54 +01:00
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc.c: fix the migrated zspage statistics. 2020-01-09 10:17:54 +01:00
zswap.c zswap: re-check zswap_is_full() after do zswap_shrink() 2018-09-05 09:26:30 +02:00