linux-stable/mm
Mike Kravetz a3ccc156f3 hugetlb: use same fault hash key for shared and private mappings
commit 1b426bac66 upstream.

hugetlb uses a fault mutex hash table to prevent page faults of the
same pages concurrently.  The key for shared and private mappings is
different.  Shared keys off address_space and file index.  Private keys
off mm and virtual address.  Consider a private mappings of a populated
hugetlbfs file.  A fault will map the page from the file and if needed
do a COW to map a writable page.

Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
pages.  It uses the address_space file index key.  However, private
mappings will use a different key and could race with this code to map
the file page.  This causes problems (BUG) for the page cache remove
code as it expects the page to be unmapped.  A sample stack is:

page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
kernel BUG at mm/filemap.c:169!
...
RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
...
Call Trace:
__delete_from_page_cache+0x39/0x220
delete_from_page_cache+0x45/0x70
remove_inode_hugepages+0x13c/0x380
? __add_to_page_cache_locked+0x162/0x380
hugetlbfs_fallocate+0x403/0x540
? _cond_resched+0x15/0x30
? __inode_security_revalidate+0x5d/0x70
? selinux_file_permission+0x100/0x130
vfs_fallocate+0x13f/0x270
ksys_fallocate+0x3c/0x80
__x64_sys_fallocate+0x1a/0x20
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9

There seems to be another potential COW issue/race with this approach
of different private and shared keys as noted in commit 8382d914eb
("mm, hugetlb: improve page-fault scalability").

Since every hugetlb mapping (even anon and private) is actually a file
mapping, just use the address_space index key for all mappings.  This
results in potentially more hash collisions.  However, this should not
be the common case.

Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
Fixes: b5cec28d36 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22 07:37:40 +02:00
..
kasan kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN 2018-08-17 16:20:30 -07:00
backing-dev.c writeback: synchronize sync(2) against cgroup writeback membership switches 2019-03-05 17:58:50 +01:00
balloon_compaction.c
bootmem.c docs/mm: bootmem: add overview documentation 2018-08-02 12:17:27 -06:00
cleancache.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
cma.c mm/cma.c: cma_declare_contiguous: correct err handling 2019-04-05 22:32:58 +02:00
cma.h
cma_debug.c mm/cma: remove unsupported gfp_mask parameter from cma_alloc() 2018-08-17 16:20:32 -07:00
compaction.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
debug.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
debug_page_ref.c
dmapool.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
early_ioremap.c
fadvise.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
failslab.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
filemap.c mm: use new return type vm_fault_t 2018-06-07 17:34:36 -07:00
frame_vector.c
frontswap.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
gup.c mm: prevent get_user_pages() from overflowing page refcount 2019-05-04 09:20:11 +02:00
gup_benchmark.c mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl 2018-10-05 16:32:04 -07:00
highmem.c
hmm.c mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL 2019-01-13 09:51:04 +01:00
huge_memory.c mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses 2019-05-22 07:37:40 +02:00
hugetlb.c hugetlb: use same fault hash key for shared and private mappings 2019-05-22 07:37:40 +02:00
hugetlb_cgroup.c mm: rename page_counter's count/limit into usage/max 2018-06-07 17:34:35 -07:00
hwpoison-inject.c
init-mm.c mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids 2018-07-17 09:35:30 +02:00
internal.h mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
interval_tree.c
Kconfig mm: disable deferred struct page for 32-bit arches 2018-09-20 22:01:11 +02:00
Kconfig.debug mm: clarify CONFIG_PAGE_POISONING and usage 2018-08-22 10:52:44 -07:00
khugepaged.c mm/khugepaged: collapse_shmem() do not crash on Compound 2018-12-05 19:31:57 +01:00
kmemleak-test.c
kmemleak.c mm/kmemleak.c: fix unused-function warning 2019-05-08 07:21:55 +02:00
ksm.c include/linux/compiler*.h: make compiler-*.h mutually exclusive 2018-08-22 17:31:34 -07:00
list_lru.c mm/list_lru: introduce list_lru_shrink_walk_irq() 2018-08-17 16:20:32 -07:00
maccess.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
madvise.c mm: madvise(MADV_DODUMP): allow hugetlbfs pages 2018-10-05 16:32:05 -07:00
Makefile vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
memblock.c mm/memblock.c: replace u64 with phys_addr_t where appropriate 2018-08-17 16:20:30 -07:00
memcontrol.c mm: writeback: use exact memcg dirty counts 2019-04-17 08:38:51 +02:00
memfd.c alloc_file(): switch to passing O_... flags instead of FMODE_... mode 2018-07-12 10:02:57 -04:00
memory-failure.c mm: hwpoison: fix thp split handing in soft_offline_in_use_page() 2019-03-23 20:10:04 +01:00
memory.c mm/memory.c: fix modifying of page protection by insert_pfn() 2019-05-16 19:41:26 +02:00
memory_hotplug.c mm/memory_hotplug.c: drop memory device reference after find_memory_block() 2019-05-16 19:41:25 +02:00
mempolicy.c mm, mempolicy: fix uninit memory access 2019-04-05 22:32:58 +02:00
mempool.c mm/mempool.c: add missing parameter description 2018-08-22 10:52:44 -07:00
memtest.c
migrate.c mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate 2019-04-03 06:26:28 +02:00
mincore.c mm/mincore.c: make mincore() more conservative 2019-05-22 07:37:40 +02:00
mlock.c dax: remove VM_MIXEDMAP for fsdax and device dax 2018-08-17 16:20:27 -07:00
mm_init.c mm: access zone->node via zone_to_nid() and zone_set_nid() 2018-08-22 10:52:45 -07:00
mmap.c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-04-27 09:36:37 +02:00
mmu_context.c
mmu_notifier.c mm, oom: distinguish blockable mode for mmu notifiers 2018-08-22 10:52:44 -07:00
mmzone.c
mprotect.c x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings 2018-06-20 19:10:01 +02:00
mremap.c mremap: properly flush TLB before releasing the page 2018-10-18 11:30:52 +02:00
msync.c
nobootmem.c mm/memblock: add a name for memblock flags enumeration 2018-08-02 12:17:27 -06:00
nommu.c mm: provide a fallback for PAGE_KERNEL_EXEC for architectures 2018-08-17 16:20:29 -07:00
oom_kill.c mm,oom: don't kill global init via memory.oom.group 2019-04-05 22:32:58 +02:00
page-writeback.c mm/page-writeback.c: don't break integrity writeback on ->writepage() error 2019-01-26 09:32:43 +01:00
page_alloc.c page_poison: play nicely with KASAN 2019-04-05 22:32:59 +02:00
page_counter.c memcg: introduce memory.min 2018-06-07 17:34:36 -07:00
page_ext.c mm/page_ext.c: fix an imbalance with kmemleak 2019-04-05 22:32:58 +02:00
page_idle.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
page_io.c swap,blkcg: issue swap io with the appropriate context 2018-07-09 09:07:54 -06:00
page_isolation.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
page_owner.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
page_poison.c page_poison: play nicely with KASAN 2019-04-05 22:32:59 +02:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-11-13 11:08:46 -08:00
pagewalk.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
percpu-internal.h
percpu-km.c percpu: convert spin_lock_irq to spin_lock_irqsave. 2019-02-12 19:47:12 +01:00
percpu-stats.c treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
percpu-vm.c percpu: allow select gfp to be passed to underlying allocators 2018-02-18 05:33:01 -08:00
percpu.c percpu: stop printing kernel addresses 2019-04-27 09:36:40 +02:00
pgtable-generic.c
process_vm_access.c mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors 2018-02-06 18:32:48 -08:00
quicklist.c
readahead.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
rmap.c mm/huge_memory: rename freeze_page() to unmap_page() 2018-12-05 19:31:57 +01:00
rodata_test.c
shmem.c tmpfs: fix uninitialized return value in shmem_link 2019-03-23 20:09:52 +01:00
slab.c slab: fix a crash by reading /proc/slab_allocators 2019-05-10 17:54:08 +02:00
slab.h mm: add support for kmem caches in DMA32 zone 2019-04-03 06:26:28 +02:00
slab_common.c mm: add support for kmem caches in DMA32 zone 2019-04-03 06:26:28 +02:00
slob.c slab: __GFP_ZERO is incompatible with a constructor 2018-06-07 17:34:34 -07:00
slub.c mm: add support for kmem caches in DMA32 zone 2019-04-03 06:26:28 +02:00
sparse-vmemmap.c mm/sparse: delete old sparse_init and enable new one 2018-08-17 16:20:32 -07:00
sparse.c mm/sparse: fix a bad comparison 2019-04-05 22:32:58 +02:00
swap.c mm: handle lru_add_drain_all for UP properly 2019-03-23 20:09:50 +01:00
swap_cgroup.c
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c treewide: kvzalloc() -> kvcalloc() 2018-06-12 16:19:22 -07:00
swapfile.c mm, swap: bounds check swap_info array accesses to avoid NULL derefs 2019-04-05 22:32:58 +02:00
truncate.c mm: cleancache: fix corruption on missed inode invalidation 2018-12-05 19:32:13 +01:00
usercopy.c mm/usercopy.c: no check page span for stack objects 2019-01-16 22:04:33 +01:00
userfaultfd.c hugetlb: use same fault hash key for shared and private mappings 2019-05-22 07:37:40 +02:00
util.c mm: page_mapped: don't assume compound page is huge or THP 2019-01-16 22:04:36 +01:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512! 2019-04-05 22:32:58 +02:00
vmpressure.c mm/vmpressure.c: convert to use match_string() helper 2018-06-07 17:34:36 -07:00
vmscan.c mm: fix inactive list balancing between NUMA nodes and cgroups 2019-05-16 19:41:23 +02:00
vmstat.c mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n 2019-04-27 09:36:40 +02:00
workingset.c mm/list_lru: introduce list_lru_shrink_walk_irq() 2018-08-17 16:20:32 -07:00
z3fold.c z3fold: fix possible reclaim races 2018-12-01 09:37:33 +01:00
zbud.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
zpool.c mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc 2018-02-21 15:35:43 -08:00
zsmalloc.c mm/zsmalloc.c: make several functions and a struct static 2018-08-17 16:20:30 -07:00
zswap.c zswap: re-check zswap_is_full() after do zswap_shrink() 2018-07-26 19:38:03 -07:00