linux-stable/mm
Nadav Amit 6ce64428d6 mm/userfaultfd: fix memory corruption due to writeprotect
Userfaultfd self-test fails occasionally, indicating a memory corruption.

Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy().  Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.

To make matters worse, memory-unprotection using userfaultfd also poses a
problem.  Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set.  Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.

The scenario that happens in selftests/vm/userfaultfd is as follows:

cpu0				cpu1			cpu2
----				----			----
							[ Writable PTE
							  cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()

change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
				[ page-fault ]
				...
				wp_page_copy()
				 cow_user_page()
				  [ copy page ]
							[ write to old
							  page ]
				...
				 set_pte_at_notify()

A similar scenario can happen:

cpu0		cpu1		cpu2		cpu3
----		----		----		----
						[ Writable PTE
				  		  cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
		userfaultfd_writeprotect()
		[ write-unprotect ]
		[ deferred TLB flush]
				[ page-fault ]
				wp_page_copy()
				 cow_user_page()
				 [ copy page ]
				 ...		[ write to page ]
				set_pte_at_notify()

This race exists since commit 292924b260 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit").  Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.

To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.

Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.

Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com
Fixes: 09854ba94c ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit <namit@vmware.com>
Suggested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-13 11:27:31 -08:00
..
kasan kasan: clarify that only first bug is reported in HW_TAGS 2021-02-26 09:41:03 -08:00
kfence kfence: fix reports if constant function prefixes exist 2021-03-13 11:27:30 -08:00
backing-dev.c mm/backing-dev.c: use might_alloc() 2021-02-26 09:41:01 -08:00
balloon_compaction.c
cleancache.c
cma.c mm: cma: print region name on failure 2021-02-26 09:41:00 -08:00
cma.h mm: cma: use CMA_MAX_NAME to define the length of cma name array 2020-09-01 09:19:43 +02:00
cma_debug.c debugfs: make sure we can remove u32_array files cleanly 2020-07-10 13:54:00 -07:00
compaction.c mm, compaction: make fast_isolate_freepages() stay within zone 2021-02-24 13:38:34 -08:00
debug.c mm/debug: improve memcg debugging 2021-02-24 13:38:27 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable/basic: iterate over entire protection_map[] 2021-02-24 13:38:27 -08:00
dmapool.c mm/dmapool: use might_alloc() 2021-02-26 09:41:01 -08:00
early_ioremap.c mm/early_ioremap.c: use __func__ instead of function name 2021-02-26 09:41:02 -08:00
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c mm: pass pvec directly to find_get_entries 2021-02-26 09:40:59 -08:00
frontswap.c mm/frontswap: mark various intentional data races 2020-08-14 19:56:56 -07:00
gup.c mm/hugetlb: grab head page refcount once for group of subpages 2021-02-24 13:38:32 -08:00
gup_test.c mm/gup_test.c: mark gup_test_init as __init function 2020-12-15 12:13:38 -08:00
gup_test.h selftests/vm: gup_test: introduce the dump_pages() sub-test 2020-12-15 12:13:38 -08:00
highmem.c mm/highmem.c: fix zero_user_segments() with start > end 2021-03-13 11:27:30 -08:00
hmm.c mm: do page fault accounting in handle_mm_fault 2020-08-12 10:58:02 -07:00
huge_memory.c mm: introduce page_needs_cow_for_dma() for deciding whether cow 2021-03-13 11:27:30 -08:00
hugetlb.c hugetlb: do early cow when page pinned on src mm 2021-03-13 11:27:30 -08:00
hugetlb_cgroup.c hugetlb_cgroup: use helper pages_per_huge_page() in hugetlb_cgroup 2021-02-24 13:38:33 -08:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-15 12:13:39 -08:00
internal.h mm: introduce page_needs_cow_for_dma() for deciding whether cow 2021-03-13 11:27:30 -08:00
interval_tree.c
ioremap.c mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
Kconfig media: videobuf2: Move frame_vector into media subsystem 2021-01-12 14:15:31 +01:00
Kconfig.debug mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO 2020-12-15 12:13:46 -08:00
khugepaged.c mm,thp,shmem: make khugepaged obey tmpfs mount flags 2021-02-26 09:40:59 -08:00
kmemleak.c mm/kmemleak: rely on rcu for task stack scanning 2020-10-13 18:38:27 -07:00
ksm.c mm: cleanup kstrto*() usage 2020-12-15 12:13:47 -08:00
list_lru.c mm/list_lru.c: remove kvfree_rcu_local() 2021-02-24 13:38:30 -08:00
maccess.c uaccess: add force_uaccess_{begin,end} helpers 2020-08-12 10:57:59 -07:00
madvise.c mm/madvise: replace ptrace attach requirement for process_madvise 2021-03-13 11:27:30 -08:00
Makefile mm: add Kernel Electric-Fence infrastructure 2021-02-26 09:41:02 -08:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: enhance the kernel-doc markups 2020-12-15 12:13:41 -08:00
memblock.c memblock: remove return value of memblock_free_all() 2021-02-22 13:01:23 -08:00
memcontrol.c mm: memcontrol: fix get_active_memcg return value 2021-02-24 13:38:30 -08:00
memfd.c
memory-failure.c mm: fix memory_failure() handling of dax-namespace metadata 2021-02-26 09:41:00 -08:00
memory.c mm/userfaultfd: fix memory corruption due to writeprotect 2021-03-13 11:27:31 -08:00
memory_hotplug.c mm/memory_hotplug: prevalidate the address range being added with platform 2021-02-26 09:41:00 -08:00
mempolicy.c mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk() 2021-02-24 13:38:34 -08:00
mempool.c kasan: move _RET_IP_ to inline wrappers 2021-02-24 13:38:31 -08:00
memremap.c mm/memory_hotplug: prevalidate the address range being added with platform 2021-02-26 09:41:00 -08:00
memtest.c
migrate.c mm: memcg: add swapcache stat for memcg v2 2021-02-24 13:38:29 -08:00
mincore.c inode: make init and permission helpers idmapped mount aware 2021-01-24 14:27:16 +01:00
mlock.c mm/mlock: stop counting mlocked pages when none vma is found 2021-02-26 09:41:01 -08:00
mm_init.c mm: fix fall-through warnings for Clang 2020-12-15 12:13:47 -08:00
mmap.c mm/mmap.c: remove unnecessary local variable 2021-02-24 13:38:30 -08:00
mmap_lock.c mm: mmap_lock: add tracepoints around lock acquisition 2020-12-15 12:13:41 -08:00
mmu_gather.c tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu() 2021-01-29 20:02:29 +01:00
mmu_notifier.c mm: track mmu notifiers in fs_reclaim_acquire/release 2020-12-15 12:13:41 -08:00
mmzone.c mm/lru: replace pgdat lru_lock with lruvec lock 2020-12-15 14:48:04 -08:00
mprotect.c mm/mprotect.c: optimize error detection in do_mprotect_pkey() 2021-02-24 13:38:30 -08:00
mremap.c mm: mremap: unlink anon_vmas when mremap with MREMAP_DONTUNMAP success 2021-02-24 13:38:30 -08:00
msync.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
nommu.c mm/nommu: Fix return type of filemap_map_pages() 2021-01-28 14:10:31 +00:00
oom_kill.c mm, oom: fix a comment in dump_task() 2021-02-24 13:38:34 -08:00
page-writeback.c mm: make wait_on_page_writeback() wait for multiple pending writebacks 2021-01-05 11:33:00 -08:00
page_alloc.c kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC 2021-03-13 11:27:31 -08:00
page_counter.c mm/page_counter: use page_counter_read in page_counter_set_max 2020-12-15 12:13:40 -08:00
page_ext.c mm: fix some spelling mistakes in comments 2020-12-15 22:46:19 -08:00
page_idle.c mm: page_idle_get_page() does not need lru_lock 2020-12-15 14:48:03 -08:00
page_io.c swap: fix swapfile read/write offset 2021-03-02 17:25:46 -07:00
page_isolation.c mm/page_isolation: do not isolate the max order page 2020-12-15 12:13:45 -08:00
page_owner.c mm/page_owner: use helper function zone_end_pfn() to get end_pfn 2021-02-24 13:38:27 -08:00
page_poison.c kasan, mm: reset tags when accessing metadata 2020-12-22 12:55:08 -08:00
page_reporting.c mm/page_reporting: use list_entry_is_head() in page_reporting_cycle() 2021-02-24 13:38:30 -08:00
page_reporting.h mm: introduce include/linux/pgtable.h 2020-06-09 09:39:13 -07:00
page_vma_mapped.c mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte 2020-12-15 12:13:41 -08:00
pagewalk.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
percpu-internal.h mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-vm.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu.c percpu: fix clang modpost section mismatch 2021-02-14 18:15:15 +00:00
pgalloc-track.h mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
pgtable-generic.c mm/pgtable-generic.c: optimize the VM_BUG_ON condition in pmdp_huge_clear_flush() 2021-02-24 13:38:30 -08:00
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-12 18:12:54 -08:00
ptdump.c kasan, arm64: expand CONFIG_KASAN checks 2020-12-22 12:55:08 -08:00
readahead.c mm: use limited read-ahead to satisfy read 2020-10-17 13:49:08 -06:00
rmap.c mm/rmap: correct obsolete comment of page_get_anon_vma() 2021-02-26 09:41:01 -08:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c mm,shmem,thp: limit shmem THP allocations to requested zones 2021-02-26 09:40:59 -08:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
slab.c kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations 2021-03-13 11:27:30 -08:00
slab.h mm: memcg/slab: pre-allocate obj_cgroups for slab caches with SLAB_ACCOUNT 2021-02-24 13:38:29 -08:00
slab_common.c kasan, mm: optimize krealloc poisoning 2021-02-26 09:41:03 -08:00
slob.c mm, tracing: record slab name for kmem_cache_free() 2021-02-24 13:38:26 -08:00
slub.c Revert "mm, slub: consider rest of partial list if acquire_slab() fails" 2021-03-10 10:18:04 -08:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG 2020-10-16 11:11:18 -07:00
swap.c mm: remove pagevec_lookup_entries 2021-02-26 09:40:59 -08:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: remove redundant NULL check 2021-02-24 13:38:28 -08:00
swap_state.c mm: add FGP_ENTRY 2021-02-26 09:40:59 -08:00
swapfile.c swap: fix swapfile read/write offset 2021-03-02 17:25:46 -07:00
truncate.c mm: remove pagevec_lookup_entries 2021-02-26 09:40:59 -08:00
usercopy.c mm/usercopy.c: delete duplicated word 2020-08-12 10:57:58 -07:00
userfaultfd.c mm/vmscan: protect the workingset on anonymous LRU 2020-08-12 10:57:55 -07:00
util.c mm: Make mem_dump_obj() handle vmalloc() memory 2021-01-22 15:24:04 -08:00
vmacache.c kernel: better document the use_mm/unuse_mm API contract 2020-06-10 19:14:18 -07:00
vmalloc.c Merge branch 'for-mingo-rcu' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu 2021-02-12 12:56:55 +01:00
vmpressure.c
vmscan.c mm/vmscan: restore zone_reclaim_mode ABI 2021-02-24 13:38:34 -08:00
vmstat.c mm/vmstat.c: erase latency in vmstat_shepherd 2021-02-26 09:41:00 -08:00
workingset.c mm: workingset: clarify eviction order and distance calculation 2021-02-24 13:38:34 -08:00
z3fold.c mm: set the sleep_mapped to true for zbud and z3fold 2021-02-26 09:41:01 -08:00
zbud.c mm: set the sleep_mapped to true for zbud and z3fold 2021-02-26 09:41:01 -08:00
zpool.c mm/zswap: add the flag can_sleep_mapped 2021-02-26 09:41:01 -08:00
zsmalloc.c mm/zsmalloc.c: use page_private() to access page->private 2021-02-26 09:41:01 -08:00
zswap.c mm/zswap: add the flag can_sleep_mapped 2021-02-26 09:41:01 -08:00