linux-stable/mm
Linus Torvalds 407faed92b gup: document and work around "COW can break either way" issue
commit 17839856fd upstream.

Doing a "get_user_pages()" on a copy-on-write page for reading can be
ambiguous: the page can be COW'ed at any time afterwards, and the
direction of a COW event isn't defined.

Yes, whoever writes to it will generally do the COW, but if the thread
that did the get_user_pages() unmapped the page before the write (and
that could happen due to memory pressure in addition to any outright
action), the writer could also just take over the old page instead.

End result: the get_user_pages() call might result in a page pointer
that is no longer associated with the original VM, and is associated
with - and controlled by - another VM having taken it over instead.

So when doing a get_user_pages() on a COW mapping, the only really safe
thing to do would be to break the COW when getting the page, even when
only getting it for reading.

At the same time, some users simply don't even care.

For example, the perf code wants to look up the page not because it
cares about the page, but because the code simply wants to look up the
physical address of the access for informational purposes, and doesn't
really care about races when a page might be unmapped and remapped
elsewhere.

This adds logic to force a COW event by setting FOLL_WRITE on any
copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page
pointer as a result.

The current semantics end up being:

 - __get_user_pages_fast(): no change. If you don't ask for a write,
   you won't break COW. You'd better know what you're doing.

 - get_user_pages_fast(): the fast-case "look it up in the page tables
   without anything getting mmap_sem" now refuses to follow a read-only
   page, since it might need COW breaking.  Which happens in the slow
   path - the fast path doesn't know if the memory might be COW or not.

 - get_user_pages() (including the slow-path fallback for gup_fast()):
   for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with
   very similar semantics to FOLL_FORCE.

If it turns out that we want finer granularity (ie "only break COW when
it might actually matter" - things like the zero page are special and
don't need to be broken) we might need to push these semantics deeper
into the lookup fault path.  So if people care enough, it's possible
that we might end up adding a new internal FOLL_BREAK_COW flag to go
with the internal FOLL_COW flag we already have for tracking "I had a
COW".

Alternatively, if it turns out that different callers might want to
explicitly control the forced COW break behavior, we might even want to
make such a flag visible to the users of get_user_pages() instead of
using the above default semantics.

But for now, this is mostly commentary on the issue (this commit message
being a lot bigger than the patch, and that patch in turn is almost all
comments), with that minimal "enable COW breaking early" logic using the
existing FOLL_WRITE behavior.

[ It might be worth noting that we've always had this ambiguity, and it
  could arguably be seen as a user-space issue.

  You only get private COW mappings that could break either way in
  situations where user space is doing cooperative things (ie fork()
  before an execve() etc), but it _is_ surprising and very subtle, and
  fork() is supposed to give you independent address spaces.

  So let's treat this as a kernel issue and make the semantics of
  get_user_pages() easier to understand. Note that obviously a true
  shared mapping will still get a page that can change under us, so this
  does _not_ mean that get_user_pages() somehow returns any "stable"
  page ]

[surenb: backport notes
	Replaced (gup_flags | FOLL_WRITE) with write=1 in gup_pgd_range.
	Removed FOLL_PIN usage in should_force_cow_break since it's missing in
	the earlier kernels.]

Reported-by: Jann Horn <jannh@google.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill Shutemov <kirill@shutemov.name>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[surenb: backport to 4.14 kernel]
Cc: stable@vger.kernel.org # 4.14.x
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-28 12:08:42 +02:00
..
kasan kasan: fix shadow_size calculation error in kasan_module_alloc 2018-08-24 13:09:12 +02:00
backing-dev.c memcg: fix a crash in wb_workfn when a device disappears 2021-02-23 14:00:30 +01:00
balloon_compaction.c virtio_balloon: fix deadlock on OOM 2018-10-13 09:27:30 +02:00
bootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cleancache.c
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-08-06 19:05:23 +02:00
cma.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-06-15 11:54:51 +02:00
compaction.c mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone 2019-10-05 12:48:13 +02:00
debug.c mm: get rid of vmacache_flush_all() entirely 2018-09-19 22:43:48 +02:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dmapool.c
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2018-02-25 11:08:03 +01:00
fadvise.c mm/fadvise.c: fix signed overflow UBSAN complaint 2018-09-15 09:45:28 +02:00
failslab.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
filemap.c mm/filemap.c: clear page error before actual read 2020-10-01 13:12:40 +02:00
frame_vector.c mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()' 2018-03-03 10:24:21 +01:00
frontswap.c
gup.c gup: document and work around "COW can break either way" issue 2021-04-28 12:08:42 +02:00
highmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hmm.c mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL 2019-01-13 10:01:02 +01:00
huge_memory.c gup: document and work around "COW can break either way" issue 2021-04-28 12:08:42 +02:00
hugetlb.c mm/hugetlb.c: fix unnecessary address expansion of pmd sharing 2021-03-07 11:27:43 +01:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-20 17:59:33 +01:00
hwpoison-inject.c
init-mm.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
internal.h vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n 2019-12-05 15:37:51 +01:00
interval_tree.c lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
Kconfig mm/hmm: select mmu notifier when selecting HMM 2019-06-15 11:54:51 +02:00
Kconfig.debug kmemcheck: rip it out 2018-02-22 15:42:24 +01:00
khugepaged.c mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged 2020-10-14 09:51:14 +02:00
kmemleak-test.c
kmemleak.c mm/kmemleak.c: use address-of operator on section symbols 2020-10-01 13:12:40 +02:00
ksm.c mm/ksm: fix NULL pointer dereference when KSM zero page is enabled 2020-05-02 17:24:22 +02:00
list_lru.c mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node 2019-06-19 08:20:54 +02:00
maccess.c uaccess: Add non-pagefault user-space write function 2020-09-09 19:03:11 +02:00
madvise.c mm: madvise(MADV_DODUMP): allow hugetlbfs pages 2018-10-10 08:54:22 +02:00
Makefile kmemcheck: rip it out 2018-02-22 15:42:24 +01:00
memblock.c memblock: do not start bottom-up allocations with kernel_end 2021-02-23 14:00:32 +01:00
memcontrol.c mm: writeback: use exact memcg dirty counts 2021-04-07 12:47:03 +02:00
memory-failure.c mm: hwpoison: fix thp split handing in soft_offline_in_use_page() 2019-03-23 14:35:24 +01:00
memory.c mm: fix race by making init_zero_pfn() early_initcall 2021-04-07 12:47:02 +02:00
memory_hotplug.c mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span() 2019-12-01 09:14:18 +01:00
mempolicy.c mm: mempolicy: fix potential pte_unmap_unlock pte error 2020-11-18 18:27:52 +01:00
mempool.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate 2019-04-03 06:25:20 +02:00
mincore.c mm/mincore.c: make mincore() more conservative 2019-05-21 18:50:16 +02:00
mlock.c mm/mlock.c: change count_mm_mlocked_page_nr return type 2019-07-10 09:54:36 +02:00
mm_init.c
mmap.c mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area 2020-10-01 13:12:41 +02:00
mmu_context.c
mmu_notifier.c mm/mmu_notifier: use hlist_add_head_rcu() 2019-07-31 07:28:56 +02:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa 2020-03-11 18:03:02 +01:00
mremap.c mm: Fix mremap not considering huge pmd devmap 2020-06-11 09:22:57 +02:00
msync.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nobootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nommu.c x86/mm: split vmalloc_sync_all() 2020-04-02 16:34:20 +02:00
oom_kill.c mm: fix oom_kill event handling 2021-04-07 12:47:03 +02:00
page-writeback.c mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() 2020-01-23 08:20:32 +01:00
page_alloc.c mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged 2020-10-14 09:51:14 +02:00
page_counter.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_ext.c mm/page_ext.c: fix an imbalance with kmemleak 2019-04-05 22:31:27 +02:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-07-03 13:15:59 +02:00
page_io.c swap: fix swapfile read/write offset 2021-03-07 11:27:46 +01:00
page_isolation.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_owner.c mm/page_owner.c: remove drain_all_pages from init_early_allocated_pages 2020-07-31 16:44:45 +02:00
page_poison.c page_poison: play nicely with KASAN 2019-04-05 22:31:28 +02:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-11-13 11:15:08 -08:00
pagewalk.c mm: pagewalk: fix termination condition in walk_pte_range() 2020-10-01 13:12:32 +02:00
percpu-internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
percpu-km.c percpu: convert spin_lock_irq to spin_lock_irqsave. 2019-02-12 19:46:05 +01:00
percpu-stats.c percpu: fix starting offset for chunk statistics traversal 2017-09-27 14:45:57 -07:00
percpu-vm.c percpu: add __GFP_NORETRY semantics to the percpu balancing path 2018-04-08 14:26:29 +02:00
percpu.c percpu: fix first chunk size calculation for populated bitmap 2020-09-23 10:46:36 +02:00
pgtable-generic.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
process_vm_access.c
quicklist.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
readahead.c readahead: stricter check for bdi io_pages 2018-09-09 19:55:53 +02:00
rmap.c mm: migration: fix migration of huge PMD shared pages 2018-10-13 09:27:22 +02:00
rodata_test.c mm: fix RODATA_TEST failure "rodata_test: test data was not read only" 2017-10-03 17:54:24 -07:00
shmem.c shmem: fix possible deadlocks on shmlock_user_lock 2020-05-20 08:17:04 +02:00
slab.c mm/slab.c: fix an infinite loop in leaks_show() 2019-06-15 11:54:51 +02:00
slab.h kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK 2018-02-22 15:42:23 +01:00
slab_common.c mm/slab: use memzero_explicit() in kzfree() 2020-06-30 15:38:08 -04:00
slob.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
slub.c Revert "mm, slub: consider rest of partial list if acquire_slab() fails" 2021-03-17 16:34:28 +01:00
sparse-vmemmap.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sparse.c mm: sections are not offlined during memory hotremove 2018-05-16 10:10:27 +02:00
swap.c mm: avoid marking swap cached page as lazyfree 2017-10-03 17:54:24 -07:00
swap_cgroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_slots.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_state.c mm/swap_state: fix a data race in swapin_nr_pages 2020-10-01 13:12:46 +02:00
swapfile.c swap: fix swapfile read/write offset 2021-03-07 11:27:46 +01:00
truncate.c mm: cleancache: fix corruption on missed inode invalidation 2018-12-08 13:03:40 +01:00
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-10-11 18:18:34 +02:00
userfaultfd.c hugetlb: use same fault hash key for shared and private mappings 2019-05-31 06:47:12 -07:00
util.c mm: add kvfree_sensitive() for freeing sensitive data objects 2020-06-20 10:24:59 +02:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-19 22:43:48 +02:00
vmalloc.c mm/vmalloc.c: don't dereference possible NULL pointer in __vunmap() 2020-06-03 08:18:11 +02:00
vmpressure.c mm, vmpressure: pass-through notification support 2017-07-10 16:32:31 -07:00
vmscan.c mm: memcg: make sure memory.events is uptodate when waking pollers 2021-04-07 12:47:03 +02:00
vmstat.c mm/vmstat.c: fix NUMA statistics updates 2019-12-17 20:37:57 +01:00
workingset.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
z3fold.c z3fold: fix possible reclaim races 2018-12-01 09:42:54 +01:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: account the number of compacted pages correctly 2021-03-07 11:27:46 +01:00
zswap.c zswap: re-check zswap_is_full() after do zswap_shrink() 2018-09-05 09:26:30 +02:00