linux-stable/mm
Mike Rapoport 6aeb25425d mmap: make mlock_future_check() global
Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v20.

This is an implementation of "secret" mappings backed by a file
descriptor.

The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call.  The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping.  The pages in that mapping will be marked as not present
in the direct map and will be present only in the page table of the owning
mm.

Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.

It's designed to provide the following protections:

* Enhanced protection (in conjunction with all the other in-kernel
  attack prevention systems) against ROP attacks.  Seceretmem makes
  "simple" ROP insufficient to perform exfiltration, which increases the
  required complexity of the attack.  Along with other protections like
  the kernel stack size limit and address space layout randomization which
  make finding gadgets is really hard, absence of any in-kernel primitive
  for accessing secret memory means the one gadget ROP attack can't work.
  Since the only way to access secret memory is to reconstruct the missing
  mapping entry, the attacker has to recover the physical page and insert
  a PTE pointing to it in the kernel and then retrieve the contents.  That
  takes at least three gadgets which is a level of difficulty beyond most
  standard attacks.

* Prevent cross-process secret userspace memory exposures.  Once the
  secret memory is allocated, the user can't accidentally pass it into the
  kernel to be transmitted somewhere.  The secreremem pages cannot be
  accessed via the direct map and they are disallowed in GUP.

* Harden against exploited kernel flaws.  In order to access secretmem,
  a kernel-side attack would need to either walk the page tables and
  create new ones, or spawn a new privileged uiserspace process to perform
  secrets exfiltration using ptrace.

In the future the secret mappings may be used as a mean to protect guest
memory in a virtual machine host.

For demonstration of secret memory usage we've created a userspace library

https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloader.git

that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it.  We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.

Hiding secret memory mappings behind an anonymous file allows usage of the
page cache for tracking pages allocated for the "secret" mappings as well
as using address_space_operations for e.g.  page migration callbacks.

The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native"
mm ABIs in the future.

Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which
affects the system performance.  However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "...  can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e057
("x86: add gbpages switches")) and the recent report [1] showed that "...
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice".  Hence, it is sufficient to
have secretmem disabled by default with the ability of a system
administrator to enable it at boot time.

In addition, there is also a long term goal to improve management of the
direct map.

[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/

This patch (of 7):

It will be used by the upcoming secret memory implementation.

Link: https://lkml.kernel.org/r/20210518072034.31572-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20210518072034.31572-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:20 -07:00
..
kasan Merge branch 'akpm' (patches from Andrew) 2021-06-29 17:29:11 -07:00
kfence kfence: unconditionally use unbound work queue 2021-07-01 11:06:03 -07:00
backing-dev.c writeback, cgroup: release dying cgwbs by switching attached inodes 2021-06-29 10:53:48 -07:00
balloon_compaction.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
bootmem_info.c mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c 2021-06-30 20:47:25 -07:00
cleancache.c
cma.c mm: use proper type for cma_[alloc|release] 2021-05-05 11:27:24 -07:00
cma.h mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
cma_debug.c mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
cma_sysfs.c mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
compaction.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
debug.c mm/debug: factor PagePoisoned out of __dump_page 2021-06-29 10:53:53 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/swapops: rework swap entry manipulation code 2021-07-01 11:06:03 -07:00
dmapool.c mm/dmapool: use DEVICE_ATTR_RO macro 2021-06-29 10:53:52 -07:00
early_ioremap.c mm/early_ioremap.c: use __func__ instead of function name 2021-02-26 09:41:02 -08:00
fadvise.c
failslab.c
filemap.c Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:30:04 -07:00
frontswap.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
gup.c mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables 2021-06-30 20:47:30 -07:00
gup_test.c selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages 2021-05-05 11:27:26 -07:00
gup_test.h selftests/vm: gup_test: fix test flag 2021-05-05 11:27:26 -07:00
highmem.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
hmm.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
huge_memory.c mm/rmap: split migration into its own function 2021-07-01 11:06:03 -07:00
hugetlb.c mm/swapops: rework swap entry manipulation code 2021-07-01 11:06:03 -07:00
hugetlb_cgroup.c hugetlb: make free_huge_page irq safe 2021-05-05 11:27:22 -07:00
hugetlb_vmemmap.c mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON 2021-06-30 20:47:26 -07:00
hugetlb_vmemmap.h mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate 2021-06-30 20:47:25 -07:00
hwpoison-inject.c
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-15 12:13:39 -08:00
internal.h mmap: make mlock_future_check() global 2021-07-08 11:48:20 -07:00
interval_tree.c mm/interval_tree: add comments to improve code readability 2021-04-30 11:20:38 -07:00
io-mapping.c mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
ioremap.c mm/ioremap: fix iomap_max_page_shift 2021-05-14 19:41:32 -07:00
Kconfig mm: generalize ZONE_[DMA|DMA32] 2021-06-30 20:47:30 -07:00
Kconfig.debug mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO 2020-12-15 12:13:46 -08:00
khugepaged.c mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs 2021-06-30 20:47:29 -07:00
kmemleak.c mm/kmemleak: fix possible wrong memory scanning period 2021-06-29 10:53:47 -07:00
ksm.c mm/ksm: use vma_lookup() in find_mergeable_vma() 2021-06-29 10:53:52 -07:00
list_lru.c mm: vmscan: consolidate shrinker_maps handling code 2021-05-05 11:27:23 -07:00
maccess.c
madvise.c mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables 2021-06-30 20:47:30 -07:00
Makefile mm: hugetlb: free the vmemmap pages associated with each HugeTLB page 2021-06-30 20:47:25 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: remove double Note in kerneldoc 2021-07-01 11:06:02 -07:00
memblock.c memblock, arm: fix crashes caused by holes in the memory map 2021-07-04 12:23:05 -07:00
memcontrol.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
memfd.c Reimplement RLIMIT_MEMLOCK on top of ucounts 2021-04-30 14:14:02 -05:00
memory-failure.c mm: fix spelling mistakes 2021-07-01 11:06:02 -07:00
memory.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
memory_hotplug.c mm/memory_hotplug: fix kerneldoc comment for __remove_memory 2021-07-01 11:06:02 -07:00
mempolicy.c mm/mempolicy: use unified 'nodes' for bind/interleave/prefer policies 2021-06-30 20:47:29 -07:00
mempool.c kasan: use separate (un)poison implementation for integrated init 2021-06-04 19:32:21 +01:00
memremap.c mm/memremap.c: fix improper SPDX comment style 2021-04-30 11:20:37 -07:00
memtest.c
migrate.c mm: rename migrate_pgmap_owner 2021-07-01 11:06:03 -07:00
mincore.c inode: make init and permission helpers idmapped mount aware 2021-01-24 14:27:16 +01:00
mlock.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
mm_init.c include/linux/page-flags-layout.h: cleanups 2021-04-30 11:20:42 -07:00
mmap.c mmap: make mlock_future_check() global 2021-07-08 11:48:20 -07:00
mmap_lock.c mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations 2021-07-01 11:06:03 -07:00
mmu_gather.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-25 09:22:55 -07:00
mmzone.c mm/lru: replace pgdat lru_lock with lruvec lock 2020-12-15 14:48:04 -08:00
mprotect.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
mremap.c mm/mremap: use vma_lookup() in vma_to_resize() 2021-06-29 10:53:52 -07:00
msync.c mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start 2021-04-30 11:20:37 -07:00
nommu.c mm/nommu: unexport do_munmap() 2021-06-30 20:47:30 -07:00
oom_kill.c Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu 2021-07-04 12:58:33 -07:00
page-writeback.c for-5.14/block-2021-06-29 2021-06-30 12:12:56 -07:00
page_alloc.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
page_counter.c mm: page_counter: mitigate consequences of a page_counter underflow 2021-04-30 11:20:38 -07:00
page_ext.c mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM 2021-06-29 10:53:55 -07:00
page_idle.c mm: page_idle_get_page() does not need lru_lock 2020-12-15 14:48:03 -08:00
page_io.c swap: fix swapfile read/write offset 2021-03-02 17:25:46 -07:00
page_isolation.c mm/page_isolation: do not isolate the max order page 2020-12-15 12:13:45 -08:00
page_owner.c mm/page_owner: constify dump_page_owner 2021-06-29 10:53:53 -07:00
page_poison.c mm: page_poison: print page info when corruption is caught 2021-04-30 11:20:36 -07:00
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_vma_mapped.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
pagewalk.c mm: pagewalk: fix walk for hugepage tables 2021-06-29 10:53:49 -07:00
percpu-internal.h Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu 2021-07-01 17:17:24 -07:00
percpu-km.c percpu: rework memcg accounting 2021-06-05 20:43:15 +00:00
percpu-stats.c percpu: rework memcg accounting 2021-06-05 20:43:15 +00:00
percpu-vm.c Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu 2021-07-01 17:17:24 -07:00
percpu.c Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu 2021-07-01 17:17:24 -07:00
pgalloc-track.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-16 09:24:42 -07:00
process_vm_access.c mm/process_vm_access.c: remove duplicate include 2021-05-05 11:27:27 -07:00
ptdump.c mm: ptdump: fix build failure 2021-04-16 16:10:37 -07:00
readahead.c mm: Implement readahead_control pageset expansion 2021-04-23 10:14:29 +01:00
rmap.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
rodata_test.c
shmem.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
shuffle.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
slab.h Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu 2021-07-04 12:58:33 -07:00
slab_common.c Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu 2021-07-04 12:58:33 -07:00
slob.c mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels 2021-03-08 14:18:46 -08:00
slub.c mm/slub: use stackdepot to save stack trace in objects 2021-07-08 11:48:20 -07:00
sparse-vmemmap.c mm: sparsemem: split the huge PMD mapping of vmemmap pages 2021-06-30 20:47:26 -07:00
sparse.c mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c 2021-06-30 20:47:25 -07:00
swap.c mm: fix typos and grammar error in comments 2021-07-01 11:06:02 -07:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: delete meaningless forward declarations 2021-06-29 10:53:49 -07:00
swap_state.c swap: check mapping_empty() for swap cache before being freed 2021-06-29 10:53:49 -07:00
swapfile.c mm: fix spelling mistakes 2021-07-01 11:06:02 -07:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-16 09:24:42 -07:00
usercopy.c
userfaultfd.c userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte() 2021-06-30 20:47:27 -07:00
util.c Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu 2021-07-04 12:58:33 -07:00
vmacache.c
vmalloc.c mm/vmalloc: include header for prototype of set_iounmap_nonlazy 2021-07-01 11:06:02 -07:00
vmpressure.c
vmscan.c mm/vmscan: remove kerneldoc-like comment from isolate_lru_pages 2021-07-01 11:06:02 -07:00
vmstat.c mm/vmstat: inline NUMA event counter updates 2021-06-29 10:53:54 -07:00
workingset.c mm: workingset: define macro WORKINGSET_SHIFT 2021-06-30 20:47:28 -07:00
z3fold.c mm/z3fold: add kerneldoc fields for z3fold_pool 2021-07-01 11:06:03 -07:00
zbud.c mm/zbud: add kerneldoc fields for zbud_pool 2021-07-01 11:06:03 -07:00
zpool.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
zsmalloc.c mm/zsmalloc.c: improve readability for async_free_zspage() 2021-07-01 11:06:02 -07:00
zswap.c mm/zswap.c: fix two bugs in zswap_writeback_entry() 2021-06-30 20:47:31 -07:00